nicholas broad

63 posts

nicholas broad

nicholas broad

@nbroad1881

dogs with bandanas advocate fde at https://t.co/ySm7MceGj3 ideas are likely someone else's but i forgot so i think they're mine

san francisco Katılım Haziran 2020
250 Takip Edilen181 Takipçiler
nicholas broad
nicholas broad@nbroad1881·
@natolambert Why don't they do a regex replace for "I'm Claude/Gemini/etc." and replacing it with "I'm Kimi" in the training data?
English
1
0
1
385
Nathan Lambert
Nathan Lambert@natolambert·
These behaviors from Chinese models thinking they’re built by American companies has a very large policy impact. It reinforces the theory that Chinese models are only good because they distill from closed western models. Distillation from API models definitely helps Chinese models — especially in a compute crunch for training — but cutting off this behavior would not change the nature of the Chinese open ecosystem much at all. In fact, Chinese builders could improve each others models even more if forced to distill from them. I don’t expect this to happen, there are just too many strong api models out there and it makes for a very nice workflow for doing post training data synthesis. TLDR maybe model builders should put a bit more effort into identity though (myself included).
Enrico - big-AGI@enricoros

Kimi-K2.5 believes it's an AI assistant named Claude. 🤔 Identity crisis, or training set? 😀

English
39
29
561
114.9K
Teknium 🪽
Teknium 🪽@Teknium·
@osanseviero Does hf sequence classification support multiple labels with % scales for each by chance, before I get too far into this and find out later - I would ideally like something that can output this kind of response, but for different labels:
Teknium 🪽 tweet media
English
4
0
6
1K
Teknium 🪽
Teknium 🪽@Teknium·
Anyone have a pipeline that makes training a HF XXXModelForSequenceClassification easy or any colabs/guides?
English
6
1
36
8.9K
Eugene Yan
Eugene Yan@eugeneyan·
What are some key hyperparameters to tweak when doing supervised fine tuning? Off the top of my head I recall a few papers mentioning learning rate and batch size (chinchilla, qlora)—anything else?
English
6
0
18
4.8K
Sebastian Raschka
Sebastian Raschka@rasbt·
What happens if we train LLMs for multiple epochs? The question I asked multiple times in the past finally got answered in this new preprint, "To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis". 1/6
Sebastian Raschka tweet media
English
18
110
681
245.7K
nicholas broad
nicholas broad@nbroad1881·
BTW, Tower 33 VFX is one of @huggingface's customers, so you'll also get to work with our team 🤗
English
0
0
1
163
nicholas broad
nicholas broad@nbroad1881·
Individuals interested in this role can check out the company's current work on their website - tower33vfx.com.  All inquiries and applications can be sent to production@tower33vfx.com
English
1
0
1
233
nicholas broad
nicholas broad@nbroad1881·
Calling all ML Engineers who want to work on visual effects! Tower 33 VFX studio has a very cool opportunity to use state-of-the-art models and approaches to build out a toolkit for improving the visual effects workflow. Apply here: tower33vfx.com/careers/
English
1
4
6
2.1K
nicholas broad retweetledi
Waseem
Waseem@waseem_s·
As part of our commitment to support the AI open source community, we're releasing a new model called Camel 🐪 (inspired by Llama, but 100% open-source)! Find it on @huggingface 🤗, and since everyone loves releasing models behind request forms, we made one too! Access it directly or use the form - your choice! 😂 #CamelModel #opensource Model=> huggingface.co/Writer/camel-5……Request Form=> huggingface.co/spaces/Writer/…… Live Demo => chatcamel.vercel.app Powered by @baseten
English
8
32
203
135.3K
Richard Kuzma
Richard Kuzma@rskuzma·
@nbroad1881 @cerebras @huggingface It's our hardware, so we don't pay, but here's a screenshot from our website of example pricing for 20x tokens per param. We do offer fine-tuning, training on more tokens (e.g., LLaMA), longer MSL, etc
Richard Kuzma tweet media
English
1
0
5
607
Richard Kuzma
Richard Kuzma@rskuzma·
LLMs aren't just for GPUs! @cerebras releases a family of models up to 13B parameters on @huggingface to promote open research into scaling laws and demonstrate the capability of CS-2 hardware Why do this? 1/4
Richard Kuzma tweet media
English
5
30
175
30.8K
nicholas broad retweetledi
Hugging Face
Hugging Face@huggingface·
Today we are excited to announce a new partnership with @awscloud! 🔥 Together, we will accelerate the availability of open-source machine learning 🤝 Read the post 👉 huggingface.co/blog/aws-partn…
English
10
152
690
123.2K
Yi Tay
Yi Tay@YiTayML·
glad that my perf rating this time sounds very much like a very famous architecture in deep learning.
English
10
1
62
38K
nicholas broad
nicholas broad@nbroad1881·
I created a video explaining Multiple Negatives Ranking Loss (MNRL) for training sentence embeddings, which can be useful for clustering/semantic search purposes.This is the 1st time making a video like this, but I'll get better as I get more practice 🤞 youtu.be/b_2v9Hpfnbw
YouTube video
YouTube
English
1
0
4
285