nicholas broad

63 posts

nicholas broad

@nbroad1881

dogs with bandanas advocate fde at https://t.co/ySm7MceGj3 ideas are likely someone else's but i forgot so i think they're mine

san francisco Katılım Haziran 2020

250 Takip Edilen181 Takipçiler

nicholas broad@nbroad1881·27 Oca

@natolambert Why don't they do a regex replace for "I'm Claude/Gemini/etc." and replacing it with "I'm Kimi" in the training data?

English

385

Nathan Lambert@natolambert·27 Oca

These behaviors from Chinese models thinking they’re built by American companies has a very large policy impact. It reinforces the theory that Chinese models are only good because they distill from closed western models. Distillation from API models definitely helps Chinese models — especially in a compute crunch for training — but cutting off this behavior would not change the nature of the Chinese open ecosystem much at all. In fact, Chinese builders could improve each others models even more if forced to distill from them. I don’t expect this to happen, there are just too many strong api models out there and it makes for a very nice workflow for doing post training data synthesis. TLDR maybe model builders should put a bit more effort into identity though (myself included).

Enrico - big-AGI@enricoros

Kimi-K2.5 believes it's an AI assistant named Claude. 🤔 Identity crisis, or training set? 😀

English

561

114.9K

nicholas broad retweetledi

Raja Biswas@raja_biswas·4 Şub

Really happy to share that my team (/w @nbroad1881 & @UdbhavBamba) won the @kaggle LLM - Detect AI Generated Text competition. Objective was to detect whether an essay was written by a student or an LLM. Solution: kaggle.com/competitions/l… Repo: github.com/rbiswasfc/llm-…

English

3.2K

nicholas broad@nbroad1881·23 Eki

@teknium @osanseviero set problem_type="multi_label_classification". See this notebook for an example: github.com/NielsRogge/Tra…

English

Teknium 🪽@Teknium·19 Eki

@osanseviero Does hf sequence classification support multiple labels with % scales for each by chance, before I get too far into this and find out later - I would ideally like something that can output this kind of response, but for different labels:

English

Teknium 🪽@Teknium·19 Eki

Anyone have a pipeline that makes training a HF XXXModelForSequenceClassification easy or any colabs/guides?

English

8.9K

nicholas broad@nbroad1881·2 Ağu

@alyssamvance Donut could work #donut" target="_blank" rel="nofollow noopener">huggingface.co/docs/transform…

English

132

nicholas broad@nbroad1881·8 Tem

@eugeneyan github.com/google-researc…

QME

137

Eugene Yan@eugeneyan·8 Tem

What are some key hyperparameters to tweak when doing supervised fine tuning? Off the top of my head I recall a few papers mentioning learning rate and batch size (chinchilla, qlora)—anything else?

English

4.8K

nicholas broad@nbroad1881·31 May

@rskuzma @rasbt I don't see a result with 64k x

English

Richard Kuzma@rskuzma·31 May

@nbroad1881 @rasbt ^ this result with 4x repetition instead of 64, 256x seems more realistic, no?

English

Sebastian Raschka@rasbt·30 May

What happens if we train LLMs for multiple epochs? The question I asked multiple times in the past finally got answered in this new preprint, "To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis". 1/6

English

110

681

245.7K

nicholas broad@nbroad1881·5 May

BTW, Tower 33 VFX is one of @huggingface's customers, so you'll also get to work with our team 🤗

English

163

nicholas broad@nbroad1881·5 May

Individuals interested in this role can check out the company's current work on their website - tower33vfx.com. All inquiries and applications can be sent to production@tower33vfx.com

English

233

nicholas broad@nbroad1881·5 May

Calling all ML Engineers who want to work on visual effects! Tower 33 VFX studio has a very cool opportunity to use state-of-the-art models and approaches to build out a toolkit for improving the visual effects workflow. Apply here: tower33vfx.com/careers/

English

2.1K

nicholas broad retweetledi

Waseem@waseem_s·18 Nis

As part of our commitment to support the AI open source community, we're releasing a new model called Camel 🐪 (inspired by Llama, but 100% open-source)! Find it on @huggingface 🤗, and since everyone loves releasing models behind request forms, we made one too! Access it directly or use the form - your choice! 😂 #CamelModel #opensource Model=> huggingface.co/Writer/camel-5……Request Form=> huggingface.co/spaces/Writer/…… Live Demo => chatcamel.vercel.app Powered by @baseten

English

203

135.3K

nicholas broad@nbroad1881·30 Mar

@rskuzma @cerebras @huggingface Is cost linear with tokens? If I wanted GPT-J on 1T tokens, would I multiply $45*1000/120?

English

Richard Kuzma@rskuzma·30 Mar

@nbroad1881 @cerebras @huggingface It's our hardware, so we don't pay, but here's a screenshot from our website of example pricing for 20x tokens per param. We do offer fine-tuning, training on more tokens (e.g., LLaMA), longer MSL, etc

English

607

Richard Kuzma@rskuzma·28 Mar

LLMs aren't just for GPUs! @cerebras releases a family of models up to 13B parameters on @huggingface to promote open research into scaling laws and demonstrate the capability of CS-2 hardware Why do this? 1/4

English

175

30.8K

nicholas broad retweetledi

Hugging Face@huggingface·21 Şub

Today we are excited to announce a new partnership with @awscloud! 🔥 Together, we will accelerate the availability of open-source machine learning 🤝 Read the post 👉 huggingface.co/blog/aws-partn…

English

152

690

123.2K

nicholas broad@nbroad1881·13 Şub

@vad13irt hardest category

English

nicholas broad@nbroad1881·11 Şub

@YiTayML xlnet 😉

Indonesia

Yi Tay@YiTayML·9 Şub

glad that my perf rating this time sounds very much like a very famous architecture in deep learning.

English

38K

nicholas broad@nbroad1881·7 Şub

Shout-out to @Nils_Reimers for creating sentence-transformers. I used his awesome library as a reference. github.com/UKPLab/sentenc…

English

100

nicholas broad@nbroad1881·7 Şub

The notebook also uses #Accelerate to utilize the 2x T4 GPUs offered on #Kaggle! github.com/huggingface/ac… @GuggerSylvain, @TheZachMueller, let me know what you think!

English

119

nicholas broad@nbroad1881·7 Şub

I created a video explaining Multiple Negatives Ranking Loss (MNRL) for training sentence embeddings, which can be useful for clustering/semantic search purposes.This is the 1st time making a video like this, but I'll get better as I get more practice 🤞 youtu.be/b_2v9Hpfnbw

YouTube

English

285

Keşfet

@natolambert @UdbhavBamba @kaggle @teknium @osanseviero @eugeneyan @rskuzma @rasbt