Neil Band

154 posts

Neil Band banner
Neil Band

Neil Band

@neilbband

PhD student @StanfordAILab @StanfordNLP @Stanford advised by Tatsunori Hashimoto and Tengyu Ma. Prev: @OATML_Oxford @CompSciOxford

Stanford, CA Katılım Eylül 2020
787 Takip Edilen1.2K Takipçiler
Sabitlenmiş Tweet
Neil Band
Neil Band@neilbband·
When LLMs are unsure, they either hallucinate or abstain. Ideally, they should clearly express truthful confidence levels. Our #ICML2024 work designs an alignment objective to achieve this notion of linguistic calibration in *long-form generations*. arxiv.org/abs/2404.00474 🧵
Neil Band tweet media
English
10
44
304
73.2K
Neil Band retweetledi
Rosinality
Rosinality@rosinality·
A synthetic data generation method that, when a model is trained on the generated data, it maximizes a certain differentiable objective. e.g. it is possible to make data that engraves a QR code in the weights of an LM head. (Or, more conventional things like translating documents to improve target language loss.)
Rosinality tweet media
English
1
34
287
54.6K
Neil Band retweetledi
Tristan Thrush
Tristan Thrush@TristanThrush·
New paper! Want to precisely optimize synthetic training data to do practical or even wacky things? Dataset Policy Gradients get you there, letting you target any differentiable training or post-training metric. We embedded a QR code in GPT-2’s weights using only training data!
Tristan Thrush tweet media
English
4
32
165
21K
Neil Band retweetledi
Karan Dalal
Karan Dalal@karansdalal·
Our new paper, “End-to-End Test-Time Training for Long Context,” is a step towards continual learning in language models. We introduce a new method that blurs the boundary between training and inference. At test-time, our model continues learning from given context using the same next-token prediction objective as training. With this end-to-end objective, our model can efficiently compress substantial context into its weights and still use it effectively, unlocking extremely long context windows for complex reasoning and applications in agents and robotics. Paper: test-time-training.github.io/e2e.pdf Code: github.com/test-time-trai…
Karan Dalal tweet media
English
42
208
1.2K
182.8K
Neil Band retweetledi
Jon Saad-Falcon
Jon Saad-Falcon@JonSaadFalcon·
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW): intelligence delivered (capabilities) per unit of power consumed (efficiency). Today’s Local LMs already handle 88.7% of single-turn chat and reasoning queries, with local IPW improving 5.3× in 2 years—driven by better models (3.2×) and better accelerators (1.7×). As local IPW improves, a meaningful fraction of workloads can shift from centralized infrastructure to local compute, with IPW serving as the critical metric for tracking this transition. (1/N)
Jon Saad-Falcon tweet media
English
55
141
454
226.3K
Neil Band retweetledi
Suhas Kotha
Suhas Kotha@kothasuhas·
Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute
Suhas Kotha tweet media
English
10
84
447
151.8K
Neil Band retweetledi
Kaiyue Wen
Kaiyue Wen@wen_kaiyue·
(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!
Kaiyue Wen tweet media
English
13
98
444
183.2K
Neil Band retweetledi
Niklas Muennighoff
Niklas Muennighoff@Muennighoff·
Can AI solve open problems in math, physics, coding, medical sciences & beyond? We collected unsolved questions (UQ) & tested frontier LLMs. Some solutions passed expert validation…
Niklas Muennighoff tweet media
English
27
188
554
85.9K
Neil Band retweetledi
CLS
CLS@ChengleiSi·
Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.
CLS tweet media
English
12
183
635
152.3K
Neil Band retweetledi
Jeff Dean
Jeff Dean@JeffDean·
Very cool thread about the CS336 Language Models from Scratch course at Stanford taught by @percyliang et al. Makes me wish I was a student again!
Percy Liang@percyliang

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

English
19
82
964
111.9K
Neil Band retweetledi
Jon Saad-Falcon
Jon Saad-Falcon@JonSaadFalcon·
How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning models like Llama 3.3 70B Instruct! 🧵(1 / N)
Jon Saad-Falcon tweet media
English
11
60
223
76.8K
Neil Band retweetledi
Percy Liang
Percy Liang@percyliang·
Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:
English
46
570
4.9K
677.5K
Neil Band retweetledi
Ryan Marten
Ryan Marten@ryanmart3n·
Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data scales. Full details are in our ✨new paper✨ - below we share the highlights: BTW, it also works on non-Qwen models😉 (1/N)
Ryan Marten tweet media
English
34
192
929
200.2K
Neil Band retweetledi
Simon Guo
Simon Guo@simonguozirui·
Designed some graphics for Stanford CS336 (Language Modeling from Scratch) by @percyliang @tatsu_hashimoto @marcelroed @neilbband @rckpudi Covering four assignments 📚 that teach you how to 🧑‍🍳 cook an LLM from scratch: - Build and Train a Tokenizer 🔤 - Write Triton kernels for Attention ⚡️ - Construct Scaling Laws 📉 - Implement GRPO 🐙
Simon Guo tweet mediaSimon Guo tweet mediaSimon Guo tweet mediaSimon Guo tweet media
English
11
58
649
68.1K
Neil Band retweetledi
Zitong Yang
Zitong Yang@ZitongYang0·
Synthetic Continued Pretraining (arxiv.org/pdf/2409.07431) has been accepted as an Oral Presentation at #ICLR2025! We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.
Zitong Yang tweet media
Zitong Yang@ZitongYang0

Grab your favorite preprint of the week: how can you put its knowledge in your LM’s parameters? Continued pretraining (CPT) works well with >10B tokens, but the preprint is <10K. Synthetic CPT downscales CPT to such small, targeted domains. 📜: arxiv.org/abs/2409.07431 🧵👇

English
1
12
83
11.2K
Neil Band retweetledi
Tatsunori Hashimoto
Tatsunori Hashimoto@tatsu_hashimoto·
I think CS336 has one of the best LLM problem sets of any AI/LM class thanks to our incredible TAs (@nelsonfliu,@GabrielPoesia,@marcelroed,@neilbband,@rckpudi). We're making it so you can do it all at home, and it's one of the best ways to learn LLMs deeply.
Stanford NLP Group@stanfordnlp

Want to learn the engineering details of building state-of-the-art Large Language Models (LLMs)? Not finding much info in @OpenAI’s non-technical reports? @percyliang and @tatsu_hashimoto are here to help with CS336: Language Modeling from Scratch, now rolling out to YouTube.

English
10
59
723
81.8K
Neil Band retweetledi
Stanford NLP Group
Stanford NLP Group@stanfordnlp·
Want to learn the engineering details of building state-of-the-art Large Language Models (LLMs)? Not finding much info in @OpenAI’s non-technical reports? @percyliang and @tatsu_hashimoto are here to help with CS336: Language Modeling from Scratch, now rolling out to YouTube.
English
9
158
1.1K
200.7K
Neil Band retweetledi
Etash Guha
Etash Guha@etash_guha·
Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)
Etash Guha tweet media
English
19
131
466
89.6K
Neil Band retweetledi
Yangjun Ruan
Yangjun Ruan@YangjunR·
New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”. arxiv.org/abs/2503.18866 Here’s how it works🧵
Yangjun Ruan tweet media
English
15
93
489
51.5K
Neil Band retweetledi
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Reasoning to Learn from Latent Thoughts "Motivated by how humans apply deliberate thinking to learn from limited data, we train an LM to infer (or “decompress”) latent thoughts underlying the highly compressed observed data. These synthesized latent thoughts augment the raw observed data during pretraining, improving the LM’s data efficiency. This procedure can be iteratively applied through an EM algorithm and form a model self-improvement loop where increasingly capable LMs synthesize more effective latent thoughts, which in turn train more capable models."
Tanishq Mathew Abraham, Ph.D. tweet media
English
12
114
634
64.7K