Neil Band

154 posts

Neil Band

@neilbband

PhD student @StanfordAILab @StanfordNLP @Stanford advised by Tatsunori Hashimoto and Tengyu Ma. Prev: @OATML_Oxford @CompSciOxford

Stanford, CA Katılım Eylül 2020

787 Takip Edilen1.2K Takipçiler

Sabitlenmiş Tweet

Neil Band@neilbband·6 Haz

When LLMs are unsure, they either hallucinate or abstain. Ideally, they should clearly express truthful confidence levels. Our #ICML2024 work designs an alignment objective to achieve this notion of linguistic calibration in *long-form generations*. arxiv.org/abs/2404.00474 🧵

English

304

73.2K

Neil Band retweetledi

Rosinality@rosinality·1d

A synthetic data generation method that, when a model is trained on the generated data, it maximizes a certain differentiable objective. e.g. it is possible to make data that engraves a QR code in the weights of an LM head. (Or, more conventional things like translating documents to improve target language loss.)

English

287

54.6K

Neil Band retweetledi

Tristan Thrush@TristanThrush·1d

New paper! Want to precisely optimize synthetic training data to do practical or even wacky things? Dataset Policy Gradients get you there, letting you target any differentiable training or post-training metric. We embedded a QR code in GPT-2’s weights using only training data!

English

165

21K

Neil Band retweetledi

Karan Dalal@karansdalal·29 Ara

Our new paper, “End-to-End Test-Time Training for Long Context,” is a step towards continual learning in language models. We introduce a new method that blurs the boundary between training and inference. At test-time, our model continues learning from given context using the same next-token prediction objective as training. With this end-to-end objective, our model can efficiently compress substantial context into its weights and still use it effectively, unlocking extremely long context windows for complex reasoning and applications in agents and robotics. Paper: test-time-training.github.io/e2e.pdf Code: github.com/test-time-trai…

English

208

1.2K

182.8K

Neil Band@neilbband·3 Ara

Tim's an amazing researcher and mentor, go work with him!

Tim G. J. Rudner@timrudner

I'm so happy to share that I’ll be joining @UofT as an Assistant Professor of Statistical Sciences and Computer Science, with an appointment at the @VectorInst, in 2026! I'm recruiting postdocs and PhD students: timrudner.com! Please help me spread the word! 🧵(1/5)

English

316

Neil Band retweetledi

Jon Saad-Falcon@JonSaadFalcon·12 Kas

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW): intelligence delivered (capabilities) per unit of power consumed (efficiency). Today’s Local LMs already handle 88.7% of single-turn chat and reasoning queries, with local IPW improving 5.3× in 2 years—driven by better models (3.2×) and better accelerators (1.7×). As local IPW improves, a meaningful fraction of workloads can shift from centralized infrastructure to local compute, with IPW serving as the critical metric for tracking this transition. (1/N)

English

141

454

226.3K

Neil Band retweetledi

Suhas Kotha@kothasuhas·19 Eyl

Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

English

447

151.8K

Neil Band retweetledi

Kaiyue Wen@wen_kaiyue·4 Eyl

(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!

English

444

183.2K

Neil Band retweetledi

Niklas Muennighoff@Muennighoff·26 Ağu

Can AI solve open problems in math, physics, coding, medical sciences & beyond? We collected unsolved questions (UQ) & tested frontier LLMs. Some solutions passed expert validation…

English

188

554

85.9K

Neil Band retweetledi

CLS@ChengleiSi·30 Haz

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

English

183

635

152.3K

Neil Band retweetledi

Jeff Dean@JeffDean·25 Haz

Very cool thread about the CS336 Language Models from Scratch course at Stanford taught by @percyliang et al. Makes me wish I was a student again!

Percy Liang@percyliang

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

English

964

111.9K

Neil Band retweetledi

Jon Saad-Falcon@JonSaadFalcon·24 Haz

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning models like Llama 3.3 70B Instruct! 🧵(1 / N)

English

223

76.8K

Neil Band retweetledi

Percy Liang@percyliang·19 Haz

English

570

4.9K

677.5K

Neil Band retweetledi

Ryan Marten@ryanmart3n·5 Haz

Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on average over code, science, and math evals. We also release our dataset, OpenThoughts3-1.2M, which is the best open reasoning dataset across all data scales. Full details are in our ✨new paper✨ - below we share the highlights: BTW, it also works on non-Qwen models😉 (1/N)

English

192

929

200.2K

Neil Band retweetledi

Simon Guo@simonguozirui·4 Haz

Designed some graphics for Stanford CS336 (Language Modeling from Scratch) by @percyliang @tatsu_hashimoto @marcelroed @neilbband @rckpudi Covering four assignments 📚 that teach you how to 🧑‍🍳 cook an LLM from scratch: - Build and Train a Tokenizer 🔤 - Write Triton kernels for Attention ⚡️ - Construct Scaling Laws 📉 - Implement GRPO 🐙

English

649

68.1K

Neil Band retweetledi

Zitong Yang@ZitongYang0·22 Nis

Synthetic Continued Pretraining (arxiv.org/pdf/2409.07431) has been accepted as an Oral Presentation at #ICLR2025! We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.

Zitong Yang@ZitongYang0

Grab your favorite preprint of the week: how can you put its knowledge in your LM’s parameters? Continued pretraining (CPT) works well with >10B tokens, but the preprint is <10K. Synthetic CPT downscales CPT to such small, targeted domains. 📜: arxiv.org/abs/2409.07431 🧵👇

English

11.2K

Neil Band retweetledi

Tatsunori Hashimoto@tatsu_hashimoto·19 Nis

I think CS336 has one of the best LLM problem sets of any AI/LM class thanks to our incredible TAs (@nelsonfliu,@GabrielPoesia,@marcelroed,@neilbband,@rckpudi). We're making it so you can do it all at home, and it's one of the best ways to learn LLMs deeply.

Stanford NLP Group@stanfordnlp

Want to learn the engineering details of building state-of-the-art Large Language Models (LLMs)? Not finding much info in @OpenAI’s non-technical reports? @percyliang and @tatsu_hashimoto are here to help with CS336: Language Modeling from Scratch, now rolling out to YouTube.

English

723

81.8K

Neil Band retweetledi

Stanford NLP Group@stanfordnlp·19 Nis

English

158

1.1K

200.7K

Neil Band retweetledi

Etash Guha@etash_guha·3 Nis

Turns out, it’s possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by selecting quality instructions from diverse sources. 🧵 (1/n)

English

131

466

89.6K

Neil Band retweetledi

Yangjun Ruan@YangjunR·26 Mar

New paper on synthetic pretraining! We show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. We call this new paradigm “reasoning to learn”. arxiv.org/abs/2503.18866 Here’s how it works🧵

English

489

51.5K

Neil Band retweetledi

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·25 Mar

Reasoning to Learn from Latent Thoughts "Motivated by how humans apply deliberate thinking to learn from limited data, we train an LM to infer (or “decompress”) latent thoughts underlying the highly compressed observed data. These synthesized latent thoughts augment the raw observed data during pretraining, improving the LM’s data efficiency. This procedure can be iteratively applied through an EM algorithm and form a model self-improvement loop where increasingly capable LMs synthesize more effective latent thoughts, which in turn train more capable models."

Tanishq Mathew Abraham, Ph.D. tweet media

English

114

634

64.7K

Keşfet

@percyliang @tatsu_hashimoto @marcelroed @rckpudi @nelsonfliu @GabrielPoesia @OpenAI @elonmusk