Myle Ott

74 posts

Myle Ott banner
Myle Ott

Myle Ott

@myleott

ML infra @thinkymachines

New York, NY Katılım Eylül 2009
577 Takip Edilen2.8K Takipçiler
Myle Ott retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…
Thinking Machines tweet media
English
61
403
2.8K
1.9M
Myle Ott
Myle Ott@myleott·
So excited about this! Tinker provides a simple+powerful interface for postraining/RL research. It also manages all the infrastructure so that users can focus on data and environments. Hidden behind that simple interface is a ton of interesting and complex ML systems challenges! In addition to the work building an efficient RL stack (orchestration, numerics, parallelism, weight transfer, etc.), we also tackled a bunch of new challenges (transparent failure recovery, multi-tenant scheduling, autoscaling, etc.). I had a lot of fun working on early parts of this system and am excited to see what others are able to build with it!
Thinking Machines@thinkymachines

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

English
5
12
162
59.9K
Myle Ott retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/
Thinking Machines tweet media
English
82
560
3.5K
1.4M
Myle Ott retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices. thinkingmachines.ai/blog/modular-m… We explore a fundamental understanding of the geometry of neural network optimization.
Thinking Machines tweet media
English
111
444
2.9K
1.5M
Myle Ott retweetledi
Woosuk Kwon
Woosuk Kwon@woosuk_k·
At Thinking Machines, our work includes collaborating with the broader research community. Today we are excited to share that we are building a vLLM team at @thinkymachines to advance open-source vLLM and serve frontier models. If you are interested, please DM me or @barret_zoph! Here are some example roles / projects: * Distributed inference engineer to support large-scale models on Blackwell GPUs * PyTorch & model optimization engineer to support & optimize latest OSS models * MLSys generalist for various aspects of vLLM
English
41
80
1.2K
193.3K
Myle Ott retweetledi
Thinking Machines
Thinking Machines@thinkymachines·
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/defeating…
Thinking Machines tweet media
English
230
1.3K
7.6K
3.4M
Myle Ott retweetledi
Mira Murati
Mira Murati@miramurati·
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're excited that in the next couple months we’ll be able to share our first product, which will include a significant open source component and be useful for researchers and startups developing custom models. Soon, we’ll also share our best science to help the research community better understand frontier AI systems. To accelerate our progress, we’re happy to confirm that we’ve raised $2B led by a16z with participation from NVIDIA, Accel, ServiceNow, CISCO, AMD, Jane Street and more who share our mission. We’re always looking for extraordinary talent that learns by doing, turning research into useful things. We believe AI should serve as an extension of individual agency and, in the spirit of freedom, be distributed as widely and equitably as possible.  We hope this vision resonates with those who share our commitment to advancing the field. If so, join us. thinkingmachines.paperform.co
English
640
676
7.7K
2.3M
Myle Ott retweetledi
Mira Murati
Mira Murati@miramurati·
I started Thinking Machines Lab alongside a remarkable team of scientists, engineers, and builders. We're building three things: - Helping people adapt AI systems to work for their specific needs - Developing strong foundations to build more capable AI systems - Fostering a culture of open science that helps the whole field understand and improve these systems Our goal is simple, advance AI by making it broadly useful and understandable through solid foundations, open science, and practical applications. thinkingmachines.ai
English
683
886
9.4K
1.1M
Myle Ott
Myle Ott@myleott·
Great work by @groeneyy on Prompt Poet! This tool has revolutionized prompt management at @character_ai, simplifying complex prompts and making prompt design more intuitive, scalable and accessible. Check it out!
Character.AI@character_ai

Thrilled to share that we're open sourcing our innovative approach to prompt design! Discover how Prompt Poet is revolutionizing the way we build AI interactions in our latest blog post: research.character.ai/prompt-design-…

English
3
0
23
4.7K
Myle Ott retweetledi
Irwan Bello
Irwan Bello@IrwanBello·
For example, we wrote our own high-performance distributed transformer implementation and were able to hit 250 TFLOPs/s on A100s, or ~80% model flops utilization. For comparison, MFU is reported at 54% for Megatron-LM and similar for MosaicML
English
9
13
239
0
(((ل()(ل() 'yoav))))👾
where do you scrap 680,000 publicly available speech+transcript hours from, on the web? is it all thanks to the various accessibility laws?
English
12
0
43
0
Myle Ott retweetledi
Lucas Caccia
Lucas Caccia@LucasPCaccia·
🚨 Paper Alert 🚨 We explore and formalize the Anytime Learning at MAcroscale (ALMA) setting, where learners sequentially receive large data dumps over time. What new challenges emerge in ALMA ? How can we learn efficiently ? We answer this in our new @CoLLAs_Conf paper!
English
1
9
30
0
Myle Ott retweetledi
Myle Ott retweetledi
PyTorch
PyTorch@PyTorch·
PyTorch 1.11 offers native support for FullyShardedDataParallel training of models with up to 1 trillion parameters. It does this by sharding the model across parallel processors, rather than being limited to a single GPU. pytorch.org/blog/introduci…
PyTorch tweet media
English
4
42
231
0
Myle Ott retweetledi
Hugging Face
Hugging Face@huggingface·
Few-shot learning beyond English 🌎 XGLM from @MetaAI is now available in Transformers. XGLM is a family of large-scale multilingual autoregressive language models which gives SoTA results on multilingual few-shot learning. Try it now on Spaces 👇 huggingface.co/spaces/valhall…
GIF
English
4
39
183
0
Myle Ott retweetledi
Mikel Artetxe
Mikel Artetxe@artetxem·
We are releasing a family of dense and MoE language models with up to 13B and 1.1T parameters. We find that MoEs are more efficient, but the gap narrows at scale and varies greatly across domains and tasks. Paper: arxiv.org/abs/2112.10684 Models & code: github.com/pytorch/fairse…
Mikel Artetxe tweet media
English
4
25
93
0
Myle Ott retweetledi
Xian Li
Xian Li@xl_nlp·
🌍Few-shot learning beyond English🌏 📢 Announcing XGLMs, a series of multilingual autoregressive languages models setting new SoTA on few-shot learning and outperforming English-centric models (e.g. GPT-3). Paper: arxiv.org/abs/2112.10668 Models and code: github.com/pytorch/fairse…
Xian Li tweet media
English
2
55
217
0