Myle Ott

74 posts

Myle Ott

@myleott

ML infra @thinkymachines

New York, NY Katılım Eylül 2009

577 Takip Edilen2.8K Takipçiler

Myle Ott retweetledi

Thinking Machines@thinkymachines·27 Eki

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other approaches for a fraction of the cost. thinkingmachines.ai/blog/on-policy…

English

403

2.8K

1.9M

Myle Ott@myleott·1 Eki

So excited about this! Tinker provides a simple+powerful interface for postraining/RL research. It also manages all the infrastructure so that users can focus on data and environments. Hidden behind that simple interface is a ton of interesting and complex ML systems challenges! In addition to the work building an efficient RL stack (orchestration, numerics, parallelism, weight transfer, etc.), we also tackled a bunch of new challenges (transparent failure recovery, multi-tenant scheduling, autoscaling, etc.). I had a lot of fun working on early parts of this system and am excited to see what others are able to build with it!

Thinking Machines@thinkymachines

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models! thinkingmachines.ai/tinker

English

162

59.9K

Myle Ott retweetledi

Thinking Machines@thinkymachines·29 Eyl

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

560

3.5K

1.4M

Myle Ott retweetledi

Thinking Machines@thinkymachines·26 Eyl

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices. thinkingmachines.ai/blog/modular-m… We explore a fundamental understanding of the geometry of neural network optimization.

English

111

444

2.9K

1.5M

Myle Ott retweetledi

Woosuk Kwon@woosuk_k·12 Eyl

At Thinking Machines, our work includes collaborating with the broader research community. Today we are excited to share that we are building a vLLM team at @thinkymachines to advance open-source vLLM and serve frontier models. If you are interested, please DM me or @barret_zoph! Here are some example roles / projects: * Distributed inference engineer to support large-scale models on Blackwell GPUs * PyTorch & model optimization engineer to support & optimize latest OSS models * MLSys generalist for various aspects of vLLM

English

1.2K

193.3K

Myle Ott retweetledi

Thinking Machines@thinkymachines·10 Eyl

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/defeating…

English

230

1.3K

7.6K

3.4M

Myle Ott retweetledi

Mira Murati@miramurati·15 Tem

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're excited that in the next couple months we’ll be able to share our first product, which will include a significant open source component and be useful for researchers and startups developing custom models. Soon, we’ll also share our best science to help the research community better understand frontier AI systems. To accelerate our progress, we’re happy to confirm that we’ve raised $2B led by a16z with participation from NVIDIA, Accel, ServiceNow, CISCO, AMD, Jane Street and more who share our mission. We’re always looking for extraordinary talent that learns by doing, turning research into useful things. We believe AI should serve as an extension of individual agency and, in the spirit of freedom, be distributed as widely and equitably as possible. We hope this vision resonates with those who share our commitment to advancing the field. If so, join us. thinkingmachines.paperform.co

English

640

676

7.7K

2.3M

Myle Ott retweetledi

Mira Murati@miramurati·18 Şub

I started Thinking Machines Lab alongside a remarkable team of scientists, engineers, and builders. We're building three things: - Helping people adapt AI systems to work for their specific needs - Developing strong foundations to build more capable AI systems - Fostering a culture of open science that helps the whole field understand and improve these systems Our goal is simple, advance AI by making it broadly useful and understandable through solid foundations, open science, and practical applications. thinkingmachines.ai

English

683

886

9.4K

1.1M

Myle Ott@myleott·2 Ağu

Great work by @groeneyy on Prompt Poet! This tool has revolutionized prompt management at @character_ai, simplifying complex prompts and making prompt design more intuitive, scalable and accessible. Check it out!

Character.AI@character_ai

Thrilled to share that we're open sourcing our innovative approach to prompt design! Discover how Prompt Poet is revolutionizing the way we build AI interactions in our latest blog post: research.character.ai/prompt-design-…

English

4.7K

Myle Ott@myleott·20 Haz

Excited to share some details of our work. Kudos to @LiangBowen, @sam_shleifer and others at Character for the awesome work optimizing our inference stack!

Noam Shazeer@NoamShazeer

Character AI is serving 20,000 QPS. Here are the technologies we use to serve hyper-efficiently. [research.character.ai/optimizing-inf… ]

English

3.5K

Myle Ott retweetledi

Irwan Bello@IrwanBello·6 Ara

For example, we wrote our own high-performance distributed transformer implementation and were able to hit 250 TFLOPs/s on A100s, or ~80% model flops utilization. For comparison, MFU is reported at 54% for Megatron-LM and similar for MosaicML

English

239

Myle Ott@myleott·22 Eyl

@yoavgo Perhaps podcasts? Looks like 50K hours here: podcastsdataset.byspotify.com

English

(((ل()(ل() 'yoav))))👾@yoavgo·21 Eyl

where do you scrap 680,000 publicly available speech+transcript hours from, on the web? is it all thanks to the various accessibility laws?

English

Myle Ott retweetledi

Lucas Caccia@LucasPCaccia·10 Haz

🚨 Paper Alert 🚨 We explore and formalize the Anytime Learning at MAcroscale (ALMA) setting, where learners sequentially receive large data dumps over time. What new challenges emerge in ALMA ? How can we learn efficiently ? We answer this in our new @CoLLAs_Conf paper!

English

Myle Ott retweetledi

Susan Zhang@suchenzang·3 May

So excited to finally open up access to these models! Couldn't have asked for a better team to do this with: @stephenroller, @NamanGoyal21 (@myleott + @sam_shleifer at the start too)!

AI at Meta@AIatMeta

Today Meta AI is sharing OPT-175B, the first 175-billion-parameter language model to be made available to the broader AI research community. OPT-175B can generate creative text on a vast range of topics. Learn more & request access: ai.facebook.com/blog/democrati…

English

538

Myle Ott retweetledi

PyTorch@PyTorch·15 Mar

PyTorch 1.11 offers native support for FullyShardedDataParallel training of models with up to 1 trillion parameters. It does this by sharding the model across parallel processors, rather than being limited to a single GPU. pytorch.org/blog/introduci…

English

231

Myle Ott retweetledi

Hugging Face@huggingface·15 Şub

Few-shot learning beyond English 🌎 XGLM from @MetaAI is now available in Transformers. XGLM is a family of large-scale multilingual autoregressive language models which gives SoTA results on multilingual few-shot learning. Try it now on Spaces 👇 huggingface.co/spaces/valhall…

GIF

English

183

Myle Ott retweetledi

Mikel Artetxe@artetxem·22 Ara

We are releasing a family of dense and MoE language models with up to 13B and 1.1T parameters. We find that MoEs are more efficient, but the gap narrows at scale and varies greatly across domains and tasks. Paper: arxiv.org/abs/2112.10684 Models & code: github.com/pytorch/fairse…

English

Myle Ott retweetledi

Xian Li@xl_nlp·21 Ara

🌍Few-shot learning beyond English🌏 📢 Announcing XGLMs, a series of multilingual autoregressive languages models setting new SoTA on few-shot learning and outperforming English-centric models (e.g. GPT-3). Paper: arxiv.org/abs/2112.10668 Models and code: github.com/pytorch/fairse…

English

217

Myle Ott retweetledi

fairseq@fairseq·23 Kas

Mixture of experts training in fairseq is now 40% faster thanks to Microsoft's Tutel library! Blog: microsoft.com/en-us/research… Fairseq code: github.com/pytorch/fairse… Tutel code: github.com/microsoft/tutel

English

Keşfet

@thinkymachines @barret_zoph @groeneyy @character_ai @LiangBowen @sam_shleifer @yoavgo @CoLLAs_Conf