emily mcmilin

155 posts

emily mcmilin

@micmylin

RL and world models for coding at FAIR

가입일 Aralık 2008

644 팔로잉690 팔로워

emily mcmilin 리트윗함

John Yang@jyangballin·2d

How much of SQLite, FFmpeg, PHP compiler can LMs code from scratch? Given just an executable and no starter code or internet access. Introducing ProgramBench: 200 rigorous, whole-repo generation tasks where models design, build, and ship a working program end to end. 🧵

English

242

1.5K

677.5K

emily mcmilin 리트윗함

Yuxiang Wei@YuxiangWei9·30 Nis

Accepted to ICML 2026! Big thanks to all the collaborators 🎉

Yuxiang Wei@YuxiangWei9

Software agents can self-improve via self-play RL Introducing Self-play SWE-RL (SSR): training a single LLM agent to self-play between bug-injection and bug-repair, grounded in real-world repositories, no human-labeled issues or tests. 🧵

English

4.2K

emily mcmilin@micmylin·27 Nis

@jaraxiong Thanks so much. That’s great to hear :)

English

Junjie A. Xiong @ ICLR 2026@jaraxiong·27 Nis

@micmylin your talk was amazing!

English

emily mcmilin@micmylin·26 Nis

I'll be giving a talk at the ICLR VerifAI workshop, about code execution for code world modeling, later today (Sun) at 9:05 am (Brazil time). Swing by if you are interested in learning more!

Ameesh Shah@ameeshsh

🗣️📣Announcing VerifAI 2: AI Verification in the Wild, an upcoming workshop at #ICLR2026!! 🗣️📣 VerifAI will gather researchers to explore topics at the intersection of genAI and trustworthy ML. Submit your work! Check out our website and CFP for more: verifai-workshop.github.io

English

3.3K

emily mcmilin 리트윗함

Zhiqing Sun@EdwardSun0909·8 Nis

Excited to share Muse Spark, the first model from whole team’s work in MSL! 🚀 It’s natively multimodal and agentic. I’ve been using it for my daily coding and research tasks. Still plenty of room to improve in agentic domains, but we’re moving with great velocity. It’s a seriously good model! Check out the full breakdown and try it out in meta.ai

Alexandr Wang@alexandr_wang

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

English

197

19.2K

emily mcmilin 리트윗함

Yuxiang Wei@YuxiangWei9·23 Ara

English

290

1.7K

518.1K

emily mcmilin@micmylin·4 Ara

@rosemary_ke I'm a fan of your work. It'd be great to meet :)

English

161

Nan Rosemary Ke@rosemary_ke·4 Ara

At NeurIPS this week. Excited to meet, please reach out. - Focussed on Scaling LLM-RL - Working on real world evals and long form generation (mathematical proofs/STEM) - Scaling tasks for agents (computer use/ coding/ research)

English

emily mcmilin@micmylin·3 Ara

We modify each repo's CI workflows to capture a single successful third-party build. For pytest repos, we inject conftest.py fixtures to verify the correct container and support optional Python execution tracing. See more in our paper: arxiv.org/abs/2510.02387

English

134

emily mcmilin@micmylin·3 Ara

Key insight: the execution env of a GitHub Actions CI workflow is fully built with deps. So we can cheaply capture it as a standalone Docker image for later execution.

English

152

emily mcmilin@micmylin·3 Ara

Better late than never to share how we built 35k+ unique repos (rather than commits from the same dozens of repos) into executable envs for CWM mid-training and SWE-RL post-training... x.com/syhw/status/19…

Gabriel Synnaeve@syhw

(🧵) Today, we release Meta Code World Model (CWM), a 32-billion-parameter dense LLM that enables novel research on improving code generation through agentic reasoning and planning with world models. ai.meta.com/research/publi…

English

1.7K

emily mcmilin 리트윗함

Taco Cohen@TacoCohen·4 Eyl

The eagle-eyed goat in question being @YuxiangWei9

Lucas Beyer (bl16)@giffmana

Goated FAIR team just found how coding agents sometimes "cheat" on SWE-Bench Verified. It's really simple. For example, Qwen3 literally greps all commit logs for the issue number of the issue it needs to fix. lol, clever model. "cheat" cuz it's more like env hacking.

English

emily mcmilin@micmylin·5 Ara

Link to video where our part of the convo starts: youtube.com/watch?v=sljBU_… Botched last attempt to send this. But better late than never...

YouTube

English

340

emily mcmilin@micmylin·5 Ara

Thank you @AleksanderMolak for the really nice opportunity to discuss some of my prior research with you, earlier this year! x.com/aleksandermola…

English

774

emily mcmilin@micmylin·27 Kas

Dreams can come true. I’ve joined FAIR’s CodeGen team. :)

English

362

34.8K

emily mcmilin@micmylin·9 May

@vishnuvig Thanks so much for the lightning fast, GPU speed and technical support, over the years. Great service!

English

1.1K

Vishnu - Jarvislabs.ai@vishnuvig·9 May

Ola recently announced that they are bringing affordable AI to Indian developers. 𝐉𝐚𝐫𝐯𝐢𝐬𝐥𝐚𝐛𝐬 an Indian company has been providing affordable GPUs for developers across the globe since 2020. We are a little known, so I want to share our story here. 𝐖𝐡𝐨 𝐰𝐞 𝐚𝐫𝐞 We are bootstrapped, building from the outskirts of Coimbatore. Started as a small team of 4, from humble backgrounds none from IITs/IIMs. Currently, we are a team of 12+. 𝐖𝐡𝐚𝐭 𝐰𝐞 𝐚𝐜𝐡𝐢𝐞𝐯𝐞𝐝 The cost of hosting GPU servers 4 years back in India was insanely high. We got 2 quotes which charged us Rs. 1.5L for a single server per month. At that cost, it was not practical for us to do the business. So we went to the first principle to build an MVP for a mini data center/server room. For the first few years, we ran all our servers from a room fitted with ACs, a UPS, and a Generator, which experts claimed would not work. As we scaled, we faced the heat of our setup, but by then we accumulated more money than we had. So last year we moved it to a tier 3+ DC near Bangalore. This helped us boost the confidence of our users, as we have redundancy for power, internet, and networking which gives us and our customers a lot of peaceful nights. 𝐖𝐡𝐨 𝐮𝐬𝐞𝐬 𝐉𝐚𝐫𝐯𝐢𝐬𝐥𝐚𝐛𝐬 Developers and artists from across the world have supported us in our journey. Some prominent companies are ZOHO (My inspiration), Weights and Biases, UNC, UpGrad, and many more. 𝐑𝐞𝐯𝐞𝐧𝐮𝐞 We crossed 580K USD in the last financial year, the highest ever in our history. Being bootstrapped, the only way for us to grow is to put all the money back. Our customers are our investors, as a founder I have hardly taken a paycheck for the last 4+ years, since the team also believes in our vision they are happy not taking a fancy cheque. 𝐕𝐢𝐬𝐢𝐨𝐧 As AI evolves, we want to bring the capabilities of AI to users at the lowest prices possible. Being bootstrapped, the only way to survive is to be frugal and disciplined. 𝐇𝐢𝐫𝐢𝐧𝐠 I am proud of our hiring strategy. We hired only freshers to date, and most of our hires do not have a formal degree. They come from rural areas and economically challenged backgrounds. The average age of our new team is 19. They have played an active role in building our V2 of Jarvislabs and improving the product daily. I love to thank everyone for supporting us in our journey. Thanks to Analytics India Magazine, INDIAai, fastai for recognizing us in our early years. If our story resonates with you, Please share our story to inspire others & support our mission. #StartupIndia

English

520

67.6K

emily mcmilin 리트윗함

Udacity@udacity·28 Nis

💡 Interested in learning more about LLM fundamentals? In the video below, Udacity instructor Emily McMilin explains what the Transformer model is & walks you through the difference between Encoder and Decoder model architectures. bit.ly/44f0eJn #genAI #generativeAI

English

6.4K

emily mcmilin@micmylin·30 Nis

Our research showing how task underspecification can cause spurious correlations & hallucinations, from BERT to GPT-3.5 is now available as AAAI 24 proceedings: ojs.aaai.org/index.php/AAAI… Video: underline.io/lecture/92119-… Arxiv extended to GPT-4 Turbo Preview: arxiv.org/abs/2210.00131

English

1.3K

emily mcmilin@micmylin·16 Nis

@srush_nlp … oops, GPT-4 turbo (preview) results only made it to the arxiv version. arxiv.org/abs/2210.00131

English

emily mcmilin@micmylin·16 Nis

@srush_nlp Using pronoun resolution as a case study, we hypothesize a casual mechanism & show empirically, that denoising objs are generally less underspecified, less vulnerable to spurious correlations / hallucinations, w AR comps ranging up to GPT-4 turbo preview. ojs.aaai.org/index.php/AAAI…

English

381

Sasha Rush@srush_nlp·15 Nis

Lazy twitter: A common question in NLP class is "if xBERT worked well, why didn't people make it bigger?" but I realize I just don't know the answer. I assume people tried but that a lot of that is unpublished. Is the theory that denoising gets too easy for big models?

English

474

141.7K

탐색

@jaraxiong @rosemary_ke @YuxiangWei9 @vishnuvig @elonmusk @BarackObama @taylorswift13 @cristiano