mufeez

523 posts

mufeez

@moofeez

tinkering with ML systems and agents infra @linear, fmr @meta. @uwaterloo SE alum

nyc/toronto Katılım Aralık 2015

458 Takip Edilen1.7K Takipçiler

Sabitlenmiş Tweet

mufeez@moofeez·28 Nis

I post-trained Qwen3-Coder to fix bugs using an actual debugger. The result: Solve rate: 70% → 89% Median turns to fix: 46 → 19 (-59%) Instead of just reading code or print-debugging, it: - reasons from execution - inspects live variables and call stacks - sets breakpoints, steps, and evaluates expressions

English

119

1.6K

119.6K

mufeez@moofeez·14s

ok london kinda mogs nyc ngl

English

mufeez@moofeez·1m

@_lychrel @bilaltwovec @_arohan_ walked in for dinner around 9

English

mufeez@moofeez·1m

@_lychrel @bilaltwovec @_arohan_ ambassador clubhouse is their sister restaurant a few blocks away, might offend someone but it’s similar food and vibes

English

rohan anil@_arohan_·1h

How do I buy spv on dishoom and hoppers?

English

2.4K

mufeez@moofeez·5d

@swyxio yikes

English

121

mufeez@moofeez·5 May

@ily_8750 lol @bilaltwovec

405

Ily@ily_8750·4 May

Canary Wharf will always be my favourite part of London

English

1.1K

167.9K

mufeez@moofeez·5 May

@tenderizzation inshallah the scaling laws hold

GIF

English

335

tender (mlsys 5/18-21)@tenderizzation·5 May

ZXX

257

9.2K

mufeez@moofeez·4 May

tuning into the gdb testimony like it's the super bowl

English

209

mufeez@moofeez·4 May

@willccbb @linear you should check out linear.app/diffs

English

1.2K

will brown@willccbb·4 May

let's just let @linear do it

English

148

15.3K

mufeez@moofeez·3 May

@arnie_hacker @chesterzelaya @OpenAI nice! excited to see the results

English

Arnie Ramesh@arnie_hacker·3 May

@chesterzelaya @OpenAI guessing you mean my cs2 project? compute+cloud service quotas💀 i'm rendering out 1K-hours cleaned dataset atm and moving to training baselines this week though :)

English

114

Arnie Ramesh@arnie_hacker·2 May

Okay so the @OpenAI image-gen team have definitely rendered out a bajillion cs:go games

English

mufeez@moofeez·3 May

@__morse @mingjie hmm

222

Tommy D. Rossi@__morse·2 May

GitHub if Linear designed it

English

125

30K

mufeez@moofeez·30 Nis

@jorilallo

GIF

QME

181

Jori Lallo@jorilallo·30 Nis

Don't tell anyone (to enable, reconnect GitHub integration with new permissions). Agents which output draft PRs also get code diffs inside Linear now

Daniel Kumlin@daniellkumlin

@linear just silently drops the fact you can review prs now directly. you also have access to the linear agent when you review. I have been waiting so long for this finally! you can even change your theme etc.

English

9.6K

mufeez@moofeez·30 Nis

@bilaltwovec caffeinate -di -w ftw

Italiano

bilal@bilaltwovec·30 Nis

sudo pmset disablesleep 1 ftw

“paula”@paularambles

locking your screen when you leave your desk is so 2024

English

511

mufeez@moofeez·29 Nis

@tenderizzation ootl, why did they go for an old school dense model?

English

584

tender (mlsys 5/18-21)@tenderizzation·29 Nis

highlighting the lowest score is crazy

Matej Sirovatka@m_sirovatka

sometimes I wonder if being poor in sf is worth over being rich in europe, thx to mistral for helping me decide

English

176

16.9K

mufeez@moofeez·29 Nis

@gilescope imo DAP would be a good fit, since most people are using debuggers within editors: microsoft.github.io/debug-adapter-… the harness could use that instead of a CLI tool; it’s designed to be pretty extensible

English

135

Giles 🇺🇦 Slava Ukraini!@gilescope·29 Nis

@moofeez Just wondering what the ideal agentic debugger interface would be… is there an MCP for a language server? Or is there a more efficient mechanism…

English

149

mufeez@moofeez·28 Nis

English

119

1.6K

119.6K

mufeez@moofeez·29 Nis

@bartolomeo_diaz @MaximeRivest nvim!

Português

402

BartolomeoDiaz@bartolomeo_diaz·29 Nis

@moofeez @MaximeRivest What is the terminal editor/ide u using?

English

422

mufeez@moofeez·29 Nis

@JedidiahMain i’ll cover this in my blog post – a lot of the project was taking proven approaches and adapting them to my specific training objective/shape

English

218

Daniel Antonio@JedidiahMain·29 Nis

@moofeez coming from a web SE, what do i need to know to be able to do this?

English

243

mufeez@moofeez·29 Nis

@GrapotteM @PrimeIntellect thanks! x.com/moofeez/status…

mufeez@moofeez

@FardeemM A few hundred dollars across synthetic data, evals, and training. imo worth it!

English

Mathys@GrapotteM·29 Nis

@moofeez @PrimeIntellect Fantastic work! How much did it cost you ?

English

mufeez@moofeez·29 Nis

@EliasLumer @willccbb I explored the standard harness + tool call approach, though there’s definitely room for experimentation here

English

154

Elias Lumer@EliasLumer·29 Nis

@moofeez @willccbb Interesting. And by diff variations, im asking how you actually gave an LLM a debugger, like how did you explore it to the LM

English

165

mufeez@moofeez·29 Nis

@EliasLumer @willccbb will publish when i’m done the blog post!

English

1.1K

Elias Lumer@EliasLumer·29 Nis

@moofeez @willccbb This is great, do you have the implementation (tools for debugger, etc) open source? Would love to see/implement this

English

1.1K

mufeez@moofeez·29 Nis

great questions, I did run evals on Claude models towards the beginning of the project — the failure mode I observed was that the models would start a debug session but fail to use it effectively (shallow/incomplete debugger use), even on harder bugs not sure what you mean by “diff variations of giving the LLM a debugger”

English

1.3K

Elias Lumer@EliasLumer·29 Nis

@moofeez @willccbb Did you give opus 4.6/ gpt5.5 a debugger? And see results there? Also did u try diff variations of giving the LLM a debugger?

English

1.4K

Keşfet

@_lychrel @bilaltwovec @_arohan_ @swyxio @ily_8750 @tenderizzation @willccbb @linear