Motoki Wu 🌊🦈

1.4K posts

Motoki Wu 🌊🦈

@plusepsilon

Resisting the urge to finetune @ Salient. ex-@cresta and other doodlings.

Oakland, CA Sumali Ekim 2013

3.3K Sinusundan829 Mga Tagasunod

Naka-pin na Tweet

Motoki Wu 🌊🦈@plusepsilon·26 Oca

All this compute power for language models but most of our knowledge are in screenshots.

English

Motoki Wu 🌊🦈@plusepsilon·4 Oca

2026 will be the year of Claude Code UIs.

English

Motoki Wu 🌊🦈@plusepsilon·17 Ara

Coding agents are so good now that you should be using it for everything. Text classification? Coding agent RAG? Coding agent Prompt optimization? Coding agent Recipe for next week's Christmas potluck? Coding agent

English

Motoki Wu 🌊🦈@plusepsilon·10 Eki

The only late night parties in SF are labeling parties 🥳.

English

Motoki Wu 🌊🦈@plusepsilon·7 Ağu

oh no

English

168

Motoki Wu 🌊🦈@plusepsilon·5 Ağu

"Owen" is here, the American version of Qwen!

OpenAI@OpenAI

Our open models are here. Both of them. openai.com/open-models

English

178

Motoki Wu 🌊🦈@plusepsilon·18 Tem

If RL needs pre-training level scaling, are we admitting that RL learns really slow...

English

138

Motoki Wu 🌊🦈@plusepsilon·18 Haz

I know RAG is dead, but please chunk your docs for the browser :) platform.openai.com/docs/api-refer…

English

197

Motoki Wu 🌊🦈@plusepsilon·6 May

OpenAI chose Windsurf because Ctrl+N doesn't delete your prompt :P

English

152

Motoki Wu 🌊🦈@plusepsilon·23 Nis

Agents for recall, workflows for precision.

English

223

Motoki Wu 🌊🦈@plusepsilon·28 Şub

Post training making a comeback. MLEs can finally breathe.

English

181

Motoki Wu 🌊🦈@plusepsilon·6 Oca

@Thom_Wolf My favorite was τ-bench (arxiv.org/abs/2406.12045). It evaluates on states, so the trajectory of the conversation and the agent's implementation matter less, and there's no need to rely on LLM evals. It strongly influenced how I structure agent workflows.

English

Thomas Wolf@Thom_Wolf·5 Oca

What was the most impactful/visible/useful release on evaluation in AI in 2024?

English

11.4K

Motoki Wu 🌊🦈@plusepsilon·17 Ara

I would say a skillful cook would have a sharper knife.

Agrim Gupta@agrimgupta92

"A pair of hands skillfully slicing a ripe tomato on a wooden cutting board" #veo

English

299

Motoki Wu 🌊🦈@plusepsilon·21 Kas

There's no reason to FOMO over a new model when you can wait one month and FOMO over a better model :)

English

226

Motoki Wu 🌊🦈@plusepsilon·19 Eyl

Can o1 at least help us with Pydantic upgrades.

English

242

Motoki Wu 🌊🦈@plusepsilon·13 Eyl

Now we can take coffee breaks during test time!

English

155

Motoki Wu 🌊🦈@plusepsilon·6 Eyl

Some Bad: * you pretty much have to restart PyCharm if something goes wrong * it’s the only thing that makes the fans spin up on my M1 * not Cursor 😛

English

Motoki Wu 🌊🦈@plusepsilon·6 Eyl

The Good: * strong autocomplete * codebase understanding with jump-to-definition * jump to previous cursor position * interactive tables * stable enough for remote notebooks Essentially, having all the IDE goodies in a notebook keeps me in the flow.

English

100

Motoki Wu 🌊🦈@plusepsilon·6 Eyl

After 10(?) years, I think I finally found a better alternative to just running plain Jupyter notebooks: PyCharm's notebook integration. I've been using it off and on for awhile, but it's reached to a point now I can wholeheartedly recommend.

English

152

Motoki Wu 🌊🦈@plusepsilon·12 Tem

I just implemented agentic RAG. ☺️

English

175

Motoki Wu 🌊🦈@plusepsilon·24 Ağu

@abacaj I've found loss masking to work well, but it does lower the number of tokens that the model sees. So you may have to compensate for that (more epochs / higher batch size).

English

anton@abacaj·22 Ağu

Couple of questions that I haven't seen answered yet: 1. Why did stanford alpaca team set instructions to -100, forcing the model to condition on outputs? 2. Why do I see a better model (including proper validation loss) training on full sample, including instructions?

English

anton@abacaj·22 Ağu

Why would SFT show increasing validation loss and still an "improving" model? Two runs, same hparams. Only difference? Training on "inputs". In the original stanford alpaca they set the instructions to -100 to condition the model on the outputs But removing this line, shows a steady validation loss (blue line) - and a model that performs better in my benchmarks... What gives?

English

27.6K

Tuklasin

@Thom_Wolf @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine