Motoki Wu ๐ŸŒŠ๐Ÿฆˆ

1.4K posts

Motoki Wu ๐ŸŒŠ๐Ÿฆˆ banner
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ

Motoki Wu ๐ŸŒŠ๐Ÿฆˆ

@plusepsilon

Resisting the urge to finetune @ Salient. ex-@cresta and other doodlings.

Oakland, CA Sumali Ekim 2013
3.3K Sinusundan829 Mga Tagasunod
Naka-pin na Tweet
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
All this compute power for language models but most of our knowledge are in screenshots.
English
0
1
4
0
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
Coding agents are so good now that you should be using it for everything. Text classification? Coding agent RAG? Coding agent Prompt optimization? Coding agent Recipe for next week's Christmas potluck? Coding agent
English
0
0
1
76
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
If RL needs pre-training level scaling, are we admitting that RL learns really slow...
English
0
0
2
138
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
@Thom_Wolf My favorite was ฯ„-bench (arxiv.org/abs/2406.12045). It evaluates on states, so the trajectory of the conversation and the agent's implementation matter less, and there's no need to rely on LLM evals. It strongly influenced how I structure agent workflows.
English
0
0
2
90
Thomas Wolf
Thomas Wolf@Thom_Wolfยท
What was the most impactful/visible/useful release on evaluation in AI in 2024?
English
12
1
23
11.4K
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
There's no reason to FOMO over a new model when you can wait one month and FOMO over a better model :)
English
1
0
8
226
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
Some Bad: * you pretty much have to restart PyCharm if something goes wrong * itโ€™s the only thing that makes the fans spin up on my M1 * not Cursor ๐Ÿ˜›
English
0
0
0
75
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
The Good: * strong autocomplete * codebase understanding with jump-to-definition * jump to previous cursor position * interactive tables * stable enough for remote notebooks Essentially, having all the IDE goodies in a notebook keeps me in the flow.
English
1
0
0
100
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
After 10(?) years, I think I finally found a better alternative to just running plain Jupyter notebooks: PyCharm's notebook integration. I've been using it off and on for awhile, but it's reached to a point now I can wholeheartedly recommend.
English
2
0
0
152
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ
Motoki Wu ๐ŸŒŠ๐Ÿฆˆ@plusepsilonยท
@abacaj I've found loss masking to work well, but it does lower the number of tokens that the model sees. So you may have to compensate for that (more epochs / higher batch size).
English
0
0
1
81
anton
anton@abacajยท
Couple of questions that I haven't seen answered yet: 1. Why did stanford alpaca team set instructions to -100, forcing the model to condition on outputs? 2. Why do I see a better model (including proper validation loss) training on full sample, including instructions?
English
5
0
15
2K
anton
anton@abacajยท
Why would SFT show increasing validation loss and still an "improving" model? Two runs, same hparams. Only difference? Training on "inputs". In the original stanford alpaca they set the instructions to -100 to condition the model on the outputs But removing this line, shows a steady validation loss (blue line) - and a model that performs better in my benchmarks... What gives?
anton tweet media
English
10
3
49
27.6K