Vaisakh M
962 posts

Vaisakh M
@m__vaisakh
AI Efficiency Research | Independent Researcher
Kochi, India Katılım Mayıs 2018
1K Takip Edilen193 Takipçiler


turns out my place has high carbon monoxide!
fire department brought a whole troop
alarm beeped yesterday and today for a period of time
I did wake up a bit tired and felt a strain all day
so I didn't take any chances and called 911
ongoing situation -- fire department investigating source of CO, which doesn't seem to come from my place
I'll update as we have new findings
English

big lab burner account humor be like
chinese distillation. chinese? distillation. did you know that the chinese... distill? chinese. the chinese are distilling,
typedfemale@typedfemale
really exciting to see an LLM trained on pre-1930 data - post-2022 is already crowded with qwen, deepseek, and kimi
English
Vaisakh M retweetledi
Vaisakh M retweetledi

@difficultyang This is one part. The other part is corporate can do this without involving the developers/researchers.
English
Vaisakh M retweetledi

“Timing is very important. You need to pick hard problems to solve and be ambitious with them. But you've also got to pick the right time when the world and the context that you're in is the right kind of environment for those ideas to flourish.”
In his official Nobel Prize interview, Demis Hassabis discussed how his aspirations as a young gaming programmer were ahead of their time.
Watch our official interview: bit.ly/41DGkXr

English

Thinking of making a "ML intuitions bench", which will be MCQs for what happens if you make certain tweaks to tranformers or other archs.
I have a bunch of findings that'll probably never make into a paper, and most of which are pretty surprising to me. If LLMs can predict these accurately then that's a pretty huge thing for autoresearch
English
Vaisakh M retweetledi

@andrew_n_carr I vaguely checked earlier and only found this (nowhere near 25T iiuc)
huggingface.co/collections/nv…
English
Vaisakh M retweetledi

@kuchaev @natolambert Time here is the time a human takes to complete a task iirc. This eval only takes into account how reliably a model finishes a task and not the time taken to do it.
English

@natolambert I am not convinced that hours is a proper metric here. Anything can and will be made faster so, if hypothetically, Anthropic made claude faster (better SW, newer/more hw) that would show up as *worse* on this plot?
English

This'll really solve the claude vs codex debate surely.
I'm still team claude.
METR@METR_Evals
We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.
English
Vaisakh M retweetledi

@elliotarledge iirc saw a mention of the method not working (at all) in the pixel space and requires a good latent representation to work.
English

some visuals of drifting vs diffusion on cifar10. can you tell the difference?




Elliot Arledge@elliotarledge
Giving Opus 4.6 and GPT 5.3 Codex a spare 8xH100 node to verify how huge this REALLY is.
English














