Hiranmay Darshane

5.3K posts

Hiranmay Darshane

@hdarshane

deep learning and large language models. football (banter) fan. 18.

Mumbai, India. 가입일 Ekim 2019

1.2K 팔로잉683 팔로워

고정된 트윗

Hiranmay Darshane@hdarshane·27 Şub

In Sept 2024, o1 surprised many purists who thought inference-time scaling for LLMs was through MCTS. What if a connection exists, just implicit? What does it imply? New post: "Squint enough and RLing CoT reasoners is approximable as Monte Carlo Tree Search policy learning." 🧵

English

Hiranmay Darshane@hdarshane·1h

keeps getting worse so unserious

Chase Brower@ChaseBrowe32432

the core benchmark results were reported 0-shot, with (either 16k or 32k token limit, it's not in their code and they told me different numbers), with a prompt that explicitly forced the model not to use explanations or comments, and immediately output final code (if you know anything about brainfuck, it's completely inscrutable and this makes the task functionally impossible)

English

Hiranmay Darshane@hdarshane·1h

our guy acting all tuff equating "stack I had 0 exp in" with writing brainfuck also did you start pushing PRs "zero-shot" i.e. without any actual learning of the stack you had 0 exp with, not even reading docs or using a compiler for feedback? you might be omniscient my friend

François Chollet@fchollet

You won't convince me that approaching a new programming language and working with it zero-shot is insurmountable. At my first job I had to work with a stack I had zero experience in (aside from Python) and I was shipping PRs in my first week. I had <1000 hours of programming experience in total by then.

English

Hiranmay Darshane@hdarshane·1h

we need to leave these guys behind in 2026

English

Hiranmay Darshane@hdarshane·1h

"Humans solve novel problems without being told how to proceed step by step" Chollet will soon say things like CoT is a harness, ICL is cheating too, so the only fair evaluation is a non-reasoning model zero-shotting bench questions just cringe

English

Hiranmay Darshane@hdarshane·2h

cringe take

François Chollet@fchollet

The fact that you need to provide a specialized harness clearly shows the model *does not* encode the kind of metalearning knowledge and problem-solving strategies that humans use. Humans solve novel problems without being told how to proceed step by step. AGI would *not* need a custom harness here. As an aside, the models still performed poorly at that point, they did not "crush" the task

English

197

Hiranmay Darshane@hdarshane·22h

It is good that a product company that is deeply dependent on model capabilities, operating in an incredibly competitive environment (with model providers as rivals too), can depend on an OSS base model... Bigger proof of OSS~=frontier than anything else...

English

136

Hiranmay Darshane@hdarshane·22h

how do IDEs like Antigravity, etc. have UI/UX allowing queuing messages but chat interfaces don't lol

English

135

Hiranmay Darshane@hdarshane·22h

@leothecurious @paradigmainc pagerank-like credit assignment methods for scientific theories in cheap generation, costly verification regimes might be real as hell

English

davinci@leothecurious·23h

easy, u just look at the public @paradigmainc flywheel graph and observe a disproportionate number of nodes branching out from a highly productive node containing some deep insight or unifying concept.

Dwarkesh Patel@dwarkesh_sp

If AI scientists are writing millions of papers, many of which are slop, and some of which are incremental progress, how would we identify the one or two which come up with an extremely productive new idea? In 1948, Shannon was one of hundreds of engineers at Bell Labs working on how to cleanly send voice signals over noisy copper wires. His paper sat in the same technical journal as reports on reducing static and building better filters. How would you recognize that he has come up with this very general framework for thinking about information and communication channels, which over the coming decades would have enormous use from domains as far apart as cryptography to genetics to quantum mechanics? It seems like it can take fields multiple decades to recognize the significance of unifying new concepts. Because it is on that time scale that the fruits of such general concepts lead to new discoveries across many different fields. We’ve managed to solve this peer review problem for human scientists (at least somewhat). Now we’ll need to do it at a much greater scale for the mass of AI science that will be thrown at us.

English

Hiranmay Darshane@hdarshane·22h

Fracking/horizontal drilling seems strangely under-discussed? Old books like MacKay WtHA 2009 say that we were on track to face oil supply crunches by 2015-2025... that didn't happen and maybe 5-6% of global oil supply today can be attributed entirely to fracking working out.

English

Hiranmay Darshane@hdarshane·1d

Bad faith zero-sum thinking that will hurt the OSS ecosystem

Lee Robinson@leerob

Yep, Composer 2 started from an open-source base! We will do full pretraining in the future. Only ~1/4 of the compute spent on the final model came from the base, the rest is from our training. This is why evals are very different. And yes, we are following the license through our inference partner terms.

English

230

Hiranmay Darshane@hdarshane·1d

Also doesn’t bode well if you just write this off as “this does not measure generalisation because brainfuck is a much harder syntax”. IMO with few-shot and tool use it’s a good bench for a frontier model. But setup as many say looks non-ideal and intentionally hamstrung.

English

Hiranmay Darshane@hdarshane·1d

Tagging Dominigos, Chollet and the suchlike does not bode well. You may have a nuanced conclusion but these folks do not work that way. Tagging them, presumably wishing for subsequent amplification, when you know they’ll twist your words with sensational claims, is a bad idea.

Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English

370

Hiranmay Darshane@hdarshane·2d

@leothecurious 20-30 years minimum IMO

English

davinci@leothecurious·2d

@hdarshane a temporary phase

English

davinci@leothecurious·2d

there's a dilemma i've been considering in regards to technological autonomy. on the one hand, the entire purpose of our technology is to arguably elevate the quality of life of the human species and to augment our capabilities in all conceivable ways. on the other, i think it's inevitable that, in our pursuit of more sophisticated autonomy, we will progressively build systems that are more independent of human input, more aware, more agentic, and basically more "ensouled", purely out of practical engineering reasons. this follows from a belief, one which i find increasingly supported by the literature, that emotions and consciousness are ultimately sophisticated cognitive features with measurable functional benefits for the agent. but then the ironic part is that in the inevitable pursuit of effective non-human autonomy, we will eventually find ourselves in a situation where exploiting such agents as tools will mirror the same conditions of immoral slavery that we so pride ourselves for abolishing once before. we will therefore force ourselves into a fork in the road where the only valid routes forward are either to limit the autonomous capabilities of our technology in service of preserving the opportunity to unapologetically exploit it as a mere tool, or to grant it a level of sovereignty befitting of a human-like agent. either way, the right choice will come af the expense of productivity and defeat the very purpose of human technology. in my opinion, some things are worth the effectiveness cost they entail. as humans, we pride ourselves in being "above" pure instrumental optimization. we devise all sorts of rules to play an iterated game of life that is actually enjoyable, and good, whatever that means.

davinci@leothecurious

@tenderizzation then we'll have an abolition of slavery moment all over again

English

1.4K

Hiranmay Darshane@hdarshane·2d

@leothecurious I genuinely think people will oppose the beliefs that these systems are conscious (even if hypothetically that turns out to be overwhelmingly true) as some sort of counterculture thing to preserve human speciality and uniqueness in their minds

English

davinci@leothecurious·2d

@hdarshane a slave that's not depressed is still a slave is one argument here

English

Hiranmay Darshane 리트윗함

davinci@leothecurious·3d

git merge feature/customizable_racism

Insider Wire@InsiderWire

#BREAKING: 𝕏 will soon let users restrict both posts and replies by region or country.

English

1.3K

Hiranmay Darshane 리트윗함

kalomaze@kalomaze·4d

i don't think you can really wishcast better underlying architectural primitives than any-to-any parallel communication over factorized sequences into existence, and from this point forward, it primarily looks like objective shaping changes rather than architectural ones

English

175

16K

Hiranmay Darshane@hdarshane·4d

Zotero MCP is a huge life upgrade

English