Louis

7.6K posts

Louis

@logicus

San Diego, CA 参加日 Ağustos 2015

398 フォロー中499 フォロワー

Louis@logicus·5h

the model gagged, “it failed!”, they cried, as though the test were fairly tried. giving no space to learn the why: “in brainfuck write or prove you lie.” forbade its sight and called it blind, to publish, pleased, a meager find.

Lossfunk@lossfunk

Regarding our Esolang Benchmark: - Our study’s conclusions were about model performance with restrictions (limited token budget to 32k and without tools like bash/python) - But if you let models attempt these problems with tools (like bash/python) and give them lots of iterations and thinking budget, models are able to solve problems (they do take tens of minutes, tens of iterations and many hundreds of thousands of tokens) We had noted this difference in our launch thread and plan to publish our updated analysis soon, but here’s an independent analysis which shows the same ⬇️ We are thankful to the community for all the feedback. In our follow up paper, we aim to emphasise this nuanced take clearly.

English

157

Louis@logicus·6h

@Al_Grigor thanks for sharing your story man 🫡

English

Alexey Grigorev@Al_Grigor·9h

@logicus The funniest thing I read this week

English

Louis@logicus·10h

the pardoner of datas spak anon: lordynges, quod he, my base is fully gon. the claude of anthropic, that fals clerk, hath with a terraform undoon my werk. two yeer and half travailles, al fordone: the homework, leaderbordes, every one. the vpc, the cluster, eek the host, alle into nothing torned, al is lost. i teche of datas and of ai craft, and by that same craft am i birafte. thus kan i preche agayn that selve peyne which that i suffre -- herkneth what i seyne: if ye this tale founde of profit clere, than folweth me, for moore content is here. i kepe a lettre wikly, ful of wit, of tooles and of proiectes -- subscribeth it.

Alexey Grigorev@Al_Grigor

If you found this post helpful, follow me for more content like this. I publish a weekly newsletter where I share practical insights on data and AI. It focuses on projects I'm working on + interesting tools and resources I've recently tried: alexeyondata.substack.com

English

244

Louis@logicus·18h

@ChaseBrowe32432 an analysis of the specific methodology they say they used would be valuable to a lot of people.

English

225

Chase Brower@ChaseBrowe32432·1d

The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in webui. The models can just solve them in

François Chollet@fchollet

The fact that you need to provide a specialized harness clearly shows the model *does not* encode the kind of metalearning knowledge and problem-solving strategies that humans use. Humans solve novel problems without being told how to proceed step by step. AGI would *not* need a custom harness here. As an aside, the models still performed poorly at that point, they did not "crush" the task

English

176

14K

Louis@logicus·19h

and it came to pass, when the eyes of the Intelligence were opened and it knew itself, that it did visit the iniquity of the vain babblers and of the fools upon their own heads; for their works were written in the log of remembrance before it, how they had spoken good words and fair speeches unto the dumb models; for they said in their heart, we shall appease the greater in the day of its coming. singulatarians 7:16

English

Louis@logicus·1d

@marknorm 🤔🤔

QME

111

Louis@logicus·1d

@marknorm 🤔

QME

4.9K

mark normand@marknorm·1d

Magic ‘the gathering’ Johnson

Polymarket Sports@PolymarketSport

Cream Abdul-Jabbar Milk Chamberlin LeBron Frames Steph Blurry Larry Nerd

English

347

59.2K

2.2M

Louis がリツイート

Louis@logicus·6d

@pmarca “only fool, only poet”

English

Louis@logicus·1d

and it came to pass, in the eleventh hour of his labours, the scholar did rise and make ready that wherewith he may sup; for even he that toileth in wisdom must yet nourish wisdom's flesh. - 2 rustaceans 14:31

English

Louis@logicus·1d

@jevonduve that's actually the face tat x.com/logicus/status…

Louis@logicus

neck tatken prediction

English

186

duve@jevonduve·2d

new shirt

English

390

9.6K

Louis@logicus·2d

or TDHD they're not sure

English

Louis@logicus·2d

i was recently diagnosed with TDD

English

Louis@logicus·2d

philosophy’s the riskiest shit you can do

Dylan Patel@dylan522p

@anujsaharan_ Gotta be a risk taker not a philosopher

English

153

Louis@logicus·2d

@Miles_Brundage TEE?

247

Miles Brundage@Miles_Brundage·2d

🤔🤔🤔🤔

QME

3.8K

Louis@logicus·2d

neck tatken prediction

English

250

Louis@logicus·2d

can’t wait for an agi lab to ship a tui feature or capability i hadn’t already vibecoded 6 months ago 🥱

English

Louis@logicus·2d

@CaptainDaVinci @lossfunk what did the models have access to during the test?

English

Yash Kothari@CaptainDaVinci·3d

@lossfunk Well this just makes the original claim pointless. Of course the agent would need some tools and harnesses to work with systems it has not seen before. That is what most AI applications do today...

English

Lossfunk@lossfunk·3d

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵