Sabitlenmiş Tweet
Jeroen van Lith
514 posts

Jeroen van Lith
@jmvanlith
Founder at https://t.co/HRbaCw2DUH - Interested in machine learning and architectural design. Also on Bluesky now!
Berlin Katılım Mayıs 2010
770 Takip Edilen483 Takipçiler

@fchollet What principles will you use for designing ARC-AGI-v2? What did you use for v1, and what will be different in v2?
English

Does this mean the ARC-AGI benchmark has saturated?
Yes -- the v1 version of the benchmark is starting to saturate. There were already signs of this in the Kaggle competition this year -- an ensemble of all submissions would score 81%.
The competition next year will run on ARC-AGI-2, an updated version of the dataset that keeps the same format as v1, but features fewer tasks that can be easily brute-forced.
Early indications are that ARC-AGI-v2 will represent a complete reset of the state-of-the-art, and it will remain extremely difficult for o3. Meanwhile, a smart human or a small panel of average humans would still be able to score >95%.
English

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.
It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.

English

After 3 years of team work, we just shipped Rayon V2! 🎉
This new version represent Rayon’s best attempt to date to finally offer interior designers/architects an online software to draft, stylize & lay out their work, all in one place.
Try V2 today -> rayon.design
English

@tsarnick Wouldn’t this lead to more complex cases, as opposing lawyers would also have access to LLMs?
English

@jhn01 Great! Ah that’s a UI mistake, they are not different. Will fix in next release!
English

@jmvanlith Ooh nice, it reminds me of the “blockify” command in Bricscad. This is huge.
English

@jhn01 Yes it will look for similar (clusters of) geometries in the file, and for each of these, it will replace that with a block instance.
English

So excited that we can finally unveil the next iteration of @HyparAEC. We've rebuilt it from the ground up to focus relentlessly on 𝙞𝙣𝙩𝙚𝙡𝙡𝙞𝙜𝙚𝙣𝙩 𝙨𝙥𝙖𝙘𝙚 𝙥𝙡𝙖𝙣𝙣𝙞𝙣𝙜, informed by hundreds of user tests + interviews.
Check it out today @ hypar.io!
Hypar@HyparAEC
Hypar 2.0 is available today! Simple, intelligent space planning for the most complex and critical sectors like healthcare, labs, education, and workplace. Sign up for free and start building today at hypar.io.
English

How is @YouTube not able to flag fake elon bitcoin livestream scams that have 300k viewers and have been streaming for 84 mins already🥴.

English

Rather than rewriting code/text, I wish @OpenAI could visualize proposed changes to a text in-place.
English





