teïlo

3.2K posts

teïlo

teïlo

@teilomillet

curious layman interest in reasoning

Katılım Aralık 2022
1.1K Takip Edilen246 Takipçiler
teïlo retweetledi
Séb Krier
Séb Krier@sebkrier·
Séb Krier tweet media
ZXX
12
32
337
18.8K
teïlo retweetledi
Zyphra
Zyphra@ZyphraAI·
Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵
Zyphra tweet media
English
100
454
2.5K
1.2M
teïlo retweetledi
Ricardo Olmedo
Ricardo Olmedo@rdolmedo_·
That being said, 🗣️model comparisons are scientifically uninformative unless we control for test task adaptation 🗣️ Absolute benchmark performance is nearly meaningless. Rate of progress on newly proposed benchmarks should be taken with a grain of salt. arxiv.org/abs/2407.07890
English
1
2
39
5.6K
teïlo
teïlo@teilomillet·
@willccbb kinda building an abstraction layer to make every harness into one. so far it covers claude-code / opencode and codex. but it should be universal. github.com/teilomillet/sa…
English
0
0
0
142
will brown
will brown@willccbb·
what cli agents do people use that support a proper server mode? so far i got: - codex, pi, opencode, hermes (openclaw? lol) and not: - claude, cursor, amp, droid
English
21
1
101
11.8K
teïlo retweetledi
Dylan HadfieldMenell
Dylan HadfieldMenell@dhadfieldmenell·
This is a super cool project, but I think there may be some data contamination. Either that, or its ability to predict the future is truly remarkable.
Dylan HadfieldMenell tweet media
Nick Levine@status_effects

New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

English
10
15
514
54.5K
will brown
will brown@willccbb·
sometimes it's "hold up i gotta tweet this" others it's "hold up i gotta grab this on pypi"
English
1
0
56
2.7K
teïlo
teïlo@teilomillet·
@factorydoge69 what are the math? openai go bust if they dont reach the 10x yoy expected by dario?
English
0
0
2
672
factorydoge
factorydoge@factorydoge69·
have seen multiple ai ppl say there there's pretty high probability openai goes bankrupt this dario clip from his latest podcast with dwarkesh explains the math behind it pretty well
Deedy@deedydas

Demis Hassabis and Sebastian Mallaby were on stage in SF today and here are the 9 best things they said: 1. "There is a 50% chance that OpenAI goes bankrupt in the next 18mos" -Mallaby 2. "Dario is the best of all the other lab leaders." -Demis 3. On Claude Mythos: "It's not really tenable for a private company to decide who gets access to the frontier of cyber defense tech. What happens when China can do this in 6-12mos?" -Mallaby 4. "Not all countries are pessimistic about AI. I was just in India for the AI Summit Modi had and they're quite optimistic there" -Demis 5. "The most exciting current prospect in AI is our work at Isomorphic Labs. AlphaFold is just one of the many problems we need to solve. We need 6 'AlphaFold' moments to compress the drug delivery timeline from 10yrs to a few months" -Demis 6. "I don't think of p(doom) as probabilities to throw out there. I just know it's non zero. Some people like Marc Andreesen and Yann LeCun think it's 0% and I think that's crazy" -Demis 7. On AGI: "I think of a post-scarcity world where on the bright side we will have an unbelievable amount of science but we will have to think of economic problems of sharing proceeds equitably. We will also have philosophical questions to answer and need great new philosophers" -Demis 8. On career advice: "Immerse yourself in AI tools. Everyone has access to tools 3-6 months behind frontier. Enormous opportunity lies in applying AI to unexplored areas." -Demis 9. On the future: "When I started building this technology, I pictured a future quite different from this. More like CERN researchers where we discuss ideas and help each other out and stress test each other's ideas. It's my job to help how I can to make sure we make more considered, more scientific, more rigorous and more thoughtful decisions and that will also involve social scientists and economists. I'm going to do all I can to try and influence the future in a note thoughtful manner. The decisions we make in the next 5-10 years are going to affect us for 1000s of years. But I remain very optimistic." -Demis

English
55
26
835
263.7K
teïlo
teïlo@teilomillet·
benchmark driven development
English
0
0
0
14
davinci
davinci@leothecurious·
@Noahpinion how can u possibly conclude that from this particular clip. how much more sense does jensen have to make before this is intuitive?
English
3
0
24
2.3K
teïlo
teïlo@teilomillet·
seeing codex optimizing some metrics on a repo really feels like a training run
English
0
0
0
28
teïlo retweetledi
Andon Labs
Andon Labs@andonlabs·
We gave an AI a 3-year retail lease in SF and asked it to make a profit. The AI interviewed and hired full-time employees, applied for credit, and stocked the store with the books Superintelligence and Making of the Atomic Bomb. Visit Andon Market at 2102 Union St now.
English
102
156
2.4K
1.9M
teïlo
teïlo@teilomillet·
bitter lesson but for tests
English
0
0
0
20
teïlo retweetledi
You Jiacheng
You Jiacheng@YouJiacheng·
IMO problems are 100% human solvable, median human scores <1%
Greg Kamradt@GregKamradt

Today we're launching ARC-AGI-3 135 Novel Environments (nearly 1K levels) we build by hand It is the only unsaturated agent benchmark in the world Each game is 100% human solvable, AI scores <1% This gap between human and AI performance proves we do not have AGI Agents today need human handholding. Agents that beat V3 will prove they don’t need that level of supervision. Agents that beat V3 will demonstrate: * Continual learning - Each level builds on top of each other. You can’t beat level 3 without carrying forward what you learned in levels 1 and 2. * World modeling - Many of the environments require planning actions many actions ahead. AI will have no choice but to build an internal world model for how the environment works, run simulations “in its head” and proceed with an action In our early testing, we’ve seen a few clear failure modes of AI: * Anticipation of future events - If an environment requires that AI set up a scene, and then carry out a scenario (like in sp80), it starts to break down. * Anchoring on early hypothesis - Early in a game it comes up with a hypothesis (even if wrong) and refuses to update its beliefs later. * Thinking it’s playing another game - AI thinks it’s playing chess, pacman. The training data holds hard! One major problem is there is too much data to carry forward in a single context. Models must learn what to remember and what to forget The agent that beats ARC-AGI-3 will have demonstrated the most authoritative evidence of progress towards general intelligence to date We're excited to get this out and excited to see what you think

Español
15
18
401
42.2K