teïlo

3.2K posts

teïlo

@teilomillet

curious layman interest in reasoning

Katılım Aralık 2022

1.1K Takip Edilen246 Takipçiler

teïlo retweetledi

Séb Krier@sebkrier·1d

ZXX

337

18.8K

teïlo retweetledi

Zyphra@ZyphraAI·4d

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

English

100

454

2.5K

1.2M

teïlo retweetledi

Ricardo Olmedo@rdolmedo_·4 May

That being said, 🗣️model comparisons are scientifically uninformative unless we control for test task adaptation 🗣️ Absolute benchmark performance is nearly meaningless. Rate of progress on newly proposed benchmarks should be taken with a grain of salt. arxiv.org/abs/2407.07890

English

5.6K

teïlo retweetledi

Lucas Beyer (bl16)@giffmana·30 Nis

In other words: their RL transfers/generalizes.

OpenAI@OpenAI

We’re talking about Goblins. openai.com/index/where-th…

English

667

121.9K

teïlo@teilomillet·30 Nis

@willccbb kinda building an abstraction layer to make every harness into one. so far it covers claude-code / opencode and codex. but it should be universal. github.com/teilomillet/sa…

English

142

will brown@willccbb·29 Nis

what cli agents do people use that support a proper server mode? so far i got: - codex, pi, opencode, hermes (openclaw? lol) and not: - claude, cursor, amp, droid

English

101

11.8K

teïlo retweetledi

Dylan HadfieldMenell@dhadfieldmenell·28 Nis

This is a super cool project, but I think there may be some data contamination. Either that, or its ability to predict the future is truly remarkable.

Nick Levine@status_effects

New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

English

514

54.5K

teïlo@teilomillet·27 Nis

@willccbb holdups

English

will brown@willccbb·27 Nis

sometimes it's "hold up i gotta tweet this" others it's "hold up i gotta grab this on pypi"

English

2.7K

teïlo@teilomillet·21 Nis

@factorydoge69 what are the math? openai go bust if they dont reach the 10x yoy expected by dario?

English

672

factorydoge@factorydoge69·21 Nis

have seen multiple ai ppl say there there's pretty high probability openai goes bankrupt this dario clip from his latest podcast with dwarkesh explains the math behind it pretty well

Deedy@deedydas

Demis Hassabis and Sebastian Mallaby were on stage in SF today and here are the 9 best things they said: 1. "There is a 50% chance that OpenAI goes bankrupt in the next 18mos" -Mallaby 2. "Dario is the best of all the other lab leaders." -Demis 3. On Claude Mythos: "It's not really tenable for a private company to decide who gets access to the frontier of cyber defense tech. What happens when China can do this in 6-12mos?" -Mallaby 4. "Not all countries are pessimistic about AI. I was just in India for the AI Summit Modi had and they're quite optimistic there" -Demis 5. "The most exciting current prospect in AI is our work at Isomorphic Labs. AlphaFold is just one of the many problems we need to solve. We need 6 'AlphaFold' moments to compress the drug delivery timeline from 10yrs to a few months" -Demis 6. "I don't think of p(doom) as probabilities to throw out there. I just know it's non zero. Some people like Marc Andreesen and Yann LeCun think it's 0% and I think that's crazy" -Demis 7. On AGI: "I think of a post-scarcity world where on the bright side we will have an unbelievable amount of science but we will have to think of economic problems of sharing proceeds equitably. We will also have philosophical questions to answer and need great new philosophers" -Demis 8. On career advice: "Immerse yourself in AI tools. Everyone has access to tools 3-6 months behind frontier. Enormous opportunity lies in applying AI to unexplored areas." -Demis 9. On the future: "When I started building this technology, I pictured a future quite different from this. More like CERN researchers where we discuss ideas and help each other out and stress test each other's ideas. It's my job to help how I can to make sure we make more considered, more scientific, more rigorous and more thoughtful decisions and that will also involve social scientists and economists. I'm going to do all I can to try and influence the future in a note thoughtful manner. The decisions we make in the next 5-10 years are going to affect us for 1000s of years. But I remain very optimistic." -Demis

English

835

263.7K

teïlo@teilomillet·19 Nis

benchmark driven development

English

teïlo retweetledi

𝕱𝖚𝖑𝖑 𝕶𝖊𝖑𝖑𝖞@full_kelly_·16 Nis

@claudeai Shrinkflation

English

751

54.9K

teïlo retweetledi

Lisan al Gaib@scaling01·16 Nis

Confirmed: Anthropic keeping Cyber capabilities of Opus 4.7 artificially low "during training we experimented with efforts to differentially reduce these capabilities"

Lisan al Gaib@scaling01

big jump in coding capabilities by Claude 4.7 Opus SWE-Bench Pro 64.3% SWE-Bench Verified 87.6% TerminalBench 69.4% but interestingly, I think they kept CyberGym scores artificially low

English

101

32.6K

teïlo@teilomillet·16 Nis

@leothecurious @Noahpinion “you dont have to move on, I am enjoyiny it”

English

110

davinci@leothecurious·16 Nis

@Noahpinion how can u possibly conclude that from this particular clip. how much more sense does jensen have to make before this is intuitive?

English

2.3K

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion·16 Nis

Dwarkesh really won this one

Steven Adler@sjgadler

Dwarkesh: Why would we want to sell China the materials for a serious cyberweapon? It's like selling them nukes with a casing that says 'made by Boeing' and claiming that's good for the US Jensen: Comparing AI to nukes is lunacy. Enriched uranium is a lousy analogy. It's an illogical analogy. What we have to recognize is that AI is a five-layered cake.

English

1.9K

210.2K

teïlo@teilomillet·16 Nis

lmao “you dont have to move on, i am enjoying it”

Steven Adler@sjgadler

English

teïlo@teilomillet·14 Nis

seeing codex optimizing some metrics on a repo really feels like a training run

English

teïlo retweetledi

Nature is Amazing ☘️@AMAZlNGNATURE·12 Nis

Meanwhile in Poland.

English

921

3.7K

38.6K

teïlo retweetledi

Andon Labs@andonlabs·11 Nis

We gave an AI a 3-year retail lease in SF and asked it to make a profit. The AI interviewed and hired full-time employees, applied for credit, and stocked the store with the books Superintelligence and Making of the Atomic Bomb. Visit Andon Market at 2102 Union St now.

English

102

156

2.4K

1.9M

teïlo retweetledi

Hensen Juang@basedjensen·2 Nis

Take this with mountain of salt but if true lol

Rahim Oramine@HereraMark45908

@IntCyberDigest APT IRAN hackers did this and published the source codes of the F-35. It is truly catastrophic.

English

7.9K

teïlo@teilomillet·1 Nis

AUKUS POCUS

NewsWire@NewsWire_US

UK Prime Minister Keir Starmer is set to address the nation this morning, becoming the third leader to do so today, alongside Donald Trump and Australian Prime Minister Anthony Albanese.

Lietuvių

teïlo@teilomillet·31 Mar

bitter lesson but for tests

English

teïlo retweetledi

You Jiacheng@YouJiacheng·26 Mar

IMO problems are 100% human solvable, median human scores <1%

Greg Kamradt@GregKamradt

Today we're launching ARC-AGI-3 135 Novel Environments (nearly 1K levels) we build by hand It is the only unsaturated agent benchmark in the world Each game is 100% human solvable, AI scores <1% This gap between human and AI performance proves we do not have AGI Agents today need human handholding. Agents that beat V3 will prove they don’t need that level of supervision. Agents that beat V3 will demonstrate: * Continual learning - Each level builds on top of each other. You can’t beat level 3 without carrying forward what you learned in levels 1 and 2. * World modeling - Many of the environments require planning actions many actions ahead. AI will have no choice but to build an internal world model for how the environment works, run simulations “in its head” and proceed with an action In our early testing, we’ve seen a few clear failure modes of AI: * Anticipation of future events - If an environment requires that AI set up a scene, and then carry out a scenario (like in sp80), it starts to break down. * Anchoring on early hypothesis - Early in a game it comes up with a hypothesis (even if wrong) and refuses to update its beliefs later. * Thinking it’s playing another game - AI thinks it’s playing chess, pacman. The training data holds hard! One major problem is there is too much data to carry forward in a single context. Models must learn what to remember and what to forget The agent that beats ARC-AGI-3 will have demonstrated the most authoritative evidence of progress towards general intelligence to date We're excited to get this out and excited to see what you think

Español

401

42.2K

Keşfet

@AMD @willccbb @factorydoge69 @claudeai @leothecurious @Noahpinion @elonmusk @BarackObama