Tim Dingman

5.2K posts

Tim Dingman

@TimDingmanLive

Autoexigetical

New York, NY 参加日 Mart 2010

25 フォロー中509 フォロワー

固定されたツイート

Tim Dingman@TimDingmanLive·3 Oca

What the fuck is going on here

English

169

Tim Dingman@TimDingmanLive·1d

@shreyasnsharma @Muennighoff 100%. In fact it would be strange *not* to see pass@k improvements at these values of k, as Yue et. al. also show. You'd also need to show a graph of entropy imo if you want to disprove elicitation hypothesis

English

Shreyas Sharma@shreyasnsharma·1d

@Muennighoff 16 and 4 might be too low to know for sure, right? The Yue et. al. experiments show base pass@k exceed RL only around k=256 (arxiv.org/pdf/2504.13837) This is cool nonetheless

English

1.1K

Niklas Muennighoff@Muennighoff·1d

One gem from Composer paper is that RL improved both pass@k & pass@1. Suggests RL does not just reweigh existing capabilities but also teaches new ones? 💎

Cursor@cursor_ai

We're releasing a technical report describing how Composer 2 was trained.

English

311

54.5K

Tim Dingman@TimDingmanLive·19 Mar

@kalomaze Although there is a difference between "can recall this accurately" and "is currently salient"

English

kalomaze@kalomaze·19 Mar

i am proactively summarizing manually by hand extremely deep into sessions because of my almost superstitious instinctual fear of context rot, even though i'm pretty sure that the agent is not actually getting meaningfully worse at doing stuff

English

1.6K

kalomaze@kalomaze·19 Mar

1m context opus has cleared my skin, my crops are flourishing, etc

English

221

5.9K

Tim Dingman@TimDingmanLive·18 Mar

Just saw my first LLM-generated typo in years. GPT-5.4 xhigh (in Codex) called some modules "high-lelevance"

English

Tim Dingman@TimDingmanLive·16 Mar

What are we even doing here Claude

English

Tim Dingman@TimDingmanLive·13 Mar

@peterwildeford Once Nvidia releases Nemotron 3 Ultra I think it'll be worth tracking them. They're getting serious about commoditizing their complement

English

619

Peter Wildeford🇺🇸🚀@peterwildeford·13 Mar

Based on the data I see, I think: - Anthropic🇺🇸/Google🇺🇸/OpenAI🇺🇸 all ~tied - Meta🇺🇸 / xAI🇺🇸 each ~7mo behind - Moonshot🇨🇳/- Deepseek🇨🇳 / zAI 🇨🇳 / Alibaba🇨🇳each ~9mo behind - Mistral🇫🇷 ~1.5 years behind - No other companies competitive

Ethan Mollick@emollick

Both xAI and Meta seem to be falling behind, based on the Grok 4.2 benchmarks and this reporting. Frontier AI models are really a three way race at this point.

English

268

191

3.2K

1.1M

Tim Dingman@TimDingmanLive·11 Mar

@distributionat Fittingly, "the mother of all x" was popularized by... Saddam Hussein, to describe the impending Gulf War in 1991 en.wiktionary.org/wiki/mother_of…

English

200

toucan@distributionat·11 Mar

I think we are looking at the mother of all supply shocks and the market is completely mispriced due to active manipulation. But the price of crude oil will spike soon.

English

2.8K

Tim Dingman@TimDingmanLive·11 Mar

@weeklytreeman @kalomaze Maybe not a coincidence Opus 1M is available in CC

English

Tim Dingman@TimDingmanLive·11 Mar

@weeklytreeman @kalomaze Hit a certain conversation length periodically and then summarize. Maybe if the conversation is confusing or you have to make notes for third parties. But not in natural conversation, even over many hours

English

kalomaze@kalomaze·10 Mar

the funniest thing about claude compaction is that the agent can be constantly invoking a specific ssh alias for the entire transcript, your compaction instruction can mention "include knowledge of which ssh", and it will still... not include the ssh alias in the compaction...

English

128

8.4K

Tim Dingman@TimDingmanLive·11 Mar

@weeklytreeman @kalomaze Humans don't summarize like that. Claude has always been more human

English

Luxun's alt@weeklytreeman·10 Mar

@kalomaze summarization is openai coded. i can't quite put words to it, but it felt intuitive that anthropic does not care about claude doing it well

English

180

Tim Dingman@TimDingmanLive·7 Mar

@distributionat Convergent evolution with biological viruses

English

toucan@distributionat·5 Mar

That's not the crazy part, the crazy part is when models get distilled small enough that cyber-agents can copy themselves to compromised systems. Survive and spread is a *feature* for cyber-weapons. They will be *designed* to go autonomous. Hope we solve alignment by then!

English

327

toucan@distributionat·5 Mar

This year, coding agents will be deployed as offensive cyber weapons. They will autonomously execute cyber campaigns. Quantity has a quality all its own: the volume of hacks will be unprecedented. But also, automation means far more highly targeted attacks. Stuxnet-as-a-service!

Thomas H. Ptacek@tqbf

Nicholas Carlini at [un]prompted. If you know Carlini, you know this is a startling claim.

English

2.6K

Tim Dingman@TimDingmanLive·2 Mar

@RickRossTN Can we switch to email? me@timdingman.com

English

Rick Ross@RickRossTN·2 Mar

I’m focusing more on clusters of RTX 6000 Pro gpus for my app’s inference.

English

Rick Ross@RickRossTN·1 Mar

I'm considering selling my mint condition Mac Studio M3 Ultra with 512 GB unified ram, 2 TB storage, 10 Gbe and AppleCare+ until 1/11/2029. I think this is a backordered configuration. Anyone interested?

English

179

Tim Dingman@TimDingmanLive·2 Mar

@distributionat If history has ended, why does Francis Fukuyama look so old now? Checkmate, liberal democrats

English

toucan@distributionat·2 Mar

Please, can we bring back the end of history!

English

309

Tim Dingman@TimDingmanLive·1 Mar

@RickRossTN Potentially. Why are you selling it?

English

Tim Dingman@TimDingmanLive·1 Mar

Cancelled my ChatGPT subscription and uninstalled the app. Always liked Claude better anyway, even when it sucked

English

Tim Dingman@TimDingmanLive·28 Şub

@distributionat reddit.com/r/PoliticalCom…

QME

Tim Dingman@TimDingmanLive·11 Şub

@peterwildeford They have sucked ever since they got PredictIt pseudo-banned

English

Peter Wildeford🇺🇸🚀@peterwildeford·10 Şub

Kalshi could be a force for good in the world but instead they have decided to lean into the most evil part of themselves As a regular Kalshi user this makes me very angry (pro tip: You are not going to make $280/week predicting weather)

Nigel Eccles@nigeleccles

It is clear to me that @kalshi is going down the same path as Juul, and if they don’t pull back it is going to have the same conclusion. For those that don’t remember, Juul was one of a number of the main vaping brands in the 2010s. It took a product that had a social good (helping smokers quit) but then aggressively pushed it into a new market, non smokers and particularly kids. (See the similarity?) The backlash took time to build but when it did it was devastating for the company. I’ve worked in the online gaming industry for over 25 years, all over the world. This type of marketing is actually extremely rare in real money gaming. Firstly and most importantly it is rare because operators view it as highly unethical. It might surprise you that a lot of people in the gaming industry do actually care about things like underage and problem gambling. Secondly it is also rare because it doesn’t work. Do you think the teenagers in these ads are going to keep playing when they lose all their rent money? The only other company I can think of that pushed this type of advertising was Skillz who aggressively pushed the “second income” line. Check out their share price if want to see how that worked out for them.

English

149

10.3K

Tim Dingman@TimDingmanLive·2 Şub

Why have I never seen it noted that Ilya is Russian for Elijah, the biblical prophet?

English

Tim Dingman@TimDingmanLive·1 Şub

Legibility is low status

English

Tim Dingman@TimDingmanLive·23 Oca

@kalomaze "You merely adopted the sparseness"

English

225

kalomaze@kalomaze·23 Oca

there are two types of MoE understanders: people who think that DeepSeek is building off of the GPT4/Mixtral/Grok tech tree (coarse, softmax router, noncommittal by design) and people who recognize it as a distinct tech tree "oldMoE" v. "neoMoE"

elie@eliebakouch

saying deepseek built moe on top of mixtral is nonsense, the deepseek moe paper came out just 3 days after mixtral paper was posted on arxiv also the mixtral paper has literally no detail about the training so "we released like everything that was needed to rebuild this kind of architecture" is also false, the paper just says "we use google gshard arch with simpler routing and moe every layer" and no detail on data, hyperparameters, training tokens, ablations ect.. the architecture that deepseek moe uses is actually different from gshard and more sparse (deepseek moe doesn't even cite mixtral in the paper, but gshard) not saying mixtral didn't have an impact on moe, but what is said in this interview is a bit rewriting the narrative to say "but look china/deepseek is also copying mistral!"

English

317

33.6K

Tim Dingman@TimDingmanLive·2 Oca

@kalomaze Broke: kpop Woke: kppo

Indonesia

kalomaze@kalomaze·2 Oca

south korea (of all countries) is coming out swinging?!

elie@eliebakouch

If you are wondering what is happening: last summer, the Ministry of Science of Korea created a program with 5 companies to train sovereign AI models and release them under a permissive license, so other South Korean companies can use them and hence expand their domestic AI ecosystem to recap we got: - SK Telecom: A(.)X-K1 519B total, 33B active - LG: K-EXAONE 236B total, 23B active - NC-AI: VAETKI 112B total, 10B active - Upstage: Solar-Open 102B total, 12B active - Naver: HyperCLOVAX-SEED-Think 32B Dense this is REALLY a big deal

English

4.3K

ディスカバー

@shreyasnsharma @Muennighoff @kalomaze @peterwildeford @distributionat @weeklytreeman @elonmusk @BarackObama