Tim Kellogg

18.9K posts

Tim Kellogg

@kellogh

AI architect // hiking, camping and long walks on the beach as long as they involve backpacking

Raleigh, NC เข้าร่วม Kasım 2011

722 กำลังติดตาม1.3K ผู้ติดตาม

ทวีตที่ปักหมุด

Tim Kellogg@kellogh·19 Ara

Meet Strix, my AI agent This one covers: - an intro from Strix - architecture deep dive & rationale - helpful diagrams - stories - oh my god what's it doing now?? - conclusion timkellogg.me/blog/2025/12/1…

English

1.3K

Tim Kellogg@kellogh·9h

@VadimStrizheus they’re releasing it on April Fools Day?

English

Vadim@VadimStrizheus·14h

THIS IS INSANE!! Supermemory reached a 99% SOTA memory system. AI agents will now remember EVERYTHING p.s they’re open sourcing it in 11 days 👇

Dhravya Shah@DhravyaShah

x.com/i/article/2035…

English

130

454.3K

Tim Kellogg@kellogh·1d

@jessfraz @YouJiacheng read the last 3 words

English

Jessie Frazelle@jessfraz·1d

@kellogh @YouJiacheng My dude if you read the language on the kimi license all it asks is that you use their name displayed

English

Jessie Frazelle@jessfraz·1d

This is nuts to me, the one thing Moonshot (kimi creators) asks you do is say that they are the base. Like just say it, what's the big deal, everyone already knows! It's insane. At this point not saying it makes you look like the baddies.

English

10.1K

Tim Kellogg@kellogh·1d

@jessfraz @Kimi_Moonshot @cursor_ai bro that’s not the license they’re using

English

366

Jessie Frazelle@jessfraz·1d

@kellogh @Kimi_Moonshot @cursor_ai Read the actual license on the model dude

English

430

Kimi.ai@Kimi_Moonshot·1d

Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.

English

509

1.4K

20.2K

3.3M

Tim Kellogg@kellogh·1d

@kimmonismus it was a COMMERCIAL license. Cursor PAID kimi for non-standard terms, such as white labeling

English

119

Chubby♨️@kimmonismus·1d

For transparency reasons, I believe it would have been better to include a direct reference to Kimi K2 in the blog post about Compoaer 2. Furthermore, it also demonstrates how good Chinese open-source models have become.

Lee Robinson@leerob

Since people really want me to say this: "KIMI K2.5" ‼️ Yes, that is the base we started from. And we are following the license through inference partner terms (e.g. Fireworks) I'm thankful for OSS models personally, good for the ecosystem.

English

306

22.7K

Tim Kellogg@kellogh·1d

@jessfraz @YouJiacheng it’s a COMMERCIAL license really not sure if you’re reading this, but they absolutely are exchanging funds for this

English

Jessie Frazelle@jessfraz·1d

@YouJiacheng they arent asking for 80 bajillion dollars they are asking that you say their name, JUST SAY IT

English

118

Tim Kellogg@kellogh·1d

@jessfraz @Kimi_Moonshot @cursor_ai it looks to me that they paid for the right to not have to announce it

English

1.2K

Jessie Frazelle@jessfraz·1d

@Kimi_Moonshot @cursor_ai I'm mad on your behalf, they should have put you in the blog and made it apparent from the start.

English

318

24.1K

Tim Kellogg@kellogh·4d

@teortaxesTex ya i cant even evaluate models this year without an agent harness. the chat frame just can’t push them hard enough

English

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·4d

> Most improvements in the last 9 months are attributable more to the tooling around the model rather than the models themselves That's because the models themselves have become good enough to reliably make use of large numbers of nontrivial tools what's this mistake called?

expatanon@expatanon

Altman admitted that transformer models have hit the wall. Most improvements in the last 9 months are attributable more to the tooling around the model rather than the models themselves. In other words, this technology is rapidly maturing with no signs of another leap.

English

133

7.7K

Tim Kellogg@kellogh·4d

@rickasaurus in my brief testing, it appears alarmingly over-RL’d. which is sad, 2.5 was really good

English

Rick@rickasaurus·4d

You might say they're min maxxing their release cadence

🥭@MangoSweet78

minimax 2.7 apparently.

English

368

Tim Kellogg@kellogh·4d

@himanshustwts @MiniMax_AI by recursive self-improvement are you just saying it knows what to remember when taking notes to it’s future self?

English

123

himanshu@himanshustwts·4d

Minimax-M2.7 is already on Claude Code! first initial impressions: + they have optimized for recursive self-improvement + incredible role-playing and multi-turn conversations + decent tokens/sec in CC BIG.

English

610

53K

Tim Kellogg@kellogh·4d

@inductionheads he’s mostly on bluesy. very good account

English

Super Dario@inductionheads·5d

Diamond in the rough X account alert His takedown of Bender is fantastic as well, will link below

SE Gyges@segyges

oh substack plug. basically this entire conception of alignment to instruction-following predates LLMs and the premise, the "value-loading problem", is not true. i wrote it up previously verysane.ai/p/alignment-is…

English

3.7K

Tim Kellogg@kellogh·4d

@inductionheads god damn that website thing is incredible. we need more like that

English

Super Dario@inductionheads·4d

Sparsity, go figure

Shiwei Liu@Shiwei_Liu66

Residual connections and pre-norm are not the whole story behind depth utilization. Our new paper shows that many seemingly different design choices — MoE, grouped-query attention, weight decay, and longer sequence length — can be understood through one unifying lens: sparsity. These components induce different forms of sparsity, which reduce output variance and in turn preserve healthier gradient flow across depth. Strikingly, these techniques also complement each other remarkably well: when combined, they lead to substantial improvements in depth utilization and notable gains in downstream accuracy. Paper page: pumpkin-co.github.io/SparsityAndCoD/ Arxiv: arxiv.org/pdf/2603.15389 Leading by @pumpkinnnnne

English

1.4K

Tim Kellogg@kellogh·14 Mar

@NandoDF they release updates even faster

English

Nando de Freitas@NandoDF·14 Mar

What happens to Anthropic when anyone can use Claude Code to generate Claude Code?

English

102

26.2K

Tim Kellogg@kellogh·13 Mar

@norootcause depends what it is. like, i generally don’t send people output directly, but there’s been a few times where i just thought to myself, “yeah, i can’t improve on this, send it”

English

329

@norootcause.surfingcomplexity.com on Bluesky@norootcause·13 Mar

Nobody wants to read the output of someone else’s prompt

English

1.1K

39.7K

Tim Kellogg@kellogh·12 Mar

@koltregaskes yeah, openai is distilling from deepseek now

English

195

Kol Tregaskes@koltregaskes·12 Mar

Is Healer Alpha the GPT-5 Omni model (GPT-5.5 or GPT-5o) or DeepSeek v4 or perhaps Kimi K3? I'm leaning towards the OpenAI model personally but very unclear. 1M context, vision, hearing and reasoning.

AiBattle@AiBattle_

2 new Stealth models on OpenRouter Hunter Alpha: - "Hunter Alpha is a 1 Trillion parameter + 1M token context frontier intelligence model built for agentic use. It excels at long-horizon planning, complex reasoning, and sustained multi-step task execution, with the reliability and instruction-following precision that production agentic pipelines demand" Healer Alpha: - "Healer Alpha is a frontier omni-modal model with vision, hearing, reasoning, and action capabilities. It brings the full power of agentic intelligence into the real world: natively perceiving visual and audio inputs, reasoning across modalities, and executing complex multi-step tasks with precision and reliability"

English

10.3K

Tim Kellogg@kellogh·11 Mar

@_lopopolo openai guy spotted in the wild

English

lopopolo@_lopopolo·11 Mar

I have seen the future and in the future I have zero desire for the model to be my buddy and have good personality

English

2.1K

Tim Kellogg@kellogh·11 Mar

@willccbb @teortaxesTex yeah that’s what i thought. crazy

English

will brown@willccbb·11 Mar

@teortaxesTex “pretrained in nvfp4” is the headline imo we hadn’t seen viability of this in the wild yet really

English

103

3.9K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·11 Mar

Well, seems we're not getting DeepSeek V4 today but we're getting what amounts to its lite version runnable on normal hardware. New architecture, fast, 1M context… …and it's a bit weaker than the equivalent Qwen 3.5.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Lisan al Gaib@scaling01

Nvidia released Nemotron 3 Super - a 120B-A12B hybrid Mamba model with LatentMoE and MTP - pre-trained on 25T tokens in NVFP4 - context up to 1M - 2.2X faster inference than GPT-OSS-120B - 7.5X faster inference than Qwen3.5-122B huggingface.co/nvidia/NVIDIA-…

English

171

44.1K

Tim Kellogg@kellogh·11 Mar

@inductionheads don’t forget about ibm

English

103

Super Dario@inductionheads·11 Mar

Nvidia will carry US open source. Commoditize your complement, from here to the singularity. 🇺🇸

NVIDIA Newsroom@nvidianewsroom

NVIDIA Nemotron 3 Super is here to accelerate the era of agentic AI. Optimized for NVIDIA Blackwell, this 120B open model uses a hybrid Mixture-of-Experts (MoE) architecture that delivers 5x higher throughput and 2x higher accuracy. The model combines advanced reasoning with a 1-million-token context window, enabling autonomous agents to solve complex tasks with speed and precision.

English

427

31.5K

Tim Kellogg@kellogh·11 Mar

@willccbb inb4 7 PMs DM me

English

367

will brown@willccbb·11 Mar

wow

First Squawk@FirstSquawk

MICROSOFT: ANNOUNCES AGENTIC END-TO-END MODERNIZATION SOLUTION

QST

377

60.5K

Tim Kellogg@kellogh·11 Mar

@rickasaurus yeah i don’t think it matters. it’s all just information & flow. better curation skills would help, the fundamental skill is knowing what to remember & what to forget

English

Rick@rickasaurus·11 Mar

@kellogh I mean in the weights

English

Rick@rickasaurus·11 Mar

This whole AI thing would be a lot easier if you could precisely control what the agents know

English

702

ค้นพบ

@VadimStrizheus @jessfraz @YouJiacheng @Kimi_Moonshot @cursor_ai @FireworksAI_HQ @kimmonismus @teortaxesTex