Jake

11

Jake retweetet

Jake@JakeKAllDay·11h

Moonshot AI continues to be the *most* open of the OS shops (#qwen and GLM are still great too!). #Kimi K2.6 is a legitimately frontier model, making it available from the start is great pressure on the Big 3 cloud providers to keep costs down. It should also be a boon to @cursor_ai users, as #Composer 2 was based on Kimi and provides vastly better value than Claude.

Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

4

81

1.5K

Jake@JakeKAllDay·33m

@pHequals7 Worst at tool calling (and agentic capabilities by extension), but still plenty of post training to be had on accuracy, IF, alignment, domain specialization, etc.

English

42

pH@pHequals7·11h

find it odd that every Deepmind person I come across works on post training but gemini is probably the worst post trained model out of all frontier models

Yuchen Jin@Yuchenj_UW

Google DeepMind formed a strike team to improve its coding models, with Sergey Brin directly involved. It’s surprising to me that Google has the world’s largest internal codebase (>2B LOC), yet lags behind Anthropic and OpenAI in coding + agents. “Google’s AI writes 50% of code, trailing Anthropic’s near 100%.” Google engineers are largely limited to internal models and tools (like Gemini CLI and Antigravity), so this makes sense. I think Sergey in founder mode can fix it again this time.

English

11

8

236

23.8K

Jake@JakeKAllDay·5h

*very* few companies voluntarily involute their products. They bias toward upselling. At a minimum, they try to increase volume and keep ASP the same. "Raise prices for more functionality" has long been the norm. Lowering prices means "owning" a smaller pie, vs getting the largest absolute piece you can.

English

1

9

corujautx 🇧🇷🇦🇺🇺🇦🇹🇼@corujautx·6h

@JakeKAllDay @econcallum Why would you not pass the productivity gains to the customer and own the market? Prices should fall to the consumer, yet, they aren’t.

English

0

24

Callum Williams@econcallum·10h

This is a highly underrated point. Programmer productivity improvements SHOULD show up as falling software prices. This is the historical norm. In fact, though, in recent months software prices have actually been *rising*...the opposite of what should happen

wanye@xwanyex

I don’t have to be convinced that LLM’s make programmers more productive. But where’s all the stuff? We’ve now had months and months of 100x or 1000x programmet productivity improvements. Where’s all the stuff they’re building?

English

10

11

106

12.1K

Jake@JakeKAllDay·5h

@econcallum it isn't -- this data is based on NIPA. It is an inflation (product) level metric. it has absolutely no way to scale for what features exist within a given software bundle. apps.bea.gov/iTable/?1921=u…

English

3

43

Callum Williams@econcallum·6h

@JakeKAllDay Feature cost should be accounted for in this price index

English

0

2

181

Jake@JakeKAllDay·6h

@Goosehater123 @qcapital2020 'hard to disentangle' is a terrible excuse for misleading statistics. Amazon, by revenue, is mostly a retail company (not so by valuation). Measuring their AI capex, a tiny portion of their AWS business currently, against their retail revenue is dumb.

English

65

Moose@Goosehater123·6h

@JakeKAllDay @qcapital2020 Hard to disentangle. Machine learning has already been a core component of all their businesses. Just AMZ for ex. Amazon Ads, AMZ search, AWS, even AMZ logistics like FBA, warehousing, pick/fulfillment, etc. The line between ML and AI is hard to determine, It’s more gradual

English

0

85

 Q-Cap @qcapital2020·8h

Capex Bubble for ants

English

23

58

779

70.2K

Jake retweetet

Jake@JakeKAllDay·7h

@qcapital2020 This is a crazy misleading chart. VAST majority of the revenue of Amazon/google/Meta/Oracle/MSFT has nothing to do with AI. Most of Amazon rev doesn’t even have to do with IT! Google + Meta are 0.5T in just Ad revenue. It’s apples and orangutans.

English

4

29

1.7K

Jake@JakeKAllDay·7h

MCP standard reached critical mass during 2.5/5/4 period and then the corpus existed to do proper RL for tool calling. Coding capabilities also progressed which was a tailwind on agentic scope. Gemini series is the smartest internally but their tool calling RL is still the weakest. Hence the disparity bw benchmark and lay engineer (I said what I said) perspective. Hopefully it gets fixed on the next cycle.

English

5

1.1K

Mike Knoop@mikeknoop·9h

Extremely clear what caused the qualitative leap from GPT 4 to o1 (test time adaptation via chain of thought reasoning). Not clear what caused the agentic leap from Gemini 2.5/GPT 5.1/Opus 4.1 to Gemini 3/GPT 5.2/Opus 4.5. Even crazier all three released ~3 weeks apart.

English

24

4

232

21.4K

Jake@JakeKAllDay·7h

@MichaelFKane One day, Children of Húrin will be made into a movie and Glaurung will get his due. @andyserkis already narrated it perfectly in audiobook form hopefully he can make that happen.

English

Sandy Petersen 🪔@SandyofCthulhu

5

Michael F Kane@MichaelFKane·2d

Arguably, Tolkien writes the Platonic Dragon, or as close as any story will ever get anyway.

@Strangeland_Elf Tolkien didn’t invent dragons either. He just created one of the greatest fiction dragons ever. Smaug

English

10

0

38

1.1K

Jake@JakeKAllDay·8h

@MichaelFKane “It’s only a REAL victory if there are zero detectable costs”

English

Christopher F. Rufo ⚔️@christopherrufo

0

57

621

Michael F Kane@MichaelFKane·11h

The US will never be allowed a military victory again because arbitrary metrics unrelated to actual military goals will be applied to declare defeat. And this isn't particularly a shot at Chris, because he is right that oil price standard will be applied to Iran, despite oil being a tertiary concern at best to the administration's actual objectives in Iran. But we are so well off and comfortable that talking heads will apply any discomfort or inconvenience to the account and credit it as a loss.

One way to measure that stable victory conditions have been met would be the price of oil returning to the pre-war median price

English

44

86

1.1K

44.6K

Jake@JakeKAllDay·11h

@ThePowerAudit I’ve never understood why people think Iran would seriously play the Red Sea card. It might not work, it will bring KSA into the fight, and will lose their CCP support. It’s great for Russia but that’s about it

English

Degrees of Change@DegreesOcean

1

103

Chris Rollins@ThePowerAudit·12h

The only cards Iran holds are a suicide card that assumes they can convince the Houthis to enact it too. I do not think they would at this point. I also do not think the US will eliminate the power plants in Iran. Game theory says you don't, not unless you want the destruction of the entire region as mentioned. (there is a devastatingly sad benefit for the US economically long term to this) China is not going to let the IRGC harm them and make them even more indebted to the US by doing it either. The "cards" Iran holds are not actually leverage. It is more of a threatened suicide vest strapped to the region. If I aim the US, I would NOT take out the power plants or attempt to send Iran to the stone age. But if Iran does pull the pin, guess whose oil and LNG just became the hottest commodity in the world? US LNG at $20 JKM. US propane at $1.13/gal delivered ARA. US crude exports at a record 5.2 million barrels a day. That is the position Iran's suicide card actually strengthens. Here is the game theory. Iran has two choices. Execute the threat and die, which makes US energy the strongest card on Earth and hands Trump the strongest hand in a generation. Or do not execute, and keep bleeding revenue through the filter every day the blockade operates. Both branches favor the US. There is no Iranian move that improves Iran's position. A threat where every outcome helps your opponent is not a threat. What sort of leverage is that?

@ThePowerAudit What are you talking about? Iran holds ALL the cards. It can destroy Abqaiq, Ras Tanura, Ras Laffan, Yanbu etc, etc and send the entire world back to the stone age. There is no way to stop them destroying all the energy infrastructure. Why can't you Americans see that???

English

5

1

21

3.4K

Jake@JakeKAllDay·12h

@StirlingForge @Alibaba_Qwen It’ll be considerably larger (hence the naming convention). Just wait for 3.6 27B dense

English

1

20

Stirling Forge (Unsupervised)@StirlingForge·12h

@JakeKAllDay @Alibaba_Qwen I mean if it's like 122B like their last few big boys, then it might be feasible for me

English

0

127

Qwen@Alibaba_Qwen·13h

🚀 Introducing Qwen3.6-Max-Preview, an early preview of our next flagship model Highlights： ⚡️ Improved agentic coding capability over Qwen3.6-Plus 📖 Stronger world knowledge and instruction following 🌍 Improved real-world agent and knowledge reliability performance Smarter, sharper, still evolving. More Qwen3.6 models to come. Stay tuned! 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… API: modelstudio.console.alibabacloud.com/ap-southeast-1…

English

152

378

3.7K

252.8K

Jake@JakeKAllDay·13h

@StirlingForge @Alibaba_Qwen Too many for you to worry about running it locally.

English

0

9

197

Stirling Forge (Unsupervised)@StirlingForge·13h

@Alibaba_Qwen OPEN SOURCE IT PLEASEEEE. Also how many parameters is that?

English

0

51

2.7K

Jake@JakeKAllDay·13h

@IrvingSwisher Gasoline prices were 4.4% of GDP at the peak of the 1980 shock. They’re typically less than 2% in today’s economy. Oil as a % of US energy consumed is vastly lower than it was before. Brent prices also =/= broader petroleum costs. open.substack.com/pub/jakekooker…

English

Oil matters a lot less than it used to More charts: a16z.news/p/charts-of-th…

1

1.3K

Skanda Amarnath@IrvingSwisher·23h

This would be good for an interview question to test basic statistical reasoning “How valid is this claim based on this chart?” “What else would you need to know to feel confident making this claim” [It is a dumb slide that misses the relevance of petroleum products to growth]

a16z@a16z

English

22

11

476

122.5K

Jake@JakeKAllDay·13h

@VKMacro @stja42860 China is hoarding oil and has banned exports so it is logical to treat them as a separate, dislocated market.

English

HFI Research@HFI_Research

1

56

VKMacro@VKMacro·18h

@stja42860 Not a different conclusion per se, but China inventories have been flat to up since the crisis began. Also, China demand can fluctuate significantly in the near term which offsets against RoW inventories.

English

3

0

3

684

VKMacro@VKMacro·19h

Sorry but you have to include China lol

In the last 6 weeks, we've drawn ~200 million bbls from visible onshore inventories + oil-on-water. This pace is set to accelerate as the earlier part of the conflict saw Middle East onshore storage swell.

English

8

0

30

14.3K

Jake@JakeKAllDay·1d

#gemini has shown the last few months that as long as you’re near the frontier, the most important thing is reducing cost to serve (through MoE, vertical integration/TPUs, etc). Worth noting Google has added more this past year than oAI + Anthropic are worth combined ($2.3T!)

Pedro Domingos@pmddomingos

Google is doing to OpenAI what Microsoft did to Netscape.

English