Interconnects

324 posts

Interconnects

@interconnectsai

What you need to know about AI research trends, from @natolambert Wednesday mornings weekly, sometimes extra posts.

เข้าร่วม Haziran 2023

2 กำลังติดตาม8.1K ผู้ติดตาม

Interconnects@interconnectsai·3d

Lossy self-improvement Why self-improvement is real but it doesn't lead to fast takeoff. interconnects.ai/p/lossy-self-i…

English

438

Interconnects@interconnectsai·18 Mar

GPT 5.4 is a big step for Codex On evaluating and understanding the frontier of agents, and why I still turn to Claude. interconnects.ai/p/gpt-54-is-a-…

English

602

Interconnects@interconnectsai·16 Mar

What comes next with open models Markets, capabilities, cope, and bewilderment in the industrialization of language models. interconnects.ai/p/the-next-pha…

English

14.3K

Interconnects รีทวีตแล้ว

Nathan Lambert@natolambert·16 Mar

Open models, what comes next Don't rely on open models catching the frontier The change from models to systems (tools/harness) Business models supporting open are far from viable (except nvidia) How to change from a few key weights to a winning ecosystem interconnects.ai/p/the-next-pha…

English

18.3K

Interconnects รีทวีตแล้ว

Nathan Lambert@natolambert·12 Mar

For people who are just learning about Nemotron with the awesome Nemotron 3 Super drop, recommend you watching this interview I did with @ctnzr -- Nemotron as a project is a LONG time coming. youtube.com/watch?v=Y3Vb6e…

YouTube

English

105

13.4K

Interconnects@interconnectsai·6 Mar

Dean Ball on open models and government control Subtle precedents on the future of open models set by the unfolding Anthropic v. Department of War case. interconnects.ai/p/how-anthropi…

English

10.9K

Interconnects@interconnectsai·5 Mar

Olmo Hybrid and future LLM architectures The latest Olmo model and discussions at the frontier of open-source post training tools. interconnects.ai/p/olmo-hybrid-…

English

759

Interconnects@interconnectsai·3 Mar

interconnects.ai/p/latest-open-…

ZXX

784

Interconnects@interconnectsai·3 Mar

Latest open artifacts (#19): @Alibaba_Qwen 3.5, @Zai_org GLM 5, @MiniMax_AI 2.5 — Chinese labs' latest push of the frontier. Featuring breakdown & analysis of: - Alibaba’s Qwen 3.5 (from 0.8B to 397B), Z.ai’s GLM-5 (744B), and @StepFun_ai 's Step-3.5-Flash. - Plus: Introducing our Relative Adoption Metrics (RAM) to track underrated models like GPT-OSS. - And covering new releases from: @MistralAI , @perplexity_ai , @cohere , @TrillionLabs , @OpenBMB , @nanbeige , @TheInclusionAI , @liquidai , @intern_lm , @JD_Corporate , and @meituan By @natolambert and @xeophon

English

8.2K

Interconnects รีทวีตแล้ว

Nathan Lambert@natolambert·3 Mar

Latest open artifacts (#19): Qwen 3.5, GLM 5, MiniMax 2.5 — Chinese labs' latest push of the frontier. We're starting to roll out more analysis with relative adoption metric (RAM). Winners: GPT OSS, K2 Thinking, OCR models. Losers: DeepSeek v3.2. interconnects.ai/p/latest-open-…

English

7.4K

Interconnects@interconnectsai·24 Şub

How much does distillation really matter for Chinese LLMs? Reacting to Anthropic's post on "distillation attacks." interconnects.ai/p/how-much-doe…

English

588

Interconnects รีทวีตแล้ว

Nathan Lambert@natolambert·17 Şub

Open models are in a perpetual race to stay relevant at the frontier. While they're doing better than I, and many experts would expect given the cost of models, I don't see evidence that open models are accelerating and surpassing the best closed models. interconnects.ai/p/open-models-…

English

145

50.5K

Interconnects@interconnectsai·17 Şub

Open models in perpetual catch-up The open-closed gap, distillation, innovation timescales, how open models win, specialized models, what’s missing, etc. interconnects.ai/p/open-models-…

English

735

Interconnects รีทวีตแล้ว

Nathan Lambert@natolambert·9 Şub

In a long time testing the new Opus 4.6 and Codex 5.3 models the most striking thing was how model releases are far trickier to read in 2026. I’m in my post-benchmark era. Claude is still king, but codex is closer than ever. interconnects.ai/p/opus-46-vs-c…

English

263

56.1K

Interconnects@interconnectsai·9 Şub

Opus 4.6, Codex 5.3, and the post-benchmark era On comparing models in 2026. interconnects.ai/p/opus-46-vs-c…

English

740

Interconnects รีทวีตแล้ว

Nathan Lambert@natolambert·8 Şub

Top 100 LLMs by Downloads Since August 2025 Source: @interconnectsai HuggingFace Snapshots Model list on GitHub: Interconnects-AI/tracked-models (~1.5K models) Featuring: @alibaba_qwen: 40, @AIatMeta: 13, @deepseek_ai: 10, @Microsoft: 8, @GoogleAI: 7, @mistralai: 4, @OpenAI: 2, @allen_ai: 2, @vikhyatk: 1, @NVIDIAAI: 1, @huggingface: 1, @Zai_org: 1, @TencentGlobal: 1 1. meta-llama/Llama-3.1-8B-Instruct - 53.3M 2. Qwen/Qwen2.5-7B-Instruct - 52.4M 3. Qwen/Qwen2.5-VL-3B-Instruct - 49.5M 4. Qwen/Qwen2.5-3B-Instruct - 46.3M 5. Qwen/Qwen3-0.6B - 45.6M 6. openai/gpt-oss-20b - 43.1M 7. Qwen/Qwen2.5-1.5B-Instruct - 32.6M 8. meta-llama/Llama-3.2-1B-Instruct - 27.6M 9. Qwen/Qwen3-8B - 24.0M 10. Qwen/Qwen2.5-VL-7B-Instruct - 23.3M 11. openai/gpt-oss-120b - 22.3M 12. google/gemma-3-1b-it - 20.7M 13. Qwen/Qwen3-4B-Instruct-2507 - 19.7M 14. google/t5gemma-b-b-prefixlm - 17.5M 15. Qwen/Qwen3-4B - 17.1M 16. Qwen/Qwen3-32B - 15.5M 17. meta-llama/Llama-3.2-1B - 15.3M 18. Qwen/Qwen2-VL-2B-Instruct - 15.2M 19. Qwen/Qwen3-1.7B - 15.1M 20. deepseek-ai/DeepSeek-OCR - 15.0M 21. deepseek-ai/DeepSeek-R1-Distill-Qwen-32B - 14.5M 22. mistralai/Mistral-7B-Instruct-v0.2 - 13.3M 23. Qwen/Qwen3-Next-80B-A3B-Instruct - 13.0M 24. Qwen/Qwen2.5-0.5B-Instruct - 12.8M 25. meta-llama/Meta-Llama-3-8B - 12.0M 26. Qwen/Qwen2.5-Coder-0.5B-Instruct - 11.7M 27. meta-llama/Llama-3.2-3B-Instruct - 11.6M 28. vikhyatk/moondream2 - 11.2M 29. Qwen/Qwen2.5-14B-Instruct - 10.3M 30. Qwen/Qwen2.5-32B-Instruct - 9.2M 31. Qwen/Qwen3-VL-8B-Instruct - 8.7M 32. Qwen/Qwen2-VL-7B-Instruct - 8.6M 33. Qwen/Qwen2.5-7B - 8.5M 34. microsoft/Phi-3-mini-4k-instruct - 8.0M 35. meta-llama/Meta-Llama-3-8B-Instruct - 7.7M 36. google/gemma-3-27b-it - 7.7M 37. google/gemma-3-12b-it - 7.1M 38. llava-hf/llava-1.5-7b-hf - 7.1M 39. deepseek-ai/DeepSeek-R1-Distill-Llama-8B - 7.0M 40. google/gemma-3-4b-it - 7.0M 41. Qwen/Qwen3-VL-30B-A3B-Instruct - 6.9M 42. Qwen/Qwen3-4B-Base - 6.9M 43. deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B - 6.8M 44. Qwen/Qwen2.5-0.5B - 6.8M 45. meta-llama/Llama-3.1-8B - 6.8M 46. OpenGVLab/InternVL2-2B - 6.7M 47. Qwen/Qwen3-30B-A3B-Instruct-2507 - 6.5M 48. nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1 - 6.2M 49. mistralai/Mistral-7B-Instruct-v0.3 - 6.2M 50. Qwen/Qwen2.5-VL-32B-Instruct - 6.2M 51. deepseek-ai/DeepSeek-R1-Distill-Qwen-7B - 6.0M 52. microsoft/phi-2 - 6.0M 53. Qwen/Qwen3-14B - 5.8M 54. meta-llama/Llama-2-7b-hf - 5.5M 55. Qwen/Qwen2-1.5B-Instruct - 5.5M 56. microsoft/Florence-2-large - 5.3M 57. HuggingFaceTB/SmolLM2-135M - 4.8M 58. microsoft/phi-4 - 4.7M 59. meta-llama/Llama-3.1-70B-Instruct - 4.6M 60. zai-org/chatglm2-6b - 4.2M 61. Qwen/Qwen2.5-Coder-7B-Instruct - 4.2M 62. rednote-hilab/dots.ocr - 4.1M 63. OpenGVLab/InternVL3_5-241B-A28B-Instruct - 4.1M 64. Qwen/Qwen2.5-1.5B - 4.0M 65. OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF - 3.9M 66. meta-llama/Llama-2-7b-chat-hf - 3.9M 67. Qwen/Qwen3-Coder-30B-A3B-Instruct - 3.9M 68. deepseek-ai/DeepSeek-R1 - 3.8M 69. mistralai/Mistral-Small-24B-Instruct-2501 - 3.7M 70. microsoft/Phi-3.5-vision-instruct - 3.7M 71. meta-llama/Llama-3.3-70B-Instruct - 3.7M 72. deepseek-ai/DeepSeek-V3 - 3.6M 73. OpenGVLab/InternVL3-78B - 3.6M 74. deepseek-ai/DeepSeek-R1-0528 - 3.5M 75. OpenGVLab/InternVL3-14B - 3.5M 76. Qwen/Qwen3-30B-A3B - 3.2M 77. Qwen/Qwen3-VL-2B-Instruct - 3.2M 78. meta-llama/Llama-3.2-3B - 3.2M 79. microsoft/Florence-2-base - 3.2M 80. google/paligemma2-3b-pt-224 - 3.2M 81. allenai/OLMo-2-0425-1B - 3.1M 82. Qwen/Qwen3-VL-32B-Instruct - 3.0M 83. tencent/HunyuanOCR - 3.0M 84. OpenGVLab/InternVL2-1B - 2.9M 85. Qwen/Qwen3-8B-Base - 2.8M 86. Qwen/Qwen2.5-VL-72B-Instruct - 2.8M 87. google/gemma-2-2b-it - 2.7M 88. llava-hf/llava-v1.6-mistral-7b-hf - 2.7M 89. microsoft/Phi-4-multimodal-instruct - 2.7M 90. mistralai/Mixtral-8x7B-Instruct-v0.1 - 2.7M 91. Qwen/Qwen3-VL-4B-Instruct - 2.7M 92. Qwen/Qwen2.5-Coder-1.5B - 2.7M 93. meta-llama/Llama-3.2-11B-Vision-Instruct - 2.6M 94. Qwen/Qwen2-0.5B - 2.6M 95. Qwen/Qwen3-0.6B-Base - 2.5M 96. Qwen/Qwen3-4B-Thinking-2507 - 2.5M 97. deepseek-ai/DeepSeek-R1-Distill-Llama-70B - 2.5M 98. deepseek-ai/deepseek-coder-1.3b-instruct - 2.4M 99. microsoft/Phi-3-mini-128k-instruct - 2.4M 100. allenai/olmOCR-2-7B-1025-FP8 - 2.3M

English

105

76.7K

Interconnects@interconnectsai·6 Şub

Why did @NVIDIA build Megatron? 🤖⚡ @ctnzr breaks down the origin story of the project that proved state-of-the-art Transformers could be built on NVIDIA hardware. The name? Let’s just say they wanted the "biggest and baddest" Transformer out there. 🦾

English

1.4K

ค้นพบ

@ctnzr @Alibaba_Qwen @Zai_org @MiniMax_AI @StepFun_ai @MistralAI @perplexity_ai @cohere