David K. (@flopsy42) - Twitter Profili | Zamantika Mersobahis Locabet

David K.@flopsy42·2d

@initjean Funny reference

English

0

21

Jean P.D. Meijer ― 🇪🇺 eu/acc@initjean·2d

"Well, Claude Max 20x is actually now Max 5x. But it is better than Pro."

Jean P.D. Meijer ― 🇪🇺 eu/acc tweet media

English

96

442

6.8K

373.4K

David K.@flopsy42·2d

@thsottiaux He is back 🙏

English

0

18

Tibo@thsottiaux·2d

Hello. We have reset Codex usage limits across all plans to let everyone experiment with the magnificent plugins we just launched, and because it had been a while! You can just build unlimited things with Codex. Have fun!

English

669

394

9.1K

889.2K

David K.@flopsy42·2d

A surprise coming soon on deepgithub.com You ll like it

English

0

15

David K. retweetledi

DeepManim@manimable·4d

TurboQuant AI models waste massive memory on vectors. Compressing them usually adds overhead defeating the purpose. Google's new paper uses just 1 extra bit to eliminate that overhead. Result: same accuracy, way less memory. Accepted at ICLR 2026. The trick? Random rotations + a 50-year-old math theorem. Here is a deepmanim.com overview of the paper. #manim

English

1

2

1

33

David K.@flopsy42·4d

@cursor_ai Great job cursor ! Here is an overview of the paper. Deepmanim.com truly made justice to the paper.

English

0

1

123

Cursor@cursor_ai·5d

We're releasing a technical report describing how Composer 2 was trained.

English

165

490

5.1K

1.2M

David K.@flopsy42·5d

@AnthropicAI Great blog ! here is an overview made by deepmanim.com

English

0

1

3

656

Anthropic@AnthropicAI·5d

New on the Anthropic Engineering Blog: How we use a multi-agent harness to push Claude further in frontend design and long-running autonomous software engineering. Read more: anthropic.com/engineering/ha…

English

290

913

6.6K

1.7M

David K.@flopsy42·5d

@cursor_ai Here are great animations made by deepmanim.com x.com/manimable/stat…

DeepManim@manimable

Cursor's AI agents spend most of their time doing one thing: grep. A tool from 1973. When your codebase is small, it's fine. But Enterprise customers have monorepos where a single grep takes 15+ seconds. So Cursor built a new index, from scratch. Here is an overview made by deepmanim.com

English

0

53

Cursor@cursor_ai·6d

Cursor can now search millions of files and find results in milliseconds. This dramatically speeds up how fast agents complete tasks. We're sharing how we built Instant Grep, including the algorithms and tradeoffs behind the design.

English

189

364

5.9K

1M

David K. retweetledi

DeepManim@manimable·6d

Cursor's AI agents spend most of their time doing one thing: grep. A tool from 1973. When your codebase is small, it's fine. But Enterprise customers have monorepos where a single grep takes 15+ seconds. So Cursor built a new index, from scratch. Here is an overview made by deepmanim.com

English

0

1

2

106

David K.@flopsy42·23 Mar

It's just like humans, pattern matching machines, with a bit of luck from time to time. @terrence_tao

DeepManim@manimable

EsoLang-Bench. A programmer who learned Python can pick up most new languages pretty fast. This study tested if AI can do the same. 80 problems across 5 esoteric languages. Same logic as Python. Result: 0% on Medium/Hard. All models. All strategies. What does this tell us about how AI learns vs how humans learn? deepmanim.com made an overview of the paper. try it out. #ai #intelligence #manim

English

0

23

David K. retweetledi

DeepManim@manimable·23 Mar

EsoLang-Bench. A programmer who learned Python can pick up most new languages pretty fast. This study tested if AI can do the same. 80 problems across 5 esoteric languages. Same logic as Python. Result: 0% on Medium/Hard. All models. All strategies. What does this tell us about how AI learns vs how humans learn? deepmanim.com made an overview of the paper. try it out. #ai #intelligence #manim

English

0

1

66

David K. retweetledi

Dwarkesh Patel@dwarkesh_sp·21 Mar

Terence Tao spent a year at the Institute for Advanced Study - no teaching, no random events of committees, just unlimited time to think. But after a few months, he ran out of ideas. Terence thinks that mathematicians and scientists need a certain level of randomness and inefficiency to come up with new ideas.

English

128

604

5.8K

893.1K

David K.@flopsy42·21 Mar

@amanrsanger @simonw Man how can you “miss” such an Elephant? I mean it’s a gargantuan Elephant

English

0

1

52

Aman Sanger@amanrsanger·20 Mar

We've evaluated a lot of base models on perplexity-based evals and Kimi k2.5 proved to be the strongest! After that, we do continued pre-training and high-compute RL (a 4x scale-up). The combination of the strong base, CPT and RL, and Fireworks' inference and RL samplers make Composer-2 frontier level. It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model.

Kimi.ai@Kimi_Moonshot

Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.

English

152

134

2.5K

488.8K

David K.@flopsy42·20 Mar

@cerebras GLM 5 on cerebras when ?

English

0

1

109

Cerebras@cerebras·20 Mar

Packed room. Line around the block. Why? Because everyone wants to go faster. Appreciate everyone who showed up.

English

7

9

92

7.7K

David K.@flopsy42·20 Mar

@AiBattle_ That s a great occasion to know if AA-Intelligence Index is broken or not, what is your personal experience ?

English

0

4

409

AiBattle@AiBattle_·20 Mar

Mistral Small 4 (reasoning) , a 119B 6.5A MoE model, has the same AA-Intelligence Index score as the dense Qwen-3.5-4B (reasoning) model The Qwen-3.5-4B (non-reasoning) model has a higher score than the Mistral Small 4 (non-reasoning) model

Artificial Analysis@ArtificialAnlys

Mistral has released Mistral Small 4, an open weights model with hybrid reasoning and image input, scoring 27 on the Artificial Analysis Intelligence Index @MistralAI's Small 4 is a 119B mixture-of-experts model with 6.5B active parameters per token, supporting both reasoning and non-reasoning modes. In reasoning mode, Mistral Small 4 scores 27 on the Artificial Analysis Intelligence Index, a 12-point improvement from Small 3.2 (15) and now among the most intelligent models Mistral has released, surpassing Mistral Large 3 (23) and matching the proprietary Magistral Medium 1.2 (27). However, it lags open weights peers with similar total parameter counts such as gpt-oss-120B (high, 33), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), and Qwen3.5 122B A10B (Reasoning, 42). Key takeaways: ➤ Reasoning and non-reasoning modes in a single model: Mistral Small 4 supports configurable hybrid reasoning with reasoning and non-reasoning modes, rather than the separate reasoning variants Mistral has released previously with their Magistral models. In reasoning mode, the model scores 27 on the Artificial Analysis Intelligence Index. In non-reasoning mode, the model scores 19, a 4-point improvement from its predecessor Mistral Small 3.2 (15) ➤ More token efficient than peers of similar size: At ~52M output tokens, Mistral Small 4 (Reasoning) uses fewer tokens to run the Artificial Analysis Intelligence Index compared to reasoning models such as gpt-oss-120B (high, ~78M), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, ~110M), and Qwen3.5 122B A10B (Reasoning, ~91M). In non-reasoning mode, the model uses ~4M output tokens ➤ Native support for image input: Mistral Small 4 is a multimodal model, accepting image input as well as text. On our multimodal evaluation, MMMU-Pro, Mistral Small 4 (Reasoning) scores 57%, ahead of Mistral Large 3 (56%) but behind Qwen3.5 122B A10B (Reasoning, 75%). Neither gpt-oss-120B nor NVIDIA Nemotron 3 Super 120B A12B support image input. All models support text output only ➤ Improvement in real-world agentic tasks: Mistral Small 4 scores an Elo of 871 on GDPval-AA, our evaluation based on OpenAI's GDPval dataset that tests models on real-world tasks across 44 occupations and 9 major industries, with models producing deliverables such as documents, spreadsheets, and diagrams in an agentic loop. This is more than double the Elo of Small 3.2 (339) and close to Mistral Large 3 (880), but behind gpt-oss-120B (high, 962), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 1021), and Qwen3.5 122B A10B (Reasoning, 1130) ➤ Lower hallucination rate than peer models of similar size: Mistral Small 4 scores -30 on AA-Omniscience, our evaluation of knowledge reliability and hallucination, where scores range from -100 to 100 (higher is better) and a negative score indicates more incorrect than correct answers. Mistral Small 4 scores ahead of gpt-oss-120B (high, -50), Qwen3.5 122B A10B (Reasoning, -40), and NVIDIA Nemotron 3 Super 120B A12B (Reasoning, -42) Key model details: ➤ Context window: 256K tokens (up from 128K on Small 3.2) ➤ Pricing: $0.15/$0.6 per 1M input/output tokens ➤ Availability: Mistral first-party API only. At native FP8 precision, Mistral Small 4's 119B parameters require ~119GB to self-host the weights (more than the 80GB of HBM3 memory on a single NVIDIA H100) ➤ Modality: Image and text input with text output only ➤ Licensing: Apache 2.0 license

English

4

5

118

9.9K

David K.@flopsy42·20 Mar

@Angaisb_ @adonis_singh It has the big model stink ?

English

0

517

Angel 🌼@Angaisb_·20 Mar

@adonis_singh I've also noticed it, I like Sonnet 4.6 much more, and for some reason (at least for me) it feels bigger

English

4

0

43

6.3K

adi@adonis_singh·20 Mar

sonnet 4.6 feels qualitatively different than opus 4.6 in a way that is not just explained by 'smaller model'

English

21

3

479

48.2K

David K.@flopsy42·20 Mar

@adonis_singh In which meaning ?

English

0

1.9K

David K.@flopsy42·20 Mar

@thegenioo I think “europe” is one big reason

English

1

0

22

Hamza@thegenioo·20 Mar

@flopsy42 you think that’s the reason?

English

1

0

364

Hamza@thegenioo·20 Mar

Mistral is xAI of Open source labs

Artificial Analysis@ArtificialAnlys

Mistral has released Mistral Small 4, an open weights model with hybrid reasoning and image input, scoring 27 on the Artificial Analysis Intelligence Index @MistralAI's Small 4 is a 119B mixture-of-experts model with 6.5B active parameters per token, supporting both reasoning and non-reasoning modes. In reasoning mode, Mistral Small 4 scores 27 on the Artificial Analysis Intelligence Index, a 12-point improvement from Small 3.2 (15) and now among the most intelligent models Mistral has released, surpassing Mistral Large 3 (23) and matching the proprietary Magistral Medium 1.2 (27). However, it lags open weights peers with similar total parameter counts such as gpt-oss-120B (high, 33), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), and Qwen3.5 122B A10B (Reasoning, 42). Key takeaways: ➤ Reasoning and non-reasoning modes in a single model: Mistral Small 4 supports configurable hybrid reasoning with reasoning and non-reasoning modes, rather than the separate reasoning variants Mistral has released previously with their Magistral models. In reasoning mode, the model scores 27 on the Artificial Analysis Intelligence Index. In non-reasoning mode, the model scores 19, a 4-point improvement from its predecessor Mistral Small 3.2 (15) ➤ More token efficient than peers of similar size: At ~52M output tokens, Mistral Small 4 (Reasoning) uses fewer tokens to run the Artificial Analysis Intelligence Index compared to reasoning models such as gpt-oss-120B (high, ~78M), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, ~110M), and Qwen3.5 122B A10B (Reasoning, ~91M). In non-reasoning mode, the model uses ~4M output tokens ➤ Native support for image input: Mistral Small 4 is a multimodal model, accepting image input as well as text. On our multimodal evaluation, MMMU-Pro, Mistral Small 4 (Reasoning) scores 57%, ahead of Mistral Large 3 (56%) but behind Qwen3.5 122B A10B (Reasoning, 75%). Neither gpt-oss-120B nor NVIDIA Nemotron 3 Super 120B A12B support image input. All models support text output only ➤ Improvement in real-world agentic tasks: Mistral Small 4 scores an Elo of 871 on GDPval-AA, our evaluation based on OpenAI's GDPval dataset that tests models on real-world tasks across 44 occupations and 9 major industries, with models producing deliverables such as documents, spreadsheets, and diagrams in an agentic loop. This is more than double the Elo of Small 3.2 (339) and close to Mistral Large 3 (880), but behind gpt-oss-120B (high, 962), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 1021), and Qwen3.5 122B A10B (Reasoning, 1130) ➤ Lower hallucination rate than peer models of similar size: Mistral Small 4 scores -30 on AA-Omniscience, our evaluation of knowledge reliability and hallucination, where scores range from -100 to 100 (higher is better) and a negative score indicates more incorrect than correct answers. Mistral Small 4 scores ahead of gpt-oss-120B (high, -50), Qwen3.5 122B A10B (Reasoning, -40), and NVIDIA Nemotron 3 Super 120B A12B (Reasoning, -42) Key model details: ➤ Context window: 256K tokens (up from 128K on Small 3.2) ➤ Pricing: $0.15/$0.6 per 1M input/output tokens ➤ Availability: Mistral first-party API only. At native FP8 precision, Mistral Small 4's 119B parameters require ~119GB to self-host the weights (more than the 80GB of HBM3 memory on a single NVIDIA H100) ➤ Modality: Image and text input with text output only ➤ Licensing: Apache 2.0 license

English

13

5

150

13.5K

David K.@flopsy42·20 Mar

@theo That s kinda the curse of building in europe, lot of things just slow you down

English

0

128

Theo - t3.gg@theo·20 Mar

Since OpenAI dropped gpt-oss-120b, Mistral has released 4 models that are worse than gpt-pss-120b

Artificial Analysis@ArtificialAnlys

Mistral has released Mistral Small 4, an open weights model with hybrid reasoning and image input, scoring 27 on the Artificial Analysis Intelligence Index @MistralAI's Small 4 is a 119B mixture-of-experts model with 6.5B active parameters per token, supporting both reasoning and non-reasoning modes. In reasoning mode, Mistral Small 4 scores 27 on the Artificial Analysis Intelligence Index, a 12-point improvement from Small 3.2 (15) and now among the most intelligent models Mistral has released, surpassing Mistral Large 3 (23) and matching the proprietary Magistral Medium 1.2 (27). However, it lags open weights peers with similar total parameter counts such as gpt-oss-120B (high, 33), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), and Qwen3.5 122B A10B (Reasoning, 42). Key takeaways: ➤ Reasoning and non-reasoning modes in a single model: Mistral Small 4 supports configurable hybrid reasoning with reasoning and non-reasoning modes, rather than the separate reasoning variants Mistral has released previously with their Magistral models. In reasoning mode, the model scores 27 on the Artificial Analysis Intelligence Index. In non-reasoning mode, the model scores 19, a 4-point improvement from its predecessor Mistral Small 3.2 (15) ➤ More token efficient than peers of similar size: At ~52M output tokens, Mistral Small 4 (Reasoning) uses fewer tokens to run the Artificial Analysis Intelligence Index compared to reasoning models such as gpt-oss-120B (high, ~78M), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, ~110M), and Qwen3.5 122B A10B (Reasoning, ~91M). In non-reasoning mode, the model uses ~4M output tokens ➤ Native support for image input: Mistral Small 4 is a multimodal model, accepting image input as well as text. On our multimodal evaluation, MMMU-Pro, Mistral Small 4 (Reasoning) scores 57%, ahead of Mistral Large 3 (56%) but behind Qwen3.5 122B A10B (Reasoning, 75%). Neither gpt-oss-120B nor NVIDIA Nemotron 3 Super 120B A12B support image input. All models support text output only ➤ Improvement in real-world agentic tasks: Mistral Small 4 scores an Elo of 871 on GDPval-AA, our evaluation based on OpenAI's GDPval dataset that tests models on real-world tasks across 44 occupations and 9 major industries, with models producing deliverables such as documents, spreadsheets, and diagrams in an agentic loop. This is more than double the Elo of Small 3.2 (339) and close to Mistral Large 3 (880), but behind gpt-oss-120B (high, 962), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 1021), and Qwen3.5 122B A10B (Reasoning, 1130) ➤ Lower hallucination rate than peer models of similar size: Mistral Small 4 scores -30 on AA-Omniscience, our evaluation of knowledge reliability and hallucination, where scores range from -100 to 100 (higher is better) and a negative score indicates more incorrect than correct answers. Mistral Small 4 scores ahead of gpt-oss-120B (high, -50), Qwen3.5 122B A10B (Reasoning, -40), and NVIDIA Nemotron 3 Super 120B A12B (Reasoning, -42) Key model details: ➤ Context window: 256K tokens (up from 128K on Small 3.2) ➤ Pricing: $0.15/$0.6 per 1M input/output tokens ➤ Availability: Mistral first-party API only. At native FP8 precision, Mistral Small 4's 119B parameters require ~119GB to self-host the weights (more than the 80GB of HBM3 memory on a single NVIDIA H100) ➤ Modality: Image and text input with text output only ➤ Licensing: Apache 2.0 license

English

89

31

1.9K

149K

David K.@flopsy42·20 Mar

@thegenioo No the 1m context of llama 4 doesn t help llama.👀

English

1

0

36

Hamza@thegenioo·20 Mar

@flopsy42 bcz it has 1M compared to 240k i think

English

1

0

205

Hamza@thegenioo·19 Mar

Idk how but MiniMax M2.7 is a better model on most benchmarks than Xiaomi MiMo-V2-Pro while being marginally cheaper too I do see that MiMo-V2-Pro gives better UI sometimes but not as reliable overall as M2.7 in coding

English

13

3

135

8.6K

David K.

Keşfet