DeepBurner

539 posts

DeepBurner

DeepBurner

@Deep_Burner

ML/CV engineer.

Katılım Ocak 2025
634 Takip Edilen39 Takipçiler
DeepBurner
DeepBurner@Deep_Burner·
@RogueWPA There's a large-ish RW substack called Reality's Last Stand, so yep
English
0
0
4
82
Cicada meth orgy fungus
Just remembered that twenty years ago very online libs described themselves as the "reality based community." Today if there was a substack or something called that you'd know it was center-right, or at least anti-woke center-left.
English
3
4
85
2.3K
Sauers
Sauers@Sauers_·
@Deep_Burner Yes! Claude and I respond: "A jet is the data structure that forward-mode autodiff operates on. Every intermediate variable in the computation is also a jet."
English
1
0
0
306
Sauers
Sauers@Sauers_·
A jet is just a function value with its derivatives up to some order. E.g., 3rd order jet means is the tuple (f(x), f'(x), f''(x), f'''(x)). The Wikipedia page is difficult to understand for unknown reasons
Sauers tweet media
English
11
0
49
3.8K
Benjamin Todd
Benjamin Todd@ben_j_todd·
How is it possible to write a substack with 6000+ likes where the main message is “LeCun is right about everything”?
Benjamin Todd tweet mediaBenjamin Todd tweet media
English
9
0
79
7.4K
DeepBurner
DeepBurner@Deep_Burner·
@actsmaniac The algo is hyper personalized now. I liked one post on project hail mary and it's now 20 percent of my feed
English
0
0
1
26
space cadet 🇪🇺🌐🇩🇪
is it just my algorithm or have I significantly increased the amount of discussion on rent control on politics twitter? I just figured YIMBYism had already won on zoning and supply questions, the next big hurdle for efficient housing markets is strict rent regulation.
English
4
0
31
1.1K
davinci
davinci@leothecurious·
> Ilya believes vision is a "solved" problem and focuses on language SSI ngmi if true
Kyle Chan@kyleichan

I’ve been working my way through this epic 7-hour interview with Xie Saining at AMI Labs. I also asked Gemini to give me top 10 takeaways. Biggest ones are that he turned down Ilya twice and believes world models, not LLMs, are the key to AGI. 1. Non-Linear Path to AI & Academic Freedom Xie emphasizes that his journey wasn't a standard, hyper-competitive path of a "genius." During his time in Shanghai Jiao Tong University's ACM class, his "highlight" was playing video games in his dorm, teaching him the value of unstructured exploration over rigid academic competition [11:46]. He believes the best research is never linear; if a project ends exactly how you initially planned it, it's likely a "boring idea" [02:09:58]. 2. Rejecting OpenAI & Ilya Sutskever (Twice!) In 2018, Xie turned down a job offer from OpenAI in favor of Facebook AI Research (FAIR), which led to an angry phone call from Ilya Sutskever [01:21:04]. More recently, he declined an invitation to join Ilya's new startup, SSI, because of a fundamental philosophical disagreement: Ilya believes vision is a "solved" problem and focuses on language, while Xie believes vision and physical world modeling are the true frontiers of AI [01:25:57]. 3. Silicon Valley is "LLM-Pilled" Xie argues that the tech industry is currently hypnotized by Large Language Models (LLMs) [05:46:51]. While he acknowledges LLMs are revolutionary communication tools, he insists they are not true "world models" because they operate purely in a digital, text-based space and lack the ability to process high-dimensional, noisy, continuous signals of the physical world [04:29:36]. 4. The Definition of a True World Model According to Xie, a true world model must go beyond text and video generation. It must be a "predictive brain" that understands the physical world, possesses associative memory, can reason and plan, and can predict the consequences of actions in the real world [04:31:32]. 5. Founding AMI Labs with Yann LeCun Disillusioned by the current Silicon Valley narrative that treats AI research as a "finite game" of benchmark-chasing and product cycles, Xie co-founded AMI Labs with Turing Award winner Yann LeCun [05:00:58]. The startup acts as an "underdog" aiming to build true predictive world models based on LeCun's JEPA (Joint Embedding Predictive Architecture) vision, separate from the dominant LLM narrative [06:04:42].

English
9
13
319
69.8K
Stefan Schubert
Stefan Schubert@StefanFSchubert·
Big gender gap among young Spanish voters, too
Stefan Schubert tweet media
English
22
116
713
356.9K
Chase Brower
Chase Brower@ChaseBrowe32432·
I painstakingly ran all 20 EsoLang-Bench hard problems through Claude webui. It solved 20/20 (100%). No specialized scaffolding, no expert prompting, no few-shot examples, it just solves them natively. This benchmark just suffocated the models with constrictive scaffolding.
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
47
102
1K
124.1K
DeepBurner
DeepBurner@Deep_Burner·
@peterrhague Yeah. And a yearly show will start to get tired eventually
English
0
0
0
318
DeepBurner
DeepBurner@Deep_Burner·
@torchcompiled Yeah I seriously need to block that emoji and the word breaking, would massively improve the feed
English
0
0
1
62
Ethan
Ethan@torchcompiled·
90% of my timeline is now “🚨BREAKING” followed by the most lukewarm sensationalized misinformation I’ve seen. What a terrible meta
English
2
0
13
622
Eric Glyman
Eric Glyman@eglyman·
If your kid’s lemonade stand processes 0.5–1% of US GDP, then yes, that’s a fair analogy for @tryramp. Ramp’s data is useful for the same reason it gets cited at all: it is quite consistent with the revenue figures OpenAI and Anthropic release. If it weren’t, no one would care.
Eric Glyman tweet media
English
52
30
1.1K
634.6K
DeepBurner
DeepBurner@Deep_Burner·
It makes sense for academia to be credentialist but it can be really frustrating to see. One of the aspects I like most about being in industry instead
Freda Shi@fredahshi

Our workshop was rejected by #ICML2026. Despite having 3 professors (2 full profs) and 2 senior research scientists, the only reason for rejection was "you got an undergrad on the organizing committee," who is actually a highly competent incoming PhD student. (1/)

English
0
0
1
55
DeepBurner
DeepBurner@Deep_Burner·
@deanwball I think you're the only person I've heard using it
English
0
0
0
99
Dean W. Ball
Dean W. Ball@deanwball·
I hope that in the “refocusing” OpenAI does not drop Pulse, which I find insanely useful for surfacing important but under-the-radar news items almost daily.
English
11
3
123
12.4K
DeepBurner
DeepBurner@Deep_Burner·
@crthpl_ @LinkofSunshine I think most people really just want it to learn from their interactions with it, so it isn't quite there yet.
English
0
0
0
21
theo
theo@crthpl_·
@Deep_Burner @LinkofSunshine I think in context learning and/or continued post training are more than enough to satisfy that criteria
English
1
0
1
25
Basil🧡
Basil🧡@LinkofSunshine·
For real though, I think Claude is obviously AGI. Not sure what else AGI would look like
English
129
11
837
88.4K
Basil🧡
Basil🧡@LinkofSunshine·
See the Wikipedia definition from 2003
Basil🧡 tweet media
English
6
1
56
3.2K
DeepBurner
DeepBurner@Deep_Burner·
@mattyglesias From internet discourse I would've guessed it's closer to 50%!
English
0
0
5
1.7K
Matthew Yglesias
Matthew Yglesias@mattyglesias·
18 percent of people say it’s morally wrong to have billions of dollars
Matthew Yglesias tweet media
English
142
24
455
1.3M
stephen balaban
stephen balaban@stephenbalaban·
@yacineMTB For me it was the AlexNet paper and Graves 2013, But deep dream really showed me how much compute was needed. Lambda ran Dreamscope which was probably the most popular deep dream app and had to build a cluster to run it. That was the start of Lambda’s cloud.
stephen balaban tweet media
English
3
3
65
6.8K
kache
kache@yacineMTB·
this is where it all began, for me
kache tweet media
English
108
56
1.3K
36.8K
DeepBurner
DeepBurner@Deep_Burner·
The EU needs to seriously face the possibility that we just don't have a frontier lab anymore
Artificial Analysis@ArtificialAnlys

Mistral has released Mistral Small 4, an open weights model with hybrid reasoning and image input, scoring 27 on the Artificial Analysis Intelligence Index @MistralAI's Small 4 is a 119B mixture-of-experts model with 6.5B active parameters per token, supporting both reasoning and non-reasoning modes. In reasoning mode, Mistral Small 4 scores 27 on the Artificial Analysis Intelligence Index, a 12-point improvement from Small 3.2 (15) and now among the most intelligent models Mistral has released, surpassing Mistral Large 3 (23) and matching the proprietary Magistral Medium 1.2 (27). However, it lags open weights peers with similar total parameter counts such as gpt-oss-120B (high, 33), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 36), and Qwen3.5 122B A10B (Reasoning, 42). Key takeaways: ➤ Reasoning and non-reasoning modes in a single model: Mistral Small 4 supports configurable hybrid reasoning with reasoning and non-reasoning modes, rather than the separate reasoning variants Mistral has released previously with their Magistral models. In reasoning mode, the model scores 27 on the Artificial Analysis Intelligence Index. In non-reasoning mode, the model scores 19, a 4-point improvement from its predecessor Mistral Small 3.2 (15) ➤ More token efficient than peers of similar size: At ~52M output tokens, Mistral Small 4 (Reasoning) uses fewer tokens to run the Artificial Analysis Intelligence Index compared to reasoning models such as gpt-oss-120B (high, ~78M), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, ~110M), and Qwen3.5 122B A10B (Reasoning, ~91M). In non-reasoning mode, the model uses ~4M output tokens ➤ Native support for image input: Mistral Small 4 is a multimodal model, accepting image input as well as text. On our multimodal evaluation, MMMU-Pro, Mistral Small 4 (Reasoning) scores 57%, ahead of Mistral Large 3 (56%) but behind Qwen3.5 122B A10B (Reasoning, 75%). Neither gpt-oss-120B nor NVIDIA Nemotron 3 Super 120B A12B support image input. All models support text output only ➤ Improvement in real-world agentic tasks: Mistral Small 4 scores an Elo of 871 on GDPval-AA, our evaluation based on OpenAI's GDPval dataset that tests models on real-world tasks across 44 occupations and 9 major industries, with models producing deliverables such as documents, spreadsheets, and diagrams in an agentic loop. This is more than double the Elo of Small 3.2 (339) and close to Mistral Large 3 (880), but behind gpt-oss-120B (high, 962), NVIDIA Nemotron 3 Super 120B A12B (Reasoning, 1021), and Qwen3.5 122B A10B (Reasoning, 1130) ➤ Lower hallucination rate than peer models of similar size: Mistral Small 4 scores -30 on AA-Omniscience, our evaluation of knowledge reliability and hallucination, where scores range from -100 to 100 (higher is better) and a negative score indicates more incorrect than correct answers. Mistral Small 4 scores ahead of gpt-oss-120B (high, -50), Qwen3.5 122B A10B (Reasoning, -40), and NVIDIA Nemotron 3 Super 120B A12B (Reasoning, -42) Key model details: ➤ Context window: 256K tokens (up from 128K on Small 3.2) ➤ Pricing: $0.15/$0.6 per 1M input/output tokens ➤ Availability: Mistral first-party API only. At native FP8 precision, Mistral Small 4's 119B parameters require ~119GB to self-host the weights (more than the 80GB of HBM3 memory on a single NVIDIA H100) ➤ Modality: Image and text input with text output only ➤ Licensing: Apache 2.0 license

English
0
0
0
30