Mu

349 posts

Mu

Mu

@__munael

Applying science in AI and Informatics. Reading stories, writing some. Personal account; employer not involved; RT =/= Endorsement; etc.

Katılım Temmuz 2018
3.3K Takip Edilen140 Takipçiler
Mu
Mu@__munael·
@ChaseBrowe32432 @NigelHiggs7 Claude's chat interface now has access to storage and code running. It's been using it more frequently than usual in my case, even (no memory and not instructed to in the global instr). So it's not purely chat at this point. Not saying it ran code (will check the traces), but...
English
1
0
1
65
Chase Brower
Chase Brower@ChaseBrowe32432·
@NigelHiggs7 It does; but I would make 2 comments. 1) They only consist of the degree of tooling sensible for a general chat interface; not even coding specialized (like Claude code). 2) I do not buy that these problems are tractable for a human given no interpreter access.
English
1
0
4
448
Chase Brower
Chase Brower@ChaseBrowe32432·
I painstakingly ran all 20 EsoLang-Bench hard problems through Claude webui. It solved 20/20 (100%). No specialized scaffolding, no expert prompting, no few-shot examples, it just solves them natively. This benchmark just suffocated the models with constrictive scaffolding.
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
42
95
882
94K
Mu
Mu@__munael·
@teortaxesTex What's the analogous work by DeepSeek?
English
0
0
0
102
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
First time ever, probably, that Kimi just straight up eclipses a major analogous work from DeepSeek. Not "same but different" or "same but bigger scale" or something, but "qualitatively stronger ideas". They have grown truly formidable.
English
11
17
478
22.8K
Clifford Richardson
Clifford Richardson@CorvusCrypto·
Currently experimenting with a hybrid SSM-attention module. So far results are quite shit. Maybe another experiment that leads to failure. I'm going to let it run overnight and see what I get anyway. Quick sketch of the mechanism in case people want to try it out. H = SSM state
Clifford Richardson tweet media
English
1
0
2
972
Mu
Mu@__munael·
@Dorialexander @maciejpioro How would you operationalize that for published research though? 🤔 Would you have to rely on frontier LLMs for the "vibe eval"? Probably cost prohibitive for most architecture research.
English
1
0
0
108
Alexander Doria
Alexander Doria@Dorialexander·
@maciejpioro I think iterating directly on the outputs ("vibe" eval) would actually turn out much better.
English
1
0
9
3K
Alexander Doria
Alexander Doria@Dorialexander·
i'm afraid autoresearch is going to make rl reward hack looks tame…
English
12
16
367
57.3K
Mu
Mu@__munael·
@viperr How did you get around the 800 limit? 🧐
English
0
0
0
138
typedfemale
typedfemale@typedfemale·
til wandb charges 1 dollar per a hour per a model - what the hell is this pricing model?!
English
12
2
154
34K
Mu
Mu@__munael·
The economics of these categories of companies are disparate. I hold that the more coherent path to AI research is specifically public-funded, nonprofit, and for-public-good. Google, if they were "good", could pull it off. A new for-profit (like OAI)—I fail to see how they could.
Andrej Karpathy@karpathy

A conventional narrative you might come across is that AI is too far along for a new, research-focused startup to outcompete and outexecute the incumbents of AI. This is exactly the sentiment I listened to often when OpenAI started ("how could the few of you possibly compete with Google?") and 1) it was very wrong, and then 2) it was very wrong again with a whole another round of startups who are now challenging OpenAI in turn, and imo it still continues to be wrong today. Scaling and locally improving what works will continue to create incredible advances, but with so much progress unlocked so quickly, with so much dust thrown up in the air in the process, and with still a large gap between frontier LLMs and the example proof of the magic of a mind running on 20 watts, the probability of research breakthroughs that yield closer to 10X improvements (instead of 10%) imo still feels very high - plenty high to continue to bet on and look for. The tricky part ofc is creating the conditions where such breakthroughs may be discovered. I think such an environment comes together rarely, but @bfspector & @amspector100 are brilliant, with (rare) full-stack understanding of LLMs top (math/algorithms) to bottom (megakernels/related), they have a great eye for talent and I think will be able to build something very special. Congrats on the launch and I look forward to what you come up with!

English
0
0
0
34
Mu retweetledi
Xeophon
Xeophon@xeophon·
If you are an (AI) researcher, it’s crucial to think about the implications about your research. I think this post from @giffmana is really thought provoking:
Xeophon tweet media
English
27
218
1.8K
75.1K
Mu
Mu@__munael·
@rosinality RLMs with a "restart LM only" model-callable command?
English
0
0
0
41
Rosinality
Rosinality@rosinality·
Context management using tool use. The question I have is whether this is a temporary mitigation or not.
Rosinality tweet media
English
3
12
86
8.1K
Mu
Mu@__munael·
@MiniMax_AI Do you have these articles posted anywhere outside X/Twitter? It's very hard to read and share them outside the app, and many in the field don't use it. I looked online with the titles of a could of your articles but couldn't find them.
English
0
0
1
480
Yifan Zhang
Yifan Zhang@yifan_zhang_·
V4 Lite now live in the app. 1M context length. Text-only. Muon + mHC confirmed. Larger version is still on the way. @zephyr_z9 @teortaxesTex
English
14
19
252
47.2K
Amr Awadallah 🤖
Amr Awadallah 🤖@awadallah·
Cairo University Entrepreneurs: come join us this Tuesday 11:30am at Cairo University to learn about creating global companies.
Amr Awadallah 🤖 tweet media
Egypt 🇪🇬 English
8
10
164
388.2K
Mu
Mu@__munael·
@rosinality Can you elaborate? Is this referring to a recent development or to the "history of embeddings"?
English
1
0
1
124
Rosinality
Rosinality@rosinality·
Synchronization of research topics frequently happens, but I think the embedding layer is an almost exceptional case. Did people discuss this at conferences, or was there a rumor that frontier companies were working on this?
English
6
0
35
4.2K
Mu retweetledi
Jonathan Gorard
Jonathan Gorard@getjonwithit·
Like @davidbessis and others, I think that Hinton is wrong. To explain why, let me tell you a brief story. About a decade ago, in 2017, I developed an automated theorem-proving framework that was ultimately integrated into Mathematica (see: youtube.com/watch?v=mMaid2…) (1/15)
YouTube video
YouTube
vitrupo@vitrupo

Geoffrey Hinton says mathematics is a closed system, so AIs can play it like a game. They can pose problems to themselves, test proofs, and learn from what works, without relying on human examples. “I think AI will get much better at mathematics than people, maybe in the next 10 years or so.”

English
124
437
2.5K
746.3K
Mu
Mu@__munael·
@soft_fox_lad How is this (not?) different from having a singleton of a configuration class accessible by all code (if you can't keep passing it around)? With wrappers, namespacing may be easier without some custom handlers in the configsing. But sharing variables is harder. Or something else?
English
1
0
1
150
fox
fox@soft_fox_lad·
A trick for writing high-performing (in terms of quality more than speed) algorithms that I stole from Stockfish: wrapper types for any arbitrary variable. Whenever you have a variable of significance, you define it using that wrapper type. In normal builds, the wrapper is made
English
6
2
321
35K
Mu
Mu@__munael·
@UnslothAI Those numbers look very enticing! Is there any docs/a tutorial on extending your impls to run a model with custom arch and data loaders?
English
0
0
0
28
Unsloth AI
Unsloth AI@UnslothAI·
You can now train LLMs 3× faster with no accuracy loss, via our new RoPE and MLP kernels. Our Triton kernels plus smart auto packing delivers ~3× faster training & 30% less VRAM vs optimized FA3 setups. Train Qwen3-4B 3x faster on just 3.9GB VRAM. Blog: docs.unsloth.ai/new/3x-faster-…
Unsloth AI tweet media
English
86
335
3.3K
639.4K
Mu retweetledi
Chuang Gan
Chuang Gan@gan_chuang·
ICLR has placed OpenReview in a difficult position, so I want to offer a few words about the OpenReview team working behind the scenes. OpenReview has long been operated at UMass Amherst as a non-profit organization founded by Andrew McCallum. Each year, Andrew must raise more than $2 million to support a 20-person team that provides essential infrastructure for most major conferences. I once asked Andrew what might have been a naïve question: whether he had considered developing a business model for OpenReview, given its prominence and the seemingly obvious opportunities. He pushed back, explaining that everything he has done for OpenReview is driven by a commitment to serve and strengthen the academic community. He is willing to devote significant personal effort to ensure the platform remains freely accessible to all. We should not blame such a brilliant and dedicated team for an accidental issue. Otherwise, fewer people would be willing to shoulder this kind of responsibility in the future. Deep respect to the OpenReview team! I’m grateful for their work and happy to support in any way!
English
27
138
991
177.4K
Mu
Mu@__munael·
@francoisfleuret Can you list the benchmarks used in the val (?)? They're not readable even on the "high quality" photo :( Also, what logging/dashboard software are you using here?
English
0
0
0
518
Mu
Mu@__munael·
@__tinygrad__ Would a sufficiently advanced compiler (with ML-powered search or other) have no need for a library of handcrafted core components? Just trying to understand why you deem the current approach mutually exclusive to a compiler for the rest of the lang. Anywhere you expand on this?
English
0
0
0
124
the tiny corp
the tiny corp@__tinygrad__·
So I finally got a chance to look at Mojo/Modular. It's not what I thought it was, it's an OpenCL replacement + implementations of kernels, not an AI compiler. While this makes it a lot easier to get full performance quickly, I think Turing completeness is a mistake for this stuff. We finally get a chance to live in a pure dataflow world, why would we not take it? Languages like this do not separate the definition of the compute from the scheduling of the compute. Read the Halide PhD, I'm obsessed with this idea. As neural networks become better and better at programming, what we want is the most concise way to express *exactly* what the program does without worrying about the details of how. Leave that to the machines. Note the parameter "maybe_epilogue_func" here. What if you want two epilogue functions storing to different buffers, or chained reduces? The loop is inside this conv function, so it's too late to change. Read the tinygrad conv for contrast. "In my decades of building compilers, I’ve never seen the myth of a “sufficiently smart compiler” actually work out!" -- @clattner_llvm We are betting that with modern search techniques (read: AI) this will finally change. Though it's a totally fair bet to take the other side, and if it doesn't pan out in the next 10 years, Mojo is probably the right point in the trade-off space.
the tiny corp tweet media
English
18
19
502
43K