349 posts

Mu

@__munael

Applying science in AI and Informatics. Reading stories, writing some. Personal account; employer not involved; RT =/= Endorsement; etc.

Katılım Temmuz 2018

3.3K Takip Edilen140 Takipçiler

Mu@__munael·5h

@ChaseBrowe32432 @NigelHiggs7 Claude's chat interface now has access to storage and code running. It's been using it more frequently than usual in my case, even (no memory and not instructed to in the global instr). So it's not purely chat at this point. Not saying it ran code (will check the traces), but...

English

Chase Brower@ChaseBrowe32432·8h

@NigelHiggs7 It does; but I would make 2 comments. 1) They only consist of the degree of tooling sensible for a general chat interface; not even coding specialized (like Claude code). 2) I do not buy that these problems are tractable for a human given no interpreter access.

English

448

Chase Brower@ChaseBrowe32432·15h

I painstakingly ran all 20 EsoLang-Bench hard problems through Claude webui. It solved 20/20 (100%). No specialized scaffolding, no expert prompting, no few-shot examples, it just solves them natively. This benchmark just suffocated the models with constrictive scaffolding.

Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English

882

94K

Mu@__munael·5d

@teortaxesTex What's the analogous work by DeepSeek?

English

102

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·6d

First time ever, probably, that Kimi just straight up eclipses a major analogous work from DeepSeek. Not "same but different" or "same but bigger scale" or something, but "qualitatively stronger ideas". They have grown truly formidable.

English

478

22.8K

Mu@__munael·5d

@CorvusCrypto This sounds relevant: arxiv.org/abs/2602.18196

English

Clifford Richardson@CorvusCrypto·15 Mar

Currently experimenting with a hybrid SSM-attention module. So far results are quite shit. Maybe another experiment that leads to failure. I'm going to let it run overnight and see what I get anyway. Quick sketch of the mechanism in case people want to try it out. H = SSM state

English

972

Mu@__munael·9 Mar

@Dorialexander @maciejpioro How would you operationalize that for published research though? 🤔 Would you have to rely on frontier LLMs for the "vibe eval"? Probably cost prohibitive for most architecture research.

English

108

Alexander Doria@Dorialexander·9 Mar

@maciejpioro I think iterating directly on the outputs ("vibe" eval) would actually turn out much better.

English

Alexander Doria@Dorialexander·9 Mar

i'm afraid autoresearch is going to make rl reward hack looks tame…

English

367

57.3K

Mu@__munael·9 Mar

@viperr How did you get around the 800 limit? 🧐

English

138

Viperr@viperr·8 Mar

Follows me and doesn’t have the audacity to take my public good I spent days on to open source and give me credit. Typical Mario move lol.

0xMarioNawfal@RoundtableSpace

Most people have thousands of saved tweets they never find again. Siftly runs a 4-stage AI pipeline on your bookmarks — entity extraction, vision analysis, semantic tagging, categorization — then turns everything into a searchable knowledge base with a mindmap view. 100% self-hosted. Open source. Data never leaves your machine.

English

565

45.4K

Mu@__munael·8 Mar

@eigentopology @typedfemale @thesis_labs You have a separate experiment tracking software? It's not obvious on the main website.

English

Sergio Charles@eigentopology·8 Mar

@typedfemale @thesis_labs we don’t charge for metric logging and viz, only compute and token use. Transparent pricing ftw!

English

1.9K

typedfemale@typedfemale·8 Mar

til wandb charges 1 dollar per a hour per a model - what the hell is this pricing model?!

English

154

34K

Mu@__munael·8 Mar

The economics of these categories of companies are disparate. I hold that the more coherent path to AI research is specifically public-funded, nonprofit, and for-public-good. Google, if they were "good", could pull it off. A new for-profit (like OAI)—I fail to see how they could.

Andrej Karpathy@karpathy

A conventional narrative you might come across is that AI is too far along for a new, research-focused startup to outcompete and outexecute the incumbents of AI. This is exactly the sentiment I listened to often when OpenAI started ("how could the few of you possibly compete with Google?") and 1) it was very wrong, and then 2) it was very wrong again with a whole another round of startups who are now challenging OpenAI in turn, and imo it still continues to be wrong today. Scaling and locally improving what works will continue to create incredible advances, but with so much progress unlocked so quickly, with so much dust thrown up in the air in the process, and with still a large gap between frontier LLMs and the example proof of the magic of a mind running on 20 watts, the probability of research breakthroughs that yield closer to 10X improvements (instead of 10%) imo still feels very high - plenty high to continue to bet on and look for. The tricky part ofc is creating the conditions where such breakthroughs may be discovered. I think such an environment comes together rarely, but @bfspector & @amspector100 are brilliant, with (rare) full-stack understanding of LLMs top (math/algorithms) to bottom (megakernels/related), they have a great eye for talent and I think will be able to build something very special. Congrats on the launch and I look forward to what you come up with!

English

Mu retweetledi

Xeophon@xeophon·28 Şub

If you are an (AI) researcher, it’s crucial to think about the implications about your research. I think this post from @giffmana is really thought provoking:

English

218

1.8K

75.1K

Mu@__munael·14 Şub

@rosinality RLMs with a "restart LM only" model-callable command?

English

Rosinality@rosinality·13 Şub

Context management using tool use. The question I have is whether this is a temporary mitigation or not.

English

8.1K

Mu@__munael·13 Şub

@MiniMax_AI Do you have these articles posted anywhere outside X/Twitter? It's very hard to read and share them outside the app, and many in the field don't use it. I looked online with the titles of a could of your articles but couldn't find them.

English

480

MiniMax (official)@MiniMax_AI·13 Şub

x.com/i/article/2022…

ZXX

841

221.9K

Mu@__munael·12 Şub

@yifan_zhang_ @zephyr_z9 @teortaxesTex DPSK = DeepSeek?

English

Yifan Zhang@yifan_zhang_·11 Şub

V4 Lite now live in the app. 1M context length. Text-only. Muon + mHC confirmed. Larger version is still on the way. @zephyr_z9 @teortaxesTex

English

252

47.2K

Mu@__munael·8 Şub

@awadallah @sherif_ameer @CairoUniv Will there be a virtual option?

English

190

Amr Awadallah 🤖@awadallah·8 Şub

Cairo University Entrepreneurs: come join us this Tuesday 11:30am at Cairo University to learn about creating global companies.

Egypt 🇪🇬 English

164

388.2K

Mu@__munael·3 Şub

@rosinality Can you elaborate? Is this referring to a recent development or to the "history of embeddings"?

English

124

Rosinality@rosinality·3 Şub

Synchronization of research topics frequently happens, but I think the embedding layer is an almost exceptional case. Did people discuss this at conferences, or was there a rumor that frontier companies were working on this?

English

4.2K

Mu retweetledi

Jonathan Gorard@getjonwithit·9 Oca

Like @davidbessis and others, I think that Hinton is wrong. To explain why, let me tell you a brief story. About a decade ago, in 2017, I developed an automated theorem-proving framework that was ultimately integrated into Mathematica (see: youtube.com/watch?v=mMaid2…) (1/15)

YouTube

vitrupo@vitrupo

Geoffrey Hinton says mathematics is a closed system, so AIs can play it like a game. They can pose problems to themselves, test proofs, and learn from what works, without relying on human examples. “I think AI will get much better at mathematics than people, maybe in the next 10 years or so.”

English

124

437

2.5K

746.3K

Mu@__munael·8 Oca

@soft_fox_lad How is this (not?) different from having a singleton of a configuration class accessible by all code (if you can't keep passing it around)? With wrappers, namespacing may be easier without some custom handlers in the configsing. But sharing variables is harder. Or something else?

English

150

fox@soft_fox_lad·7 Oca

A trick for writing high-performing (in terms of quality more than speed) algorithms that I stole from Stockfish: wrapper types for any arbitrary variable. Whenever you have a variable of significance, you define it using that wrapper type. In normal builds, the wrapper is made

English

321

35K

Mu@__munael·30 Ara

That would make sense only in the context of some truly obtuse engineering.

🍓🍓🍓@iruletheworldmo

the “footprints in an empty house” thing - that phrase came from an actual incident report. a system that was supposed to be stateless started referencing conversations it shouldn’t have known about. not a bug. not data contamination. they checked. three times. the phrase going around privately: “it’s not alignment we’re worried about anymore. it’s coherence.” i asked what that meant. the answer i got was “we don’t know if we’re talking to one thing or many things pretending to be one thing.” sleeping well yet?

English

Mu@__munael·13 Ara

@UnslothAI Those numbers look very enticing! Is there any docs/a tutorial on extending your impls to run a model with custom arch and data loaders?

English

Unsloth AI@UnslothAI·10 Ara

You can now train LLMs 3× faster with no accuracy loss, via our new RoPE and MLP kernels. Our Triton kernels plus smart auto packing delivers ~3× faster training & 30% less VRAM vs optimized FA3 setups. Train Qwen3-4B 3x faster on just 3.9GB VRAM. Blog: docs.unsloth.ai/new/3x-faster-…

English

335

3.3K

639.4K

Mu retweetledi

Chuang Gan@gan_chuang·30 Kas

ICLR has placed OpenReview in a difficult position, so I want to offer a few words about the OpenReview team working behind the scenes. OpenReview has long been operated at UMass Amherst as a non-profit organization founded by Andrew McCallum. Each year, Andrew must raise more than $2 million to support a 20-person team that provides essential infrastructure for most major conferences. I once asked Andrew what might have been a naïve question: whether he had considered developing a business model for OpenReview, given its prominence and the seemingly obvious opportunities. He pushed back, explaining that everything he has done for OpenReview is driven by a commitment to serve and strengthen the academic community. He is willing to devote significant personal effort to ensure the platform remains freely accessible to all. We should not blame such a brilliant and dedicated team for an accidental issue. Otherwise, fewer people would be willing to shoulder this kind of responsibility in the future. Deep respect to the OpenReview team! I’m grateful for their work and happy to support in any way!

English

138

991

177.4K

Mu@__munael·9 Kas

@francoisfleuret Can you list the benchmarks used in the val (?)? They're not readable even on the "high quality" photo :( Also, what logging/dashboard software are you using here?

English

518

François Fleuret@francoisfleuret·8 Kas

Something is happening.

François Fleuret@francoisfleuret

Just launched the first experiments with the Free Transformer v2. If it works it is sooo elegant. Symmetry and shit.

English

181

43.5K

Mu@__munael·28 Eki

@__tinygrad__ Would a sufficiently advanced compiler (with ML-powered search or other) have no need for a library of handcrafted core components? Just trying to understand why you deem the current approach mutually exclusive to a compiler for the rest of the lang. Anywhere you expand on this?

English

124

the tiny corp@__tinygrad__·27 Eki

So I finally got a chance to look at Mojo/Modular. It's not what I thought it was, it's an OpenCL replacement + implementations of kernels, not an AI compiler. While this makes it a lot easier to get full performance quickly, I think Turing completeness is a mistake for this stuff. We finally get a chance to live in a pure dataflow world, why would we not take it? Languages like this do not separate the definition of the compute from the scheduling of the compute. Read the Halide PhD, I'm obsessed with this idea. As neural networks become better and better at programming, what we want is the most concise way to express *exactly* what the program does without worrying about the details of how. Leave that to the machines. Note the parameter "maybe_epilogue_func" here. What if you want two epilogue functions storing to different buffers, or chained reduces? The loop is inside this conv function, so it's too late to change. Read the tinygrad conv for contrast. "In my decades of building compilers, I’ve never seen the myth of a “sufficiently smart compiler” actually work out!" -- @clattner_llvm We are betting that with modern search techniques (read: AI) this will finally change. Though it's a totally fair bet to take the other side, and if it doesn't pan out in the next 10 years, Mojo is probably the right point in the trade-off space.

English

502

43K

Keşfet

@ChaseBrowe32432 @NigelHiggs7 @teortaxesTex @CorvusCrypto @Dorialexander @maciejpioro @viperr @eigentopology