λndres Mariscal

3.1K posts

λndres Mariscal

@SerialDev

Wrote anti-cheat ml, do ML/AI at places you know off and probably use && into graphics||compilers||DBs I like tech, sloths and, chihuahuas.

Helsinki Beigetreten Temmuz 2015

3K Folgt330 Follower

Angehefteter Tweet

λndres Mariscal@SerialDev·11 Eki

@karpathy We could just have a thread on the best talks/ content you've seen there. youtu.be/8X69_42Mj-g?si… 1 HR absolutely fantastic talk ( some follow ups on the llvm dev meetings). I would love the potential of LLMs with a chemistry DSL!

YouTube

English

278

λndres Mariscal retweetet

tetsuo.cpp (no slop)@tetsuo_cpp·6d

An awesome thread where @AgileJebrim talks about his custom language, compiler and programming model for GPUs. By restricting certain features/instructions, he is able to guarantee deterministic execution time, making it viable for real-time applications.

Jebrim@AgileJebrim

@tetsuo_cpp @blirbilize Not at a point yet where we’re ready for a public release. We’ve got a lot of infrastructure to build still as it’s to be a fully featured multi-user development environment and GPU simulator/debugger as well. Check back again in a year or two.

English

3.4K

λndres Mariscal@SerialDev·6d

@mpweiher I like this one, deleters will be more experienced! but now lets make it more interesting, both teams are juniors same exp, which team do you feel becomes more competent faster?

English

Marcel Weiher 🇪🇺@mpweiher·6d

At my first job, we had this idea (mostly as a joke) that there should be two software teams on every project: - The first team’s role is to create code. - The second team’s role is to delete code. I’ll let you guess which team would be the more experienced engineers.

English

107

λndres Mariscal@SerialDev·6d

@HSVSphere Quantisation without notice lol, its actively+measurably better at non-US hours

English

415

HSVSphere@HSVSphere·6d

Claude has become totally useless for writing anything, only OK for code search. I'm back to writing it all by hand lol

English

608

24.1K

λndres Mariscal@SerialDev·6d

10x compression (32 bytes) looks great on paper, but jumping from 0.034 to 0.117 distortion is a total quality cliff. Johnson-Lindenstrauss lemma. Cutting QJL from 128 to 64 bits doesn't just "lose precision"it breaks the ds preservation guarantees. ideas?

English

λndres Mariscal@SerialDev·6d

🙃

QME

λndres Mariscal retweetet

Alex Zhurkevich@cudagdb·4 Nis

Trtllmgen kernels are now open. Fastest prefill and decode kernels for our target workloads. We wrote these to win InferenceX, MLPerf, other benchmarks. Powering some of today’s top served models. Dive in, learn, use them, or level up your own. Enjoy. github.com/flashinfer-ai/…

English

329

141.8K

λndres Mariscal retweetet

Jeremie Pelletier@HostOfMeta·3 Nis

@ThePrimeagen Strudel REPL but for gamedev; that was proof of concept cljs; going for the moon no re-star'. x.com/HostOfMeta/sta…

Jeremie Pelletier@HostOfMeta

Need more tools to bind all these feature together: twitter.com/Lambda_Coder/s…

English

133

λndres Mariscal retweetet

Jeet Desai@Jeet2505·31 Mar

Amazing! 🙌@TigerBeetleDB youtube.com/watch?v=y2_Bqk…

YouTube

English

1.6K

λndres Mariscal retweetet

TigerBeetle@TigerBeetleDB·2 Nis

IronBeetle⚡️ Ep 105 Zig's comptime is A W E S O M E for CLI argument parsing youtu.be/BDGkD3jtWpM

YouTube

English

1.5K

λndres Mariscal retweetet

John Carmack@ID_AA_Carmack·2 Nis

Without getting all the way down to performance counters, GPU power from nvidia-smi is a better indicator of true utilization than job scheduling or “gpu busy”. I would love to see animated “heat maps” of the big data centers, with each pixel being an individual GPU’s power draw. I am confident that inference and frontier training at the big labs is highly efficient, but I wonder how many GPUs would be dark due to scheduling and inefficient research code. With a little calibration for base load and peak, just the power bill for the datacenter would be a pretty good first order indicator of utilization.

English

174.7K

λndres Mariscal retweetet

gengstah@_gengstah·31 Mar

Released WinDbg MCP — attach Claude (or any LLM) to a live Windows process and let it poke around. set breakpoints, read memory, walk the stack, load crash dumps. 55 tools over MCP. github.com/gengstah/windb…

English

263

12.8K

λndres Mariscal retweetet

Natalie Fratto@NatalieFratto·31 Mar

One of these things is not like the other… The other day @PratapRanade brought home 3 RF circuits. Ok “10GHz band pass-filters” he says, to be precise. The first two are human-made, the third is what they’re calling “an alien geometry” 👾 Look how funky it is. That’s the world’s first-ever AI-made RF circuit achieved by the electromagnetism foundation model @arenaphysica. No human would have created it this way. It’s odd, it looks random, but it really works & it might be the future guts inside every satellite, radar, microwave etc one day.

Arya Hezarkhani@_i_am_arya

Today, we're announcing Heaviside, our foundation model for electromagnetism. Trained on tens of millions of designs and over 20 years of proprietary simulation data, Heaviside predicts electromagnetic behavior from geometry in 13ms, which is 800,000x faster than a commercial solver. Heaviside is not a language model, and it’s not a surrogate model. Heaviside marks a new class of foundation model for physics which understands the fundamental relationships between materials, the geometries and the electromagnetic fields they generate. We’re releasing a research preview of Heaviside in Atlas RF Studio, an interactive agentic sandbox where you describe the EM behavior you want and the model generates the physical structure that produces it. @arenaphysica , we believe the implications of this class of model extend well beyond RF, as the frontier of exquisite hardware is electromagnetically-governed: wireless communication, radar, power delivery, high-speed computing, and the interconnects inside every chip on earth. In the months ahead, we’re excited to scale up Heaviside to broader frequency ranges, design spaces, and to support silicon-level designs, and deploy it with our closest partners and collaborators in service of their biggest design challenges. If you’ve read our thesis, this is just Step 2 in our pursuit of electromagnetic superintelligence. Read the full announcement and try Atlas RF Studio…tell us what you think: arenaphysica.com/publications/r…

English

117

380

2.9K

456.3K

λndres Mariscal retweetet

Simone Margaritelli@evilsocket·31 Mar

That somebody is @Little_34306, fucking credit people.

0xMarioNawfal@RoundtableSpace

SOMEONE DROPPED A FULLY JAILBROKEN, OPEN-SOURCE, PRE-BUILT IOS 26 VIRTUAL MACHINE Repo: github.com/34306/vphone-a…

English

260

3.6K

119.1K

λndres Mariscal retweetet

John Carmack@ID_AA_Carmack·31 Mar

Paper review: LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels arxiv.org/pdf/2603.19312 Nice clean github: github.com/lucas-maes/le-… This is the application of the LeJEPA results to world models, trained offline on experience from three different robotics style tests with one to two million steps in each dataset. Re-states the benefits of the SigReg loss relative to prior world model approaches. Uses ImageNet standard 224x224 RGB pixel input images with an unmodified ViT-Tiny vision transformer from HuggingFace to generate latents. One extra post-projection step is needed to give SigReg the necessary freedom to perturb the latents into independent gaussians, since ViT ends with a layernorm’d layer. Also tested with ResNet-18, which still performed well, but slightly worse. Uses a 192 dimensional latent. Performance slightly dropped when doubling the latent size to 384; it would be nice to know if it was stable there, or if it continued worsening with excessive latents. There is a relationship between batch size and SIGReg, the larger latent may have improved performance if the batch size was increased. The predictor is implemented as a ViT-S backbone – Why a vision transformer when the latent is flat? Uses a history of 3 sets of latents for two of the benchmarks and 1 for the other. Performance was markedly better with the “small” ViT model than the “tiny”, but the larger “base” model degraded notably, which is interesting. Dropout of 0.1 on the predictor significantly improved performance. 0.2 was still better than 0.0, but 0.5 was worse. Trained with a batch of 128 x 4 trajectories. I wish their training loss graphs were more zoomed in with grid lines. Performs planning at test time instead of building a policy by training in imagination like Dreamer / Diamond. Rolls out 300 initially random sets of actions up to a planning horizon H of 5 (at frame-skip 5). Iterates up to 30 times using the Cross Entropy Method (CEM). The main paper body mentions using Model Predictive Control (MPC) strategy, where only the first K planned actions are executed before replanning, but appendix D says they execute all 5 planned actions. After training, they probe the latent space to demonstrate that it does capture and represent physically meaningful quantities. They also implement a decoder from the latent space back to pixels – not used by the algorithms, but helpful to see what things the latent space is actually representing. They tested incorporating the reconstruction loss into training, but it hurt performance somewhat. They wound up with a 0.1 lambda for SigReg, as opposed to 0.05 in the LeJEPA paper. 1024 sigreg projections, but observe the number has negligible impact I like the JEPA framework, but so far my attempts to use it on Atari games with value functions have not matched my other efforts.

Lucas Maes@lucasmaes_

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io

English

935

205.1K

λndres Mariscal retweetet

Eric Lengyel@EricLengyel·17 Mar

New blog post: A Decade of Slug This talks about the evolution of the Slug font rendering algorithm, and it includes an exciting announcement: The patent has been dedicated to the public domain. terathon.com/blog/decade-sl…

English

377

2.2K

285.6K

λndres Mariscal@SerialDev·13 Mar

As someone that worked on highly effective teams, this was already the case

Todd Saunders@toddsaunders

The token cost to build a production feature is now lower than the meeting cost to discuss building that feature. Let me rephrase. It is literally cheaper to build the thing and see if it works than to have a 30 minute planning meeting about whether you should build it. It’s wild when you think about it. This completely inverts how you should run a software organization. The planning layer becomes the bottleneck because the building layer is essentially free. The cost of code has dropped to essentially 0. The rational response is to eliminate planning for anything that can be tested empirically. Don’t debate whether a feature will work. Just build it in 2 hours, measure it with a group of customers, and then decide to kill or keep it. I saw a startup operating this way and their build velocity is up 20x. Decision quality is up because every decision is informed by a real prototype, not a slide deck and an expensive meeting. We went from “move fast and break things” to “move fast and build everything.” The planning industrial complex is dead. Thank god.

English

λndres Mariscal retweetet

Anshel Sag@anshelsag·11 Mar

.@Tenstorrent just launched the TT-QuietBox 2: The first RISC-V desktop AI workstation delivering teraflop-class inference. It runs 120B parameter models locally on a fully open-source stack. Quiet, liquid-cooled power starting at $9,999. Get ready for Q2 2026!

English

658

λndres Mariscal@SerialDev·4 Mar

@carlfranzen @jeremyphoward Even so losing top talent like this, they are making a big mistake

English

162

Carl Franzen@carlfranzen·4 Mar

@jeremyphoward Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source venturebeat.com/technology/did…

English

138

67.5K

Jeremy Howard@jeremyphoward·4 Mar

WTF is going on at Qwen?!? Some kind of implosion? This is really sad and worrying. They've been *such* a strong team, and are losing some of their very best researchers.

Binyuan Hui@huybery

bye qwen, me too.

English

1.3K

269.8K

λndres Mariscal@SerialDev·3 Mar

@JustinLin610 End of an era, good luck on your next steps

English

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

728

13.5K

6.6M

λndres Mariscal@SerialDev·27 Şub

@preshing . I'm quite curious how you wrote the fragmentation tests, and led the agent towards a goal, from your observations, was the agent using the tests iteratively or just as a final validation.

English

Jeff Preshing@preshing·25 Şub

😲 Wow! Codex 5.3 wrote a complete, general-purpose C++ memory allocator for me in just 30 minutes. Bada bing bada boom. The code is clearly written, well-documented, efficient, handles fragmentation well and stands up to a battery of tests. I was able to submit the AI-generated work directly to the main branch with no additional changes on my part. Of course, if you want the full story, I should also mention that I spent several days preparing the workspace, designing the fragmentation test, customizing the AGENTS file and iterating on the prompt in addition to those 30 minutes. But I still find it very cool. Using a powerful LLM is like using a fax machine to get the answer back from a parallel universe where the remaining work has already been completed. 📠 For anyone interested, the prompt used can be seen in the commit description: github.com/preshing/plywo…

English

189

24.1K

Entdecken

@AgileJebrim @mpweiher @HSVSphere @ThePrimeagen @TigerBeetleDB @PratapRanade @arenaphysica @Little_34306