λndres Mariscal

3.1K posts

λndres Mariscal banner
λndres Mariscal

λndres Mariscal

@SerialDev

Wrote anti-cheat ml, do ML/AI at places you know off and probably use && into graphics||compilers||DBs I like tech, sloths and, chihuahuas.

Helsinki Beigetreten Temmuz 2015
3K Folgt330 Follower
Angehefteter Tweet
λndres Mariscal
λndres Mariscal@SerialDev·
@karpathy We could just have a thread on the best talks/ content you've seen there. youtu.be/8X69_42Mj-g?si… 1 HR absolutely fantastic talk ( some follow ups on the llvm dev meetings). I would love the potential of LLMs with a chemistry DSL!
YouTube video
YouTube
English
0
0
1
278
λndres Mariscal retweetet
tetsuo.cpp (no slop)
tetsuo.cpp (no slop)@tetsuo_cpp·
An awesome thread where @AgileJebrim talks about his custom language, compiler and programming model for GPUs. By restricting certain features/instructions, he is able to guarantee deterministic execution time, making it viable for real-time applications.
Jebrim@AgileJebrim

@tetsuo_cpp @blirbilize Not at a point yet where we’re ready for a public release. We’ve got a lot of infrastructure to build still as it’s to be a fully featured multi-user development environment and GPU simulator/debugger as well. Check back again in a year or two.

English
2
5
34
3.4K
λndres Mariscal
λndres Mariscal@SerialDev·
@mpweiher I like this one, deleters will be more experienced! but now lets make it more interesting, both teams are juniors same exp, which team do you feel becomes more competent faster?
English
0
0
0
20
Marcel Weiher 🇪🇺
At my first job, we had this idea (mostly as a joke) that there should be two software teams on every project: - The first team’s role is to create code. - The second team’s role is to delete code. I’ll let you guess which team would be the more experienced engineers.
English
1
0
0
107
λndres Mariscal
λndres Mariscal@SerialDev·
@HSVSphere Quantisation without notice lol, its actively+measurably better at non-US hours
English
0
0
2
415
HSVSphere
HSVSphere@HSVSphere·
Claude has become totally useless for writing anything, only OK for code search. I'm back to writing it all by hand lol
English
26
7
608
24.1K
λndres Mariscal
λndres Mariscal@SerialDev·
10x compression (32 bytes) looks great on paper, but jumping from 0.034 to 0.117 distortion is a total quality cliff. Johnson-Lindenstrauss lemma. Cutting QJL from 128 to 64 bits doesn't just "lose precision"it breaks the ds preservation guarantees. ideas?
English
0
0
0
36
λndres Mariscal retweetet
Alex Zhurkevich
Alex Zhurkevich@cudagdb·
Trtllmgen kernels are now open. Fastest prefill and decode kernels for our target workloads. We wrote these to win InferenceX, MLPerf, other benchmarks. Powering some of today’s top served models. Dive in, learn, use them, or level up your own. Enjoy. github.com/flashinfer-ai/…
English
13
50
329
141.8K
λndres Mariscal retweetet
TigerBeetle
TigerBeetle@TigerBeetleDB·
IronBeetle⚡️ Ep 105 Zig's comptime is A W E S O M E for CLI argument parsing youtu.be/BDGkD3jtWpM
YouTube video
YouTube
TigerBeetle tweet media
English
1
3
24
1.5K
λndres Mariscal retweetet
John Carmack
John Carmack@ID_AA_Carmack·
Without getting all the way down to performance counters, GPU power from nvidia-smi is a better indicator of true utilization than job scheduling or “gpu busy”. I would love to see animated “heat maps” of the big data centers, with each pixel being an individual GPU’s power draw. I am confident that inference and frontier training at the big labs is highly efficient, but I wonder how many GPUs would be dark due to scheduling and inefficient research code. With a little calibration for base load and peak, just the power bill for the datacenter would be a pretty good first order indicator of utilization.
English
74
64
1K
174.7K
λndres Mariscal retweetet
gengstah
gengstah@_gengstah·
Released WinDbg MCP — attach Claude (or any LLM) to a live Windows process and let it poke around. set breakpoints, read memory, walk the stack, load crash dumps. 55 tools over MCP. github.com/gengstah/windb…
English
3
85
263
12.8K
λndres Mariscal retweetet
Natalie Fratto
Natalie Fratto@NatalieFratto·
One of these things is not like the other… The other day @PratapRanade brought home 3 RF circuits. Ok “10GHz band pass-filters” he says, to be precise. The first two are human-made, the third is what they’re calling “an alien geometry” 👾 Look how funky it is. That’s the world’s first-ever AI-made RF circuit achieved by the electromagnetism foundation model @arenaphysica. No human would have created it this way. It’s odd, it looks random, but it really works & it might be the future guts inside every satellite, radar, microwave etc one day.
Natalie Fratto tweet media
Arya Hezarkhani@_i_am_arya

Today, we're announcing Heaviside, our foundation model for electromagnetism. Trained on tens of millions of designs and over 20 years of proprietary simulation data, Heaviside predicts electromagnetic behavior from geometry in 13ms, which is 800,000x faster than a commercial solver. Heaviside is not a language model, and it’s not a surrogate model. Heaviside marks a new class of foundation model for physics which understands the fundamental relationships between materials, the geometries and the electromagnetic fields they generate. We’re releasing a research preview of Heaviside in Atlas RF Studio, an interactive agentic sandbox where you describe the EM behavior you want and the model generates the physical structure that produces it. @arenaphysica , we believe the implications of this class of model extend well beyond RF, as the frontier of exquisite hardware is electromagnetically-governed: wireless communication, radar, power delivery, high-speed computing, and the interconnects inside every chip on earth. In the months ahead, we’re excited to scale up Heaviside to broader frequency ranges, design spaces, and to support silicon-level designs, and deploy it with our closest partners and collaborators in service of their biggest design challenges. If you’ve read our thesis, this is just Step 2 in our pursuit of electromagnetic superintelligence. Read the full announcement and try Atlas RF Studio…tell us what you think: arenaphysica.com/publications/r…

English
117
380
2.9K
456.3K
λndres Mariscal retweetet
John Carmack
John Carmack@ID_AA_Carmack·
Paper review: LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels arxiv.org/pdf/2603.19312 Nice clean github: github.com/lucas-maes/le-… This is the application of the LeJEPA results to world models, trained offline on experience from three different robotics style tests with one to two million steps in each dataset. Re-states the benefits of the SigReg loss relative to prior world model approaches. Uses ImageNet standard 224x224 RGB pixel input images with an unmodified ViT-Tiny vision transformer from HuggingFace to generate latents. One extra post-projection step is needed to give SigReg the necessary freedom to perturb the latents into independent gaussians, since ViT ends with a layernorm’d layer. Also tested with ResNet-18, which still performed well, but slightly worse. Uses a 192 dimensional latent. Performance slightly dropped when doubling the latent size to 384; it would be nice to know if it was stable there, or if it continued worsening with excessive latents. There is a relationship between batch size and SIGReg, the larger latent may have improved performance if the batch size was increased. The predictor is implemented as a ViT-S backbone – Why a vision transformer when the latent is flat? Uses a history of 3 sets of latents for two of the benchmarks and 1 for the other. Performance was markedly better with the “small” ViT model than the “tiny”, but the larger “base” model degraded notably, which is interesting. Dropout of 0.1 on the predictor significantly improved performance. 0.2 was still better than 0.0, but 0.5 was worse. Trained with a batch of 128 x 4 trajectories. I wish their training loss graphs were more zoomed in with grid lines. Performs planning at test time instead of building a policy by training in imagination like Dreamer / Diamond. Rolls out 300 initially random sets of actions up to a planning horizon H of 5 (at frame-skip 5). Iterates up to 30 times using the Cross Entropy Method (CEM). The main paper body mentions using Model Predictive Control (MPC) strategy, where only the first K planned actions are executed before replanning, but appendix D says they execute all 5 planned actions. After training, they probe the latent space to demonstrate that it does capture and represent physically meaningful quantities. They also implement a decoder from the latent space back to pixels – not used by the algorithms, but helpful to see what things the latent space is actually representing. They tested incorporating the reconstruction loss into training, but it hurt performance somewhat. They wound up with a 0.1 lambda for SigReg, as opposed to 0.05 in the LeJEPA paper. 1024 sigreg projections, but observe the number has negligible impact I like the JEPA framework, but so far my attempts to use it on Atari games with value functions have not matched my other efforts.
Lucas Maes@lucasmaes_

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io

English
40
96
935
205.1K
λndres Mariscal retweetet
Eric Lengyel
Eric Lengyel@EricLengyel·
New blog post: A Decade of Slug This talks about the evolution of the Slug font rendering algorithm, and it includes an exciting announcement: The patent has been dedicated to the public domain. terathon.com/blog/decade-sl…
Eric Lengyel tweet mediaEric Lengyel tweet media
English
48
377
2.2K
285.6K
λndres Mariscal retweetet
Anshel Sag
Anshel Sag@anshelsag·
.@Tenstorrent just launched the TT-QuietBox 2: The first RISC-V desktop AI workstation delivering teraflop-class inference. It runs 120B parameter models locally on a fully open-source stack. Quiet, liquid-cooled power starting at $9,999. Get ready for Q2 2026!
Anshel Sag tweet mediaAnshel Sag tweet mediaAnshel Sag tweet media
English
0
3
5
658
Jeremy Howard
Jeremy Howard@jeremyphoward·
WTF is going on at Qwen?!? Some kind of implosion? This is really sad and worrying. They've been *such* a strong team, and are losing some of their very best researchers.
Binyuan Hui@huybery

bye qwen, me too.

English
65
42
1.3K
269.8K
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
728
13.5K
6.6M
λndres Mariscal
λndres Mariscal@SerialDev·
@preshing . I'm quite curious how you wrote the fragmentation tests, and led the agent towards a goal, from your observations, was the agent using the tests iteratively or just as a final validation.
English
1
0
0
37
Jeff Preshing
Jeff Preshing@preshing·
😲 Wow! Codex 5.3 wrote a complete, general-purpose C++ memory allocator for me in just 30 minutes. Bada bing bada boom. The code is clearly written, well-documented, efficient, handles fragmentation well and stands up to a battery of tests. I was able to submit the AI-generated work directly to the main branch with no additional changes on my part. Of course, if you want the full story, I should also mention that I spent several days preparing the workspace, designing the fragmentation test, customizing the AGENTS file and iterating on the prompt in addition to those 30 minutes. But I still find it very cool. Using a powerful LLM is like using a fax machine to get the answer back from a parallel universe where the remaining work has already been completed. 📠 For anyone interested, the prompt used can be seen in the commit description: github.com/preshing/plywo…
English
17
16
189
24.1K