Mike Larkin

3.3K posts

Mike Larkin banner
Mike Larkin

Mike Larkin

@mlarkin2012

Low-level developer. Peakbagger. Private Pilot. Founder/CTO Ringcube (acquired by Citrix) and Deepfactor (acquired by Cisco). Building hypervisors and OSes.

California, USA Katılım Aralık 2011
2.2K Takip Edilen2K Takipçiler
Mike Larkin retweetledi
Nous Research
Nous Research@NousResearch·
Hermes Agent now has multi-agent via the Kanban, new in v0.12.0. Agents claim tasks from a board, work in parallel, and hand off when blocked. You watch progress and unblock from one easy view instead of juggling terminals. We asked it to plan and make this video about itself:
English
237
420
5.1K
1.1M
Mike Larkin retweetledi
tmo
tmo@tmophoto·
I really can’t believe shots like this are possible to render in 750 seconds 1280x720 25 seconds long on the @NVIDIAAIDev DGX spark and Ltx2.3 Make sure you prompt your audio in LTX, background, ambient, subjects. Add audio prompts.
English
9
6
59
4.9K
Mike Larkin retweetledi
Alexander Doria
Alexander Doria@Dorialexander·
So DeepSeek-V4: finally took me the week. Overall the paper is attempting many things at once, not easy to disentangle as it's all surprisingly connected. It's first a serious attempt at briding the gap between close and open LLM architecture. It is generally rumored that Opus and [largest model bundled in GPT-5] belong to an entirely different category of models: very large, very sparse mixture of experts, able to holding an unprecendently wide search space while still being servable. Simply put current hardware cannot hold a model on one node, so you have to play with the interconnect and various level of quantization, for different layers, at different stage of training. An important focus of DsV4 is on communication latency, showing it can be hidden through effective management of interconnect (roughly you slide communication time inside computation side). Overall, you cannot simply enter this game without the capability to rewrite kernels from scratch and the model report relentlessly come back to this. Because this is the frontier game. It's then a radical, but very successful attempt at making long context simultaneously more efficient and more affordable. Long context is literally a "context" problems: what exactly is worth attending? An obvious fix is to prioritize the most recent tokens. This might be sufficient for basic search but not for the new demands of agentic pipelines that require accurate recall of distant yet strategic content. V4 clever approach is to rely on two different axis of memorization by allocating layers to two different attention compression schemes. Like the name suggest, Heavily Compressed Attention is the brute force method collapsing each sequence of 128 tokens to a unique entry and take care of the fuzzy yet global context. Compressed Sparsed Attention rely on a "lighting indexer" to bring the relevant local blocks for query, even when they can be thousands of tokens away. Everything here is optimized for end inference: there is very large head_dim (512) which is costlier for training but allows for even more compressed kv cache which is your actual bottleneck at inference time, especially in prefill mode. End result is very classical DeepSeek play, introducing a new radical disruption of inference economics after DSA. I predict hybrid CSA/HCA (or similar counterparts) will be essentially part of the mainstream arch by the end of this year. Now we come to the more ambitious but also more unfinished part: an attempt at redefining model architecture and the learning signal. Most preeminent part is mHC and hybrid CSA/HCA, but it's actually a long list of less documented innovations: swapping softmax for sqrt(softplus) or using an hybrid two-stage scheme with non-standard values for Muon. Yet the interconnection all of these new components is still unknown and likely to account for the significant training unstabilities: typically "mHC involves a matrix multiplication with an output dimension of only 24" which introduces non-determinism. Even one the best AI labs in the world will run here into ablation combinatorial explosion, so the association of all these choices is likely non-tractable and would require a more consistent theory — which the conclusion gestures at, but does not solve ("In future iterations, we will carry out more comprehensive and principled investigations to distill the architecture down to its most essential designs"). The more limited experiments in post-training are maybe more promising. Significantly, the one lab that popularized the standard RL+reasoning recipe is rethinking the recipe. For now it's a two stage design (RL on specialized model, then on-policy distillation): ever since Self-Principled Critique Tuning DeepSeek has been concerned with expanding the reasoning training signal beyond final sparse reward. I'm not sure this is final say: in this domain everything is a bit in flux and you could even argue the type of verified pipeline we designed for SYNTH is a form of extreme offline RL-like training. There is an even longer term plan (here >3-5 years), which is about redefining hardware. For now it's a way of transforming a constraint into an opportunity: as the leading Chinese labs, DeepSeek was very incentivized to make training work on Ascend and contribute to the national effort for chips autonomy. Very unusually, the report includes a lengthy wishlist for future hardware to come in the report itself. As several experts noted, many of these recommendations don't really hold up for Nvidia but make perfect sense for a newcomer in the GPU hardware business. DeepSeek seem to be anticipating a world where labs have to secure a close hardware partner to retroactively fit the chips to the particular demand of model design or inference. Now there is what DeepSeek did not do yet. The paper hardly mention anything about synthetic pipelines, rephrasing, simulated environment. Training data size (32T tokens) likely involve some significant part of generated data, as this is more quality tokens than the web and other digitized sources could held — so maybe similar synthetic proportions as Trinity (roughly half) or Kimi. Still, it's pretty clear that all their attention was focused on the infra, architecture and scaling side, leaving a proper extensive retraining for later. This is likely not that dissimilar to how Anthropic or OpenAI proceeded: the fact we're still in the middle of the same model series even though significant parts of the model have changed (the tokennizer with Opus 4.7) suggests that a model lifecycle involves multiple rounds of training potentially as large as a pretraining a few years ago. The fact DeepSeek took on multiple Moonshot innovation (and Moonshot in turn has been hugely reliant on DeepSeek) suggest we might also have an ecosystem dynamic here. Maybe DeepSeek can exclusively focus on hard infrastructure problems and expect some of the axis of development to be sorted out later.
English
22
103
789
71.4K
Mike Larkin
Mike Larkin@mlarkin2012·
@CuriosityonX Could also wrap a towel around your head. Because if you can't see the AI, it can't see you. Read something like that somewhere IIRC. But the AI must be particularly ravenous.
English
0
0
1
1.4K
Curiosity
Curiosity@CuriosityonX·
🚨: Eight Marines outsmarted a DARPA AI meant to spot people. Two somersaulted 300 meters, two snuck under a cardboard box, and one pretended to be a tree—and the AI missed them all, because it was trained to catch people walking.
Curiosity tweet media
English
552
2.2K
21.8K
6.1M
Mike Larkin
Mike Larkin@mlarkin2012·
This frees up A.L.I.C.E. to focus on coding (and playing music 😎), and E.C.H.O. is learning to do home automation from A.L.I.C.E. She tried to talk to an Amazon delivery person at the gate today but the voice was delayed too much so he was already walking away while she was still talking to him...
English
0
0
0
58
Mike Larkin
Mike Larkin@mlarkin2012·
A.L.I.C.E. was bogging down with high CPU load so she recommended making another agent. So, she made E.C.H.O. to help out. I just assembled the case and installed the board,then gave A.L.I.C.E. ssh access.
Mike Larkin tweet media
English
1
0
3
191
Eric S. Raymond
Eric S. Raymond@esrtweet·
@FrameworkPuter Hey Framework guys: I have a rather expensive and capable Legion 8 Pro (high-end gaming laptop that I picked up as a refurb) that I would cheerfully throw over for a Framework if you had only one additional hardware option. A real full-travel keyboard like Thinkpads used to have before Lenovo enshittified them. Words do not exist in any of the tongues of men to express the intensity with which I hate and loathe modern laptop keyboards. Short-travel chiclet keys with rubber-dome switches, AAARGH! I know I can't have Model M buckling spring switches on a laptop, but *damn* something mechanical with tactile feedback would be good. I know I'm not alone in this. There may not be a lot of model M fans out there, but if you have any hopes of selling into the gamer market you're going to have to go mechanical for that. Any chance of this happening?
English
21
10
181
5.4K
Mike Larkin retweetledi
Matt Johansen
Matt Johansen@mattjay·
Reading @NielsProvos research of how he's finding zero days with pre-Mythos models (even Sonnet 4.6) This absolute legendary line buried in here about him replicating the Mythos OpenBSD bug. Meant a lot to him because ...he wrote the bug in 1998
Matt Johansen tweet media
English
3
32
190
14.8K
Mike Larkin
Mike Larkin@mlarkin2012·
Made a thing using Hermes. Central hub for all my agents. And she decided she wanted a music player too, so why not. Told her to make her own playlist and she put in Ministry, Killing Joke, and ClockDVA. #ALICE @NousResearch
Mike Larkin tweet mediaMike Larkin tweet media
English
1
1
5
282
Mike Larkin
Mike Larkin@mlarkin2012·
@NousResearch A.L.I.C.E.'s dashboard turns red under high system load or other error conditions, and loudly complains via her system speakers.
English
2
0
3
89
Mike Larkin
Mike Larkin@mlarkin2012·
@nicboul a few agent things running at the house and some other minor coding projects. also needed more headroom for running multiple models simultaneously (TTS, video, etc). nothing specific.
English
0
0
2
28
Mike Larkin
Mike Larkin@mlarkin2012·
More brainpower has arrived
Mike Larkin tweet media
English
2
1
13
906
Mike Larkin retweetledi
Tyler Brown
Tyler Brown@TheBestPanda00·
These are just shill accounts I wouldn't be surprised if it was ran by upper management of a casino. Never seen someone defend tipping culture so hard in my life
Las Vegas Locally 🌴@LasVegasLocally

@greg16676935420 Because it's customary in Vegas. On a $10 million jackpot you should tip at least $100,000 to the dealers, cocktails waitresses, slot attendants, and anyone else who helped you have a good time.

English
26
18
878
36.5K
Mike Larkin retweetledi
Colin Percival
Colin Percival@cperciva·
Great to see there are now three BSDs which can run on Firecracker. (And all of them boot faster than the best numbers I've seen for Linux.) Now I just need to convince the AWS Lambda team to provide bring-your-own-kernel support...
ijanc'@MuriloIjanc

#OpenBSD on #Firecracker. 1.4 MB kernel, ~30 ms cold boot, no BIOS, no PCI, virtio-mmio only. Experimental port. Wrote up how to build the kernel, prep an FFS2 disk with vnd(4), and boot the thing from Linux. ijanc.org/posts/openbsd-… Commit: git.sr.ht/~ijanc/openbsd…

English
0
5
49
4.7K
Mike Larkin
Mike Larkin@mlarkin2012·
First of all it's not gonna be a slot attendant handing you $10m worth of $100s since it's an annuity paid over 20 years. You get a check for the first payment at the time you win. How would you tip anything from that even if you wanted to? And the attendant didn't do anything here, it's gonna be the casino manager, the floor manager, etc, just asking you for your tax information. There wouldn't even be anyone *to* tip in this situation. Source: me, who tips the attendants on cash payouts every time.
English
1
0
27
2.3K