تغريدة مثبتة
Niall Lyons
14 posts

Niall Lyons
@noliathain
Building edge language models that actually run, from scratch. Training, on-device inference, quantization, evals, deployment and everything in between
انضم Eylül 2024
48 يتبع5 المتابعون

That’s fair. Artifact size is only one piece of the memory story. For embedded inference, the real wall is usually peak resident memory at runtime, like expanded weights, activations, sequence state / KV-style memory, temp buffers, and framework overhead. On RTOS, it gets even tighter because memory behavior has to be predictable and coexist with the rest of the system.
So yeah, the winning submission could be very parameter efficient without being inference efficient. I still think the constraint is interesting, because even if the method itself isn’t directly usable on device, forcing artifacts that low could still surface compression/representation ideas that matter later for genuinely constrained systems. And realistically, RTOS targets are probably a much better fit for narrow task-specific models than tiny general-purpose ones anyway.
English

@noliathain nope, 16MB artifact ≠ device with 16MB memory can run it. It's highly likely that wining submission will "expand" its weight during inference (and use more RAM/VRAM than artifact size).
English

Will GPT-5.5 surpass top1?
"16MB artifact size" will induce some parameter efficient but not inference efficient methods.
This is also interesting, cuz it targets Kolmogorov complexity.
(I'm not very interested in this setting tho)
OpenAI@OpenAI
Are you up for a challenge? openai.com/parameter-golf
English

Same. I think the strongest path is narrow task-specific models built around real memory, latency, context, and deployment constraints rather than trying to shrink general chat endlessly. Once the hardware target is fixed (as far down as mcu level!), things like active params per token, runtime memory, and failure modes start mattering a lot more than abstract capability. Where do you think the best early use cases are ?
English

M2.5 was interesting because it seemed aimed at throughput, coding, and real agent workflows, not just leaderboards.
Curious where M2.7 goes from here
MiniMax (official)@MiniMax_AI
The team is at @NVIDIAGTC this week - happy to grab coffee and talk M2.7, multimodal systems, real AI products or everything! DM us!
English

Really interesting writeup. Feels like there are 3 different stories mixed together here: model quality, eval methodology, and runtime behavior. Curious how much of the GSM8K / IFEval gap moves in Part 3 once thinking tokens are handled properly versus the 122B still winning because sparse decode is just a better fit for the hardware. Also makes me wonder how far the same logic carries down into much smaller systems. Once memory bandwidth is the wall, sparse active params per token seem to matter a lot more than raw model size
English

@fionakryan This is really impressive! How does it handle really challenging scenarios like rapid head movements or complex scenes given it relies on a single feature representation? Can’t wait to see how this evolves!
English

Introducing Gaze-LLE, a new model for gaze target estimation built on top of a frozen visual foundation model!
Gaze-LLE achieves SOTA results on multiple benchmarks while learning minimal parameters, and shows strong generalization
paper: arxiv.org/abs/2412.09586
English
Niall Lyons أُعيد تغريده



