Niall Lyons

14 posts

Niall Lyons

Niall Lyons

@noliathain

Building edge language models that actually run, from scratch. Training, on-device inference, quantization, evals, deployment and everything in between

شامل ہوئے Eylül 2024
48 فالونگ5 فالوورز
پن کیا گیا ٹویٹ
Niall Lyons
Niall Lyons@noliathain·
> Over the next few months I’ll be building edge language models from scratch. > The goal is to see if task-specific ELMs can actually work on MCUs with <32MB. > Training, quantization, evals, deployment, and all the tradeoffs in between. impossible some might say, let’s see how it goes...
English
1
0
1
45
Marco Franzon
Marco Franzon@mfranz_on·
we need an IDE for agents, terminal is not enough.
English
6
0
11
917
Niall Lyons
Niall Lyons@noliathain·
That’s fair. Artifact size is only one piece of the memory story. For embedded inference, the real wall is usually peak resident memory at runtime, like expanded weights, activations, sequence state / KV-style memory, temp buffers, and framework overhead. On RTOS, it gets even tighter because memory behavior has to be predictable and coexist with the rest of the system. So yeah, the winning submission could be very parameter efficient without being inference efficient. I still think the constraint is interesting, because even if the method itself isn’t directly usable on device, forcing artifacts that low could still surface compression/representation ideas that matter later for genuinely constrained systems. And realistically, RTOS targets are probably a much better fit for narrow task-specific models than tiny general-purpose ones anyway.
English
0
0
3
156
You Jiacheng
You Jiacheng@YouJiacheng·
@noliathain nope, 16MB artifact ≠ device with 16MB memory can run it. It's highly likely that wining submission will "expand" its weight during inference (and use more RAM/VRAM than artifact size).
English
1
1
9
438
Niall Lyons
Niall Lyons@noliathain·
Same. I think the strongest path is narrow task-specific models built around real memory, latency, context, and deployment constraints rather than trying to shrink general chat endlessly. Once the hardware target is fixed (as far down as mcu level!), things like active params per token, runtime memory, and failure modes start mattering a lot more than abstract capability. Where do you think the best early use cases are ?
English
0
0
1
280
Morgan
Morgan@morganlinton·
Insanely bullish on small, special purpose models.
English
242
566
2.9K
669.6K
Niall Lyons
Niall Lyons@noliathain·
M2.5 was interesting because it seemed aimed at throughput, coding, and real agent workflows, not just leaderboards. Curious where M2.7 goes from here
MiniMax (official)@MiniMax_AI

The team is at @NVIDIAGTC this week - happy to grab coffee and talk M2.7, multimodal systems, real AI products or everything! DM us!

English
0
0
1
161
Niall Lyons
Niall Lyons@noliathain·
Really interesting writeup. Feels like there are 3 different stories mixed together here: model quality, eval methodology, and runtime behavior. Curious how much of the GSM8K / IFEval gap moves in Part 3 once thinking tokens are handled properly versus the 122B still winning because sparse decode is just a better fit for the hardware. Also makes me wonder how far the same logic carries down into much smaller systems. Once memory bandwidth is the wall, sparse active params per token seem to matter a lot more than raw model size
English
1
0
1
14
Niall Lyons
Niall Lyons@noliathain·
There are billions of MCU class devices already deployed. If task specific models can run fully on device there, it changes what embedded products can do without depending on the cloud, which opens up a very different future for embedded systems.
English
0
0
1
21
Niall Lyons
Niall Lyons@noliathain·
current max sequence length is 256, which is effectively the current hardware ceiling. So the real question is not just how to get around that constraint, but when to redesign the task, prompting, memory, or system flow so 256 is actually workable
English
1
0
1
20
Niall Lyons
Niall Lyons@noliathain·
> Over the next few months I’ll be building edge language models from scratch. > The goal is to see if task-specific ELMs can actually work on MCUs with <32MB. > Training, quantization, evals, deployment, and all the tradeoffs in between. impossible some might say, let’s see how it goes...
English
1
0
1
45
Niall Lyons
Niall Lyons@noliathain·
@fionakryan This is really impressive! How does it handle really challenging scenarios like rapid head movements or complex scenes given it relies on a single feature representation? Can’t wait to see how this evolves!
English
0
0
0
385
Fiona Ryan
Fiona Ryan@fionakryan·
Introducing Gaze-LLE, a new model for gaze target estimation built on top of a frozen visual foundation model! Gaze-LLE achieves SOTA results on multiple benchmarks while learning minimal parameters, and shows strong generalization paper: arxiv.org/abs/2412.09586
English
79
479
4.3K
427.2K
Niall Lyons ری ٹویٹ کیا
Jimmy Wu
Jimmy Wu@jimmyyhwu·
When will robots help us with our household chores? TidyBot++ brings us closer to that future. Our new open-source mobile manipulator makes it more accessible and practical to do robot learning research outside the lab, in real homes!
English
19
103
598
187.8K