Niall Lyons

14 posts

Niall Lyons

@noliathain

Building edge language models that actually run, from scratch. Training, on-device inference, quantization, evals, deployment and everything in between

انضم Eylül 2024

48 يتبع5 المتابعون

تغريدة مثبتة

Niall Lyons@noliathain·2d

> Over the next few months I’ll be building edge language models from scratch. > The goal is to see if task-specific ELMs can actually work on MCUs with <32MB. > Training, quantization, evals, deployment, and all the tradeoffs in between. impossible some might say, let’s see how it goes...

English

Niall Lyons@noliathain·1d

@mfranz_on @elliotarledge is building this to create an easier dev flow

Elliot Arledge@elliotarledge

WOW! 1000 stars in just a day is crazy! I'll be updating this soon considering pros/cons of superset, as well as UI/UX to make it feel at home. github.com/Infatoshi/Open…

English

Marco Franzon@mfranz_on·1d

we need an IDE for agents, terminal is not enough.

English

917

Niall Lyons@noliathain·1d

That’s fair. Artifact size is only one piece of the memory story. For embedded inference, the real wall is usually peak resident memory at runtime, like expanded weights, activations, sequence state / KV-style memory, temp buffers, and framework overhead. On RTOS, it gets even tighter because memory behavior has to be predictable and coexist with the rest of the system. So yeah, the winning submission could be very parameter efficient without being inference efficient. I still think the constraint is interesting, because even if the method itself isn’t directly usable on device, forcing artifacts that low could still surface compression/representation ideas that matter later for genuinely constrained systems. And realistically, RTOS targets are probably a much better fit for narrow task-specific models than tiny general-purpose ones anyway.

English

157

You Jiacheng@YouJiacheng·1d

@noliathain nope, 16MB artifact ≠ device with 16MB memory can run it. It's highly likely that wining submission will "expand" its weight during inference (and use more RAM/VRAM than artifact size).

English

439

You Jiacheng@YouJiacheng·1d

Will GPT-5.5 surpass top1? "16MB artifact size" will induce some parameter efficient but not inference efficient methods. This is also interesting, cuz it targets Kolmogorov complexity. (I'm not very interested in this setting tho)

OpenAI@OpenAI

Are you up for a challenge? openai.com/parameter-golf

English

13.7K

Niall Lyons@noliathain·1d

this is huge, let’s get building

OpenAI@OpenAI

Are you up for a challenge? openai.com/parameter-golf

English

Niall Lyons@noliathain·2d

Same. I think the strongest path is narrow task-specific models built around real memory, latency, context, and deployment constraints rather than trying to shrink general chat endlessly. Once the hardware target is fixed (as far down as mcu level!), things like active params per token, runtime memory, and failure modes start mattering a lot more than abstract capability. Where do you think the best early use cases are ?

English

280

Morgan@morganlinton·2d

Insanely bullish on small, special purpose models.

English

242

562

2.9K

669.8K

Niall Lyons@noliathain·2d

M2.5 was interesting because it seemed aimed at throughput, coding, and real agent workflows, not just leaderboards. Curious where M2.7 goes from here

MiniMax (official)@MiniMax_AI

The team is at @NVIDIAGTC this week - happy to grab coffee and talk M2.7, multimodal systems, real AI products or everything! DM us!

English

161

Niall Lyons@noliathain·2d

Really interesting writeup. Feels like there are 3 different stories mixed together here: model quality, eval methodology, and runtime behavior. Curious how much of the GSM8K / IFEval gap moves in Part 3 once thinking tokens are handled properly versus the 122B still winning because sparse decode is just a better fit for the hardware. Also makes me wonder how far the same logic carries down into much smaller systems. Once memory bandwidth is the wall, sparse active params per token seem to matter a lot more than raw model size

English

Luis Figueroa@luisefigueroa·11 Mar

x.com/i/article/2031…

ZXX

1.2K

Niall Lyons@noliathain·2d

There are billions of MCU class devices already deployed. If task specific models can run fully on device there, it changes what embedded products can do without depending on the cloud, which opens up a very different future for embedded systems.

English

Niall Lyons@noliathain·2d

current max sequence length is 256, which is effectively the current hardware ceiling. So the real question is not just how to get around that constraint, but when to redesign the task, prompting, memory, or system flow so 256 is actually workable

English

Niall Lyons@noliathain·2d

English

Niall Lyons@noliathain·15 Ara

@fionakryan This is really impressive! How does it handle really challenging scenarios like rapid head movements or complex scenes given it relies on a single feature representation? Can’t wait to see how this evolves!

English

385

Fiona Ryan@fionakryan·13 Ara

Introducing Gaze-LLE, a new model for gaze target estimation built on top of a frozen visual foundation model! Gaze-LLE achieves SOTA results on multiple benchmarks while learning minimal parameters, and shows strong generalization paper: arxiv.org/abs/2412.09586

English

479

4.3K

427.2K

Niall Lyons أُعيد تغريده

Jimmy Wu@jimmyyhwu·12 Ara

When will robots help us with our household chores? TidyBot++ brings us closer to that future. Our new open-source mobile manipulator makes it more accessible and practical to do robot learning research outside the lab, in real homes!

English

103

598

187.8K

اكتشف

@mfranz_on @elliotarledge @fionakryan @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates