Ash Lewis

249 posts

Ash Lewis

@ash_csx

Building @ https://t.co/1uG7lSz7HK

Katılım Aralık 2014

618 Takip Edilen4.5K Takipçiler

Ash Lewis@ash_csx·1d

Choosing the right fine-tuning method can save weeks and thousands in compute. 3 main approaches: Full Fine-Tuning: Retrains everything. High cost, max performance. Only worth it for mission-critical apps. LoRA: Trains adapters, freezes base model. Moderate cost. Great for multiple tasks. QLoRA: Compressed LoRA. Low cost, runs on consumer GPUs. Perfect for prototyping but risky for production. Best workflow: Start QLoRA → validate → scale with LoRA → reserve full fine-tuning only if accuracy demands it.

English

Ash Lewis@ash_csx·4d

The pressure to cut inference costs is real and it’s not going away. But these optimizations have hidden costs when it comes to model accuracy. New framework from Amazon uses McNemar's test to catch degradations as small as 0.3%. But if you use Pioneer, you won’t have any degradations to catch. Try it out today. pioneer.ai

English

Ash Lewis@ash_csx·20 Mar

This is what open source compounding looks like. Since releasing GLiNER, the community has shipped ONNX support, Safetensors, and this week @MaxWBuckley dropped 3 benchmarked PRs, including a 63–95% decoder speedup. This is the power of open source. github.com/urchade/GLiNER

English

Ash Lewis@ash_csx·18 Mar

We just added a Principal Research Scientist from MIT-IBM Watson AI Lab to the team. Nikhil's work on post-training, continual learning, and efficient scaling for LLMs is exactly what we need as we scale GLiNER and our fine-tuning platform. Welcome to the team, Nikhil.

English

106

Ash Lewis retweetledi

Fastino Labs@fastinoAI·17 Mar

Last night we sat down with @l2k, @vanpelt, @scottcjohnston, @george_onx, and @ash_csx to talk SLMs, agents, and inference. Production AI needs specialist models, not one giant generalist.

English

222

Ash Lewis@ash_csx·17 Mar

Last night at GTC proved one thing: the era of throwing 100B+ parameter models at every problem is over. Hosted a fireside with @l2k, @scottcjohnston, @vanpelt, and @george_onx on specialized SLMs for agentic systems. We did a demo, too. More on that soon.

English

179

Ash Lewis@ash_csx·13 Mar

Small models are eating into LLM territory faster than expected. Join us March 16th in San Jose for a fireside chat with Lukas Biewald on what's working in production SLMs vs LLMs, fine-tuning economics, and agentic workflows. Live demo. Open bar. Free NVIDIA Jetson giveaway. luma.com/yz9eq2em?utm_s…

English

Ash Lewis@ash_csx·12 Mar

The era of brute-force scaling is over. NVIDIA's Nemotron 3 Super: 120B parameters, only 12B active at inference. 2.2x–7.5x faster than competitors on real agentic workloads. We think the best models are inference-first. Sign up for our waitlist to see what we’re building. pioneer.ai

English

Ash Lewis retweetledi

Fastino Labs@fastinoAI·11 Mar

Join us next Monday evening for a conversation with our co-founders, @george_onx and @ash_csx wuth @l2k, co-founder of @wandb (and now @CoreWeave). We’ll have free food and drinks and a chance to win an NVIDIA Jetson. Hope to see you there. luma.com/yz9eq2em?utm_s…

English

166

Ash Lewis@ash_csx·9 Mar

Scaling laws were the orthodoxy. We built SLMs anyway. At Fastino, we hire for mindset. People who question conventional wisdom, repurpose old ideas in new ways, and aren't afraid to start from scratch. Still contrarian. Still hiring. jobs.ashbyhq.com/fastino-ai

English

135

Ash Lewis@ash_csx·4 Mar

Choosing the right model is hard. Keeping it accurate in production is harder. Watch my 5-minute lightning talk introducing adaptive inference. youtube.com/watch?v=LXAvtN…

YouTube

English

294

Ash Lewis@ash_csx·2 Mar

Small language models have always had one drawback, low-reasoning capabilities. That might be able to change with Qwen3.5. Small reasoning models are here with Qwen3.5. 0.8B + 2B with hybrid Gated Delta + sparse MoE, native 262k context, vision baked in from pretraining, 201 languages, and thinking mode at sub-1B scale. The 2B hits 48.8 on TAU2-Bench. 🤗 Models here: lnkd.in/dUYvdB4t

English

906

Ash Lewis@ash_csx·26 Şub

Next week I'm speaking alongside Sid Bidasaria (co-creator of Claude Code) and @hwchase17 from @LangChain at Coding Agents by @mlopscommunity My talk: why agents need small language models. 📅 March 3rd 📍 Computer History Museum, Mountain View 🔗 luma.com/codingagents

English

1.7K

Ash Lewis@ash_csx·25 Şub

Fine-tune SLMs for agents for free this Friday. You don’t even need to bring a dataset, just show up, write a few prompts and walk out with a fine-tuned SLM. @AWS, @OpenAI, @Render, @Modulate, and @Neo4j will also be there. $47k+ in prizes. See you there.

English

560

Ash Lewis retweetledi

Little Tail for Dazriel@sQuare_QWRQL·3 Şub

Funny thing: if openAI's master plan was always "build the ultimate coding assistant," you'd think they'd have called it, I dunno, CodeGPT? DevGPT? Something with "engineer" in the title? Nope. They went with ChatGPT. Chat. You know, what people do when they talk to each other. The name was the mission statement from day one: this was about conversation, communication, human connection. But sure, let's pivot to code now. Makes total sense😹 #keep4o #keep4oAPI #keep41 #StopAIPaternalism #MyModelMyChoice #no4onosubscription @sama @OpenAI

English

191

Ash Lewis retweetledi

Andrea Volpini@cyberandy·13 Oca

There is always need for tools that extract entities, classify text, parse structured data, and extract relationships. Now GLiNER2 does all of this in one efficient model. GitHub - fastino-ai/GLiNER2: Unified Schema-Based Information Extraction github.com/fastino-ai/GLi…

English

716

Ash Lewis retweetledi

David Im@davidim·4 Oca

Didn't want to end up a Temu @im_roy_lee so i asked steve jobs advice got roasted eventually wtf

English

127

1.4K

170K

Ash Lewis retweetledi

Scott Johnston@scottcjohnston·13 Kas

#developers, Building an #LLM-powered app, but no GPU? No problem! Check-out @fastinoAI 's latest OSS model!

Fastino Labs@fastinoAI

Introducing GLiNER-2 - Fastino’s next-gen open-source model for unified entity extraction, classification & structured parsing. • NER, classification & JSON in 1 blazing-fast pass • ⚡ <150 ms CPU latency • 🧩 Apache-2.0 + hosted API Built by @fastino_ai — unveiled live at #EMNLP2025 by @urchadeDS 🔗 github.com/fastino-ai/GLi…

English

637

Ash Lewis@ash_csx·15 Kas

Heating up at the self-evolving agents hack in SF today 👀 @fastinoAI

English

432

Ash Lewis retweetledi

Fastino Labs@fastinoAI·15 Kas

Early preview from from our researcher @var6595 of a new foundational model for personalization we've been building at Fastino. At the self-evolving agents hack luma.com/agentshack

English

530

Keşfet

@MaxWBuckley @l2k @vanpelt @scottcjohnston @george_onx @wandb @CoreWeave @hwchase17