Cristobal Santana
158 posts

Cristobal Santana
@csantana_ml
ML/ AI Engineer | Breaking down LLMs, advanced prompting, RAG & long-context problems | Paper summaries | Building in public → Substack👇🏻








Liquid's LFM2.5-8B-A1B smashed OpenAI's gpt-oss-20b on tool calling We ran both locally on a MacBook Pro M5 Max, 64GB, and gave each the same trip-planning request that only completes if the model fires all 7 tool calls - weather for 3 cities, two currency conversions, an email and a reminder Outputs: LFM2.5-8B-A1B: 4.8 GB RAM usage, 7/7 tool-calls, 266 tok/s, 6.9s OpenAI gpt-oss-20b: 11 GB RAM usage, 3/7 tool-calls, 146 tok/s, 15.0s The 8B used less than half the RAM and still fired all 7 calls, while the 20B silently dropped more than half of its own. It also ran ~2x faster, wrapping the full agentic request in 6.9s against 15s. That's what 38T training tokens buy: a 1B-active MoE that nails the agentic tool calls a model 2.5x its active size keeps dropping




























