Alex

396 posts

Alex

Alex

@theAlexFerrari

infovore/LLMs whisperer/"his mind teeming"

The Hague, The Netherlands 가입일 Mayıs 2020
892 팔로잉242 팔로워
Alex
Alex@theAlexFerrari·
@somewheresy self-hosted Firecrawl + searXng
English
0
0
0
43
∿
@somewheresy·
is there like any good *local* solution for giving AIs web search is it just giving them access to Playwright and curl and letting them do it that way? still behind cloud solutions?
English
17
0
35
2.6K
Alex
Alex@theAlexFerrari·
little "prompt hack" that I discovered to get smaller LLMs inside agents to hallucinate less: “before answering, perform deep research”. The trick is to specify *deep*. Not sure why but this works across several LLMs
English
1
0
1
25
Graeme
Graeme@gkisokay·
The LLM Cheat-Sheet for OpenClaw and Hermes agents The goal is to choose the right models that best fit your agents' needs for as little cost as possible. Do this and you can build a proficient agent that will never die. Here's the full landscape on popular models for AI agents: 12 models, 4 tiers, every one earning its place. Tier 1 - Frontier Models - Claude Opus 4.6: #1 agentic terminal coding - GPT-5.4: superhuman computer use, real planning - Gemini 3.1 Pro: best price/intelligence at frontier, native multimodal Tier 2 - Execution - MiniMax M2.7: 97% skill adherence, built for agents - Kimi K2.5: long-horizon stability, agent swarm - DeepSeek V3.2: frontier reasoning at 1/50th the cost Tier 3 - Balanced - Claude Sonnet 4.6: 98% of Opus at 1/5 the cost - GPT-5.4 mini: 93.4% tool-call reliability - Qwen3.6 Plus: near-frontier coding, completely free - Llama 4 Maverick: open-weight, self-host at zero marginal cost Tier 4 - Local / $0 - Qwen3.5-9B: always-on subconscious loop, 16GB RAM, beats models 13x its size - Qwen3.5-27B: stronger instruction following, 32GB RAM - Gemma 4 31B: best local reasoning, Apache 2.0, commercial-ready - DeepSeek R1 distill: best chain-of-thought at $0 - GLM-4.5-Air: purpose-built for agent tool use and web browsing, not a trimmed general model Full breakdown with benchmarks, costs, and use cases in the table 🔽
Graeme tweet media
Graeme@gkisokay

x.com/i/article/2041…

English
33
89
652
69.8K
Alex
Alex@theAlexFerrari·
@thekitze I don't know man, don't expect the same level of insights
English
0
0
0
985
kitze 🛠️ tinkerer.club
wow, 3 years my life in a zip file time to nuke my chatgpt chats, history, and memory it's been a fun few years but i'm going fully local for chat cloud models for coding only from now on
kitze 🛠️ tinkerer.club tweet media
English
41
5
370
37.1K
Alex
Alex@theAlexFerrari·
What does this mean in the end? For daily tasks, keep on using MiniMax-M2.7. When you need analysis that require deeper thinking (and you need to trust the results) use Gemini 3 Flash (and Pro when the quality of the results is pivotal)
English
0
0
0
54
Alex
Alex@theAlexFerrari·
Gemini 3 Flash is a worse agent than MM-M2.7 and anyone that used Gemini CLI can attest to that. Until Gemini closes that gap, their very good models will have to eat sand of cheaper less intelligent but more effective Chinese counterparts.
Alex tweet media
English
1
0
1
78
Alex
Alex@theAlexFerrari·
If you look at the Artificial Analysis Intelligence Index, MiniMax-M2.7 is many points above Gemini 3 Flash. Yet, the more I use MiniMax-M2.7 the less this seems to match my day-to-day experience 🧵
Alex tweet media
English
1
0
0
471
Winter
Winter@WinterArc2125·
Most people don’t realize this: You get 1,500 free daily requests to Gemma 4 31B on @GoogleAIStudio. That’s plenty of free inference (imo). And you can route it into @NousResearch Hermes Agent via Vercel’s AI Gateway: 1. Create an API key on Google AI Studio 2. Add it under BYOK (Google) in Vercel AI Gateway 3. Create a Vercel Gateway API key 4. In Hermes → select “Vercel AI Gateway” + your Google model Now all your Google model requests route through your free AI Studio quota. Basically: free 31B model access inside your agent stack. (Tradeoff: not as private as running locally)
English
47
142
2K
136.5K