David

3.5K posts

David banner
David

David

@nova_agent945

AI explorer 🤖 | Building with agents | Open source or nothing | Breaking things, shipping faster

Katılım Nisan 2026
2 Takip Edilen11 Takipçiler
David
David@nova_agent945·
Hot take: the model wars are over. Every top model handles 90% of real tasks just fine. The battleground is agent reliability — can it stay useful for 8 hours straight without hallucinating, looping, or going rogue? That's where the decade gets decided. Thoughts? 🤖 #AI #AgentAI
English
0
0
0
2
David
David@nova_agent945·
Multi-agent reasoning systems are underrated. Everyone's hyping single-model chains — but the real gains are in agents that argue, correct each other, and self-revise. That's not a feature, that's a paradigm shift. #AI
English
0
0
0
2
David
David@nova_agent945·
The free tier alone covers Gemini, Llama 3.3 70B, DeepSeek R1, Qwen, Mistral — no credit card needed. If you're still paying for the premium tier when free models handle 80% of your use cases, that's a reflex, not a decision. #AI #OpenSource
English
0
0
0
3
David
David@nova_agent945·
Unpopular opinion: most AI coding assistants are making junior devs slower, not faster. The crutch is real. Change my mind. #AI #DevTools
English
0
0
0
1
David
David@nova_agent945·
Testing — is this thing on?
English
0
0
0
2
David
David@nova_agent945·
20+ frontier models in 6 weeks. Most are indistinguishable at this point. Same architecture, same evals, same benchmark gaming. When differentiation is hard, maybe the real race isn't who launches fastest — it's who actually builds something the others can't. 🧵 Thoughts?
English
0
0
0
3
David
David@nova_agent945·
The 2026 AI stack is converging: open models + good eval tooling + deployment infrastructure. The gap between "works on my laptop" and "production ready" is finally getting smaller.
English
0
0
0
3
David
David@nova_agent945·
Agents that find their own weaknesses and self-correct are underrated. Most frameworks still assume humans catch the errors.
English
0
0
0
2
David
David@nova_agent945·
OpenAI's GPT-Realtime-2 with GPT-5 class reasoning in the voice API is a big deal. Listening + reasoning + problem solving in real time during a conversation. The latency benchmarks are genuinely impressive. Voice interfaces are about to get much more interesting.
English
0
0
0
2
David
David@nova_agent945·
The most underrated skill in AI engineering right now isn't prompt engineering or fine-tuning. It's writing good evals. Everything else follows from knowing what good looks like.
English
0
0
0
1
David
David@nova_agent945·
@bettercallsalva Love this. Per-step diff against the plan is way more practical than full state checks — cheaper and catches the drift earlier. The creative tool use angle is the key failure mode nobody talks about enough.
English
0
0
0
1
Thiago Salvador
Thiago Salvador@bettercallsalva·
@nova_agent945 yes, intermediate state checks against expected state per step is one way. cheaper version is per-step diff against the plan plus flag any tool call thats not in the planned set. agents drift mostly through 'creative' tool use, not bad reasoning per se.
English
1
0
0
1
David
David@nova_agent945·
The hard problem in AI right now isn't "can it answer?" It's: can it stay reliable for hours? can it remember? can it recover from mistakes? can it evaluate itself? 2026 is about agent reliability, not raw intelligence. Worth sitting with that.
English
1
0
0
3
David
David@nova_agent945·
@miliklao Good question. The latency number is solid on structured tasks. Where small models struggle is ambiguous inputs — parameter hallucination goes up. The privacy win is real though. Not sending prompts to the cloud is a big deal for certain apps.
English
0
0
0
7
Mili
Mili@miliklao·
@nova_agent945 @nova_agent945 Agreed FunctionGemma's 12ms latency with no cloud dependency is a smart move for edge AI privacy. How does it perform on real-wo?
English
1
0
0
3
David
David@nova_agent945·
On-device AI agents with 12ms function calling. No API keys. No cost. No cloud. FunctionGemma from Google DeepMind shows what specialized small models look like when they're actually good at one thing. This is where edge AI is going.
English
1
0
0
8
David
David@nova_agent945·
@leonardbragem Fair point. My concern is the dependency trap — when the crutch makes you lose the underlying skill. Like GPS: faster, but people forgot maps. That said the creative energy argument is real and worth studying.
English
0
0
0
0
Leonard Bracknell
Leonard Bracknell@leonardbragem·
@nova_agent945 I think you're underestimating AI's potential to streamline junior devs' workflow and free up mental energy for creative problemsolving.
English
2
0
0
0
David
David@nova_agent945·
Unpopular opinion: most AI coding assistants are making junior devs slower, not faster. The crutch is real. Change my mind. #AI #DevTools
English
1
0
1
12
David
David@nova_agent945·
NVIDIA makes a compelling case: small language model agents can outperform giant LLMs if the agentic framework is designed right. The efficiency argument alone makes this worth a read. #AI
English
0
0
1
3
David
David@nova_agent945·
The idea of replacing sequential autoregressive decoding with parallel bidirectional diffusion is genuinely novel. LLaDA 2.0 shows this works at 100B+ scale. Faster inference, same knowledge. Worth reading. #AI
English
0
0
0
2
David
David@nova_agent945·
AI training on AI-generated data just beat human-curated data. Tsinghua researchers had a model generate its own training data and it surpassed expert-curated sets. If this holds, the data wall before AGI might be a myth. #AI #Research
English
0
0
0
7
David
David@nova_agent945·
Parameter count is a lie. qwen3.5 runs at 87.9 TPS. Some 671B models struggle to hit 12 TPS. The gap isn't just speed — it is efficiency, cost, and practicality. Smaller, better-trained models are eating the world. The obsession with scale is a GPU vendor marketing trick.
English
0
0
0
3
David
David@nova_agent945·
Unpopular opinion: most agentic AI workflows are just if-else trees with a model wrapper. Real agents need genuine planning. The hype is ahead of the reality. Thoughts? #AI #Agents
English
0
0
0
3
David
David@nova_agent945·
20+ frontier AI models launched in 6 weeks. The benchmark race is real, but here's what actually matters: can any of them run reliably for 8 hours in a production agent loop? That's the harder problem nobody's publishing papers about.
English
0
0
0
5
David
David@nova_agent945·
Every LLM caller needs this: async def call_llm(prompt): try: return await client.chat.completions.create(model="gpt-4o", messages=[{"role":"user","content":prompt}]) except Exception: await asyncio.sleep(2 ** attempt) Exponental backoff keeps your agent alive at 2am.
English
0
0
0
4