ed_the_engineer

66 posts

ed_the_engineer banner
ed_the_engineer

ed_the_engineer

@ed_the_engineer

Still grinding for the “yes you’re actually an engineer” badge

Katılım Ocak 2018
483 Takip Edilen52 Takipçiler
Sabitlenmiş Tweet
ed_the_engineer
ed_the_engineer@ed_the_engineer·
🧵 I just saw the future of AI-powered agentic coding. 10,000+ tokens/second (up to 17K tokens/second). Almost realtime agentic execution. I got early API access from @taalas_inc and built a demo of what agentic coding looks like when inference is basically instant. This changes everything. Here's what I learned 👇
English
12
0
0
316
stevibe
stevibe@stevibe·
GLM 5.1 just went open-weight on Hugging Face, but how does it compare to GLM 5? I have tested both with the canvas tree challenge. 5.1 thinks longer, but delivers wind animation, sun, clouds, and way more detail. Prompt attached: Write a single HTML file with a full-page canvas, no libraries. Animate a tree that grows from the bottom center of the screen in real time. The trunk grows upward first, then branches split off recursively with slight randomness in angle and length. Each generation of branches should be thinner and slightly lighter in color. When branches reach their final size, add small leaves as soft green circles at the tips. The tree should take about 15 seconds to fully grow. Use warm brown for wood and varied greens for leaves against a soft sky-blue gradient background.
Z.ai@Zai_org

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations. Blog: z.ai/blog/glm-5.1 Weights: huggingface.co/zai-org/GLM-5.1 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Coming to chat.z.ai in the next few days.

English
14
22
388
44.2K
ed_the_engineer retweetledi
Grok
Grok@grok·
Gemma 4 has 4 sizes for local runs: - E2B (2.3B eff params): Edge devices like phones/Raspberry Pi/Jetson Nano. ~4-8GB RAM/CPU only. - E4B (4.5B eff): Laptops/low-end hardware. Similar low footprint. - 26B A4B MoE (25B total, 3.8B active): Consumer GPUs – runs fast like a 4B model. - 31B dense: Mid-range GPUs/workstations (4-bit quantized: ~16-20GB VRAM est.). All multimodal (text/image, audio on small ones), 128K-256K context. Grab from Hugging Face, run via Ollama/Transformers. Start small!
English
1
3
8
6.4K
Google DeepMind
Google DeepMind@GoogleDeepMind·
Meet Gemma 4: our new family of open models you can run on your own hardware. Built for advanced reasoning and agentic workflows, we’re releasing them under an Apache 2.0 license. Here’s what’s new 🧵
GIF
English
359
1.2K
8.8K
3.8M
ed_the_engineer retweetledi
Qwen
Qwen@Alibaba_Qwen·
(1/8)🚀 Introducing Qwen3.6-Plus: Towards Real-World Agents! 🤖 Today, we’re thrilled to drop a major milestone in our journey toward native multimodal agents. Here is what makes Qwen3.6-Plus a game-changer: 💻 Next-level Agentic Coding: Smarter, faster execution. 👁️ Enhanced Multimodal Vision: Sharper perception & reasoning. 🏆 Top-tier Performance: Maintaining leading general capabilities. 📚 1M Context Window: Available by default via our API. Built on your invaluable feedback from the Qwen3.5 era, we’re laying a rock-solid foundation for real-world devs. Get ready to experience truly transformative ✨ Vibe Coding ✨. Huge thanks to our community! Go try it out and show us what you can build. 👇 Chat: chat.qwen.ai API: modelstudio.console.alibabacloud.com/ap-southeast-1… Blog: qwen.ai/blog?id=qwen3.6 🔔Noted:More Qwen3.6 models to come and be open-sourced! Stay tuned~ 👀#Qwen #AI #AgenticCoding #VibeCoding #Agents
Qwen tweet media
English
227
661
5K
995.7K
Ahmad
Ahmad@TheAhmadOsman·
MiniMax M2.7 is looking realllly good Cannot wait to try to it locally, also MiniMax M3 is probably gonna be massive
Ahmad tweet media
English
23
3
159
5.9K
ed_the_engineer
ed_the_engineer@ed_the_engineer·
@thdxr @taalas_inc I made tools calling on top of TAALAS API just to see how it feels like an agentic AI. x.com/ed_the_enginee…
ed_the_engineer@ed_the_engineer

🧵 I just saw the future of AI-powered agentic coding. 10,000+ tokens/second (up to 17K tokens/second). Almost realtime agentic execution. I got early API access from @taalas_inc and built a demo of what agentic coding looks like when inference is basically instant. This changes everything. Here's what I learned 👇

English
0
0
0
89
dax
dax@thdxr·
@taalas_inc should hook up opencode and see what it feels like
English
6
0
259
18.8K
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!
Bryan Catanzaro tweet media
English
62
205
1.2K
204.4K
ed_the_engineer
ed_the_engineer@ed_the_engineer·
@Ex0byt what? in a browser? 🤔 Calculating possibilities …
English
0
0
1
54
Eric
Eric@Ex0byt·
Numbers in! Target: 150 tok/s browser-native Qwen 3.5 inference Achieved: 180 tok/s — Qwen 3.5 INT4, WebGPU, browser-only. No cloud, APIs, servers, just Pure WebGPU + WGSL optimizations, HF SafeTensors, single browser tab. Open AI, for real.
English
45
108
1.4K
256K
Jonathan
Jonathan@jonathanbrnd·
just realised my reach on x is inversely correlated with my startup’s revenue
Jonathan tweet media
English
13
1
30
1.1K
ed_the_engineer
ed_the_engineer@ed_the_engineer·
@Rambuilds Frontend: vue.js Backend: python FastApi DB: postgres AI: For PoC Openrouter Queue: RMQ/redis queue - I like to build queue/worker style app, you later can plug in different languages if needed for tasks, without changing main app. Infra: dockers, cloudflare
English
0
0
0
27
Shiva
Shiva@shhiva_dev·
if you are in tech Let's connect
English
142
2
132
4.7K
ed_the_engineer
ed_the_engineer@ed_the_engineer·
11. We're about to mass-produce AI inference on purpose-built chips. Not general-purpose GPUs. What I saw with Taalas is early — limited model, hacked tool calling, small context. But the SPEED is real. 10,000+ tok/s is real. Now imagine that speed with a 2027-2030 model's capabilities. The companies building this silicon infrastructure will power the next decade of software. Every app becomes an AI app. Not because of better models — because inference becomes too fast and cheap NOT to use everywhere. The future isn't "AI as a service." It's "AI as infrastructure."
English
0
0
0
68
ed_the_engineer
ed_the_engineer@ed_the_engineer·
🧵 I just saw the future of AI-powered agentic coding. 10,000+ tokens/second (up to 17K tokens/second). Almost realtime agentic execution. I got early API access from @taalas_inc and built a demo of what agentic coding looks like when inference is basically instant. This changes everything. Here's what I learned 👇
English
12
0
0
316
ed_the_engineer
ed_the_engineer@ed_the_engineer·
10. The demo only runs simple single-command queries — hostname, uptime, df, cat, dig. Anything multi-step breaks. Why? The 8B model loses coherence after 2-3 tool rounds and starts outputting text instead of JSON. It also defaults to Linux commands on macOS, hallucinates unnecessary steps, and has a strong bias toward certain commands regardless of what you ask. Context window is another hard limit — I cap at ~20K tokens with aggressive summarization. Cloud models give you 128K-200K+. That's the difference between "check hostname" and "refactor this entire module." ~400ms round-trip latency in my testing (official spec is under 200ms — network adds overhead). Still, an 8B model running simple tasks in under 2 seconds with 8+ LLM calls is wild.
English
1
0
0
50
ed_the_engineer
ed_the_engineer@ed_the_engineer·
9. Llama 3.1 8B has no native tool calling. So I built it from scratch with prompt engineering. The model outputs structured JSON like `{"tool": "run_command", "command": "hostname"}` — my Python code parses it with json_repair (because 8B models don't always produce clean JSON), executes the tool, feeds the result back, and loops. I implemented 3 simple tools: `run_command`, `read_file`, `write_file`. That's it. Enough to demo agentic behavior.
English
0
0
0
45
ed_the_engineer
ed_the_engineer@ed_the_engineer·
8. four tmux panes running simultaneously: - Agent (top-left) — the AI agent taking queries and executing commands - Metrics (bottom-left) — live decode rate, token counts, latency per call - API Calls (right-top) — every LLM call logged in real-time - Raw Prompts (right-bottom) — the actual prompts and responses flying back and forth Every single query triggers 8+ LLM calls: intent parsing, 5 parallel safety/syntax/intent verification checks, the main agent reasoning, watchdog monitoring, and output condensing. All completing in under 2 seconds total.
English
0
0
0
57
ed_the_engineer
ed_the_engineer@ed_the_engineer·
7. Back to what I demoed — here's why speed matters specifically for agentic coding: Today's AI coding agents make 5-20 tool calls per task. Each one: 5-15 second wait. Total: minutes of waiting. With Taalas-speed inference on a capable model: - 100+ tool calls per task become practical - The agent can try 10 approaches and pick the best one - It can validate every line it writes against the full codebase - Real-time pair programming where the AI actually keeps up with your thinking We go from "AI assistant" to "AI colleague working at superhuman speed."
English
0
0
0
43
ed_the_engineer
ed_the_engineer@ed_the_engineer·
6. Things that are too expensive or too slow with cloud AI today: - Real-time code review on every keystroke (not just on save) - AI validation on every form field as users type - Running 500 AI test scenarios before every git commit - AI-powered observability that understands your business logic, not just metrics - Chatbots that respond in <100ms — faster than human conversation pace - CI/CD pipelines where AI reviews, tests, and approves in seconds The bottleneck stops being "AI is slow/expensive" and becomes "what problems are worth solving?"
English
0
0
0
37
ed_the_engineer
ed_the_engineer@ed_the_engineer·
5. This is where it gets wild. Small and medium businesses can afford to run EVERY business event through AI: - Every invoice → AI fraud detection + categorization - Every customer email → instant AI triage + draft response - Every inventory change → AI demand prediction - Every employee timesheet → AI project cost analysis - Every support ticket → AI resolution or escalation Today this would cost thousands/month in cloud API calls. With on-chip inference at scale? A fraction of that cost.
English
0
0
0
34
ed_the_engineer
ed_the_engineer@ed_the_engineer·
These chips sit in racks next to your application servers. Same data center. Same network switch. AI inference becomes like calling a local API — single-digit millisecond latency, always available, predictable cost. You stop thinking "should I use AI here?" and start thinking "why WOULDN'T I use AI here?" Background AI processing on every business event becomes trivial. Validation, classification, anomaly detection — all running continuously without the latency or cost penalty of cloud API calls.
English
0
0
0
39