Abhishek (key/value)

13.9K posts

Abhishek (key/value) banner
Abhishek (key/value)

Abhishek (key/value)

@StalwartCoder

scaling @smallest_ai | Ex-@yugabyte DB | @ThePSF Fellow | Pythonista 🐍 |🤹‍♂️ @pyconindia, @gdgchennai, @fossunited| try https://t.co/YROrRELJAQ 🤖

LRU cache | BLR Katılım Aralık 2013
5.5K Takip Edilen4.5K Takipçiler
Abhishek (key/value) retweetledi
Diptanu Choudhury
Diptanu Choudhury@diptanu·
A few architectural choices we made for Tensorlake sandboxes are helping us move quickly now - 1. Runtime environment driver architecture on the dataplane. This lets us run CloudHypervisor, Firecracker, and gVisor transparently for different use cases using the same control plane. 2. Dataplane and Control Plane separation via outbound mTLS connections. We can bring up a BYOC deployment in private clusters in under 15 minutes. 3. Durable sandboxes with a focus on optimizing snapshot restore speed. This has been a game changer for resuming coding agent sessions, automatic build caches for CI use cases, and for debugging RL environments. 4. A workflow for working with Docker images and building VM images from Dockerfiles. 5. A dynamic cluster scheduler optimized for high-throughput server-less functions and sandboxes. A single scheduler can support dedicated machines for customers and shared machines for general cloud usage. At the time of building all this, it felt we were just plumbing but it's paying off now as we scale.
English
4
5
44
3.5K
Abhishek (key/value) retweetledi
smallest.ai
smallest.ai@smallest_AI·
Wednesday someone on the team asked if lightning v3 was good enough to do a real podcast. By friday we had podcast.smallest.ai. paste a url, two ai voices talk about it, flip the 3d toggle and watch them lip-sync the whole thing in your browser. It's Free.
English
0
6
13
1.5K
Abhishek (key/value) retweetledi
smallest.ai
smallest.ai@smallest_AI·
Most STT vendors publish WER on audio that normalized to -10dBFS. But in the real world, audio doesn't come pre-normalized. On raw FLEURS English: - Grok jumps to 60% (claimed: 7.58%) - Deepgram jumps to 11.86% (claimed: 6.57%) - Pulse: 6.03%. Stable across raw and normalized
smallest.ai tweet media
English
2
3
11
731
Abhishek (key/value) retweetledi
Ash
Ash@_akashnagaraj·
career update: been at @smallest_AI for a bit now
English
10
6
58
7.5K
Abhishek (key/value)
Abhishek (key/value)@StalwartCoder·
agent psychosis is real
Mitchell Hashimoto@mitchellh

I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem. As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)! I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work. It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results? 88ms => 1.5ms 150K allocs => ~500 allocs Incredible right? Nope. My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path. This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput. The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity. Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.

English
0
0
0
91
Abhishek (key/value) retweetledi
GREG ISENBERG
GREG ISENBERG@gregisenberg·
Claude Code just dropped "dynamic workflows" and it's pretty cool. You type "create a workflow" or turn on "ultracode" in the effort menu and it spins up hundreds of parallel agents that check each other's work. The unit of work you can hand off jumps from a file to an entire codebase. Migrations, audits, rewrites, framework swaps, stuff you used to plan in sprints now finishes overnight. The part that got me:....the agents argue with each other before showing you the result. Independent attempts at the same problem, then adversarial agents trying to break the answer. It keeps iterating until they converge. That's how senior engineering teams work. Except this team runs at 3am and never gets tired. Also if the workflow gets interrupted, it picks up where it left off. That means you can kick off work that runs for days. Not sessions. Days. Fair warning though: this burns through tokens FAST. Anthropic says so themselves. But if the task is a codebase migration that would have taken a team 3 months, spending $500 in tokens to do it in a week is the best trade in software. The ceiling on what one person can build just moved again. Classic. Going to be playing with this all week. Pretty cool.
cat@_catwu

Excited to share our most powerful new Claude Code feature: dynamic workflows! Mention "workflow" in a prompt and Claude will dynamically create an orchestration plan that it strictly follows, allowing you to confidently trust that every stage happens in the right order even across 100s of agents.

English
147
186
2.2K
349.9K
Abhishek (key/value) retweetledi
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem. As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)! I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work. It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results? 88ms => 1.5ms 150K allocs => ~500 allocs Incredible right? Nope. My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path. This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput. The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity. Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.
English
292
922
8.4K
718.1K
Sarah Fong
Sarah Fong@MilksandMatcha·
Changed my last name and fled the country 💍
Sarah Fong tweet media
English
144
8
1.3K
147.7K
Abhishek (key/value) retweetledi
smallest.ai
smallest.ai@smallest_AI·
On WildASR, Pulse hits 9.63% word error rate. Deepgram Nova-3 hits 28.17%. Nearly 3x the WER, on the same audio. WildASR tests STT on real production conditions: far-field mics, reverb, phone codec compression, clipping, background noise. Check out all the benchmarks in our docs. Link below.
smallest.ai tweet media
English
3
10
35
1.3K