

Vaibhav Gupta
884 posts

@vaibcode
Making a programming language - BAML. @boundaryml, 🦄 aitw podcast: https://t.co/g4CJWgOsrj prev YC, google, msft, deshaw, and other things










Parsing 3.6 million historical names with frontier models was too slow, too expensive, and inaccurate. We switched to small, fine-tuned Qwen 3.5 models and hit 96% accuracy. The secret wasn't just the model size—it was the format.


been running coding agents for long horizon (48h+) tasks without deterministic + cheap verification to work against, agents cheat and fail path to ultra long agents require deterministic feedback



Being at the "frontier" is just about stacking while loops and "intelligence" is just a measure of how long while true can run uninterrupted. on a separate note, this is also a good way to measure human intelligence.






