
Thrilled to share we've raised $21.7m for @ColumnTax to scale up our fundamentally new tax filing product forbes.com/sites/igorbosi… It's been so much fun building the first-ever IRS authorized tax filing API with the amazing team here
Column Tax
146 posts

@ColumnTax
Building the future of personal income tax APIs. Also hiring: https://t.co/eqZXSRC6ee

Thrilled to share we've raised $21.7m for @ColumnTax to scale up our fundamentally new tax filing product forbes.com/sites/igorbosi… It's been so much fun building the first-ever IRS authorized tax filing API with the amazing team here








Benchmarks are the operating system for product truth. But most generic evals miss the factors that actually determine success in real-world verticals: domain rules, tool use, multi-step workflows, and the need for auditable, line-item-level correctness. This week, I sat down with @michaelrbock, CTO & co-founder of @ColumnTax, to unpack why vertical benchmarks matter, how they built one for the tax industry (TaxCalcBench), and how other founders can adopt similar playbooks across their own verticals. Some takeaways: ➡️Even top frontier models compute fewer than one-third of tax returns correctly under strict criteria. ➡️Models that look strong in best-of-N settings often become inconsistent when run repeatedly, a sign of how fragile model reasoning still is. ➡️The right benchmark can guide technical strategy, like Column Tax’s decision to double down on developing Iris, their tax-development agent. ➡️Vertical evals compound into moats: they encode data quality, edge cases, domain rules, and institutional knowledge directly into code. As Michael put it, “If you’re building any sort of AI or agent-based functionality, you need an eval – full stop. Building an agent without an eval is like trying to drive a car blindfolded.” Accuracy-critical industries demand proof. Vertical benchmarks are how you build that proof. They are the quality gate every prompt, model, or agent must clear, and the foundation for delivering systems that work consistently in the messy, high-stakes reality of the real world.







