Dillon Erb

323 posts

Dillon Erb

@dlnrb

building something new @a____t____g — prev: CEO / co-founder @hellopaperspace (acquired by @digitalocean)

New York, USA Katılım Ağustos 2010

1.1K Takip Edilen1.6K Takipçiler

Dillon Erb@dlnrb·30 Haz

Super proud to have backed this team early and excited to see this launch! One of the most impressive founding teams I have had the pleasure of working with. Congrats @Etched 👏

Etched@Etched

We're coming out of stealth. We've built our first racks after a successful A0 tapeout, $1B+ in customer contracts, and $800m raised. Early customer tests show us achieving SOTA throughput, latency, and power efficiency on inference workloads. Our first racks ship this summer.

English

Dillon Erb retweetledi

Andriy Mulyar@andriy_mulyar·22 May

hiring a growth engineer at @nomic_ai the job: build agentic systems that get us in front of every built environment company in the U.S. (~20k companies). orchestrate agents to automate non-spammy outbound, ad campaigns, linkedin touchpoints, events, seo — all wired together. i've personally been building our internal gtm system myself from scratch the last 6 months with very impressive results - time to scale! this isn't a marketing role. it's engineering role where your measured output is qualified customer calls and sign ups. if you like low latency feedback loops between prompt and customer demand surges this might be the role for you. link: nomic.ai/careers

English

8.6K

Dillon Erb@dlnrb·2 Nis

This is very cool. Having spent a lot of time in architecture world this is really powerful!

Nomic@nomic_ai

Today, we're launching AEC-Bench — the first open, multimodal agent benchmark for construction. 196 tasks across real construction documents. Full agent harness. Automated evaluation. Apache 2.0. We benchmarked Claude Code, Codex, and our own agent. Here's what we found 🧵

English

248

Dillon Erb retweetledi

PrismML@PrismML·31 Mar

Today, we are emerging from stealth and launching PrismML, an AI lab with Caltech origins that is centered on building the most concentrated form of intelligence. At PrismML, we believe that the next major leaps in AI will be driven by order-of-magnitude improvements in intelligence density, not just sheer parameter count. Our first proof point is the 1-bit Bonsai 8B, a 1-bit weight model that fits into 1.15 GBs of memory and delivers over 10x the intelligence density of its full-precision counterparts. It is 14x smaller, 8x faster, and 5x more energy efficient on edge hardware while remaining competitive with other models in its parameter-class. We are open-sourcing the model under Apache 2.0 license, along with Bonsai 4B and 1.7B models. When advanced models become small, fast, and efficient enough to run locally, the design space for AI changes immediately. We believe in a future of on-device agents, real-time robotics, offline intelligence and entirely new products that were previously impossible. We are excited to share our vision with you and keep working in the future to push the frontier of intelligence to the edge.

English

176

582

4.1K

1.3M

Dillon Erb retweetledi

turbopuffer@turbopuffer·12 Şub

queue.json on object storage is all you need to build a reliable distributed job queue → FIFO execution → at-least-once delivery → 10x lower tail latencies tpuf.link/queue

English

789

207.8K

Dillon Erb@dlnrb·8 Oca

@MichaelFlores @garrytan @ycombinator Looking into it thx for the heads up!

English

3.3K

Michael Flores@MichaelFlores·8 Oca

@dlnrb @garrytan @ycombinator Just a heads up the waitlist form doesn’t allow submitting nor look correctly positioned when opened from X browser on iOS. (Reach out if you need someone to help on web! michaelflores.io/contact)

English

4.6K

Dillon Erb@dlnrb·8 Oca

Excited to finally come out of stealth and share what we've been building! Introducing Autonomous — a superintelligent financial advisor at 0% advisory fees. We just announced our $15M fundraise led by @garrytan @ @ycombinator along with some other amazing investors! Get early access → becomeautonomous.com We are hiring across multiple roles in NYC and SF @a____t____g

English

111

1.4K

203.1K

Dillon Erb retweetledi

Y Combinator@ycombinator·8 Oca

The founders of Paperspace (YC W15) just announced Autonomous (@a____t____g), an AI-native wealth strategist that brings elite strategies used by the ultra-wealthy, now available to everyone at 0% advisory fees. Millions of people already ask AI what to do with money. Autonomous is building the missing piece: the Cursor "apply" button that connects your financial life with AI. Get early access: becomeautonomous.com

English

1.3K

152.5K

Dillon Erb@dlnrb·18 Ara

Great work 👏

xjdr@_xjdr

today we’re open-sourcing nmoe: github.com/Noumena-Networ… i started this because training deepseek-shaped ultra-sparse moes should be straightforward at research scale, but in practice it’s painful: - expert flops get stranded (router shatters your batch → tiny per-expert gemms → gpus idle) - router stability is fragile (especially without deepseek’s batch sizes) - data + mixtures dominate (proxy runs are useless if mixtures aren’t deterministic/resumable) nmoe is our attempt at a clean, production-grade reference path for moe training that you can actually read + modify (outside of the highly optimized kernels). what’s inside: - rdep (replicated dense / expert parallel): replicate dense/attention, shard experts, pool+dispatch routed tokens so per-expert batches are hot (no nccl all-to-all on the moe path; direct dispatch/return via ipc + nvshmem) - mixed precision experts (bf16/fp8/nvfp4), with a focus on killing the usual “mixed precision overhead” taxes - a frontier-ish data pipeline: deterministic mixtures, exact resume, and tooling for building/inspecting datasets (including hydra-style grading) - metrics + nviz: sqlite experiments + duckdb timeseries + a dashboard that reads from shared storage - container-first + toml-first, and intentionally narrow: b200-only (sm_100a), no tensor parallel, no expert all-to-all this repo started in the spirit of nanochat (small, hackable, end-to-end), then grew into a rewrite of a bunch of the core components we wish existed as a public reference for moe training. over the next few weeks i’ll post deep dives on: - rdep + why per-expert batch size is the whole moe problem - router stability in small runs - fp8/nvfp4 expert training without drowning in overhead - deterministic mixtures + why “close enough” sampling breaks proxy validity - the metrics/nviz stack and what we track that actually matters

English

2.3K

Dillon Erb@dlnrb·5 Ara

This is awesome

Will Bryk@WilliamBryk

We embedded all 5000+ NeurIPS papers! exa.ai/neurips Cool queries: - "new retrieval techniques" - "the paper that elon would love most" - "intersection of coding agents and biology, poster session 5" It uses our in-house model trained for precise semantic retrieval 😌

English

1.7K

Dillon Erb retweetledi

NIK@ns123abc·1 Ara

DeepSeek: “Frontier has no knowledge advantage. Compute is the only serious differentiator left. Time to get more GPUs.”

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

enough preamble. The most important part in every Whale paper, as I've said so many times over these years, is “Conclusion, Limitation, and Future Work”. They say: Frontier has no knowledge advantage. Compute is the only serious differentiator left. Time to get more GPUs.

English

166

3.1K

297.4K

Dillon Erb@dlnrb·1 Ara

“Speciale” — the high-performance, track-focused version

DeepSeek@deepseek_ai

🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech report: huggingface.co/deepseek-ai/De… 1/n

English

647

Dillon Erb@dlnrb·1 Ara

Amazing 👏

DeepSeek@deepseek_ai

🏆 World-Leading Reasoning 🔹 V3.2: Balanced inference vs. length. Your daily driver at GPT-5 level performance. 🔹 V3.2-Speciale: Maxed-out reasoning capabilities. Rivals Gemini-3.0-Pro. 🥇 Gold-Medal Performance: V3.2-Speciale attains gold-level results in IMO, CMO, ICPC World Finals & IOI 2025. 📝 Note: V3.2-Speciale dominates complex tasks but requires higher token usage. Currently API-only (no tool-use) to support community evaluation & research. 2/n

English

526

Dillon Erb@dlnrb·24 Kas

Tool search tool!

Claude@claudeai

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

English

447

Dillon Erb@dlnrb·24 Kas

Amazing

Keller Jordan@kellerjordan0

New training speed record for @karpathy's NanoGPT setup: 3.28 Fineweb val loss in 22.3 minutes Previous record: 24.9 minutes Changelog: - Removed learning rate warmup, since the optimizer (Muon) doesn't need it - Rescaled Muon's weight updates to have unit variance per param 1/5

English

348

Dillon Erb retweetledi

Sebastian Raschka@rasbt·23 Kas

Implemented Olmo 3 from scratch (in a standalone notebook) this weekend! If you are a coder, probably the best way to read the architecture details at a glance: github.com/rasbt/LLMs-fro…

Sebastian Raschka@rasbt

Olmo models are always a highlight due to them being fully transparent and their nice, detailed technical reports. I am sure I'll talk more about the interesting training-related aspects from that 100-pager in the upcoming days and weeks. In the meantime, here's the side-by-side architecture comparison with Qwen3. 1) As we can see, the Olmo 3 architecture is relatively similar to Qwen3. However, it's worth noting that this is essentially likely inspired by the Olmo 2 predecessor, not Qwen3. 2) Similar to Olmo 2, Olmo 3 still uses a post-norm flavor instead of pre-norm, as they found in the Olmo 2 paper that it stabilizes the training. 3) Interestingly, the 7B model still uses multi-head attention similar to Olmo 2. However, to make things more efficient and shrink the KV cache size, they now use sliding window attention (e.g., similar to Gemma 3.) Next, let's look at the 32B model. 4) Overall, it's the same architecture but just scaled up. Also, the proportions (e.g., going from the input to the intermediate size in the feed forward layer, and so on) roughly match the ones in Qwen3. 5) My guess is the architecture was initially somewhat smaller than Qwen3 due to the smaller vocabulary, and they then scaled up the intermediate size expansion from 5x in Qwen 3 to 5.4 in Olmo 3 to have a 32B model for a direct comparison. 6) Also, note that the 32B model (finally!) uses grouped query attention.

English

286

166.3K

Dillon Erb retweetledi

Ryan D’Onofrio@rsdgpt·23 Kas

I built osgrep It’s a local code search tool that understands natural language. Works as a standalone CLI or a plugin for Claude Code. No API keys or subscription. I wanted the power of "semantic search" without the latency, price, or privacy trade-offs. Video in realtime.

English

603

68.9K

Dillon Erb@dlnrb·20 Kas

This 100%

Liz Wessel@lizwessel

I’m meeting more + more investors (angels & VCs) who tell founders that they invest at the seed (to get a meeting) but when you look at all the companies they brag about having been an investor in, they actually came in at Series B-D when ‘likelihood of success’ was much more proven. Example: Someone I know invested in the recent SPV of Anthropic just to say he’s an angel in anthropic. And wild enough, it works for him to get into new deals — most founders don’t think to ask what stage he came in at.

English

293

Keşfet

@Etched @nomic_ai @MichaelFlores @garrytan @ycombinator @a____t____g @elonmusk @BarackObama