Derek Xu

49 posts

Derek Xu

@derekzxu

eng @fireworksai_hq / prev @tryramp, @ucberkeleymet

Berkeley, CA Katılım Nisan 2023

298 Takip Edilen109 Takipçiler

Derek Xu retweetledi

Fireworks AI@FireworksAI_HQ·7 Nis

Fireworks Training is now in preview. You can now full-parameter fine-tune Kimi K2.5 (1T params, 256k context) with custom loss functions (GRPO, DRO, DAPO, or bring your own) on managed infra. @genspark_ai built their proprietary model stack in four weeks. @vercel hit 93% error-free generation with RFT. @cursor_ai runs their RL rollout fleet on Fireworks. Full-parameter from 8B to 1T. Multi-LoRA serving. Managed or bring your own training loop. Your model is your product. Your data is your moat. fireworks.ai/blog/training-…

English

201

32.2K

Derek Xu retweetledi

Max Weinbach@mweinbach·28 Mar

Fireworks AI fire pass is so good It's Kimi K2.5 Turbo right now at like 250 tok/s and idk what the limits are but it's HIGH oh and free trial but $7 per week

English

453

67.4K

Derek Xu@derekzxu·25 Mar

@chuyishang incredible

English

692

Derek Xu retweetledi

chuyi shang@chuyishang·24 Mar

Wrote a deep dive on implementing a language model from scratch in JAX and scaling it with distributed training! If you’re coming from PyTorch and want to see how the same ideas look in JAX, or just want a hands-on intro to distributed training, check out this blog post: chuyishang.com/blog/2026/jax-… Comes with code + an assignment and test cases so you can follow along!

English

602

32.5K

Derek Xu retweetledi

Dmytro Dzhulgakov@dzhulgakov·20 Mar

Composer 2 beats Opus on TerminalBench at a fraction of the cost. The ingredients: coding focus only, data flywheel, cracked RL team, and infrastructure that can keep up. @FireworksAI_HQ powered the inference and RL scaling behind Composer 2. Scaling RL is still genuinely hard, and we're proud we could help make it less so. Congrats to @cursor_ai on shipping a great model!

Cursor@cursor_ai

Composer 2 is now available in Cursor.

English

36.9K

Derek Xu@derekzxu·17 Mar

@DimitrisPapail noticed tau bench airline wasn't called out anywhere here. any interesting findings there? my sense is labs are moving away from it (or using the modified Anthropic version), due to variance from the simulated user, which also probably makes it hard to predict.

English

Derek Xu retweetledi

Dimitris Papailiopoulos@DimitrisPapail·25 Şub

x.com/i/article/2026…

ZXX

1.1K

295.3K

Derek Xu retweetledi

Nishkarsh@contextkingceo·12 Mar

We've raised $6.5M to kill vector databases. Every system today retrieves context the same way: vector search that stores everything as flat embeddings and returns whatever "feels" closest. Similar, sure. Relevant? Almost never. Embeddings can’t tell a Q3 renewal clause from a Q1 termination notice if the language is close enough. A friend of mine asked his AI about a contract last week, and it returned a detailed, perfectly crafted answer pulled from a completely different client’s file. Once you’re dealing with 10M+ documents, these mix-ups happen all the time. VectorDB accuracy goes to shit. We built @hydra_db for exactly this. HydraDB builds an ontology-first context graph over your data, maps relationships between entities, understands the 'why' behind documents, and tracks how information evolves over time. So when you ask about 'Apple,' it knows you mean the company you're serving as a customer. Not the fruit. Even when a vector DB's similarity score says 0.94. More below ⬇️

English

620

635

3.8M

Derek Xu@derekzxu·12 Mar

it was great working on this together! kernel is an awesome tool :)

KERNEL@usekernel

we worked with @fireworksai_hq to make training VLM browser agents with open source tools as easy as possible.

English

438

Derek Xu retweetledi

KERNEL@usekernel·12 Mar

we worked with @fireworksai_hq to make training VLM browser agents with open source tools as easy as possible.

English

4.1K

Derek Xu retweetledi

Fireworks AI@FireworksAI_HQ·10 Mar

New blog from the team at Fireworks: Where training–inference parity breaks in MoE models Kernel fusions that are mathematically identical can still drift numerically. We walk through the bugs we hit while serving Kimi K2.5 and training Qwen3.5-MoE, and how we fixed them. Worth a read if you're building high-performance inference: fireworks.ai/blog/when-fast…

English

91.5K

Derek Xu@derekzxu·10 Mar

@itsayaanmomin tuff

English

Ayaan Momin@itsayaanmomin·10 Mar

coding /kōdiNG/ — noun asking your agent to write code handcrafting code — noun an outdated human practice involving writing software line by line

English

Derek Xu@derekzxu·5 Mar

@VihaarNandigala incredible

English

Vihaar Nandigala@VihaarNandigala·5 Mar

We built something a little dangerous for GTM teams. It’s called GTM Claw. A workflow library that turns OpenClaw + Claude Code into a customer-finding machine. Think: • Find people talking about your problem on Reddit, LinkedIn, Twitter • Identify companies showing buying signals • Enrich decision makers automatically • Check ICP fit • Push qualified leads into sequences or your CRM Just continuous discovery of in-market accounts. We’ve been using it internally at Orange Slice to: • scrape LinkedIn reactors on competitors’ posts • detect product complaints on Reddit • identify operators switching companies • surface companies hiring GTM engineers We’re opening GTM Claw in beta. Only 100 users for the next couple months. ~30 spots already filled. If you want access: Comment “CLAW” and follow me so I can dm you the link. Turn Claude Code + Open Claw into a GTM menace. 🦞

English

396

700

119.2K

Derek Xu@derekzxu·6 Şub

@VihaarNandigala let’s gooooooooooo

English

110

Derek Xu retweetledi

Vihaar Nandigala@VihaarNandigala·6 Şub

We just raised a $5.3M seed round for Orange Slice, co-led by 1984 Capital and Moxxie Ventures, with participation from angels like Paul Graham. We’re building AI agents, inside a spreadsheet, that help sales teams find companies that already want to buy. The reality is most sales teams don’t struggle with effort - they struggle with timing. Reps spend huge amounts of time working static lists and broad targeting, chasing leads that were never going to convert. That creates noise, low reply rates, and wasted cycles. Top companies like Ramp solve this with dedicated growth engineers building internal data workflows. We’re making that same capability accessible to everyone else. At its core, the challenge is simple: finding customers who already have the problem you solve. Orange Slice turns the spreadsheet into a system for discovering buying signals - agents research company sites, news, social signals, and niche sources like court records or building permits, then structure that information directly into columns teams can act on. Not “might be a fit.” But “likely in-market.” So instead of guessing who to target, teams build and refine living lists of high-intent accounts inside a sheet. Still early. Still learning. But we’re excited to keep building. Kishan and I met sophomore year on a Bollywood dance team at Michigan — and I couldn’t ask for a better co-founder. Grateful to our team, customers, and investors for believing in this vision.

English

673

69K

Derek Xu retweetledi

Cal Lavicka@CalLavicka·6 Şub

LLMs suck at creating tests. Their tests are too basic and they cheat all the time, validating buggy behavior to get 100% test coverage rather than flagging real bugs. So, I created an opencode plugin to fix this

English

3.5K

Derek Xu retweetledi

Dmytro Dzhulgakov@dzhulgakov·27 Oca

🌕 Kimi K2.5 = open SOTA reasoning + vision + 256K context + agentic coding 🏎 200+ t/s on @FireworksAI_HQ (soon even faster) ✅ Nails @simonw's "pelican on a bike" test in both directions Try it now on Fireworks and hats off to @Kimi_Moonshot

English

7.6K

Derek Xu@derekzxu·14 Oca

@jasminecoded_ where is dublin california 😓

English

241

jasmine@jasminecoded_·14 Oca

Ultimate Silicon Valley City Tier List: S+: Portola Valley, Woodside, Atherton S: Los Altos (Hills), Palo Alto (Old Palo Alto only - Crescent Park, Professorville) A+: Menlo Park, Los Altos (flats) A: New Palo Alto (Midtown, Barron Park, South PA) B: Mountain View, Cupertino, Sunnyvale (North of El Camino) C: Redwood City, Sunnyvale (South) D: San Mateo, Santa Clara, Fremont F: San Jose, Milpitas You’re welcome.

English

137

807

158.8K

Derek Xu@derekzxu·23 Ara

if you're curious, the example was a text-to-sql dataset with the full runnable code here: github.com/eval-protocol/…

English

Derek Xu@derekzxu·23 Ara

just read this blog, reminded me of an experiment i just ran! super cool to see these sigmoid learning curves appear in post-training :)

Dwarkesh Patel@dwarkesh_sp

New blog post. Recently, people have been talking about how it takes way more compute to get a single sample in RL than it does in pretraining. But this is only half the trouble. In RL, that expensive sample is also usually giving you way fewer bits. And this has implications for how well RLVR will scale, plus helps us understand why self-play and curriculum learning are so helpful for RL, why RLed models are bizarrely jagged, and how we can think about what humans do differently. Link below.

English

104

Keşfet

@genspark_ai @vercel @cursor_ai @chuyishang @FireworksAI_HQ @DimitrisPapail @hydra_db @itsayaanmomin