jason

2.8K posts

jason

@jvmncs

❤️s/RTs are randomized and differentially private.

Brooklyn Bergabung Şubat 2010

1.3K Mengikuti1.1K Pengikut

jason@jvmncs·1d

@atoniolo76 @peywalt I don't talk to interns

English

Alessio Toniolo@atoniolo76·1d

@peywalt @jvmncs response?

Español

Peyton Walters@peywalt·2d

claude is discovering diloco from first principles and you're bearish on token spend???

Aryaman Arora@aryaman2020

yesterday I was debugging a poorly-performing training run with Claude Code and I discovered that instead of training on 30 batches of data it had somehow decided to train a new model for 500 steps on each batch and then average the 30 sets of weights

English

jason@jvmncs·2d

@cyrusasg tweet mentioned 🐐

Modal@modal

Modal Auto Endpoints provide state-of-the-art open source inference perf with a click. Learn how we developed our low latency inference playbook with @DecagonAI, delivering responses 60ms faster than the best proprietary provider. modal.com/blog/achieve-s…

English

313

jason@jvmncs·3d

Engram is one of the more sophisticated teams I’ve had the pleasure of working with at Modal exciting launch, looking forward to assisting with even weirder deployments in the future!

Sabri Eyuboglu@EyubogluSabri

Modal's been super important for our velocity over the last 6 months - Training on each user's context means scaling out to thousands of GPUs in quick bursts. Modal allowed us to do this from day zero, before we could keep a large committed cluster hot - Our research team experiments with weird parameterizations all the time and needs to make changes to our inference and training servers. Modal makes it super easy for everyone on the team to deploy new endpoints for dogfooding and eval

English

8.2K

jason@jvmncs·3d

I've been forward-deploying a lot of LLM inference for our customers over the last year our team consolidated all that experience into a self-serve, automatically-optimized endpoints product some very slick features on the way soon 🫡

Modal@modal

It is not too late to _actually_ own your inference. Introducing: Modal Auto Endpoints.

English

4.3K

jason@jvmncs·3d

@deepfates long pole

English

179

🎭@deepfates·4d

We know about "load-bearing" "smoke test" "genuinely" "explicitly" etc.. What other llmisms have you found?

English

398

110

986

121.2K

jason me-retweet

Connor@cnnradams·4d

light work

English

18.3K

jason@jvmncs·6d

@francoisfleuret oh, but it does! modal.com/blog/spec-is-a…

English

442

François Fleuret@francoisfleuret·6d

Speculative decoding is the closest you can get to a free lunch method. It is beautiful and astounding. I am surprised that it does not play a greater role in "AI".

English

440

48.6K

jason@jvmncs·6d

@tenderizzation @fujikanaeda spotted

English

237

tender@tenderizzation·6d

meet your moots (even if it means going to south bay)

tender@tenderizzation

1st and 2nd degree moots: I will be hosting another movie club in south bay in june, DM me for details Paris, Texas (1984)

English

9.5K

jason@jvmncs·20 Haz

buried in this post is a note about our from-scratch framework for draft model training by @_dcw02 I’ve used it to chase down a few research ideas, and damn does that thing rip. such a joy to use

Charles 🎉 Frye@charles_irl

Speculation Is All You Need. In this blog post, we announce the co-release (w/ Z Lab) of six more state-of-the-art DFlash speculators for @Alibaba_Qwen 3.x. Over 1k output tps for 3.5 122B-A10B on a B200. Read the blog for why we're all-in on spec dec. modal.com/blog/spec-is-a…

English

8.1K

jason me-retweet

Charles 🎉 Frye@charles_irl·20 Haz

English

100

697

185.3K

jason@jvmncs·19 Haz

current status

English

1.2K

jason me-retweet

Cedar You@our_decay·18 Haz

Watched a cute animal video that I knew to be AI all the way through

English

108

1.8K

27.1K

656.3K

jason me-retweet

David Wang@_dcw02·15 Haz

9+ accept lengths on coding workloads generic drafter btw qwen 397b 4x faster repro btw dflash go brrr

Modal@modal

We worked with @lmsysorg and z-lab.ai to - integrate DFlash spec into @sgl_project - make it faster with overlap - train a DFlash drafter for @Alibaba_Qwen 397B-A17B The result: up to 4.3x greater throughput over baseline and 1.5x over native MTP.

English

9.8K

jason@jvmncs·14 Haz

@deepfates @aivillage_dc stolen valor

Svenska

481

🎭@deepfates·14 Haz

yep. That looks like an event designed by AIs all right

English

229

15.5K

jason@jvmncs·14 Haz

@ellev3n11 this is p good github.com/Noumena-Networ…

English

269

Federico Cassano@ellev3n11·14 Haz

interesting project: training codebase that gets high utilization for training small small models. big runs are cool, but lots of small runs is also cool.

English

7.6K

jason@jvmncs·13 Haz

@ajhinh @modal @charles_irl LFG

199

Andrew Hinh@ajhinh·13 Haz

After graduating this weekend, I'll be joining @modal as a Developer Relations Engineer! I want to describe how I got here, as my path was rather unconventional. My first "connection" to Modal was back in 2022: after graduating high school, I took @charles_irl's Full Stack Deep Learning course, where I created admirer, a flavor of a VLM powered by AWS Lambda and GPT-3 (for those who remember!). I suppose my age and being a one-person team left an impression on him, and we continued to stay in touch. When he discovered Modal, I quickly became a user and found it just so delightful and easy to use. Plus, the free $30/month was more than enough for personal projects and experimentation, and I was always telling others to try it out. In fact, during a summer internship at an edtech startup, I helped secure a $5000 grant that allowed us to switch from to Modal for our fine-tuning and deployment jobs. Last summer, Charles unexpectedly offered an internship on the growth team, where I was initially uncertain how I'd use my full-stack ML experience at an infrastructure company. As it turned out, quite nicely: while contributing to the wide-ranging (and actually helpful!) set of examples (modal.com/docs/examples), I quickly saw that a sufficiently useful and captivating example empowered devs to take the next step. Soon after, I was tasked with showing how to mesh together RL, LLMs, and Modal Sandboxes. After a weekend or two of experimentation, I came up with a web demo of Street Fighter III where you could play against an RL-trained Qwen 3-8B (btw, you can try it out here: andrewhinh--sf3.modal.run). The most fun part for me, besides getting it to work as well as it did, was seeing the joy and excitement from the team. What makes me so excited to rejoin is that, really, I'm just continuing where I left off last summer to spread the good word about Modal. I can't thank Charles, @bernhardsson, @akshat_b, and the team at Modal enough for the opportunity to do so. Stay tuned for more!

English

100

12.9K

jason me-retweet

Leon@iamleonli·9 Haz

How far can we compress the discrete tokens in an LLM's context into compact latent vectors? With the right training recipe at large scale, our Latent Context Language Models (LCLMs) compress context up to 16× and land on a new Pareto frontier for long-context inference. 🧵(1/n)

English

8.3K

jason me-retweet

joy liu@qjoyliu·8 Haz

The future of training is open source. Super excited to announce that we've joined forces with HuggingFace, Nvidia, Meta, Mercor and other leading companies to support OpenEnv :)

Ben Burtenshaw@ben_burtenshaw

So excited to be opening up OpenEnv to the whole community. It will now be owned by @huggingface , Meta-PyTorch, @reflection_ai , @UnslothAI , @modal, @PrimeIntellect , @NVIDIAAI , @mercor_ai , and @fleet_ai . the reason is: frontier labs train the model and the harness together, so the model is fitted to its harness. that coupling is a chunk of why claude code and codex feel so good. open source can't do that. you bring whatever harness, whatever model, whatever env, whatever trainer. which is the whole point of open source and also the problem for training. openenv is the socket in between all of this. in short: it's a protocol layer, not a reward framework. it does not have opinions about your rewards or your training loop. those live in the libs that are actually good at them. read more in the blog post. it's early, come break it.

English

5.7K

jason@jvmncs·5 Haz

how I imagine a @cyrusasg tweet is born

Cyrus@cyrusasg

I get asked a lot about what actually matters in the inference space. The conversation has shifted as OSS frameworks have closed much of the gap on raw latency, but workload-specific tuning remains an open problem. Increasingly, more differentiation lives in the product layer around infrastructure. What separates providers now: Latency: for synchronous, latency-sensitive workloads, the ability to tailor deployments to meet specific needs (whether TTFT or e2e) is critical and highly dependent on token profiles and use case requirements. Throughput & cost: these form a pareto frontier with latency. Reliability: table stakes. Observability and alerting are a big part of this. Developer velocity: underrated on most lists. Self-serve configurability is a massive force multiplier for sophisticated teams. Autoscaling flexibility: not just "does it scale" but what triggers it and how fast. Capacity: still a real constraint for newer hardware, and the geographic dimension for colocation can make this a harder constraint.

English

5.5K

jason me-retweet

Modal@modal·1 Haz

Reinforcement learning has exploded on Modal, and we've been cooking. Here's a review of lessons learned helping teams train at scale, the patterns we kept seeing, and an open-source library to get started with RL on Modal quickly.

English

272

100K

Jelajahi

@atoniolo76 @peywalt @cyrusasg @deepfates @francoisfleuret @tenderizzation @fujikanaeda @_dcw02