Jasper Lu

102 posts

Jasper Lu

Jasper Lu

@lu__jasper

teaching models to design @figma, formerly @nuro. Knicks in 5

NYC 가입일 Temmuz 2009
101 팔로잉90 팔로워
Jasper Lu 리트윗함
Josh Hart
Josh Hart@joshhart·
FROM NOW ON ADDRESS ME AS CHAMP! 🧡💙
English
4.1K
27.7K
225.1K
2.6M
Legion Hoops
Legion Hoops@LegionHoops·
BREAKING: Knicks will be flying back to New York to celebrate tonight, per @ChrisBHaynes
English
63
829
19.9K
1.1M
Jasper Lu
Jasper Lu@lu__jasper·
@ar0cket1 Have you tried this in the top k distillation setting as well (as opposed to just samples tokens)?
English
1
0
0
156
Jasper Lu
Jasper Lu@lu__jasper·
@QiaochuYuan You can probably test this hypothesis by comparing a diffusion LLM vs an auto regressive one of the same family, e.g diffusion Gemma vs the usual one
English
0
0
0
34
QC
QC@QiaochuYuan·
interesting hypothesis that the "not X, but Y" LLMism is an artifact of "not" being a high-probability completion since it can continue in so many different ways, and that other LLMisms can be understood similarly. anyone know if any work has been done on this?
QC tweet media
English
23
15
247
11.4K
Jasper Lu
Jasper Lu@lu__jasper·
@barrowjoseph One fun direction I've been wanting to play around with (once I figure out how to do it without breaking the bank) is to turn indexing into precomputing KV caches over an entire corpus and then dumping them into an object store for faster filtering
English
1
0
1
37
Joe Barrow
Joe Barrow@barrowjoseph·
@lu__jasper Thinking the same! Thankfully “sonnet-level models” are getting cheaper and smaller.
English
1
0
1
206
Jasper Lu
Jasper Lu@lu__jasper·
Been thinking about this topic a lot while playing around with OBLIQ-bench. IMO, hard search will increasingly converge towards map-filter workflows in the future. As small models will get smarter and compute gets cheaper, it's hard to imagine that search doesn't just become: have an agent retrieve as many relevant docs as possible and then filter through all of them with a Sonnet level model.
Joe Barrow@barrowjoseph

x.com/i/article/2065…

English
2
0
5
947
Jasper Lu
Jasper Lu@lu__jasper·
@edwardzhou_ /goal make me a benchmark to test /goal loops to the point of performance degradation
English
0
0
0
43
Edward
Edward@edwardzhou_·
now that loops are trendy… are there any benchmarks where we test a models’s ability to extend its TTC infinitely via standard loops & measure the point of performance degradation e.g. how good is it at following a minimal /goal setup?
English
1
0
0
83
Jasper Lu
Jasper Lu@lu__jasper·
@signulll Excited for this. Once on device ai is good enough, I think we’ll start to see intelligence embedded into apps in some more fun ways than just being a chatbot
English
0
0
0
235
signüll
signüll@signulll·
my lord i am convinced on device ai will be good enough very very soon which will finally enable zero marginal cost ai products. that means network effects can actually take place. this will be a huge shift for consumer experiences.
English
79
33
993
90.8K
Jasper Lu
Jasper Lu@lu__jasper·
Is there a name for this kind of collage aesthetic I've been seeing lately
Jasper Lu tweet mediaJasper Lu tweet media
English
0
0
2
77
Jasper Lu
Jasper Lu@lu__jasper·
I've noticed in my own daily use that previous LLMs are pretty bad at writing complex sft / rl pipelines unless I send VERY detailed prompts. From their report, seems like these are the use-cases they were targeting with nerfs. But...competing labs probably already have the right talent inhouse, so these nerfs probably wouldn't hurt that much.
English
0
0
0
1.4K
Jack Morris
Jack Morris@jxmnop·
An underrated part of this discussion is that (a) there's huge leverage in improving data, and (b) there's no way Anthropic could safeguard this xAI could instruct Fable to look through EVERY row of pretraining data and fix any typos and errors. this probably the single highest-leverage activity for a lab playing catchup and it's not possible for Anthropic to prevent this without completely kneecapping the model itself, because data quality work looks like any other kind of knowledge work ("check this text for errors", "rewrite this in a formal tone")
Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash. “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

English
15
2
161
107.4K
Jasper Lu
Jasper Lu@lu__jasper·
The new test of if you're really doing "cutting edge work" is whether Fable nerfs itself for you
English
0
0
1
48
Jasper Lu
Jasper Lu@lu__jasper·
@SaiMandhan Always found it a little odd that RL env companies command such high multiples. Manual creation of environments has always felt a little bitter lesson pilled to me.
English
0
0
0
71
Sai Mandhan
Sai Mandhan@SaiMandhan·
I’m curious how durable these RL env / human data companies are long term They’re essentially just selling shovels until the mine learns to dig itself If RSI takes off, models will generate, solve, critique, and expand their own curricula faster than any human can design new environments Feels very much like a business model with an expiration date They print money tho lol
English
18
1
104
22.4K
Jasper Lu
Jasper Lu@lu__jasper·
Starting to see more work targeting a key failure mode in on-policy (self) distillation: when the student's rollout drifts too far from the teacher's distribution, the reward signal on later tokens can get noisy. The common thread people seem to be converging on is the idea of teacher intervention: instead of training on raw student rollouts, you set up a teacher to help shape the trajectory before distillation. Three techniques that stood out to me: 1/ Pedagogical RL You first RL a copy of the base model to be good at taking in privileged information (e.g. an answer key) and generating distillation-friendly rollouts. Then, you sample from this model during training instead of student and run distillation. 2/ Trajectory-Refined Distillation For each training example, you first sample a rollout from the student, then ask a teacher model to rewrite the trajectory, potentially using some privileged information. In the end, you distill the rewritten rollout to the student. 3/ Speculative Knowledge Distillation A little older than current wave of OP(S)D techniques, but still interesting: during student sampling, you compare each token with the teacher's top-k. If it falls outside, you sample the token from the teacher instead. This helps keep the rollout from going too off-track. My thoughts: - All three approaches sort of blur the line of what "on-policy" really means. - Pedagogical RL feels the most elegant, but IMO requires a little too much effort to gain widespread adoption. - To me, something like SKD but rebuilt for self distillation really feels the most natural next step. Anything out there doing this already?
English
1
0
3
247
Jasper Lu
Jasper Lu@lu__jasper·
Awesome to see that Google is pushing on open-source diffusion LLMs. I've long thought that dLLMs could be a better fit than autoregressive LLMs for tasks like frontend design, and have been patiently waiting for open-source models to mature so that I can play around with them.
Google Gemma@googlegemma

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

English
0
0
2
126
Jasper Lu 리트윗함
Figma
Figma@figma·
Imagine a world where you could copy/paste websites into editable Figma layers (jk you don’t have to imagine you can do this now with our Chrome extension)
GIF
English
201
555
5.9K
2.3M