gene yang

22 posts

gene yang

gene yang

@geneyang4

@scsatcmu

Katılım Ocak 2014
1.2K Takip Edilen78 Takipçiler
gene yang retweetledi
Swadesh Sistla
Swadesh Sistla@SwadeshSistla·
How do we steer AIs toward safe multi-agent cooperation? Idea: instead of acting as black-box policies, agents submit open-source programs to act on their behalf. Can transparency enable trust? Check out our NeurIPS 2025 paper: arxiv.org/abs/2512.00371🧵
Swadesh Sistla tweet media
English
2
9
27
6.4K
gene yang retweetledi
Matthew Yang
Matthew Yang@_matthewyang·
Almost nobody does proper credit assignment in RL-on-LLMs 💀 Learning only from the final outcome → punishes good steps 😭 → rewards bad steps 😭😭 🚨New Paper🚨 A new paradigm for credit assignment: LLMs identify their own mistakes ❌ and propose targeted fixes 🎯 🧵[1/n]
Matthew Yang tweet media
English
8
25
193
11.1K
Jeffrey Wang
Jeffrey Wang@jeffreygwang·
The Amalfi Coast! (…as seen from Strawberry Hill in Golden Gate Park)
Jeffrey Wang tweet media
English
3
0
17
1.2K
Quentin Romero Lauro
Quentin Romero Lauro@Qromerolauro·
Editing front-end is now as easy as commenting on a page and asking for a change. Five weeks ago we wrote our first line of code on Inspector. Now, we're making it available for everyone to use. Try it out now! Much more coming soon ;)
English
45
32
335
36K
gene yang retweetledi
Miles Turpin
Miles Turpin@milesaturpin·
Thrilled to share that I joined @Meta to work on safety and alignment evaluations for our Superintelligence effort. Excited to keep working with @_julianmichael_ and @summeryue0!
English
11
0
128
15.2K
Kashu Yamazaki
Kashu Yamazaki@kashu_yamazaki·
この度、Forbes JAPANが選ぶ「世界を変える30歳未満の30人」に選出いただきました! これからも精進して研究します。日本を再びロボットの中心地に!!!!! @forbesjapan_30 #u30fj
Kashu Yamazaki tweet media
日本語
15
22
400
57.7K
jet
jet@jw_source·
I'm super excited to announce that I’ve joined the Cerebras Fellows Program! 🚀 A huge thank you to @cerebras and @BainCapVC for this incredible opportunity to start building the next generation of AI applications!
Cerebras@cerebras

Come build with us! Cerebras inference is powering the next generation of AI applications — 70x faster than on GPUS. We are so excited to announce the Cerebras Fellows Program, in partnership with @BainCapVC. The fellows program invites engineers, researchers, and students to build impactful, next-level products unlocked by instant AI. Join us for exclusive access to free Cerebras inference, higher rate limits, and more. Learn more at cerebras.ai/fellows

English
3
0
5
576
gene yang
gene yang@geneyang4·
@rohankalia_ Instruct models are unfunny because their goal is helpfulness; my guess is the predictability issue can probably be circumvented with hidden cot + seeding
English
0
0
2
43
rohan
rohan@rohankalia_·
a simple case for why llms will not be funny in the foreseable future: humor can be modeled as a next token prediction task for the word/action that (a) minimizes crossentropy (it contextually makes sense and is satisfying) and (b) is an unlikely logprob (people don't see it coming). during training llms minimize crossentropy and the sample the highest logprob during inference. even if you increase temperature, you're relying on rng that it finds the "correct" thing to say. RLHFing on humor is also not good as of now, I think mostly because simulating something both correct and unexpected during post-training conflicts too heavily with the base model. --> llms cannot be funny (for now)
English
2
0
4
210
Michael Liao
Michael Liao@michaelcfix·
bro how is a flight from Toronto to VANCOUVER more expensive than Toronto to SF??
English
1
0
3
283
Bruce Tang
Bruce Tang@brucetangg·
if you had one month with no commitments & no thinking about ai what would you do?
Bruce Tang tweet media
English
1
0
5
289
gene yang retweetledi
Andy Zou
Andy Zou@andyzou_jiaming·
We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵
Andy Zou tweet mediaAndy Zou tweet media
English
72
385
2.2K
525K
gene yang retweetledi
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains 'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative improvement on HealthBench-1k compared to simple Likert-based approaches, while matching or surpassing the performance of reward signals derived from expert-written references."
Tanishq Mathew Abraham, Ph.D. tweet media
English
12
76
570
83.3K
Chase Brower
Chase Brower@ChaseBrowe32432·
It sort of does—CoT RL models are usually run with some temperature (1), which means they have the possibility to get derailed with a bad token sample. Even 1 chance to re-evaluate and delete an erroneous bad token sample could be useful (then it gets to re-sample). But it would be nice to do something more like the latter, either allow it to delete patches, or perhaps give it some sort of state so it can remember the deleting actions. Maybe a patch in context, or an actual state space (if you’re really willing to screw with the architecture)
English
1
0
3
85
Chase Brower
Chase Brower@ChaseBrowe32432·
Has anyone tried adding backspace to the LLMs' vocabulary? Would be hard to incorporate in pre-train regime, but you could add to cold-start SFT for RL and then use for RL
English
8
0
30
1.9K
Justin
Justin@justinwangx·
how is it october already
English
3
0
14
1.2K
gene yang retweetledi
Dan Hendrycks
Dan Hendrycks@hendrycks·
@polynoamial For one of them I want it to have questions that are harder than what humans can answer so that it can measure different levels of superintelligence.
English
4
3
124
8.6K