Nan Jiang (@nanjiangwill) - Twitter Profili | Zamantika Mersobahis Locabet

Nan Jiang retweetledi

dr. jack morris@jxmnop·9 Mar

x.com/i/article/2031…

ZXX

32

160

1.9K

391.9K

Nan Jiang@nanjiangwill·12 Ara

🫡

LMSYS Org@lmsysorg

Miles Series Release: True On-policy for VLMs in FSDP + SGLang! Our Miles team achieved precision alignment between FSDP and SGLang for LLMs as early as two months ago, ensuring that the log probs obtained from SGLang inference match perfectly with the log probs from the FSDP forward pass, with an absolute KL divergence of 0. Thanks to Nan Jiang from our community—the "Greek God of VLM"—we have now successfully aligned VLM training and inference on FSDP. You can now enjoy VLM training with strictly zero KL divergence!

ART

0

2

217

Nan Jiang retweetledi

Christopher Manning@chrmanning·7 Eki

This paper by Ivan Lee (@ivn1e) & @BergKirkpatrick was great! Best thing I’ve seen at #COLM2025 so far! Readability ≠ Learnability: Rethinking the Role of Simplicity in Training Small Language Models openreview.net/forum?id=AFMGb…

English

5

33

274

24.1K

Nan Jiang retweetledi

Songlin Yang@SonglinYang4·24 May

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

English

9

86

546

76.6K

Nan Jiang@nanjiangwill·22 Nis

amazing Jason, amazing Nexad, please check this out!

Jason Hu@onjas_6

Let’s be real—ads have annoyed me for years. Pop-ups, spam, etc… while the world is moving towards AGI, the ad world felt stuck in the past. So I decided to flip the script. Today, I’m proud to share: Nexad has raised a $6M seed round, led by @a16z SR04, @Prosus_Ventures , @p72vc , Carya, and more. 🧵

English

1

0

1

270

Nan Jiang retweetledi

Wenting Zhao@wzhao_nlp·4 Mar

Coding agents can debug their own outputs, but what if none of the fixes are correct? We overcome sparse rewards by making them continuous📈 Instead of having binary execution rewards, we introduce a learned verifier to measure how close the current solution is to a correct one📏

GIF

English

2

25

203

31K

Nan Jiang retweetledi

Sasha Rush@srush_nlp·26 Eyl

I teach a class where students code up an ML library from scratch in Python. Wenting showed me that a Claude Agent (with interactive unit test feedback and the spec) could solve it 100%. We thought it would be fun to scale this idea to every Python library in the world.

Wenting Zhao@wzhao_nlp

Introducing the commit0 interactive environment for coding agents. Challenge: generate Python libraries from scratch. Commit0 is designed with interactivity, dependencies, and specifications as first-class considerations. We include a benchmark with 50+ challenging libraries.

English

8

36

390

51.8K

Nan Jiang@nanjiangwill·26 Eyl

So... can agents now build a package from scratch? Test them on Commit0! This is an amazing and fun project this summer! Huge thanks to Wenting and to everyone in the lab for their support and guidance! 🚀👏

Wenting Zhao@wzhao_nlp

Introducing the commit0 interactive environment for coding agents. Challenge: generate Python libraries from scratch. Commit0 is designed with interactivity, dependencies, and specifications as first-class considerations. We include a benchmark with 50+ challenging libraries.

English

0

2

9

367

Nan Jiang retweetledi

Jason Hu@onjas_6·29 Mar

🚀 Introducing RouterBench, the first comprehensive benchmark for evaluating LLM routers! 🎉 A collaboration between @withmartian and Prof. @KurtKeutzer at @UCBerkeley, we've created the first holistic framework to assess LLM routing systems. 🧵1/8 To read more: blog.withmartian.com/post/router-be…

English

8

30

130

50.8K

Nan Jiang@nanjiangwill·11 Şub

@DimitrisPapail @Krafton_inc @SeoulNatlUni @UMich @UWMadison interesting work! we also look into this problem across other architectures! x.com/nanjiangwill/s… interesting to explore other alternative architectures

Nan Jiang@nanjiangwill

❓Are attention-based models needed for In-Context Learning(ICL)? 🤔Can emerging architectures perform ICL? 🎉Check out our #ICLR2024 paper "Exploring the Relationship Between Model Architecture and In-Context Learning Ability" 🎉 #LLM Paper: arxiv.org/abs/2310.08049 🧵[1/9]

English

0

2

148

Dimitris Papailiopoulos@DimitrisPapail·6 Şub

arxiv drop tonite "Can Mamba Learn How to Learn?: A Comparative Study on In-Context Learning Tasks" with all-star set of collaborations from @Krafton_inc @SeoulNatlUni @UMich and @UWMadison

English

8

55

367

46.7K

Nan Jiang@nanjiangwill·9 Şub

@akyurekekin Yes!! also interesting to see what might happen when adding syntax information

English

0

22

Ekin Akyürek@akyurekekin·9 Şub

@nanjiangwill you are asking if I train a model with ICL with a class of languages (we did w/ regular languages in the paper), how well they generalize other languages? This is a great follow-up Q. that we mentioned in the intro.

English

1

0

1

74

Ekin Akyürek@akyurekekin·8 Şub

A really interesting corollary that I realized: copying ("x|x") = ICLL with singleton languages

Ekin Akyürek@akyurekekin

@EranMalach @brandfonbrener Ah, I found the connection between the copying task ("x|x") and in-context language learning (ICLL): copying is a subset of ICLL w/ regular languages such that the languages consist of a single element and each instance have 2 examples. copying = ICLL with singleton languages

English

2

0

11

1.9K

Nan Jiang@nanjiangwill·9 Şub

@akyurekekin ah sorry for not making it clear, mainly want to discuss OOD cases. interesting to think about with ICLL. what if we try to get rot-2("f|?") with rot-2("x|z"), rot-2("a|c") in context, but the models are trained on rot-1 examples?

English

1

0

1

84

Ekin Akyürek@akyurekekin·9 Şub

@nanjiangwill could you open "complicated"? when I say "x|x" I meant x as a string/sequence token not a single character, the task from the repeat after me paper

English

1

0

93

Nan Jiang retweetledi

Wai Keen Vong@wkvong·1 Şub

1/ Today in Science, we train a neural net from scratch through the eyes and ears of one child. The model learns to map words to visual referents, showing how grounded language learning from just one child's perspective is possible with today's AI tools. science.org/doi/10.1126/sc…

English

50

678

2.5K

1M

Nan Jiang@nanjiangwill·7 Şub

@zacknovack 🥳🥳🥳🥳 thanks Zac!!

English

0

1

87

Zachary Novack@zacknovack·7 Şub

A sick study on how model architecture (i.e. attention alternatives) influence ICL ability Check it out! (p.s. @nanjiangwill is on the PhD market!! 🥳)

Nan Jiang@nanjiangwill

❓Are attention-based models needed for In-Context Learning(ICL)? 🤔Can emerging architectures perform ICL? 🎉Check out our #ICLR2024 paper "Exploring the Relationship Between Model Architecture and In-Context Learning Ability" 🎉 #LLM Paper: arxiv.org/abs/2310.08049 🧵[1/9]

English

1

0

9

917

Nan Jiang@nanjiangwill·7 Şub

We're excited to contribute to the exploration of alternative architectures and emergent capabilities!! 🎉🎉🎉 Huge congrats and many thanks to Ivan Lee and Prof. Taylor Berg-Kirkpatrick @BergKirkpatrick 🧵[9/9]

English

0

1

240

Nan Jiang@nanjiangwill·7 Şub

Section 3.1: A Simple Few-Shot Natural Language Task 1) Stronger models tend to have worse performance when not relying on semantics 2) Most architectures fail in the flipped setting while Hyena is the best model compared to other models that is not pre-trained. 🧵[8/9]

English

1

0

1

343

Nan Jiang@nanjiangwill·7 Şub

❓Are attention-based models needed for In-Context Learning(ICL)? 🤔Can emerging architectures perform ICL? 🎉Check out our #ICLR2024 paper "Exploring the Relationship Between Model Architecture and In-Context Learning Ability" 🎉 #LLM Paper: arxiv.org/abs/2310.08049 🧵[1/9]

English

1

29

123

16.6K

Nan Jiang

Keşfet