Ebtesam

35 posts

Ebtesam banner
Ebtesam

Ebtesam

@ebtesamdotpy

AI/SE Research | CS PhD @GeorgeMasonU | Prev @MSFTResearch

Washington, DC Katılım Ekim 2021
199 Takip Edilen114 Takipçiler
Ebtesam retweetledi
Jia-Bin Huang
Jia-Bin Huang@jbhuang0604·
Scrolling the AI news timeline as a researcher feels like a teenager browsing Instagram: "Everyone else has figured everything out!" Reliable home robots imminent, 100× productivity AI agents, insane visual generation ... Exciting, but anxiety-inducing. What am I doing? 😬
English
36
29
534
32.5K
Ebtesam retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
crazy that they called it context window when attention span was right there
English
132
363
7.3K
333K
Ebtesam retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
As we all know by now, reasoning models often generate longer responses, which raises compute costs. Now, this new paper (arxiv.org/abs/2504.05185) shows that this behavior comes from the RL training process, not from an actual need for long answers for better accuracy. The RL loss tends to favor longer responses when the model gets negative rewards, which I think explains the "aha" moments and longer chains of thought that arise from pure RL training. I.e., if the model gets a negative reward (i.e., the answer is wrong), the math behind PPO causes the average per-token loss becomes smaller when the response is longer. So, the model is indirectly encouraged to make its responses longer. This is true even if those extra tokens don't actually help solve the problem. What does the response length have to do with the loss? When the reward is negative, longer responses can dilute the penalty per individual token, which results in lower (i.e., better) loss values (even though the model is still getting the answer wrong). So the model "learns" that longer responses reduce the punishment, even though they are not helping correctness. In addition, the researchers show that a second round of RL (using just a few problems that are sometimes solvable) can shorten responses while preserving or even improving accuracy. This has big implications for deployment efficiency.
Sebastian Raschka tweet media
English
33
183
1.2K
108.2K
Ebtesam retweetledi
Tristan T
Tristan T@trirpi·
Tristan T tweet media
ZXX
124
1K
9.4K
677.7K
Ebtesam retweetledi
I Am Devloper
I Am Devloper@iamdevloper·
vibe coding, where 2 engineers can now create the tech debt of at least 50 engineers
English
168
1.5K
14.7K
647.1K
Ebtesam retweetledi
Nabeel S. Qureshi
Nabeel S. Qureshi@nabeelqu·
For the confused, it's actually super easy: - GPT 4.5 is the new Claude 3.6 (aka 3.5) - Claude 3.7 is the new o3-mini-high - Claude Code is the new Cursor - Grok is the new Perplexity - o1 pro is the 'smartest', except for o3, which backs Deep Research Obviously. Keep up.
English
241
611
11.3K
1.1M
Ebtesam retweetledi
Hamel Husain
Hamel Husain@HamelHusain·
New post re: Devin (the AI SWE). We couldn't find many reviews of people using it for real tasks, so we went MKBHD mode and put Devin through its paces. We documented our findings here. Would love to know if others have had a different experience. answer.ai/posts/2025-01-…
Hamel Husain tweet media
English
59
167
1.5K
571.6K
Ebtesam retweetledi
Jiaxin Pei
Jiaxin Pei@jiaxin_pei·
It's common to add personas in system prompts, assuming this can help LLMs. However, through analyzing 162 roles x 4 LLMs x 2410 questions, we show that adding a persona mostly has *no* statistically significant difference from the no-persona setting. If there is a difference, it is *negative*. It's time to rethink the usage of personas in system prompts!
Mingqian Zheng@elisazmq_zheng

🎙️ What if the way we prompt LLMs might actually hold it back? 🚨 Assigning personas like "helpful assistant" in system prompts might *not* be as helpful as we think! ✨ Check out our work accepted to Findings of @emnlpmeeting ✨ 📜 arxiv.org/abs/2311.10054 🧵 [1/7]

English
1
14
68
9.3K
Ebtesam retweetledi
Ishan
Ishan@radshaan·
If you get frequent urges to go deep into a subject, do not ignore them Pick a weekend, stop everything else, and give in to the urge Fresh insights await at the other end
English
25
396
3.7K
114.5K
Ebtesam retweetledi
Upol Ehsan
Upol Ehsan@UpolEhsan·
Is hallucination in LLMs inevitable even with an idealized model architecture and perfect training data? This work argues YES and offers a formal proof. Let's dig in ⤵ 🧵1/n
Upol Ehsan tweet media
English
15
68
326
59.3K
Ebtesam retweetledi
Edward Grefenstette
Edward Grefenstette@egrefen·
Instead, evaluation processes should track the diverse notions of extrinsic utility which are to be found in both everyday usage of our technology today, but also anticipating how people might use technology tomorrow.
Kingston upon Thames, London 🇬🇧 English
1
2
10
1.5K
Ebtesam retweetledi
Dr Meming
Dr Meming@Dr_Meming·
Heck
Dr Meming tweet media
English
14
201
2.2K
117.8K
Ebtesam retweetledi
Dr. Amy Lee
Dr. Amy Lee@minisciencegirl·
Never name a manuscript draft "_FINAL"
English
64
154
1.5K
155.4K
Ebtesam retweetledi
Dr Meming
Dr Meming@Dr_Meming·
Academic research: months of experiments and data analysis that ends up being a few sentences in a paper
Dr Meming tweet media
English
16
672
5.8K
315.7K
Ebtesam retweetledi
will depue
will depue@willdepue·
I feel like large language model feels a bit reductive when GPT-2 is in the same class as GPT-4. gigantic language models? enormous language models? big ass language models? Nimitz-class language models? better suggestions needed
English
69
3
222
32K
Ebtesam retweetledi
MIT CSAIL
MIT CSAIL@MIT_CSAIL·
Happy birthday to Python creator Guido van Rossum. The open source language was named after comedy troupe Monty Python: bit.ly/2B8R7h6 Image v/Midjourney
MIT CSAIL tweet media
English
8
167
775
56.5K
Ebtesam retweetledi
François Chollet
François Chollet@fchollet·
When I got started with programming, I debugged using printf() statements. Today, I debug with print() statements. The purpose of debugging is to correct your mental model of what your code does, and no tool can do that for you. The best any tool can do is provide visibility into code execution, and targeted print statements already do a tremendous job at that.
MIT CSAIL@MIT_CSAIL

“The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.” — Brian Kernighan, co-creator of Unix

English
78
209
2.2K
564.5K