Jebish7

39 posts

Jebish7 banner
Jebish7

Jebish7

@jebish7

Siuuuuuu

Katılım Şubat 2023
225 Takip Edilen22 Takipçiler
AWS Developers
AWS Developers@awsdevelopers·
Reply to this tweet with "AWS" and we’ll tell you which AWS Service you are
English
3.3K
55
2K
547.2K
Jebish7
Jebish7@jebish7·
@universeinanegg Thanks for bringing this to my feed. Went through few threads. My biggest takeaway is that, we are really lacking in our evaluations of models. Really fascinating idea this, kinda also shows how agents operate in social settings.
English
0
0
1
29
Jebish7
Jebish7@jebish7·
@sarahookr With No Bias whatsoever, can say Momo is the best.
English
0
0
2
504
Sara Hooker
Sara Hooker@sarahookr·
My favorite type of distribution shift.
Sara Hooker tweet media
English
15
14
245
29.7K
Jebish7
Jebish7@jebish7·
@skoularidou Congratulations. As an aspiring researcher, I don’t think 3K is trivial. It shows that researchers need your paper.
English
0
0
1
149
Maria Skoularidou (she/her)
Maria Skoularidou (she/her)@skoularidou·
I understand that this is quite trivial to most of the researchers in here but I feel like sharing that I hit >3,000 citations It is not much, but it somehow feels nice, as despite the health issues life goes on
Maria Skoularidou (she/her) tweet media
English
32
11
556
46.3K
Jebish7
Jebish7@jebish7·
@tomssilver Relocate to South Asia? You will have deadline on 5-6 PM, so a typical office time.
English
0
0
1
522
Tom Silver
Tom Silver@tomssilver·
Considering relocating my lab to Anywhere on Earth so we can go to bed at a reasonable hour after deadlines
English
12
4
304
19.1K
Jebish7
Jebish7@jebish7·
@universeinanegg Tried this with Claude and GPT, and it’s the same. Gemini 3 pro did give two 3s ( after thinking a lot). Though all of their first two numbers are 3 followed by 1 (irrespective of temperature). Models seem to love 3.
English
0
0
0
50
Ari Holtzman
Ari Holtzman@universeinanegg·
my favorite part about Google's AI is it will NEVER allow duplicates if you ask for random numbers with replacement
Ari Holtzman tweet media
English
1
0
5
865
Jebish7 retweetledi
Cheng Qian
Cheng Qian@qiancheng1231·
🔮 Can a world model (simulator) give today’s AI agents foresight? We tested “world model as a tool”… and found it often doesn’t help—sometimes it hurts. Check our newest paper here: arxiv.org/pdf/2601.03905… #AIagents #WorldModel #ToolUse
Cheng Qian tweet media
English
1
18
52
8K
Jebish7 retweetledi
David Chiang
David Chiang@davidweichiang·
@ReviewAcl pretty please extend the Jan deadline
English
0
1
9
441
Jebish7
Jebish7@jebish7·
@PingbangHu There was sudden score inflation after the leak, so this was the only realistic way to fix it. They could have rolled everything back to just before the leak, but that would’ve been unfair to people whose reviewers hadn’t responded yet.
English
1
0
1
446
Pingbang Hu 🇹🇼
Pingbang Hu 🇹🇼@PingbangHu·
Hot take on ICLR's action to revert reviews and scores: It's acceptable, even reasonable, if you believe in academic integrity. I’ve seen people describe this decision as “treating everyone as guilty.” Unfortunately, that’s true in a sense, but this is about academic integrity, which is inherently much stricter than ordinary moral standards. From that perspective, I find the decision completely reasonable, even though I personally benefited from the rebuttal process.
Pingbang Hu 🇹🇼 tweet media
English
6
2
70
16.1K
Jebish7 retweetledi
Catherine Arnett
Catherine Arnett@linguist_cat·
Very excited to see that Global PIQA is already being used to evaluate multilingual capabilities in new models!
Google DeepMind@GoogleDeepMind

Our first release is Gemini 3 Pro, which is rolling out globally starting today. It significantly outperforms 2.5 Pro across the board: 🥇 Tops LMArena and WebDev @arena leaderboards 🧠 PhD-level reasoning on Humanity’s Last Exam 📋 Leads long-horizon planning on Vending-Bench 2

English
1
7
28
1.8K
Jebish7
Jebish7@jebish7·
@Mengyue_Yang_ They previously released a notice for increased security for cheating and using third party apps. Maybe they were anticipating this.
English
0
0
1
40
Mengyue Yang
Mengyue Yang@Mengyue_Yang_·
As a world-model researcher and long-time Genshin player, this is absolutely mind-blowing to see 🤯🔥 Huge respect to the team, this is the kind of crossover I never expected (yeah but not out of my imagination). But seriously… when did Genshin’s environment become open enough for training agents? 😂 We really need that!
Weihao Tan@WeihaoTan64

🚀Introducing Lumine, a generalist AI agent trained within Genshin Impact that can perceive, reason, and act in real time, completing hours-long missions and following diverse instructions within complex 3D open-world environments.🎮 Website: lumine-ai.org 1/6

English
13
53
682
113.6K
Jebish7 retweetledi
Multilingual Representation Workshop @ EMNLP 2025
Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.
Multilingual Representation Workshop @ EMNLP 2025 tweet media
English
2
57
114
26.5K
Jebish7 retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
3 days. Worldwide. Inspiring & starting new research collaborations. Introducing the Connect conference. 🖇️ Join for incredible speakers, including @1vnzh @jpineau1 @mziizm & @ShayneRedford + >20 researchers discussing how collaboration and open science are driving progress. 🚀
Cohere Labs tweet media
English
1
17
52
17K
Jebish7 retweetledi
Ram Kadiyala
Ram Kadiyala@_1024_m·
Three of our papers have been accepted at AACL 2025 @aaclmeeting (2 Main, 1 Findings). 1. DSBC : Data Science task Benchmarking with Context engineering arxiv.org/pdf/2507.23336 2. Uncovering Cultural Representation Disparities in Vision-Language Models arxiv.org/pdf/2505.14729 3. Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance arxiv.org/pdf/2504.09753 Grateful to the co-authors @SidYaeger @Siddartha_10 @jebish7 @delliott @alexrs95 @_sumand @_srishtiyadav @KanwalMehreen2 This was made possible through research grants from @TraversaalAI @AnthropicAI @Cohere_Labs
English
1
3
8
755