Jiri

152 posts

Jiri banner
Jiri

Jiri

@JIRIGESI

Random AI person

Katılım Eylül 2015
1.6K Takip Edilen175 Takipçiler
Jiri retweetledi
Chris
Chris@chatgpt21·
I literally just watched GPT-5.5 via codex beat an Amazon customer associate in real time. 💀 I asked it to get me a refund, and I watched it navigate the settings, cancel the subscription, then it went step further into the help page. I thought it was going to request a phone call (which would prompt me to take over) Instead, it opened: “Chat with an associate now.” That’s when I sat up on my couch because I knew it was going to get real The agent said: “Your subscription is active.” And GPT-5.5 immediately explained that it only shows as active because cancellation leaves access through the billing period, but that I wanted it stopped now and refunded. And my jaw just hung open, it was the first time I watched sand handle a customer service agent for me in real time Once the agent confirmed the refund, it just ended the chat no mercy no thank you LMAO First time I’ve watched a human customer service agent get outmaneuvered by AI in real time. And it made me 15$! almost paid for itself in 5 minutes
English
192
282
5.4K
1.4M
Shunyu Yao
Shunyu Yao@ShunyuYao12·
Our goal is to build practical models with comprehensive capabilities beyond open benchmarks. And the only way to do it to co-design with diverse products while scaling solidly. Tencent has the best product ecosystem and a solid, low-ego culture, and we are just getting started!
Tencent Hy@TencentHunyuan

👋Hi /haɪ/, we're the Tencent Hy /haɪ/ team🐧 Today, we open source Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency. Give us feedback to help improve Hy3 official! 🤗 hf.co/tencent/Hy3-pr… 📖 hy.tencent.com/hy3-preview

English
50
151
1.9K
864.5K
Jiri retweetledi
Eigen AI
Eigen AI@Eigen_AI_Labs·
The future of AI is open -- but it also needs to be fast, efficient, reliable, and production-ready. Excited to partner with @NebiusAI to bring optimized frontier open models to Token Factory. 🚀 Together, we’re helping developers and enterprises run leading open-source models in production with greater speed, reliability, and scale by combining Eigen AI’s deep inference optimization with Nebius’s production-grade infrastructure. ⚡ Read more below. 🤝 nebius.com/blog/posts/neb… #AI #OpenSourceAI #Inference #LLM #GenAI #AIInfrastructure #Nebius #MLSys #EigenAI
Nebius Token Factory@nebiustf

Open models are improving fast. Running them efficiently in production is still hard. @nebiustf × @Eigen_AI_Labs are partnering to bring optimized frontier open models to Token Factory. DeepSeek, GPT-OSS, Kimi, Qwen, Llama, GLM and more, optimized for speed and efficiency at scale. High-performance open model inference without building the optimization stack yourself. Read more: nebius.com/blog/posts/neb…

English
1
9
25
2.7K
Ryan Hanrui Wang
Ryan Hanrui Wang@hanrui_w·
Super proud of what our team has built! Seeing our team featured on Jensen Huang’s GTC keynote slide as the #1 speed inference provider is a deeply meaningful moment for us. This recognition reflects the technical depth, intensity, and persistence of a team that has been relentlessly focused on performance. We are just getting started. 🚀
Eigen AI@Eigen_AI_Labs

We are incredibly excited to share that Eigen AI was recognized as #1 𝐬𝐩𝐞𝐞𝐝 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐩𝐫𝐨𝐯𝐢𝐝𝐞𝐫 on Jensen Huang's keynote slide at NVIDIA GTC 2026. 🚀 This is a surreal and deeply meaningful moment for our team. ❤️ From day one, we set out to build world-class AI infrastructure with a focus on extreme performance, efficiency, and real-world deployment. To see Eigen AI recognized on one of the biggest stages in AI is an incredible honor, and a testament to the hard work, technical depth, and persistence of our entire team. Beyond Kimi K2.5, Eigen AI is also currently ranked #1 on another 25 𝐦𝐨𝐝𝐞𝐥𝐬 on Artificial Analysis, reflecting the breadth and consistency of our inference optimization across leading open-source models. ⚡ We are proud to be pushing the frontier of fast, scalable inference for leading open-source models, and even more excited about what comes next. 🌍 Huge thank you to everyone who has supported us on this journey. We are just getting started. Find more at eigenai.com #GTC #NVIDIA #AI #Inference #GenAI #LLM #AIInfrastructure #EigenAI #Infrastructure #keynote #GTC2026

English
7
5
31
3.2K
Jiri retweetledi
Eigen AI
Eigen AI@Eigen_AI_Labs·
(1/7)🚀 Eigen AI inference milestone. We’ve reached industry-leading speed on Artificial Analysis across 11 major models— including DeepSeek-V3, Qwen3, Qwen3-VL, and Llama 4. This wasn’t achieved via per-model hacks, but by building a production-grade inference stack that scales across architectures and workloads. #artificial_intelligence #DeepLearning #LLMs
Eigen AI tweet mediaEigen AI tweet media
English
2
4
17
3.5K
Jiri
Jiri@JIRIGESI·
@chijinML Awesome! looking forward what you will cook there
English
0
0
1
438
Chi Jin
Chi Jin@chijinML·
Life update🙂: I’m on sabbatical from Princeton and have started at OpenAI, working on building AGI. Happy to be back in the Bay Area after 6 years! Bay Area friends—DMs open for food & hikes.
Chi Jin tweet mediaChi Jin tweet mediaChi Jin tweet media
English
38
11
662
64.7K
Jiri retweetledi
Zhaopeng Tu
Zhaopeng Tu@tuzhaopeng·
Are RL agents truly learning to reason, or just finding lucky shortcuts? 🤔 Introducing RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards — a novel framework that rewards not just outcomes, but the quality of reasoning itself, creating more robust and generalizable agents. 1️⃣ We identify "inefficient exploration" in standard RL: agents achieve success through flawed reasoning paths (e.g., repetitive actions, illogical steps), leading to brittle policies that fail on new tasks. 2️⃣ RLVMR provides dense, process-level rewards for verifiable meta-reasoning behaviors: • 🎯 Planning: Reward strategic thinking • 🔍 Exploration: Reward discovering new states • 💭 Reflection: Reward error correction 3️⃣ Results on ALFWorld & ScienceWorld: • 🏆 New SOTA: 83.6% success on hardest unseen tasks (7B model) • 📉 Significant reduction in repetitive actions • 🚀 Enhanced generalization to novel scenarios 🧑‍💻 Code: github.com/Tencent/Digita… 📃 Paper: arxiv.org/abs/2507.22844
Zhaopeng Tu tweet media
English
8
87
420
58.8K
Jiri retweetledi
Giosue Migliorini
Giosue Migliorini@joh_sweh·
@JIRIGESI Hi Jiri, I am a fourth year PhD candidate at UCI interested in probabilistic modeling, RL, and multi modal generative models. I’d love to grab a coffee if you are available!
English
1
0
1
96
Jiri
Jiri@JIRIGESI·
I’ll be at NeurIPS, if you’re interested in a 2026 PhD research internship with Amazon Store Foundation AI and want to work on agents, RL, and multi-modal, I’d love to connect at the conference.
English
3
1
17
1.6K
Jiri retweetledi
Dakuo Wang
Dakuo Wang@dakuowang·
Our team will be giving a demo on “LLM Agent as Digital Twins of Online Shopping Customers” at #NeurIPS2025 . This is a collaboration between @amazon and @Northeastern human-centered AI lab. We are actively hiring PhD, postdoc, interns. Stop by the Amazon booth tomorrow Wednesday at 11:30am to 12:00pm. @JIRIGESI @leoleoasd Jing Huang, Jin Lai
English
2
4
15
1.9K
Jiri
Jiri@JIRIGESI·
@ShipengLiu5 Thanks for your interest, would be great if we can talk there
English
0
0
0
59
Shipeng Liu
Shipeng Liu@ShipengLiu5·
@JIRIGESI Hi Jiri, I’m a Ph.D. student at USC working on agents and human–AI interaction. I’m very excited to learn more about the internship opportunities with your team and would love to connect. I also sent you a LinkedIn connection request 😊 and hope to stay in touch.
English
1
0
1
110
Lingming Zhang
Lingming Zhang@LingmingZhang·
🤯🤯🤯 Gemini 3 Pro + Live-SWE-agent hits 77.4% on SWE-bench Verified, beating ALL existing models, including Claude 4.5!! 🤖 Live-SWE-agent is the first live software agent that autonomously self-evolves on the fly — and it even outperforms the manually engineered scaffold used by the Gemini 3 Pro team (76.2%)
Lingming Zhang tweet media
English
32
69
474
113.4K
Jiri
Jiri@JIRIGESI·
@Yong18850571 @PrincetonPLI @thinkymachines Congrats, Yong! It was truly inspiring to work with you. I learned so much from how you aim high, move fast, and stay focused on what’s truly impactful. Excited to see what you build at TML!
English
0
0
1
299
Yong Lin
Yong Lin@Yong18850571·
[Life update] I’ve officially left @PrincetonPLI and joined Thinking Machines Lab @thinkymachines . It feels like the right time to look back on my journey at Princeton — one and a half years that were truly transformative. During this period, I made many friends, learned tremendously, and co-founded and co-led the Goedel Project. It was one of the most rewarding experiences of my life: a small, close-knit team of about ten people working with a clear purpose, moving fast, and ultimately building something impactful. In mid-July, we released Goedel-Prover-V2 (32B), a model that significantly outperformed the previous state-of-the-art DeepSeek-Prover-V2-671B on formal mathematical reasoning, using nearly 20× fewer parameters and dramatically less compute. Even now, four months after release, it still sits at the top of the open-source leaderboard. What makes this achievement especially meaningful is that we accomplished it entirely with academic resources. Competing against large industrial labs and still coming out ahead felt almost unreal. Seeing so many research teams now building on top of Goedel-Prover-V2 is deeply gratifying — it’s proof that open, academic AI can still make a real impact. Equally fulfilling was the journey itself. Unlike industrial teams with access to large-scale, off-the-shelf RL infrastructure, we — a group of students and researchers from academia with zero prior experience in massive model training — had to build almost everything from scratch. We learned quickly, identified problems as they emerged, and fixed them with remarkable speed. Designing, scaling, and successfully training a 32B-parameter model within just three months remains one of the things I’m most proud of. The Goedel Project began in October 2024. At that time, we had no serious experience training models that could compete with the best labs. DeepSeek-Prover-V1 and V1.5 looked unbeatable — they had started a year earlier and already set an incredibly high bar. We experimented with many ideas — agentic pipelines, divide-and-conquer methods — most of which turned out to be too costly or impractical given our limited resources. Eventually, we discovered a simple yet powerful iterative-training approach that allowed us to scale efficiently within our compute limits. Bit by bit, we caught up with DeepSeek-Prover-V1.5 — and then surpassed it. Princeton winters are brutally cold. It was the first time I’d ever seen snow last for weeks. I spent the entire winter break at home, running experiments, analyzing results, and adjusting training methods and data again and again. That persistence paid off: in February, we released Goedel-Prover-V1-7B, which captured the top spot on the leaderboard. It was our first major milestone — proof that an academic team could compete with frontier models. Our celebration was short-lived. In April, Kimi-Prover-72B and DeepSeek-Prover-V2-671B both arrived — and completely outperformed us. It was a tough moment. We couldn’t even host DeepSeek-Prover-V2-671B for inference locally; communication errors kept crashing our limited infrastructure. None of us had experience deploying or training models of that scale. Still, we decided to aim higher — to beat them in the next version. We began by identifying the bottlenecks in DeepSeek and Kimi’s provers, exploring every possible angle for improvement. We experimented with compiler-based feedback loops, curriculum data synthesis, self-improvement strategies, model distillation, and model merging to improve diversity during RL. But the most critical insight was about efficiency — optimizing how we allocated limited resources across training design, data generation, and scaling. Every GPU hour had to count. After two months of exploration and countless small-scale tests, we finally established a systematic framework for the next release: Goedel-Prover-V2. I led "The Big Run" — a nearly month-long sequence of two self-improvement fine-tuning cycles followed by one large-scale RL round. We completed training just a few days before our scheduled release, leaving barely enough time for evaluation. Those last nights were intense — running tests, fixing scripts, collecting metrics — but everything came together perfectly. When we saw the final results, we could hardly believe them: Goedel-Prover-V2 solved twice as many problems on PutnamBench as DeepSeek-Prover-V2-671B. Many people have since asked what the “key” was — how an academic team managed to outperform frontier labs using a fraction of their resources. There isn’t a single magic trick, but rather a combination of principles that guided us: * build solid infrastructure early * focus on real bottlenecks instead of chasing novelty * investigate broadly with small-scale experiments * fix problems in real time * optimize resource allocation carefully * execute the final big run with precision. Each of these steps sounds simple, but together they made all the difference. Now, at Thinking Machines Lab, I’m shifting focus beyond formal reasoning toward building general-purpose models. I’m deeply inspired by TML’s mission — developing interactive AI systems and advancing open science. I’m thrilled to begin this new chapter and look forward to sharing more in the future.
English
18
18
646
78.3K
Samu
Samu@SamuelNellessen·
@_lewtun thanks a lot!!!
English
1
0
1
98
Lewis Tunstall
Lewis Tunstall@_lewtun·
When you unintentionally mirror your memes
Lewis Tunstall tweet media
English
9
13
338
75.9K
Yiling Lou
Yiling Lou@yiling__LOU·
Thrilled to announce that I'll be joining UIUC CS @siebelschool as an Assistant Professor in Spring 2026! 📢 I’m looking for Fall '26 PhD students who are interested in the intersection of Software Engineering and AI, especially in LLM4Code and Code Agents. Please drop me an email if you are interested in working with me.
English
44
68
701
79.2K
Jiri retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
Hah judging by mentions overnight people seem to find the ghost analogy provocative. I swear I don't wake up just trying to come with new memes but to elaborate briefly why I thought it was a fun comparison: 1) It captures the idea that LLMs are purely digital artifacts that don't interact with the physical world (unlike animals, which are very embodied). 2) Ghosts are a kind of "echo" of the living, in this case a statistical distillation of humanity. 3) There is an air of mystery over both ghosts and LLMs, as in we don't fully understand what they are or how they work. 4) The process of training LLMs is a bit like summoning a ghost, i.e. a kind of elaborate computational ritual on a summoning platform of an exotic megastructure (GPU cluster). I've heard earlier references of LLM training as that of "summoning a demon" and it never sounded right because it implies and presupposes evil. Ghosts are a lot more neural entity just like LLMs, and may or may not be evil. For example, one of my favorite cartoons when I was a child was Casper the Friendly Ghost, clearly a friendly and wholesome entity. Same in Harry Potter, e.g. Nearly Headless Nick and such. 5) It is a nod to an earlier reference "ghost in the machine", in the context of Decartes' mind-body dualism, and of course later derived references, "Ghost in the shell" etc. As in the mind (ghost) that animates a body (machine). Probably a few other things in the embedding space. Among the ways the analogy isn't great is that while ghosts may or may not be evil, they are almost always spooky, which feels too unfair. But anyway, I like that while no analogy is perfect, they let you pull in structure laterally from one domain to another as as a way of generating entropy and reaching unique thoughts.
Andrej Karpathy tweet media
English
88
78
1K
263.2K
Chi Jin
Chi Jin@chijinML·
Excited to share that I’ve been promoted to Associate Professor with tenure at Princeton!🎉 6 years may not be long, but AI research has evolved significantly during this period. Grateful to all my students, collaborators, colleagues for being with me on this remarkable journey!
Chi Jin tweet media
English
149
60
2.7K
114K
Jiri retweetledi
Bohan Lyu
Bohan Lyu@Lyubh22·
Building upon Goedel-Prover-V2, Hilbert Prover achieved 99.2% on Minif2f and solved over 70% PutnamBench problems😱 Amazing news from my old home @yuqirose's lab. At ICML this year, someone asked why the model struggled with Putnam problems. I said it was a matter of time, and now here we are! I still vividly remember explaining our V2 work to Sumanth over spaghetti and meatballs the day after the blog post went live. What a journey. Congrats! Paper: arxiv.org/abs/2509.22819
Bohan Lyu tweet mediaBohan Lyu tweet media
English
0
4
17
2.1K