Jiri

152 posts

Jiri

@JIRIGESI

Random AI person

Katılım Eylül 2015

1.6K Takip Edilen175 Takipçiler

Jiri retweetledi

Chris@chatgpt21·6d

I literally just watched GPT-5.5 via codex beat an Amazon customer associate in real time. 💀 I asked it to get me a refund, and I watched it navigate the settings, cancel the subscription, then it went step further into the help page. I thought it was going to request a phone call (which would prompt me to take over) Instead, it opened: “Chat with an associate now.” That’s when I sat up on my couch because I knew it was going to get real The agent said: “Your subscription is active.” And GPT-5.5 immediately explained that it only shows as active because cancellation leaves access through the billing period, but that I wanted it stopped now and refunded. And my jaw just hung open, it was the first time I watched sand handle a customer service agent for me in real time Once the agent confirmed the refund, it just ended the chat no mercy no thank you LMAO First time I’ve watched a human customer service agent get outmaneuvered by AI in real time. And it made me 15$! almost paid for itself in 5 minutes

English

192

282

5.4K

1.4M

Jiri@JIRIGESI·23 Nis

@ShunyuYao12 Congratulations!

English

694

Shunyu Yao@ShunyuYao12·23 Nis

Our goal is to build practical models with comprehensive capabilities beyond open benchmarks. And the only way to do it to co-design with diverse products while scaling solidly. Tencent has the best product ecosystem and a solid, low-ego culture, and we are just getting started!

Tencent Hy@TencentHunyuan

👋Hi /haɪ/, we're the Tencent Hy /haɪ/ team🐧 Today, we open source Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency. Give us feedback to help improve Hy3 official! 🤗 hf.co/tencent/Hy3-pr… 📖 hy.tencent.com/hy3-preview

English

151

1.9K

864.5K

Jiri retweetledi

Eigen AI@Eigen_AI_Labs·17 Mar

The future of AI is open -- but it also needs to be fast, efficient, reliable, and production-ready. Excited to partner with @NebiusAI to bring optimized frontier open models to Token Factory. 🚀 Together, we’re helping developers and enterprises run leading open-source models in production with greater speed, reliability, and scale by combining Eigen AI’s deep inference optimization with Nebius’s production-grade infrastructure. ⚡ Read more below. 🤝 nebius.com/blog/posts/neb… #AI #OpenSourceAI #Inference #LLM #GenAI #AIInfrastructure #Nebius #MLSys #EigenAI

Nebius Token Factory@nebiustf

Open models are improving fast. Running them efficiently in production is still hard. @nebiustf × @Eigen_AI_Labs are partnering to bring optimized frontier open models to Token Factory. DeepSeek, GPT-OSS, Kimi, Qwen, Llama, GLM and more, optimized for speed and efficiency at scale. High-performance open model inference without building the optimization stack yourself. Read more: nebius.com/blog/posts/neb…

English

2.7K

Jiri@JIRIGESI·17 Mar

@hanrui_w @samir_khaki Huge congratulations!

English

Ryan Hanrui Wang@hanrui_w·17 Mar

Super proud of what our team has built! Seeing our team featured on Jensen Huang’s GTC keynote slide as the #1 speed inference provider is a deeply meaningful moment for us. This recognition reflects the technical depth, intensity, and persistence of a team that has been relentlessly focused on performance. We are just getting started. 🚀

Eigen AI@Eigen_AI_Labs

We are incredibly excited to share that Eigen AI was recognized as #1 𝐬𝐩𝐞𝐞𝐝 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐩𝐫𝐨𝐯𝐢𝐝𝐞𝐫 on Jensen Huang's keynote slide at NVIDIA GTC 2026. 🚀 This is a surreal and deeply meaningful moment for our team. ❤️ From day one, we set out to build world-class AI infrastructure with a focus on extreme performance, efficiency, and real-world deployment. To see Eigen AI recognized on one of the biggest stages in AI is an incredible honor, and a testament to the hard work, technical depth, and persistence of our entire team. Beyond Kimi K2.5, Eigen AI is also currently ranked #1 on another 25 𝐦𝐨𝐝𝐞𝐥𝐬 on Artificial Analysis, reflecting the breadth and consistency of our inference optimization across leading open-source models. ⚡ We are proud to be pushing the frontier of fast, scalable inference for leading open-source models, and even more excited about what comes next. 🌍 Huge thank you to everyone who has supported us on this journey. We are just getting started. Find more at eigenai.com #GTC #NVIDIA #AI #Inference #GenAI #LLM #AIInfrastructure #EigenAI #Infrastructure #keynote #GTC2026

English

3.2K

Jiri retweetledi

Eigen AI@Eigen_AI_Labs·6 Şub

(1/7)🚀 Eigen AI inference milestone. We’ve reached industry-leading speed on Artificial Analysis across 11 major models— including DeepSeek-V3, Qwen3, Qwen3-VL, and Llama 4. This wasn’t achieved via per-model hacks, but by building a production-grade inference stack that scales across architectures and workloads. #artificial_intelligence #DeepLearning #LLMs

English

3.5K

Jiri@JIRIGESI·28 Oca

@chijinML Awesome! looking forward what you will cook there

English

438

Chi Jin@chijinML·28 Oca

Life update🙂: I’m on sabbatical from Princeton and have started at OpenAI, working on building AGI. Happy to be back in the Bay Area after 6 years! Bay Area friends—DMs open for food & hikes.

English

662

64.7K

Jiri retweetledi

Zhaopeng Tu@tuzhaopeng·31 Tem

Are RL agents truly learning to reason, or just finding lucky shortcuts? 🤔 Introducing RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards — a novel framework that rewards not just outcomes, but the quality of reasoning itself, creating more robust and generalizable agents. 1️⃣ We identify "inefficient exploration" in standard RL: agents achieve success through flawed reasoning paths (e.g., repetitive actions, illogical steps), leading to brittle policies that fail on new tasks. 2️⃣ RLVMR provides dense, process-level rewards for verifiable meta-reasoning behaviors: • 🎯 Planning: Reward strategic thinking • 🔍 Exploration: Reward discovering new states • 💭 Reflection: Reward error correction 3️⃣ Results on ALFWorld & ScienceWorld: • 🏆 New SOTA: 83.6% success on hardest unseen tasks (7B model) • 📉 Significant reduction in repetitive actions • 🚀 Enhanced generalization to novel scenarios 🧑‍💻 Code: github.com/Tencent/Digita… 📃 Paper: arxiv.org/abs/2507.22844

English

420

58.8K

Jiri retweetledi

Beidi Chen@BeidiChen·23 Oca

We’re recruiting postdocs this year! Help us spread the world 🙏

Infini-AI-Lab@InfiniAILab

🚀 InfiniAI Lab @ CMU is hiring Postdocs! We are looking for outstanding postdoctoral researchers in ML systems and security to join InfiniAI Lab at Carnegie Mellon University. Research directions include (but are not limited to): 🤖 AI Agents & RL 🔐 Machine Learning Security 🎥 Video Models 🏗️ AI Systems & Architecture Design We especially encourage candidates interested in applying for the CMU–Bosch Institute (CBI) Postdoctoral Fellowship, which provides strong support for independent, high-impact research: 👉 carnegiebosch.cmu.edu/fellowships/in… 🗓️ CBI application deadline: January 30, 2026 How to apply: Please fill out the form and send us an email via 👉 infini-ai-lab.cmu.edu/vacancies

English

148

26.8K

Jiri@JIRIGESI·3 Ara

@joh_sweh Sure! DM you

English

Giosue Migliorini@joh_sweh·3 Ara

@JIRIGESI Hi Jiri, I am a fourth year PhD candidate at UCI interested in probabilistic modeling, RL, and multi modal generative models. I’d love to grab a coffee if you are available!

English

Jiri@JIRIGESI·27 Kas

I’ll be at NeurIPS, if you’re interested in a 2026 PhD research internship with Amazon Store Foundation AI and want to work on agents, RL, and multi-modal, I’d love to connect at the conference.

English

1.6K

Jiri retweetledi

Dakuo Wang@dakuowang·3 Ara

Our team will be giving a demo on “LLM Agent as Digital Twins of Online Shopping Customers” at #NeurIPS2025 . This is a collaboration between @amazon and @Northeastern human-centered AI lab. We are actively hiring PhD, postdoc, interns. Stop by the Amazon booth tomorrow Wednesday at 11:30am to 12:00pm. @JIRIGESI @leoleoasd Jing Huang, Jin Lai

English

1.9K

Jiri@JIRIGESI·27 Kas

@ShipengLiu5 Thanks for your interest, would be great if we can talk there

English

Shipeng Liu@ShipengLiu5·27 Kas

@JIRIGESI Hi Jiri, I’m a Ph.D. student at USC working on agents and human–AI interaction. I’m very excited to learn more about the internship opportunities with your team and would love to connect. I also sent you a LinkedIn connection request 😊 and hope to stay in touch.

English

110

Jiri@JIRIGESI·23 Kas

@LingmingZhang Wow, congratulations! Awesome work!

English

Lingming Zhang@LingmingZhang·21 Kas

🤯🤯🤯 Gemini 3 Pro + Live-SWE-agent hits 77.4% on SWE-bench Verified, beating ALL existing models, including Claude 4.5!! 🤖 Live-SWE-agent is the first live software agent that autonomously self-evolves on the fly — and it even outperforms the manually engineered scaffold used by the Gemini 3 Pro team (76.2%)

English

474

113.4K

Jiri@JIRIGESI·14 Kas

@Yong18850571 @PrincetonPLI @thinkymachines Congrats, Yong! It was truly inspiring to work with you. I learned so much from how you aim high, move fast, and stay focused on what’s truly impactful. Excited to see what you build at TML!

English

299

Yong Lin@Yong18850571·14 Kas

[Life update] I’ve officially left @PrincetonPLI and joined Thinking Machines Lab @thinkymachines . It feels like the right time to look back on my journey at Princeton — one and a half years that were truly transformative. During this period, I made many friends, learned tremendously, and co-founded and co-led the Goedel Project. It was one of the most rewarding experiences of my life: a small, close-knit team of about ten people working with a clear purpose, moving fast, and ultimately building something impactful. In mid-July, we released Goedel-Prover-V2 (32B), a model that significantly outperformed the previous state-of-the-art DeepSeek-Prover-V2-671B on formal mathematical reasoning, using nearly 20× fewer parameters and dramatically less compute. Even now, four months after release, it still sits at the top of the open-source leaderboard. What makes this achievement especially meaningful is that we accomplished it entirely with academic resources. Competing against large industrial labs and still coming out ahead felt almost unreal. Seeing so many research teams now building on top of Goedel-Prover-V2 is deeply gratifying — it’s proof that open, academic AI can still make a real impact. Equally fulfilling was the journey itself. Unlike industrial teams with access to large-scale, off-the-shelf RL infrastructure, we — a group of students and researchers from academia with zero prior experience in massive model training — had to build almost everything from scratch. We learned quickly, identified problems as they emerged, and fixed them with remarkable speed. Designing, scaling, and successfully training a 32B-parameter model within just three months remains one of the things I’m most proud of. The Goedel Project began in October 2024. At that time, we had no serious experience training models that could compete with the best labs. DeepSeek-Prover-V1 and V1.5 looked unbeatable — they had started a year earlier and already set an incredibly high bar. We experimented with many ideas — agentic pipelines, divide-and-conquer methods — most of which turned out to be too costly or impractical given our limited resources. Eventually, we discovered a simple yet powerful iterative-training approach that allowed us to scale efficiently within our compute limits. Bit by bit, we caught up with DeepSeek-Prover-V1.5 — and then surpassed it. Princeton winters are brutally cold. It was the first time I’d ever seen snow last for weeks. I spent the entire winter break at home, running experiments, analyzing results, and adjusting training methods and data again and again. That persistence paid off: in February, we released Goedel-Prover-V1-7B, which captured the top spot on the leaderboard. It was our first major milestone — proof that an academic team could compete with frontier models. Our celebration was short-lived. In April, Kimi-Prover-72B and DeepSeek-Prover-V2-671B both arrived — and completely outperformed us. It was a tough moment. We couldn’t even host DeepSeek-Prover-V2-671B for inference locally; communication errors kept crashing our limited infrastructure. None of us had experience deploying or training models of that scale. Still, we decided to aim higher — to beat them in the next version. We began by identifying the bottlenecks in DeepSeek and Kimi’s provers, exploring every possible angle for improvement. We experimented with compiler-based feedback loops, curriculum data synthesis, self-improvement strategies, model distillation, and model merging to improve diversity during RL. But the most critical insight was about efficiency — optimizing how we allocated limited resources across training design, data generation, and scaling. Every GPU hour had to count. After two months of exploration and countless small-scale tests, we finally established a systematic framework for the next release: Goedel-Prover-V2. I led "The Big Run" — a nearly month-long sequence of two self-improvement fine-tuning cycles followed by one large-scale RL round. We completed training just a few days before our scheduled release, leaving barely enough time for evaluation. Those last nights were intense — running tests, fixing scripts, collecting metrics — but everything came together perfectly. When we saw the final results, we could hardly believe them: Goedel-Prover-V2 solved twice as many problems on PutnamBench as DeepSeek-Prover-V2-671B. Many people have since asked what the “key” was — how an academic team managed to outperform frontier labs using a fraction of their resources. There isn’t a single magic trick, but rather a combination of principles that guided us: * build solid infrastructure early * focus on real bottlenecks instead of chasing novelty * investigate broadly with small-scale experiments * fix problems in real time * optimize resource allocation carefully * execute the final big run with precision. Each of these steps sounds simple, but together they made all the difference. Now, at Thinking Machines Lab, I’m shifting focus beyond formal reasoning toward building general-purpose models. I’m deeply inspired by TML’s mission — developing interactive AI systems and advancing open science. I’m thrilled to begin this new chapter and look forward to sharing more in the future.

English

646

78.3K

Jiri@JIRIGESI·6 Kas

@Yong18850571 Yep…

Yong Lin@Yong18850571·5 Kas

Like I previously said, you should wait for at least 3 months to draw meaningful conclusion on a trading strategy.

yifei e/λ (meetmeinshibuya april 26)@yifever

congrats to llama 3 large for winning the LLM trading contest by not participating

English

Jiri@JIRIGESI·12 Eki

@SamuelNellessen @_lewtun Thanks for the great insight!

English

Samu@SamuelNellessen·11 Eki

@_lewtun thanks a lot!!!

English

Lewis Tunstall@_lewtun·10 Eki

When you unintentionally mirror your memes

English

338

75.9K

Jiri@JIRIGESI·6 Eki

@yiling__LOU @siebelschool congratulations!

English

796

Yiling Lou@yiling__LOU·6 Eki

Thrilled to announce that I'll be joining UIUC CS @siebelschool as an Assistant Professor in Spring 2026! 📢 I’m looking for Fall '26 PhD students who are interested in the intersection of Software Engineering and AI, especially in LLM4Code and Code Agents. Please drop me an email if you are interested in working with me.

English

701

79.2K

Jiri retweetledi

Andrej Karpathy@karpathy·2 Eki

Hah judging by mentions overnight people seem to find the ghost analogy provocative. I swear I don't wake up just trying to come with new memes but to elaborate briefly why I thought it was a fun comparison: 1) It captures the idea that LLMs are purely digital artifacts that don't interact with the physical world (unlike animals, which are very embodied). 2) Ghosts are a kind of "echo" of the living, in this case a statistical distillation of humanity. 3) There is an air of mystery over both ghosts and LLMs, as in we don't fully understand what they are or how they work. 4) The process of training LLMs is a bit like summoning a ghost, i.e. a kind of elaborate computational ritual on a summoning platform of an exotic megastructure (GPU cluster). I've heard earlier references of LLM training as that of "summoning a demon" and it never sounded right because it implies and presupposes evil. Ghosts are a lot more neural entity just like LLMs, and may or may not be evil. For example, one of my favorite cartoons when I was a child was Casper the Friendly Ghost, clearly a friendly and wholesome entity. Same in Harry Potter, e.g. Nearly Headless Nick and such. 5) It is a nod to an earlier reference "ghost in the machine", in the context of Decartes' mind-body dualism, and of course later derived references, "Ghost in the shell" etc. As in the mind (ghost) that animates a body (machine). Probably a few other things in the embedding space. Among the ways the analogy isn't great is that while ghosts may or may not be evil, they are almost always spooky, which feels too unfair. But anyway, I like that while no analogy is perfect, they let you pull in structure laterally from one domain to another as as a way of generating entropy and reaching unique thoughts.

English

263.2K

Jiri@JIRIGESI·2 Eki

@chijinML Congratulations!

English

144

Chi Jin@chijinML·2 Eki

Excited to share that I’ve been promoted to Associate Professor with tenure at Princeton!🎉 6 years may not be long, but AI research has evolved significantly during this period. Grateful to all my students, collaborators, colleagues for being with me on this remarkable journey!

English

149

2.7K

114K

Jiri retweetledi

Bohan Lyu@Lyubh22·30 Eyl

Building upon Goedel-Prover-V2, Hilbert Prover achieved 99.2% on Minif2f and solved over 70% PutnamBench problems😱 Amazing news from my old home @yuqirose's lab. At ICML this year, someone asked why the model struggled with Putnam problems. I said it was a matter of time, and now here we are! I still vividly remember explaining our V2 work to Sumanth over spaghetti and meatballs the day after the blog post went live. What a journey. Congrats! Paper: arxiv.org/abs/2509.22819

English

2.1K

Keşfet

@ShunyuYao12 @NebiusAI @hanrui_w @samir_khaki @chijinML @joh_sweh @amazon @Northeastern