Thinh

26 posts

Thinh

Thinh

@thinhphp_vt

PhD student @VT_CS, supervised by @tuvllms. Interested in search-augmented LLMs. Ex AI resident @VinAI_Research

Blacksburg, VA Inscrit le Temmuz 2023
598 Abonnements94 Abonnés
Thinh
Thinh@thinhphp_vt·
🔥Our paper "SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models" has been accepted to #ICLR 2026!! 🎉🎉🎉 Huge thanks to my supervisor @tuvllms and the other co-authors for all your hard work! See you in Brazil ✈️
Thinh tweet media
English
3
5
33
1.8K
Thinh retweeté
alex zhang
alex zhang@a1zhang·
Much like the switch in 2025 from language models to reasoning models, we think 2026 will be all about the switch to Recursive Language Models (RLMs). It turns out that models can be far more powerful if you allow them to treat *their own prompts* as an object in an external environment, which they understand and manipulate by writing code that invokes LLMs! Our full paper on RLMs is now available—with much more expansive experiments compared to our initial blogpost from October 2025! arxiv.org/pdf/2512.24601
alex zhang tweet media
English
253
1.1K
7.4K
2M
Thinh retweeté
Kimi.ai
Kimi.ai@Kimi_Moonshot·
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns. K2 Thinking is now live on kimi.com in chat mode, with full agentic mode coming soon. It is also accessible via API. 🔌 API is live: platform.moonshot.ai 🔗 Tech blog: moonshotai.github.io/Kimi-K2/thinki… 🔗 Weights & code: huggingface.co/moonshotai
Kimi.ai tweet media
English
581
1.5K
9.7K
4.8M
Thinh retweeté
Sentient
Sentient@SentientAGI·
Announcing ROMA (Recursive Open Meta Agent): our new multi-agent framework that sets SOTA in reasoning + search. Seal-0: 45.6% FRAMES: 81.7% SimpleQA: 93.9% 🧵 Read more about how recursive coordination lets agents tackle complex queries.
Sentient tweet media
English
764
598
2.4K
505K
Thinh retweeté
Rohan Paul
Rohan Paul@rohanpaul_ai·
OpenAI realesed new paper. "Why language models hallucinate" Simple ans - LLMs hallucinate because training and evaluation reward guessing instead of admitting uncertainty. The paper puts this on a statistical footing with simple, test-like incentives that reward confident wrong answers over honest “I don’t know” responses. The fix is to grade differently, give credit for appropriate uncertainty and penalize confident errors more than abstentions, so models stop being optimized for blind guessing. OpenAI is showing that 52% abstention gives substantially fewer wrong answers than 1% abstention, proving that letting a model admit uncertainty reduces hallucinations even if accuracy looks lower. Abstention means the model refuses to answer when it is unsure and simply says something like “I don’t know” instead of making up a guess. Hallucinations drop because most wrong answers come from bad guesses. If the model abstains instead of guessing, it produces fewer false answers. 🧵 Read on 👇
Rohan Paul tweet mediaRohan Paul tweet media
English
96
327
2.4K
371.5K
Thinh retweeté
Ken Liu
Ken Liu@kenziyuliu·
New paper! We explore a radical paradigm for AI evals: assessing LLMs on *unsolved* questions. Instead of contrived exams where progress ≠ value, we eval LLMs on organic, unsolved problems via reference-free LLM validation & community verification. LLMs solved ~10/500 so far:
Ken Liu tweet media
English
15
76
370
67.1K
Thinh retweeté
Thinh retweeté
basvanopheusden
basvanopheusden@basvanopheusden·
A few weeks ago, I started a new job at @OpenAI. I wrote a document about my interview process and recommendations for anyone on the job market for AI research positions. I hope it's helpful! docs.google.com/document/d/1ZV…
English
64
343
4.1K
335.7K
Thinh retweeté
Sheryl Hsu
Sheryl Hsu@SherylHsu02·
1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨‍💻👨‍💻
Sheryl Hsu tweet media
English
198
288
2.7K
2.5M
Thinh retweeté
Intelligent Internet
Intelligent Internet@ii_posts·
Most search models need the cloud. II-Search-4B doesn’t. 4B model tuned for reasoning with search tools, built for local use. Performance of models 10x its size. Search that is small, smart, and open.
English
21
106
647
499.9K
Thinh retweeté
Peter H. Diamandis, MD
Peter H. Diamandis, MD@PeterDiamandis·
. @EMostaque came back on the show to chat about: --how we can't compete against AI agents --his solution for a POSITIVE AI world --Why UBI won't work but UBAI might.. --we need to be focused on incentivizing the right outcomes -- Nations need sovereign AI stacks or they'll be left behind by the mega-models
English
24
63
244
27.5K
Thinh retweeté
Jasper Dekoninck
Jasper Dekoninck@j_dekoninck·
We just released the evaluation of LLMs on the 2025 IMO on MathArena! Gemini scores best, but is still unlikely to achieve the bronze medal with its 31% score (13/42). 🧵(1/4)
Jasper Dekoninck tweet media
English
13
39
218
37.3K
Thinh retweeté
Sukjun (June) Hwang
Sukjun (June) Hwang@sukjun_hwang·
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
GIF
GIF
English
95
737
4.7K
786.6K