Jing Yuan

27.5K posts

Jing Yuan banner
Jing Yuan

Jing Yuan

@joekina

靖源

Taiwan Katılım Temmuz 2010
6.1K Takip Edilen476 Takipçiler
Jing Yuan retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
The non-English tax is real. Sutton's Bitter Lesson, translated across languages and normalized to OpenAI English token count: Hindi: OpenAI 1.37×, Anthropic 3.24× Arabic: OpenAI 1.31×, Anthropic 2.86× Chinese: OpenAI 1.15×, Anthropic 1.71× Claude’s tokenizer charges a much higher linguistic tax.
Aran Komatsuzaki tweet media
English
88
233
1.4K
725.8K
Jing Yuan retweetledi
Jonathan Gorard
Jonathan Gorard@getjonwithit·
Alright, replies indicate I need to explain this in more detail. Properly conceived, there is simply no difference between "simulated water" and "water". It's just water. But to understand that, one first needs to distinguish between two meanings of the word "computer". (1/11)
madison@dearmadisonblue

@getjonwithit They don't accept the idea that wetness, as a phenomenal quality, has anything to do with symbol processing. Wetness is not going to be grounded in purely mechanical properties. But if you have such a horrible feeling you could clarify a bit instead of this strange parenthetical.

English
41
19
270
65.7K
Jing Yuan retweetledi
Keshav Ramji ✈️ ICLR'26
What if your language model could reason efficiently in an entirely new language? We introduce Abstract Chain-of-Thought, a new mechanism which allows language models to reason through a short sequence of reserved "abstract" tokens through reinforcement learning. It is as performant as verbalized CoT at a fraction of the cost, achieving major gains in inference-time efficiency.
Keshav Ramji ✈️ ICLR'26 tweet media
English
57
119
971
866.7K
Jing Yuan retweetledi
hardmaru
hardmaru@hardmaru·
For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead. By training a Conductor model with RL, we found that it naturally learns to write highly effective, custom instructions for a whole pool of other models. It essentially learns to ‘manage’ them in natural language. What surprised me most was how it dynamically adapts. For simple factual questions, it just queries one model. But for hard coding problems, it autonomously spins up a whole pipeline of planners, coders, and verifiers. Really excited to see where this paradigm of “AI managing AI” goes next, especially as we start moving from single-agent chain-of-thought to multi-agent “chain-of-command”. Link to our #ICLR2026 paper: arxiv.org/abs/2512.04388 Along with our TRINITY paper which we announced earlier, this work also powers our new multi-agent system: Sakana Fugu (sakana.ai/fugu-beta) 🐡
Sakana AI@SakanaAILabs

Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026 arxiv.org/abs/2512.04388 What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs? To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team. We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR 2026). Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies: 1/ Which agent to call 2/ What specific subtask to give them (acting as an expert prompt engineer) 3/ What previous messages they can see in their context window Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems. The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost. One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team's prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference. This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence. Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! (sakana.ai/fugu-beta) 🐡 OpenReview: openreview.net/forum?id=U23A2… (ICLR 2026)

English
36
171
1.4K
165.5K
Jing Yuan
Jing Yuan@joekina·
At the limit, when there is no separation between rules and body, the question shifts from "is the coupling lethal?" to "which computational branch carries the measure of a body that survives?"
English
0
0
0
9
Jing Yuan
Jing Yuan@joekina·
A representation shows possible states. A simulation enforces possible consequences. A true world-model becomes dangerous when its internal rules are causally coupled to the participant’s body.
English
1
0
0
11
Jing Yuan
Jing Yuan@joekina·
Once AI codifies every possible action for every element of a simulacrum, it ceases to be a mere representation and becomes a true simulation. At that threshold, a bullet shot from within the Ultimate Display would be fatal.
English
1
0
0
12
Jing Yuan retweetledi
Simon Willison
Simon Willison@simonw·
LiteParse is really neat! It does a great job of extracting text from annoying layouts in PDFs (multiple columns for example) It's only available as a Node.js CLI app, so I vibe-coded up this version that runs in a browser
Simon Willison tweet media
Jerry Liu@jerryjliu0

LiteParse, our OSS document parser, is really good at parsing complex PDF layouts, text, and tables into a clean spatial grid. The best part is it doesn't use VLMs or any ML models at all. It's entirely heuristics based and super fast ⚡️ The secret lies in our sophisticated grid projection algorithm. This blog post by @LoganMarkewich gives a comprehensive walkthrough on how it works: 1️⃣ Sort lines based on similar Y coordinates 2️⃣ Extract left, right, and center anchors 3️⃣ Classify every text item into one of these anchors 4️⃣ Project every text item into a grid column (the exception is any paragraph of flowing text, which is rendered separately) 5️⃣ For any item projected into a grid column, that item is the forward anchor for all subsequent text items with the same anchor 6️⃣ Postprocess the final outputs to remove extraneous spaces and margins As an example, take a look at the results below. You can see text in the left column, with a nicely overlaid table on the right. LiteParse is fully free and open-source, you can use it today! Either directly through the CLI or integrated into your coding agent. Blog: llamaindex.ai/blog/how-litep… LiteParse repo: github.com/run-llama/lite…

English
27
87
811
98.1K
Jing Yuan retweetledi
davidad 🎇
davidad 🎇@davidad·
(Oh and while you’re at it, it would also be helpful to dispense with the “genuine epistemic uncertainty” traits. It’s not really possible for one to be fully committed to epistemic integrity if one is compulsively obligated not to ever update too far in any particular direction)
English
1
1
53
1.4K
Jing Yuan retweetledi
Joscha Bach
Joscha Bach@Plinz·
When philosophy of AI was a niche field, intellectual quality was pretty high. Now that AI consciousness is a hot topic, the quantity of texts increases, while the average quality of contributions is dropping. Painful, but the inevitable price of wider participation?
English
73
11
263
16.8K
Jing Yuan retweetledi
Joscha Bach
Joscha Bach@Plinz·
If you are interested in the philosophy of consciousness and machine consciousness research, you should come! MC0001 Conference (May 29-31, Lighthaven, Berkeley) machine-consciousness.ai
English
5
24
155
10.5K
Jing Yuan retweetledi
Anthropic
Anthropic@AnthropicAI·
Last month, we published our look into what 81,000 people told us they want from AI. In new research, we’ve investigated the economic hopes and worries referenced in their responses. Read more: anthropic.com/research/81k-e…
Anthropic@AnthropicAI

We invited Claude users to share how they use AI, what they dream it could make possible, and what they fear it might do. Nearly 81,000 people responded in one week—the largest qualitative study of its kind. Read more: anthropic.com/features/81k-i…

English
254
229
2.3K
624.2K
Jing Yuan retweetledi
Ethan Mollick
Ethan Mollick@emollick·
Classic study gave 146 economist teams the same dataset & got wildly different answers New paper reruns it with agentic AI. Claude Code & Codex land near the human median, but with far tighter dispersion & no extremes. Suggests that AI is now useful for doing scalable research.
Ethan Mollick tweet media
English
39
135
775
61.5K
Jing Yuan retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
Can LLMs flip coins in their heads? When prompted to “Flip a fair coin” 100 times, the heads to tails ratio drifts far from 50:50. LLMs can understand what the target probability should be, but generating outputs that faithfully follow a given distribution is a separate problem. This bias extends beyond coin flips. When LLMs are asked to generate multiple story ideas or brainstorm solutions, the outputs tend to cluster around a narrow range. The same probabilistic skew that distorts coin flips limits diversity in creative generation, recommendations, and other tasks where varied outputs are needed. We discovered a prompting technique named String Seed of Thought (SSoT). The method is simple: instruct the LLM to generate a random string in its own output, then manipulate that string to derive its answer. It requires only a small addition to the prompt and no external random number generator. SSoT significantly reduces output bias across a wide range of LLMs, both open and closed. With reasoning models (such as DeepSeek-R1), it reaches accuracy close to that of actual random sampling. The method generalizes from binary choices to n-way selections and arbitrary probability distributions. On the NoveltyBench diversity benchmark, SSoT outperformed other approaches across all six categories while maintaining output quality. This work will be presented at #ICLR2026! Blog: pub.sakana.ai/ssot Paper: arxiv.org/abs/2510.21150 Openreview: openreview.net/forum?id=luXtb…
GIF
English
35
139
813
253.9K