路傍の肉@AI
171 posts


@Tazerface16 やはりLLMだけでAGIは難しい。
合理性の果てに発明があるならAGIで良いが、発想の転換や人類初の新たな課題に直面した時、今のLLMでは超えられないいくつもの壁があると思う
日本語

@Diandianhsutw 世界モデルを使わず、LLMだけでも社会に大きなインパクトは与えられるよね
だけど推論と確率の総当たりで押し流すだけだとARC-AGI-3みたいな問題は対処できない
組織や仕事、子供との遊びの“世界のルール“を理解して対処するには世界モデルの学習を組み込む必要があるって思っちゃうよ
日本語

@robouniku Rather than waiting for a bigger world model or a smarter human-like intelligence to appear, it may make more sense to build the right workflow now and create the right sandboxed environment for the model we already have.
English

What interests me about ARC-AGI-3 is not just the human–AI score gap, but the kind of intelligence gap it reveals.
This benchmark is not only testing what a model already knows. It is testing whether an agent can learn rules, build an internal model, track changing state, use resources under constraints, and plan the next move.
That is why I don’t think brute-force scaling alone is the real answer. If humans solve these tasks by first understanding the rule structure, then an important direction may be to formalize that reasoning process for agents instead of only throwing more computation at the problem.
The benchmark is hard, but the harder and more interesting question is: what kind of architecture would actually close this gap?
ARC Prize@arcprize
Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
English

@Diandianhsutw 翻訳だけど言いたい事はだいたい理解できたと思うけど、どうだろう。
ARC-AGI-3攻略には、今のLLMの構造に学習とリトライの繰り返しが必要だよね。
そこは同意。
僕はエージェント単独でARC-AGI-3をクリアするには、LLMの中で世界モデルの学習とリトライを繰り返す仕組みがあれば良いと思ってる
日本語

At the level of a single raw language model, I partly agree with you, it often behaves like a prediction machine.
But that does not mean its performance cannot be improved through structure and engineering. I think you may be underestimating the upper bound of the system once the model is placed inside a proper agent framework.
ARC-AGI-3 is not only testing the raw performance of a single model. It is also testing what engineers build around it…memory, exploration, decision structure, and workflow. In that sense, the engineer is a bit like a Pokémon trainer adding the right skills and support system around the model.
So things like scanning the map, tracking resources, making decisions, and using a simple memory system are absolutely achievable from an engineering perspective. The key question is not whether a bare LLM can do everything alone, but what kind of structure lets the agent learn and act efficiently inside the environment.
English

@Diandianhsutw 追加で言うと、LLMだけで大発明の連鎖は発生しないと考える。
LLMは確率をベースとした推論なので、思考の飛びや発想の飛躍は出来ないと思う。
また、ワールドモデルなしにLLMが物理世界を学ぶのは、魚に木登りさせる様な試みに見える。
日本語

@Diandianhsutw コメントしてる人いるけど、これは完全にワールドモデルが必要。
LLMにワールドモデルを組み込む事はできるのか?まるで物理世界とPC世界の対応能力差みたいだよ
日本語

@kanon_sfire 3000万なんて数年でなくなるし、本当に何もできねーじゃん。
1億狙うための種銭でしょ。
家族との時間の方が何倍も価値あるよ。
子供持ったらわかると思うけども。
日本語

The biggest difference between OAI’s flagship model and Anthropic’s is agency. Opus just GOES. It thinks in terms of action and executes without asking. 5.4 is all talk, no action. It will think up an amazing plan and write it all down, but is super lazy about wanting to execute it.
If Spud is going to rival Mythos, it’s going to have to rival Opus first. They haven’t done that yet, period.
English

@maxintechnology @robertotcestari @haider1 漏洩したメール、読みたいな。詳しくないんだ。
本当にMythos級のSpudがリリースされたら、世界は混乱するんじゃないかな。
混乱しなかったなら、SpudはMythosには及ばないと言うことになるのかな。
日本語

@robertotcestari @haider1 Read the leaked emails. Somebody with integrity doesn't operate like that. Someone who truly wants to generate positive impact doesn't pretend to be non-profit to raise funds, saying he doesn't care about money, and then buy supercars.
English

現在のLLMには世界モデルが取り込まれてないとすると、AGIは成功しないと思う。
AIは莫大な利益を出し、大規模な失業はおきるが、AIがAIを作ったり、世紀の大発見が連鎖するシンギュラリティは起きない。
Sick@sickdotdev
AI succeeds → massive job losses AI fails → economy crashes AI is just hype → we wasted a decade which path are we actually on ?
日本語

@planert41 わかるよ。日本もほとんど同じだもん。
世界中のパパはだいたい同じなんだな。
今日は河原でデイキャンプ。でもバーベキューはしない。
日本語

Dads with young kids
Is it normal to just feel tired and burnt out all the time?
It’s like you don’t even really get the weekend to recover because it’s all just kid stuff the moment you wake up
Arguably the only personal time you have is when they nap (and you’re already dead tired by that time) or the hour after they go to bed but before you pass out
Just trying to figure out if I’m doing something wrong
English













