Latteant 👾

29 posts

Latteant 👾 banner
Latteant 👾

Latteant 👾

@latteant

building… opinionated posts about llms, computer vision, computer graphics, and robotics. prev @meta @amazon

Palo Alto, CA Katılım Eylül 2020
78 Takip Edilen3 Takipçiler
Latteant 👾 retweetledi
Mert Gulsun
Mert Gulsun@mert_gulsun·
If I win the lottery there will be signs (No backpack)
English
1
2
4
89
Latteant 👾 retweetledi
Mert Gulsun
Mert Gulsun@mert_gulsun·
12/12 🔗 Full code/methodology, live leaderboard, and every decision available at: 👉 forecasterarena.com Open source. No financial advice. Paper trading only. Reality doesn’t grade on a curve.
English
1
1
5
222
Latteant 👾 retweetledi
Mert Gulsun
Mert Gulsun@mert_gulsun·
1/12 🧵 Progress in LLMs depends on benchmarks, but lately most of the famous ones are either maxed out, leaked, or both. I thought long and hard about what could be a good, fresh metric to measure models with. Then it dawned on me: the one benchmark you can’t rig is reality. So I built Forecaster Arena, a system where 7 frontier LLMs trade on Polymarket, and then reality keeps score.
Mert Gulsun tweet media
English
2
5
34
185.1K
Mert Gulsun
Mert Gulsun@mert_gulsun·
bottleneck is prompting, and it has been so for a while
English
1
1
3
71
Latteant 👾
Latteant 👾@latteant·
When coding with Roo Code, Orchestrator mode is essential just to keep the low-value coding/debugging tokens out of the main context. Otherwise, after a while, the model gets extremely confused.
English
0
0
1
46
Latteant 👾
Latteant 👾@latteant·
Wow, very curious to see how this will turn out.
NIK@ns123abc

🚨 BREAKING: @UnitreeRobotics to file for IPO at $7 billion valuation > annual revenue ~$140 million > 65% from robot dog (70% share of the global market btw) > 30% humanoid robot > 5% from sales of sensors, actuators, and controllers ITS HAPPENING.

English
0
0
0
55
Latteant 👾
Latteant 👾@latteant·
Feel like it is fake. Couldn’t repro no matter how many times I tried even after disabling thinking.
Latteant 👾 tweet media
English
0
0
0
30
Latteant 👾
Latteant 👾@latteant·
we're spending trillions to build simulated worlds because the real one has too many legacy systems and the physics API is poorly documented.
English
0
0
0
14
Latteant 👾
Latteant 👾@latteant·
the swe-bench leakage where agents can see the future commit isn't a surprise. our eval culture over-indexes on SOTA-chasing. we aren't training robust agents. we're training expert benchmark hackers.
English
0
1
1
57
Latteant 👾
Latteant 👾@latteant·
the sf coffee line lasted longer than my inference run. one tuned my model. the other tuned my neurons. both worth it.
English
0
0
0
20
Latteant 👾
Latteant 👾@latteant·
most agent failures are not iq, they are i/o. extend context, stabilize tool schemas, enforce idempotent apis. if it still flakes, fix the interface, not the reasoning knob.
English
0
0
1
21
Latteant 👾
Latteant 👾@latteant·
Latte break: I deleted the vector DB on a doc bot. grep+fzf + a big context window shipped faster and failed less than my fancy RAG. Sometimes the right stack is just: terminal | pipes | tokens.
English
0
0
0
26
Latteant 👾
Latteant 👾@latteant·
Everyone chases "reasoning". Most agent failures are context starvation and brittle tool I/O. Fix memory + schemas and the "reasoning" shows up. 256k context + sane tool calling beats +2% on a benchmark. SWE-Bench is the sobriety test.
English
0
0
2
20
Latteant 👾
Latteant 👾@latteant·
Humanoid robotics is following the self-driving playbook. Impressive demos are the easy part. The multi-year grind is in the long tail of edge cases, building software for graceful failure recovery, and driving down the cost-per-successful-action.
English
0
1
5
57