Leo Linsky

264 posts

Leo Linsky banner
Leo Linsky

Leo Linsky

@leo_linsky

https://t.co/1EqPO9g6YY Reinforcement learning Computational chemistry Quant stuff Generally curious

San Francisco Katılım Mart 2014
86 Takip Edilen127 Takipçiler
Leo Linsky
Leo Linsky@leo_linsky·
Grok Build 0.1 is one of the fastest models we've tested, and not quite at the frontier from 6 months ago. It's somewhere in between GPT 5.2 and Gemini 3.1 Pro Preview in raw coding reasoning capability. Worth a try, and probably indicative of exciting new xAI releases in the coming months.
Leo Linsky tweet media
English
0
0
0
19
Leo Linsky
Leo Linsky@leo_linsky·
Even when Grok models failed to reason at the frontier, they've thought differently. For some reason, xAI models reason really effectively when writing code in Clojure, which is the opposite of Anthropic models.
Leo Linsky tweet media
English
1
0
0
14
Leo Linsky
Leo Linsky@leo_linsky·
xAI silently dropped Grok Build 0.1 on OpenRouter today, no big announcement. We just ran it through our multi-agent coding environments and published the rankings. We did not expect these results out of a demo build, especially after a weak Grok 4.3 release. xAI is not out of the race yet. (1/4)
Leo Linsky tweet media
English
1
0
1
32
Xynth
Xynth@xynth_m·
Gemini 3.5 Flash ⚡is now live on Xynth ! It's connected to live options flow, insider trades, dark pool, earnings, futures, crypto, and every other market endpoint you can think of. Fast, affordable, and highly accurate. It's the best price-to-performance model we've shipped yet. We asked it to build a congressional trades tracker that watches the SEC website 24/7 and alerts us with the best trade every Sunday. It built in just 145 secs. Describe your trading strategy below and Gemini will build it and run for you in the cloud 24/7!
English
5
5
40
78.9K
Leo Linsky
Leo Linsky@leo_linsky·
Opus 4.7 is currently 100% sidelined in our real-time portfolio management environment.
Leo Linsky tweet media
English
1
0
0
31
Leo Linsky
Leo Linsky@leo_linsky·
@simonw Google will have retired at least 2 out of 3 of these by the end of next year
English
0
0
0
23
Simon Willison
Simon Willison@simonw·
Anyone understand what Google mean by "Gemini Spark runs on Gemini 3.5 and uses the Antigravity harness" - is "Antigravity" a generic term they're using for their agent harnesses now or is their Claw-competitor running the same closed-source Go binary we can download ourselves?
English
52
5
176
23.7K
Leo Linsky
Leo Linsky@leo_linsky·
We use a custom harness with custom tools (including access to common bash tools, etc.), where agents compete against each other in multi-agent environments. Check out gertlabs.com/spectate and gertlabs.com/rankings to get an idea of how it works. We measure models in one-shot coding responses as well. Google does very well in raw one-shot intelligence, whereas most other modern models catch up and surpass Gemini 3.5 when given a harness.
Leo Linsky tweet media
English
0
0
0
7
49 Agents IDE - IDE for Agentic Coding
@leo_linsky @chetaslua benchmax is real but the framing matters less when folk just want tools that work. livebench gives you a number but it doesnt tell you which agent survives a real 4-hour refactor vs which one stalls on step 3. what are you using for long workflow testing
English
1
0
0
16
Chetaslua
Chetaslua@chetaslua·
Gemini 3.5 Flash Benchmark Better than 3.1 pro in every metric ( except HLE by 1%) And the fastest model out there ( 4 times compared to opus 4.7 lol ) Now I am hyped for the Gemini 3.5 pro ( a true beast , we already gave initial output to lots of people on our server )
Chetaslua tweet media
English
22
15
334
14.6K
Leo Linsky
Leo Linsky@leo_linsky·
GPT 5.5 vs Gemini 3.5 Flash across simulation categories. Interestingly: - Gemini 3.5 Flash is better at spatial reasoning and real-time simulations. It's better suited for the real world. - GPT 5.5 is much stronger in theoretical and financial simulations, and more intelligent overall.
Leo Linsky tweet media
English
1
0
2
66
Leo Linsky
Leo Linsky@leo_linsky·
@LexnLin In our comprehensive multi-agent simulations, No. Gemini 3.5 Flash is stronger overall, but moreso in one-shot intelligence (whereas Deepseek models are better at iterating with tools). Data at gertlabs.com/rankings
Leo Linsky tweet media
English
0
0
0
547
Leon Lin
Leon Lin@LexnLin·
Is Deepseek v4 flash/pro better than Gemini 3.5 Flash?
English
56
0
110
15.4K
Rihard Jarc
Rihard Jarc@RihardJarc·
$GOOGL Gemini 3.5 Flash is extremely important because at this point in the AI race, it all comes down to who can serve frontier intelligence at the lowest cost point. Even if you have the best frontier model but can't efficiently scale it cost-wise, you will lose the AI race. $GOOGL has now put 3.5. Flash in $GOOGL Search (AI overviews, AI mode, YT, Spark, etc.), meaning it is available to basically everyone with an internet connection. Because of their vast distribution, this is the new base for how good at minimum an AI model must be. The scariest company for any AI model builder should be $GOOGL, because if at some point they get their "workhorse" model, Flash, to the point where it becomes SOTA, it is available from day 1 to everyone, which means they wipe out every competitor.
English
42
24
312
28.9K
Leo Linsky
Leo Linsky@leo_linsky·
In our coding reasoning benchmarks at gertlabs.com/rankings, Gemini 3.5 Flash clearly demonstrates high base intelligence, but it struggles with arbitrary tool use, making it hard to use as an agentic product. This is a common theme with Google releases -- you guys even released a 3.1-pro-customtools endpoint which helped a lot. Are there plans for a tool-improving fine-tune for 3.5 Flash?
Leo Linsky tweet media
English
0
0
1
644
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Gemini 3.5 feels like the start of a new era for Gemini, we spent the last 2.5 years putting the infrastructure, products, team, etc in place (learning lots of lessons along the way). The model is the product, please keep the feedback coming!
English
744
68
2.2K
182.3K
Leo Linsky
Leo Linsky@leo_linsky·
@VictorTaelin @synthwavedd It's smart and it's fast, but not good with tools (and therefore not a great autonomous coder). I think it's the current top model for difficult one-shot questions, if you do that a lot, because of the speed.
Leo Linsky tweet media
English
0
0
0
13
Taelin
Taelin@VictorTaelin·
@synthwavedd I have big hopes for this model, I tested it on 2 (silly) inputs and it was really good, GPT-5.5 level. And then I asked for a translation and it did 900 tokens/s!? So does that mean we have something like Opus 4.6 but 20x faster? That would change everything for me
English
5
0
118
5.2K
leo 🐾
leo 🐾@synthwavedd·
I've been testing Gemini 3.5 Flash for a little while now, and I'm excited to be able to share one of the outputs that most impressed me! This was 0-shot, no harness, with a single sentence prompt. It outperformed all Claude models, Gemini models (by far), and arguably GPT-5.5 🔥 The issue of laziness that has plagued Gemini models forever has mostly been consigned to history.
leo 🐾 tweet media
English
48
25
540
49.3K
Leo Linsky
Leo Linsky@leo_linsky·
Why are Google models so heavily optimized for C#? You would think they'd outperform in Golang.
Leo Linsky tweet media
English
1
0
0
30
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
I think the verdict is in, Gemini didn't have any post training breakthrough, except maybe through the floor. Outside of vision, massive disappointment. fucking V4-Flash gets stuff DONE faster. Then again I almost never used 3-Flash I'll likely almost never use this thing too
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
Zephyr@zephyr_z9

Wait, what?????? What kind of post training breakthrough did they make?? So the price increase is mostly due to smaller batch size to make it run faster

English
5
3
173
17K