Chetaslua

10.3K posts

Chetaslua banner
Chetaslua

Chetaslua

@chetaslua

AI Insider / Reporter featured in BGR • HackerNews • GIGAZINE • 36Kr | AI Prompting and Testing | Vibe Benchmark and Vibe Marketing 🫆

가입일 Aralık 2024
149 팔로잉23.1K 팔로워
Chetaslua
Chetaslua@chetaslua·
@Gc_qube Yeah but it's cheap and they used 3 flash as main system
English
0
0
1
141
pozitiv4ik
pozitiv4ik@Gc_qube·
@chetaslua This is a human + AI as an assistant. This differs from the results of GPT 5.5 Pro, which solved the problems itself
English
1
0
1
327
Chetaslua
Chetaslua@chetaslua·
Google DeepMind's Al agent autonomously solved 9 of 353 open Erdos problems using last gen SOTA. Think of new gen model like 3.5 flash + 3.5 Pro we can better results in a month >The basic proving loop uses Gemini 3.1 Pro for multi-turn proof generation, > full system uses Gemini 3.0 Flash for high-throughput rating/evaluation > Gemini 3.1 Pro for harder prover work. (frontier reasoning model + cheaper fast model + formal verifier + AlphaProof)
Chetaslua tweet media
Mark Kretschmann@mark_k

Google DeepMind just dropped one of the clearest signals yet for where math is going. The @GoogleDeepMind AlphaProof Nexus agent autonomously resolved 9 of 353 open Erdős problems, with the proofs checked in Lean. The reported inference cost: a few hundred dollars per problem. That is wild. Not because 9/353 means math is "solved". It clearly doesn’t. Most problems still resisted, and the full search cost is more complicated. But the direction is obvious: AI agents are moving from contest math into real research-level mathematics. And formal verification turns the output from "sounds plausible" into "actually compiles". generate, test, verify, repeat.

English
9
13
158
15.7K
marfin
marfin@marfinxx·
@chetaslua It looks like AI will actually start winning Nobel Prizes in a year
English
1
0
0
204
Chetaslua
Chetaslua@chetaslua·
😯 Kimi K2.6 + Cerebras with Grok-like 4 parallel reasoning just made this paper physics website in one shot @Kimi_Moonshot you guys cooked , guys i have 5-6 more demos to share with this workflow , should i also share the orchestrator ?
Chetaslua@chetaslua

HOLLLLY SHIIIIIT 😳 GPT 5.5 xhigh in codex made this paper physics website with wind effect one shot check the physics , ui and interaction @sama you guys cooked and this is my first time i can suggest chatgpt is go to ai solution for everything

English
8
23
344
50K
Maksim
Maksim@MaksimXBT·
@chetaslua 48 seconds sounds fast till you need to maintain it over 1000 tests
English
1
0
0
235
Chetaslua
Chetaslua@chetaslua·
Holy shit Kimi K2.5 cooked the voxel cube test 🤯 best open-source coding model for me rn I asked for one thing that is rubicks cube >it gave me a better version compared to Gemini 3.5 >at this point i will choose kimi over gemini , never thought i will say this
Chetaslua@chetaslua

Holy shiitt Gemini 3.2 is insanely fast and intelligent 🤯 passed voxel cube test with more than flying colours , it gave so much better output that i didnot even asked for 1700 lines of codes - 48 seconds and solved it , proof attached so that everyone can see @demishassabis thanks for hearing us and gave us first non lazy google model 🥹

English
13
10
136
14.6K
Chetaslua
Chetaslua@chetaslua·
@notjazii They can upcoming gpt 5.6 is very good and will mog opus 4.8
English
2
0
7
1.9K
Chetaslua
Chetaslua@chetaslua·
@Ubannoblesse deepseek is best for normal to do type apps , good for all app store wrappers
English
0
0
6
848
Chetaslua
Chetaslua@chetaslua·
Gemini 3.5 Flash vs Gemini-3.1 Pro You have seen a lots of tweet from google team that 3.5 flash is better than 3.1 pro and faster reality : it is 4 times more costly on the same task as it consumes more token and it is super dumb and jagged , like see it cant solve kids math
Chetaslua tweet mediaChetaslua tweet media
Chetaslua@chetaslua

Gemini 3.5 Flash vs GPT-5.5 instant vs Sonnet 4.6 Remember guys. #1 in Finance Agent v2. SOTA performance right here. lol 🤣 Prompt : " 300+140=460 Is this correct? Breakdown? "

English
45
21
344
58.5K
Nick
Nick@nick_kango·
I was curious if this was true so created my own eval on a bunch of sota models (took <2min with Kaggle btw) Turns out only Opus 4.7 got it wrong. Every other model — Flash 3.5, GPT 5.5, GLM-5, and even qwen3 — got it right. Models are truly jagged & spikey
Chetaslua@chetaslua

Gemini 3.5 Flash vs GPT-5.5 instant vs Sonnet 4.6 Remember guys. #1 in Finance Agent v2. SOTA performance right here. lol 🤣 Prompt : " 300+140=460 Is this correct? Breakdown? "

English
2
2
17
2.7K