Chetaslua

10.3K posts

Chetaslua

@chetaslua

AI Insider / Reporter featured in BGR • HackerNews • GIGAZINE • 36Kr | AI Prompting and Testing | Vibe Benchmark and Vibe Marketing 🫆

가입일 Aralık 2024

149 팔로잉23.1K 팔로워

Chetaslua@chetaslua·1d

@Gc_qube Yeah but it's cheap and they used 3 flash as main system

English

141

pozitiv4ik@Gc_qube·1d

@chetaslua This is a human + AI as an assistant. This differs from the results of GPT 5.5 Pro, which solved the problems itself

English

327

Chetaslua@chetaslua·1d

Google DeepMind's Al agent autonomously solved 9 of 353 open Erdos problems using last gen SOTA. Think of new gen model like 3.5 flash + 3.5 Pro we can better results in a month >The basic proving loop uses Gemini 3.1 Pro for multi-turn proof generation, > full system uses Gemini 3.0 Flash for high-throughput rating/evaluation > Gemini 3.1 Pro for harder prover work. (frontier reasoning model + cheaper fast model + formal verifier + AlphaProof)

Mark Kretschmann@mark_k

Google DeepMind just dropped one of the clearest signals yet for where math is going. The @GoogleDeepMind AlphaProof Nexus agent autonomously resolved 9 of 353 open Erdős problems, with the proofs checked in Lean. The reported inference cost: a few hundred dollars per problem. That is wild. Not because 9/353 means math is "solved". It clearly doesn’t. Most problems still resisted, and the full search cost is more complicated. But the direction is obvious: AI agents are moving from contest math into real research-level mathematics. And formal verification turns the output from "sounds plausible" into "actually compiles". generate, test, verify, repeat.

English

158

15.7K

Chetaslua@chetaslua·1d

@marfinxx Haha true

Filipino

130

marfin@marfinxx·1d

@chetaslua It looks like AI will actually start winning Nobel Prizes in a year

English

204

Chetaslua@chetaslua·1d

@0xAlphaTrader @Kimi_Moonshot 100% working on that

English

402

Alpha Trader 🧑‍💻@0xAlphaTrader·1d

@chetaslua @Kimi_Moonshot Give us the link to play around!

English

465

Chetaslua@chetaslua·1d

😯 Kimi K2.6 + Cerebras with Grok-like 4 parallel reasoning just made this paper physics website in one shot @Kimi_Moonshot you guys cooked , guys i have 5-6 more demos to share with this workflow , should i also share the orchestrator ?

Chetaslua@chetaslua

HOLLLLY SHIIIIIT 😳 GPT 5.5 xhigh in codex made this paper physics website with wind effect one shot check the physics , ui and interaction @sama you guys cooked and this is my first time i can suggest chatgpt is go to ai solution for everything

English

344

50K

Chetaslua@chetaslua·1d

@InderosD @Kimi_Moonshot Yes I will

English

401

daniel inderos@InderosD·1d

@chetaslua @Kimi_Moonshot woah that's so good? how fast was it with cerebras? and yes you should share the orchestrator

English

499

Chetaslua@chetaslua·1d

@IqraSaifiii @Kimi_Moonshot Thanks a lot 🙏

English

432

Iqra Saifi@IqraSaifiii·1d

@chetaslua @Kimi_Moonshot Looks so good and congratulations for 23 k followers

English

559

Chetaslua@chetaslua·1d

@MaksimXBT yeah

English

224

Maksim@MaksimXBT·1d

@chetaslua 48 seconds sounds fast till you need to maintain it over 1000 tests

English

235

Chetaslua@chetaslua·1d

Holy shit Kimi K2.5 cooked the voxel cube test 🤯 best open-source coding model for me rn I asked for one thing that is rubicks cube >it gave me a better version compared to Gemini 3.5 >at this point i will choose kimi over gemini , never thought i will say this

Chetaslua@chetaslua

Holy shiitt Gemini 3.2 is insanely fast and intelligent 🤯 passed voxel cube test with more than flying colours , it gave so much better output that i didnot even asked for 1700 lines of codes - 48 seconds and solved it , proof attached so that everyone can see @demishassabis thanks for hearing us and gave us first non lazy google model 🥹

English

136

14.6K

Chetaslua@chetaslua·1d

@Antialpha8 composer is best inside cursor for the price

English

271

Antialpha@Antialpha8·1d

@chetaslua Is composer any improvement over kimi?

English

223

Chetaslua@chetaslua·2d

@notjazii They can upcoming gpt 5.6 is very good and will mog opus 4.8

English

1.9K

J A Z I I@notjazii·2d

@chetaslua They better not release it publicly

English

1.6K

Chetaslua@chetaslua·2d

Claude-Mythos-1-preview spotted will be available in Claude Code and Claude Security. here is more detailed images you can see new plans ui too and mythos-1 with adaptive thinking

🚨 AI News | TestingCatalog@testingcatalog

ANTHROPIC 🔥: Mythos 1, "claude-mythos-1-preview", is being prepared for a release on Claude Code and Claude Security. The model became visible for a short amount of time on Claude; besides that, new strings mentioning Mythos have been added. > Access to the Claude Mythos model in Claude Code and Claude Security. It still doesn't mean the general public will have access to this exact model, according to Anthropic's earlier communication. More below 👇

English

1.2K

223.9K

Chetaslua@chetaslua·2d

@glitchedsomi Haha average claude usage

English

3.3K

So Me@glitchedsomi·2d

@chetaslua 💀

QME

4.2K

Chetaslua@chetaslua·2d

@diegocabezas01 yes

Diego | AI 🚀 - e/acc@diegocabezas01·2d

@chetaslua a lot of them! specially the general public

English

129

Diego | AI 🚀 - e/acc@diegocabezas01·2d

AI is crossing into superhuman territory

English

1.9K

Chetaslua@chetaslua·3d

@Ubannoblesse deepseek is best for normal to do type apps , good for all app store wrappers

English

848

SpiritualSpell@Ubannoblesse·3d

American AI is cooked 😭

DeepSeek@deepseek_ai

We are making our discount permanent! 🎉 Enjoy building with DeepSeek-V4-Pro and bring your innovative ideas to life! 🚀

English

2.7K

Chetaslua@chetaslua·3d

Proof : - gemini.google.com/share/d611c51c…

English

3.3K

Chetaslua@chetaslua·3d

Gemini 3.5 Flash vs Gemini-3.1 Pro You have seen a lots of tweet from google team that 3.5 flash is better than 3.1 pro and faster reality : it is 4 times more costly on the same task as it consumes more token and it is super dumb and jagged , like see it cant solve kids math

Chetaslua@chetaslua

Gemini 3.5 Flash vs GPT-5.5 instant vs Sonnet 4.6 Remember guys. #1 in Finance Agent v2. SOTA performance right here. lol 🤣 Prompt : " 300+140=460 Is this correct? Breakdown? "

English

344

58.5K

Chetaslua@chetaslua·3d

@brian_from_1999 yeah these non thinking models

English

brian@brian_from_1999·3d

@chetaslua AI can’t do maths 🤣 x.com/chetaslua/stat…

Chetaslua@chetaslua

Gemini 3.5 Flash vs GPT-5.5 instant vs Sonnet 4.6 Remember guys. #1 in Finance Agent v2. SOTA performance right here. lol 🤣 Prompt : " 300+140=460 Is this correct? Breakdown? "

English

165

Chetaslua@chetaslua·3d

Two Biggest Shortcomings of AI This week , We are So Back > Ai can't do maths ( solve erdos problem ) > Ai can't write good story ( won prize in story writing ) Reason by Anti Ai : ai can't feel so they can't write , ai can't think so they can't solve open problem

OpenAI@OpenAI

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

English

8.7K

Chetaslua@chetaslua·3d

@nick_kango yeah perfect analysis

English

481

Nick@nick_kango·3d

I was curious if this was true so created my own eval on a bunch of sota models (took <2min with Kaggle btw) Turns out only Opus 4.7 got it wrong. Every other model — Flash 3.5, GPT 5.5, GLM-5, and even qwen3 — got it right. Models are truly jagged & spikey

Chetaslua@chetaslua

Gemini 3.5 Flash vs GPT-5.5 instant vs Sonnet 4.6 Remember guys. #1 in Finance Agent v2. SOTA performance right here. lol 🤣 Prompt : " 300+140=460 Is this correct? Breakdown? "

English

2.7K

탐색

@Gc_qube @marfinxx @0xAlphaTrader @Kimi_Moonshot @InderosD @IqraSaifiii @MaksimXBT @Antialpha8