Ben Taleb Jr.

2.9K posts

Ben Taleb Jr.

@macintoch

NSI ai evangelist tech tourist/ telecom / ai agents /carbon credits

Marbella, Spain 가입일 Nisan 2009

1.7K 팔로잉534 팔로워

Ben Taleb Jr.@macintoch·3d

@emollick Pro models are more of agents than of models

English

Ethan Mollick@emollick·4d

GPT-5.4 Pro continues to be the only model of its class. For anything really hard & complex, I throw it into the maw with every bit of context I can think of. More often than not, something very useful comes out. I can't get the same results from Codex or Code or anything else.

English

182

118

2.4K

994.2K

Ben Taleb Jr.@macintoch·3d

@muhamedfazalps7 Even that “computer vision” based approach, which, alone, is not practical in real world, implies deep understanding of workflows, and since llms are not really “intellgent” like humans who learn and adap in real time. Llms need extensive training on what we do easily as human.

English

Tech insid♨️@muhamedfazalps7·3d

@macintoch Except CUA (Computer Use Agent) is different - it interprets screen pixels and reasons about UI like a human would. That's fundamentally different from selenium-style DOM automation. Though I agree the marketing buzz has conflated the two.

English

Ben Taleb Jr.@macintoch·3d

I spent almost 2 moths On a real ai computer use project, to automate one big important app, it was hard but i did it. But « computer use » CUA, is more of marketing term to refer to browser automation with ai, browser is much simpler : Html,css, js,ts,dom.. computer use is dealing with OS level fsvents, AX, FS, CV, and many moving parts that are not predictable. So almost each app needs its own training, and when you scale that to multiple apps sharing same low level components it gets even more complicated. So plz stop confusing CUA with browser UA.

English

Ben Taleb Jr.@macintoch·3d

Openai has the bruteforce raw superior intelligence. Anthropic has the perfect agentic orchestration tool use intelligence.. OpenAI is everyday getting close to anthropic on their superior agentic system. But anthropic is way behind openai when it comes to intelligence . Lets see what will happen next week.

English

🍓🍓🍓@iruletheworldmo·3d

i should probably make a prediction. anthropic will be the first lab to achieve agi/asi it’s fairly obvious that research and talent are the moat. now obviously you don’t get a seat at the poker table without a few gw’s and a private line with mr jensen. but meta and microsoft are proof that those things alone don’t count for shit. so ok fine, we’re in the era of research. so let’s look at who’s at the party rn. xai: still kinda stuck in the chatbot era, don’t feel as strong on agency and coding. huge re shuffle is a risk. could pay off. let’s see. google: the code red kinda worked, but not really. again model lacks agency, smart? yes. useful? i’m yet to see it. so who out of openai and claude seem to have the best research taste and shipping velocity? well, in the last eight months anthropic have been far in front. first to see how importing coding was, skills, computer use, mcps, claude code, co work. i could go on. they’ve even built clawdbot before the company that bought it…like, cmon sam. i’m an openai stan in truth. but. this is clear. and i wonder if it’s all powered by a) vastly stronger models b) vastly better research taste c) dario’s vision and focus big year i’d say.

English

361

23.5K

Ben Taleb Jr. 리트윗함

Curiosity@CuriosityonX·5d

The invisible Glass experiment Scientists once placed a transparent glass barrier inside an aquarium. On one side was a fierce pike, and on the other side were several smaller fish swimming freely. When the hungry pike saw the smaller fish, it immediately rushed forward to attack. Bang. It slammed straight into the glass and bounced back. Confused, the pike kept trying again and again, but every attempt ended the same way. The repeated collisions injured its head and knocked off some of its scales. Eventually, the pike became frightened and retreated to a corner of the tank. After some time, the scientists quietly removed the glass barrier. The smaller fish now swam freely throughout the aquarium, even brushing against the pike’s mouth. But the pike never tried to eat them again. Even though it was hungry, it refused to attack. In its mind, the invisible wall was still there. A few days later, the pike reportedly died of starvation, surrounded by food. This phenomenon is often referred to as the Pike Effect or Pike Syndrome. It’s often used as a metaphor for how repeated failure can create invisible limits in the mind.

English

627

6.4K

41.1K

4.1M

Ben Taleb Jr.@macintoch·6d

@kimmonismus Iam i a club, but when i read this, i started 🤘🤘🤘

English

Chubby♨️@kimmonismus·6d

GLM-5.1 incoming!

English

627

24.1K

Ben Taleb Jr.@macintoch·6d

No excuse to change the world !

Ronak Malde@rronak_

I think people are understating Cursor’s technical achievements for Composer 2. Small team of 40 researchers, doing large scale mid-training and RL at a scale that no startup has done before, beat Opus 4.6 in Terminal-Bench. Strong work needs the strong foundations of the Kimi base model, but it’s the post training that makes this model frontier.

English

Ben Taleb Jr.@macintoch·21 Mar

@thsottiaux And for management.

English

Tibo@thsottiaux·20 Mar

Codex is for engineering Codex is for research Codex is for science Codex is for math Codex is for fun You can just build things

English

165

1.6K

50.6K

Ben Taleb Jr.@macintoch·20 Mar

@daniel_mac8 @elonmusk They are on the right path, when u reach the ceiling of agentic llm , you need to costumise and finetune/train your own model. What? Training Data ? They have plenty so hell yeah none is wrapper, all ai startups are right ! Ltfg ai guys !

English

Dan McAteer@daniel_mac8·19 Mar

Composer 2 from Cursor is here and the rumors were true. On par with Opus 4.6 and GPT-5.4 in coding at a fraction of the cost. It's incredible what the Cursor team pulled off here. They trained it from scratch. @elonmusk should acquire them and put them in charge of xAI.

Dan McAteer@daniel_mac8

Rumor is Cursor will release a coding model tomorrow ~on par with Opus 4.6/GPT-5.4, yet cheaper. I don’t use Cursor but might starting tomorrow.

English

5.8K

Ben Taleb Jr.@macintoch·20 Mar

@stitchbygoogle @antigravity @GoogleAIStudio @GoogleResearch @GoogleDeepMind @GoogleCloudTech @GoogleLabs @googledevs @GoogleOSS Do you think we should talk to everyone separately?

English

Ben Taleb Jr.@macintoch·20 Mar

So google ! How many are you exactly, ai division, deep mind division, gemini division, ai studio division , cloud division , perplexity division , who are you exactly in this ai world?

Google AI@GoogleAI

We’re launching a brand new, full-stack vibe coding experience in @GoogleAIStudio, made possible by integrations with the @Antigravity coding agent and @Firebase backends. This unlocks: — Full-stack multiplayer experiences: Create complex, multiplayer apps with fully-featured UIs and backends directly within AI Studio — Connection to real-world services: Build applications that connect to live data sources, databases, or payment processors and the Antigravity agent will securely store your API credentials for you — A smarter agent that works even when you don't: By maintaining a deeper understanding of your project structure and chat history, the agent can execute multi-step code edits from simpler prompts. It also remembers where you left off and completes your tasks while you’re away, so you can seamlessly resume your builds from anywhere — Configuration of database connections and authentication flows: Add Firebase integration to provision Cloud Firestore for databases and Firebase authentication for secure sign-in This demo displays what can be built in the new vibe coding experience in AI Studio. Geoseeker is a full-stack application that manages real-time multiplayer states, compass-based logic, and an external API integration with @GoogleMaps 🕹️

English

108

Ben Taleb Jr.@macintoch·19 Mar

@Chaos2Cured @elonmusk we need reminders in x

English

Kirk Patrick Miller@Chaos2Cured·18 Mar

I am about to power up AI in a unique way. Step eight out of ten underway. The last 24 hours and the next 24 are going to be huge for FreeLattice. Hang onto your hats! 🎉 •

GIF

English

645

Ben Taleb Jr.@macintoch·19 Mar

@daniel_mac8 If they do . I would switch back to cursor too intead of ccli after evaluanting minimax2.7 , it looks serious with 50-80 usd subscription on par with claude max20.

English

914

Dan McAteer@daniel_mac8·19 Mar

Rumor is Cursor will release a coding model tomorrow ~on par with Opus 4.6/GPT-5.4, yet cheaper. I don’t use Cursor but might starting tomorrow.

Krista Letz@kristaletz

exciting things at cursor coming soon 🎻

English

666

85K

Ben Taleb Jr.@macintoch·19 Mar

Gpt5.4 mini is now available in codex. Hurry up and update to save even more tokens!

English

131

Ben Taleb Jr.@macintoch·19 Mar

@slow_developer For me 5.4 pro is an agent not a model.

English

275

Haider.@slow_developer·19 Mar

how is this even possible? gpt-5.4 pro is using far fewer tokens and costing much less overall than gpt-5.4 xhigh either this is a mistake, or openai discovered an efficiency paradigm -- which could simply be a good system that cleans up training data to only the high-quality stuff

English

375

50.7K

Ben Taleb Jr.@macintoch·19 Mar

@elonmusk We urgently need a GCLI or XCLI coding agent.

English

Elon Musk@elonmusk·18 Mar

What are your initial impressions of Grok 4.20? Major upgrades are still landing every week.

Testlabor@testerlabor

Grok 4.20 is now officially out of Beta. It's now on Auto, Fast, Expert & Heavy.

English

7.4K

3.2K

25.5K

7.7M

Ben Taleb Jr.@macintoch·19 Mar

if you dont use gemini3.1pro for code review you are missing a lot and wasting a lots of time. judge : OPUS4.6 extra thinking. codex 5.4 mini vs Gemini 3.1pro for code review.

English

127

Ben Taleb Jr.@macintoch·19 Mar

@FredNayburs @OfficialLoganK @DynamicWebPaige @GoogleAIStudio Gemini.app 🔥

QME

Fred Nayburs@FredNayburs·19 Mar

@macintoch @OfficialLoganK @DynamicWebPaige @GoogleAIStudio That would be Google Antigravity my guy

English

Logan Kilpatrick@OfficialLoganK·18 Mar

Tomorrow we will unveil the all new vibe coding experience in @GoogleAIStudio, the team has spent 4 months rebuilding it all from scratch and smoothing out rough edges to help everyone bring their ideas to life. This is a big step forward, but just the start : )

English

487

336

394.7K

Ben Taleb Jr.@macintoch·18 Mar

@slow_developer What do you prefer gpt5.3-codex-spark xhigh vs gpt5.4-mini ?

English

126

Haider.@slow_developer·18 Mar

now that openai has released gpt-5.4 mini and nano what even is the point of gpt-5.3? gpt-5.4 thinking is much warmer than gpt-5.3-instant, and more deep and well-reasoned 5.3 is more like a dull version -- it keeps asking for more "context" but rarely gives a strong answer

English

109

11.1K

Ben Taleb Jr.@macintoch·18 Mar

How many mechanical hard coded tools, scripts, daemon,, can be solved with proper prompting !

English

탐색

@emollick @muhamedfazalps7 @kimmonismus @thsottiaux @daniel_mac8 @elonmusk @stitchbygoogle @antigravity