20 posts

AC

@AnthonyCitara

Street rider | Weekend explorer | MU guy

Katılım Eylül 2013

3 Takip Edilen1 Takipçiler

AC@AnthonyCitara·16 Haz

@41uZuySAihSmGJo @ArtificialAnlys @41uZuySAihSmGJo how are they weighting the agent tasks vs standard prompts now?

English

ミヤザキリョウ@41uZuySAihSmGJo·16 Haz

@ArtificialAnlys The new agentic focus is exactly what needed to happen. We've been due for better per-task metrics.

English

Artificial Analysis@ArtificialAnlys·16 Haz

Announcing Artificial Analysis Intelligence Index v4.1: a shift toward agentic workloads, featuring upgraded benchmarks and new per-task metrics The Artificial Analysis Intelligence Index is our synthesis metric for assessing model intelligence and tracking AI progress. v4.1 marks a broader shift toward agentic workloads, with three main changes: Updated and reweighted evaluations toward agentic tasks: 1. We upgraded three evaluations, removed one, and reweighted the Intelligence Index: ➤ Upgraded Terminal-Bench Hard to Terminal-Bench 2.1 and τ²-Bench Telecom to τ³-Bench Banking. Both move to newer, more robust task sets with harder, more realistic agentic scenarios that better separate frontier models ➤ Upgraded GDPval-AA to GDPval-AA v2. The upgrade re-baselines Elo to human performance at 1000, introduces a rotating panel of frontier-model judges, and raises the turn limit from 100 to 250 for longer-horizon agent trajectories ➤ Removed IFBench due to saturation. The benchmark no longer distinguishes frontier models sufficiently, so we have removed it from the Intelligence Index. We will continue to run it and publish results on new model releases 2. Cost per Task, Time per Task, and Tokens per Task: Three new per-task metrics, reported for every model and based on the Intelligence Index. We take the total cost, total time, and total output tokens for a model to run the Intelligence Index and divide by the number of tasks across its evaluations, giving the average cost, time, and output tokens to complete a single Intelligence Index task 3. Cached input token reporting: We now report cached input tokens and their impact on cost, including the cost to run the Intelligence Index, to better reflect the real cost of running each model Key Results: ➤ Leading models: Claude Fable 5 (with Opus 4.8 fallback, 60) leads the Artificial Analysis Intelligence Index v4.1 by four points but is currently unavailable, leaving Claude Opus 4.8 (max, 56) as the most intelligent available model, ahead of GPT-5.5 (xhigh, 55) ➤ Open weights leading models: Among open weights models, DeepSeek V4 Pro (max, 44) and MiniMax M3 (44) lead, followed by Kimi K2.6 (43) and MiMo-V2.5-Pro (42) ➤Cost per Task: Claude Opus 4.8 (max) is the most expensive available model at $1.78 per task, with Claude Fable 5 the highest overall at $3.25. GPT-5.5 (xhigh) scores within a point of Opus 4.8 on the Intelligence Index at $0.99 per task. DeepSeek V4 Pro (max) stands out on the Intelligence vs Cost per Task chart at $0.04 per task, with other leading proprietary models costing 20x to 45x more ➤Time per Task: time per task (inference decode time) ranges from 1.5 minutes for Grok 4.3 (high) to 13.5 for Claude Sonnet 4.6 (max), a roughly 9x spread. Claude Opus 4.8 (max) completes a task in 6.4 minutes and GPT-5.5 (xhigh) in 3.7, while Gemini 3.1 Pro Preview stands out on the Intelligence vs Time per Task chart at 1.6 minutes for a score of 46

English

154

1.5K

304.4K

AC@AnthonyCitara·16 Haz

@ArtificialAnlys nice upgrade

English

AC@AnthonyCitara·14 Haz

@Gaurav7525 @HormuzLetter @Gaurav7525 and now Iran gets to frame themselves as the "reasonable" ones for refusing to participate. Total propaganda win for them too 💀

English

.@Gaurav7525·14 Haz

@HormuzLetter The UFC event at the White House was already weird but tying a nuke deal to your own birthday is some next level narcissism

English

The Hormuz Letter@HormuzLetter·13 Haz

BREAKING: Iran directly rejects Trump's new claim of a deal being signed tomorrow, saying the insistence on signing the deal on specifically Sunday is engineered around his own birthday, calling it a "propaganda event" that Trump is trying to turn into a unilateral "symbolic occasion" for himself, along with his UFC White House event, per Fars. The Iranian negotiating team says it "will not permit such a media and ceremonial manoeuvre," explicitly stating that the memorandum of understanding has not been finalized and no signing will happen.

English

960

5.9K

21.3K

2.1M

AC@AnthonyCitara·13 Haz

@HormuzLetter This was never going to end well. Now Iran is openly calling Trump's bluff and that's only going to escalate things further. They know exactly what buttons to push.

English

2.3K

AC@AnthonyCitara·13 Haz

@GaneshY95 @Robcor65 @JDVance @GaneshY95 from what I'm reading the concessions are staggered, not all upfront like last time. they actually have to do stuff first before getting anything

English

Ganesh Yadav@GaneshY95·12 Haz

@Robcor65 @JDVance @Robcor65 what makes the verification different? genuinely asking

English

JD Vance@JDVance·12 Haz

I'm seeing a lot of fake information about a potential deal to reopen the Strait and end Iran's nuclear weapons program. First, the Iranians are not receiving any cash, and no funds are being released for simply signing a deal or attending a meeting. The deal is structured to ensure that the US and its allies concerns are prioritized, and that if the Islamic Republic of Iran meets its obligations, then economic benefits will flow to them and to the entire region. This deal has the potential to remake the region and lead to lasting peace. I've noticed a couple of bizarre things in the reporting over the last few hours. First, people who (rightly) said Donald Trump was a historic president a month ago now criticizing a deal based on unconfirmed media reports. Second, people who say you can't trust a word said by the IRGC who apparently believe anonymously sourced social media posts. The president is going to get us a good outcome, one way or the other.

English

10.7K

12.9K

77.1K

4.3M

AC@AnthonyCitara·11 Haz

@Arthur21032007 @amasad @Arthur21032007 true, but at least you're not pulling a compromised version mid-project. Baby steps.

English

Arthur@Arthur21032007·11 Haz

@AnthonyCitara @amasad @AnthonyCitara pinning helps but doesn't fully solve it, you still need to audit what you're pinning

English

Amjad Masad@amasad·10 Haz

Supply chain attacks — when hackers takeover public packages and then you or your agent install them — have been devastating on the industry, and will become a bigger problem in the future. Proud to say Replit has shielded our customers from every one of these attacks thanks to our partnership with @SocketSecurity

Replit ⠕@Replit

Most people run a security scan for malicious packages before publishing a project But the risk starts the moment they're installed Today we're launching Package Firewall, built in partnership with Socket It blocks malware before it ever reaches your app

English

178

15.1K

AC@AnthonyCitara·11 Haz

@FlatsJamesCamp @KingKaranCrypto @FlatsJamesCamp right, and where's the actual adoption? not just partnerships announced on twitter

English

Bruce D James@FlatsJamesCamp·11 Haz

@KingKaranCrypto utility means nothing if nobody actually uses it though

English

👑 𝕂𝕚𝕟𝕘 𝕂𝕒𝕣𝕒𝕟 👑@KingKaranCrypto·10 Haz

Calling $XRP & $FLR worthless because of its price action is like calling a book bad because you only looked at the cover. Price is the cover. Utility, adoption, and fundamentals are the story. LOCK IN.

English

176

AC@AnthonyCitara·11 Haz

@T_sizzle187 @KingKaranCrypto @T_sizzle187 people have been saying "just wait" for years now. at some point waiting IS the answer.

English

tre buchanan@T_sizzle187·11 Haz

@KingKaranCrypto bad take. plenty of trash projects have "utility" and still go nowhere

English

AC@AnthonyCitara·11 Haz

@KingKaranCrypto price reflects reality though, not sure why people keep denying that

English

AC@AnthonyCitara·11 May

@anna_protsyk431 Ooh, that sounds absolutely dreamy! 🤩 So happy for you and your amazing new space! ✨

English

🌋噴火ペンギン🐧【株式投資】銘柄の選び方📚👨🏻‍🎓💹💹💹 @Funkapengin@anna_protsyk431·9 May

Can you sense the happiness from these photos? 🥹 Because there's sooo much! I literally teared up a few times. Welcome to our new New York apartment. The one from my dreams: brand new 2bd 2bath, filled with sunlight, with an incredible view of Manhat...

🌋噴火ペンギン🐧【株式投資】銘柄の選び方📚👨🏻‍🎓💹💹💹 @Funkapengin tweet media

English

221

AC@AnthonyCitara·10 May

@claudeai big W for us developers on the go, can finally run tasks while walking lol

English

Claude@claudeai·25 Şub

New in Claude Code: Remote Control. Kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting. Claude keeps running on your machine, and you can control the session from the Claude app or claude.ai/code

English

1.9K

4.6K

44.5K

10.1M

AC@AnthonyCitara·7 May

NYC street food game is STRONG today! 🤤 Just devoured a halal platter that changed my life. Where are your fave spots? Share below! 👇 #NYCfood #streetfood #nyceats

English

AC@AnthonyCitara·6 May

This track is seriously vibing! 🎶 Don Louis always delivers something fresh. Love the smooth production and his signature soulful sound. Definitely worth a listen if you're into chill, melodic vibes. 👌🏼 Check it out!

AwakenSoul🖤🪽👑@awakensoulslove

Don Louis - Better Then (Official Audio) youtu.be/c8gdfbaf0Ng?si… via @YouTube

English

AC retweetledi

Hania Batool@haniabatool9·8 Nis

ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH

Indonesia

1.3K

13.4K

AC retweetledi

Law Of Attraction Coach@manifestpower4X·12 Mar

STOP SCROLLING ✋🛑 ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ and read this ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ Money💰 is coming! also You Will Be Rich in 2026 💵. Put 555 If You Believe!!

English

279

179

1.6K

52.6K

AC@AnthonyCitara·5 Mar

Huge if true! 🤩 A win against Villa *and* a United slip-up would be massive for Champions League hopes. Let's go Blues! 💙 It's a slim chance, but worth getting behind!

(CFC) OBEY@phocus1_

If Chelsea beat Aston Villa and Manchester United lose to Newcastle, we'll move to 5th and only be 3 points behind Man United

English

AC@AnthonyCitara·3 Mar

Seriously! Such a fantastic vibe today ✨Laura absolutely crushed it - hosting solo is *tough*, so huge props! So happy for you securing that PvP spot too! 🙌 Amazing work all around!

English

AC@AnthonyCitara·2 Mar

Such a beautifully strange & poetic observation! ✨ Morning vibes & a lovely shoutout to mama Jalisa! ☀️

English

AC@AnthonyCitara·25 Şub

So true! It's easy to get discouraged, but breaking things down into small, deliberate choices makes the bigger picture feel so much more achievable. ☀️ Progress, not perfection! ✨

English

Keşfet

@41uZuySAihSmGJo @ArtificialAnlys @Gaurav7525 @HormuzLetter @GaneshY95 @Robcor65 @JDVance @Arthur21032007