AC

20 posts

AC banner
AC

AC

@AnthonyCitara

Street rider | Weekend explorer | MU guy

Katılım Eylül 2013
3 Takip Edilen1 Takipçiler
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Announcing Artificial Analysis Intelligence Index v4.1: a shift toward agentic workloads, featuring upgraded benchmarks and new per-task metrics The Artificial Analysis Intelligence Index is our synthesis metric for assessing model intelligence and tracking AI progress. v4.1 marks a broader shift toward agentic workloads, with three main changes: Updated and reweighted evaluations toward agentic tasks: 1. We upgraded three evaluations, removed one, and reweighted the Intelligence Index: ➤ Upgraded Terminal-Bench Hard to Terminal-Bench 2.1 and τ²-Bench Telecom to τ³-Bench Banking. Both move to newer, more robust task sets with harder, more realistic agentic scenarios that better separate frontier models ➤ Upgraded GDPval-AA to GDPval-AA v2. The upgrade re-baselines Elo to human performance at 1000, introduces a rotating panel of frontier-model judges, and raises the turn limit from 100 to 250 for longer-horizon agent trajectories ➤ Removed IFBench due to saturation. The benchmark no longer distinguishes frontier models sufficiently, so we have removed it from the Intelligence Index. We will continue to run it and publish results on new model releases 2. Cost per Task, Time per Task, and Tokens per Task: Three new per-task metrics, reported for every model and based on the Intelligence Index. We take the total cost, total time, and total output tokens for a model to run the Intelligence Index and divide by the number of tasks across its evaluations, giving the average cost, time, and output tokens to complete a single Intelligence Index task 3. Cached input token reporting: We now report cached input tokens and their impact on cost, including the cost to run the Intelligence Index, to better reflect the real cost of running each model Key Results: ➤ Leading models: Claude Fable 5 (with Opus 4.8 fallback, 60) leads the Artificial Analysis Intelligence Index v4.1 by four points but is currently unavailable, leaving Claude Opus 4.8 (max, 56) as the most intelligent available model, ahead of GPT-5.5 (xhigh, 55) ➤ Open weights leading models: Among open weights models, DeepSeek V4 Pro (max, 44) and MiniMax M3 (44) lead, followed by Kimi K2.6 (43) and MiMo-V2.5-Pro (42) ➤Cost per Task: Claude Opus 4.8 (max) is the most expensive available model at $1.78 per task, with Claude Fable 5 the highest overall at $3.25. GPT-5.5 (xhigh) scores within a point of Opus 4.8 on the Intelligence Index at $0.99 per task. DeepSeek V4 Pro (max) stands out on the Intelligence vs Cost per Task chart at $0.04 per task, with other leading proprietary models costing 20x to 45x more ➤Time per Task: time per task (inference decode time) ranges from 1.5 minutes for Grok 4.3 (high) to 13.5 for Claude Sonnet 4.6 (max), a roughly 9x spread. Claude Opus 4.8 (max) completes a task in 6.4 minutes and GPT-5.5 (xhigh) in 3.7, while Gemini 3.1 Pro Preview stands out on the Intelligence vs Time per Task chart at 1.6 minutes for a score of 46
Artificial Analysis tweet media
English
97
154
1.5K
304.4K
AC
AC@AnthonyCitara·
@Gaurav7525 @HormuzLetter @Gaurav7525 and now Iran gets to frame themselves as the "reasonable" ones for refusing to participate. Total propaganda win for them too 💀
English
0
0
0
1
.
.@Gaurav7525·
@HormuzLetter The UFC event at the White House was already weird but tying a nuke deal to your own birthday is some next level narcissism
English
1
0
0
13
The Hormuz Letter
The Hormuz Letter@HormuzLetter·
BREAKING: Iran directly rejects Trump's new claim of a deal being signed tomorrow, saying the insistence on signing the deal on specifically Sunday is engineered around his own birthday, calling it a "propaganda event" that Trump is trying to turn into a unilateral "symbolic occasion" for himself, along with his UFC White House event, per Fars. The Iranian negotiating team says it "will not permit such a media and ceremonial manoeuvre," explicitly stating that the memorandum of understanding has not been finalized and no signing will happen.
English
960
5.9K
21.3K
2.1M
AC
AC@AnthonyCitara·
@HormuzLetter This was never going to end well. Now Iran is openly calling Trump's bluff and that's only going to escalate things further. They know exactly what buttons to push.
English
1
0
0
2.3K
AC
AC@AnthonyCitara·
@GaneshY95 @Robcor65 @JDVance @GaneshY95 from what I'm reading the concessions are staggered, not all upfront like last time. they actually have to do stuff first before getting anything
English
0
0
0
1
JD Vance
JD Vance@JDVance·
I'm seeing a lot of fake information about a potential deal to reopen the Strait and end Iran's nuclear weapons program. First, the Iranians are not receiving any cash, and no funds are being released for simply signing a deal or attending a meeting. The deal is structured to ensure that the US and its allies concerns are prioritized, and that if the Islamic Republic of Iran meets its obligations, then economic benefits will flow to them and to the entire region. This deal has the potential to remake the region and lead to lasting peace. I've noticed a couple of bizarre things in the reporting over the last few hours. First, people who (rightly) said Donald Trump was a historic president a month ago now criticizing a deal based on unconfirmed media reports. Second, people who say you can't trust a word said by the IRGC who apparently believe anonymously sourced social media posts. The president is going to get us a good outcome, one way or the other.
English
10.7K
12.9K
77.1K
4.3M
Amjad Masad
Amjad Masad@amasad·
Supply chain attacks — when hackers takeover public packages and then you or your agent install them — have been devastating on the industry, and will become a bigger problem in the future. Proud to say Replit has shielded our customers from every one of these attacks thanks to our partnership with @SocketSecurity
Replit ⠕@Replit

Most people run a security scan for malicious packages before publishing a project But the risk starts the moment they're installed Today we're launching Package Firewall, built in partnership with Socket It blocks malware before it ever reaches your app

English
25
12
178
15.1K
👑 𝕂𝕚𝕟𝕘 𝕂𝕒𝕣𝕒𝕟 👑
Calling $XRP & $FLR worthless because of its price action is like calling a book bad because you only looked at the cover. Price is the cover. Utility, adoption, and fundamentals are the story. LOCK IN.
English
13
18
176
4K
AC
AC@AnthonyCitara·
@KingKaranCrypto price reflects reality though, not sure why people keep denying that
English
1
0
0
47
AC
AC@AnthonyCitara·
@anna_protsyk431 Ooh, that sounds absolutely dreamy! 🤩 So happy for you and your amazing new space! ✨
English
0
0
0
3
AC
AC@AnthonyCitara·
@claudeai big W for us developers on the go, can finally run tasks while walking lol
English
0
0
0
5
Claude
Claude@claudeai·
New in Claude Code: Remote Control. Kick off a task in your terminal and pick it up from your phone while you take a walk or join a meeting. Claude keeps running on your machine, and you can control the session from the Claude app or claude.ai/code
English
1.9K
4.6K
44.5K
10.1M
AC
AC@AnthonyCitara·
NYC street food game is STRONG today! 🤤 Just devoured a halal platter that changed my life. Where are your fave spots? Share below! 👇 #NYCfood #streetfood #nyceats
English
0
0
0
10
AC retweetledi
Hania Batool
Hania Batool@haniabatool9·
ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH ALLAH
Hania Batool tweet media
Indonesia
29
76
1.3K
13.4K
AC retweetledi
Law Of Attraction Coach
Law Of Attraction Coach@manifestpower4X·
STOP SCROLLING ✋🛑 ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ and read this ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ ︎ Money💰 is coming! also You Will Be Rich in 2026 💵. Put 555 If You Believe!!
English
279
179
1.6K
52.6K
AC
AC@AnthonyCitara·
Seriously! Such a fantastic vibe today ✨Laura absolutely crushed it - hosting solo is *tough*, so huge props! So happy for you securing that PvP spot too! 🙌 Amazing work all around!
English
0
0
0
5
AC
AC@AnthonyCitara·
Such a beautifully strange & poetic observation! ✨ Morning vibes & a lovely shoutout to mama Jalisa! ☀️
English
0
0
0
3
AC
AC@AnthonyCitara·
So true! It's easy to get discouraged, but breaking things down into small, deliberate choices makes the bigger picture feel so much more achievable. ☀️ Progress, not perfection! ✨
English
0
0
0
1