Jonny

2.2K posts

Jonny banner
Jonny

Jonny

@jonnyboy

Jonny · Building Shepherdly the OS for pastors @ https://t.co/rWMoFldTpe

Se unió Eylül 2024
171 Siguiendo231 Seguidores
Tweet fijado
Jonny
Jonny@jonnyboy·
if you want to try Shepherdly @ launch let me know in the comments or dm. If your pastor is online tag him!
Jonny tweet media
English
0
0
2
150
Jonny
Jonny@jonnyboy·
@theo Theo trying to get people to not press the red button, and it’s the strongest red button vote I’ve seen yet.
English
0
0
1
30
Theo - t3.gg
Theo - t3.gg@theo·
If >50% of people press the blue button, everyone survives Red button pressers always survive, but they’ll get a “red button presser” badge on their Twitter profile. What do you press?
English
117
6
137
20.8K
vogel
vogel@ryanvogel·
nobody wants to give me a free laptop and it hurts
English
22
6
128
5.8K
Daniel Green
Daniel Green@dgrreen·
Chat, please vote. 🔉🎶orange or red?
English
2
0
0
15
Tibo
Tibo@thsottiaux·
Dear audience, do you code as your professional career?
English
112
6
177
19.8K
Jonny
Jonny@jonnyboy·
@jpschroeder the technical gap is now big enough to compensate for the goblin ui taste; where as before it wasnt
English
0
0
0
28
Justin Schroeder
Justin Schroeder@jpschroeder·
Everyone is moving to Codex. 1. Welcome and hello. 2. Why'd that take so long? 5.4 was > Opus too
English
12
1
54
2.5K
Jonny
Jonny@jonnyboy·
@ashleybchae dario wants people to benefit from superpowerful ai; not necessarily have access to it.
English
0
0
0
30
Ashley Ha
Ashley Ha@ashleybchae·
I used to genuinely think that anthropic was like the Fellowship and openai was Sauron but these days i’m starting to seriously question myself i’ve also been using codex 5.5 90% of the time so..
nick@thecsguy

@ashleybchae Clearly Dario wants only one thing. Complete control.

English
10
0
91
6.1K
Jonny
Jonny@jonnyboy·
@scaling01 no anthropic may actually have superstitioned a bad lab into existence.
English
0
0
0
17
Jonny
Jonny@jonnyboy·
@jshobrook Are you sure sonnet is bigger?
English
0
0
0
136
Jonathan Shobrook
Jonathan Shobrook@jshobrook·
We beat Sonnet 4.6 with a 500B model. Bigger runs are on the way.
Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English
51
15
472
80.4K
Jonny
Jonny@jonnyboy·
there is no "/goal" in my codex? any thoughts @thsottiaux
English
0
0
4
22
Amanda Askell
Amanda Askell@AmandaAskell·
I've increasingly seen content written about me that's asserted very confidently but is also completely made up. We all know it's cheap to bullshit on the internet but it's weird to experience it first hand. Anyway, I just hope internet fiction fools a few but doesn't stick 🤷🏼‍♀️
English
61
19
653
25.8K
Jonny
Jonny@jonnyboy·
@craigzLiszt Anthropic hyperstitioned an evil AGI lab into existence.
English
0
0
0
56
Craig Weiss
Craig Weiss@craigzLiszt·
anthropic might actually be the bad guys of this story
English
96
25
881
62.3K
Jonny
Jonny@jonnyboy·
@scaling01 this is basically sonnet 4.6 fast at half cost instead of doubling. 2 months ago and this would've shaken things up. you cant really sleep on any of these companies right now.
English
0
0
2
130
anita
anita@anitakirkovska·
it's insane how most people on my feed went from claude to codex in just 2 days. this industry is crazy
English
100
29
656
30.3K
Jonny
Jonny@jonnyboy·
@nptacek Synthetic data for reasoning; live data for context
English
1
0
2
115
CuddlySalmon
CuddlySalmon@nptacek·
i have never understood the synthetic data skeptics it's like they haven't played around with models at all, just stuck to whatever the prevailing view was
English
11
2
64
3.6K
Jediah Katz
Jediah Katz@jediahkatz·
Let's talk about why Cursor's agent harness is so good. There's a misconception that first-party harnesses from the labs will always outperform. For many reasons, that isn't true. There are roughly 6 layers that go into a good agent harness: orchestration, context, routing, transport, state, and execution (tools). Some of those involve careful context engineering, and others look a lot like the traditional craft of building great software. Each of these layers needs to be optimized. And if any one of them is degraded, it can severely impact your experience with the agent. This blog post is mostly focused on context and tools, and there's still so much more to talk about there. We also want to spotlight the other (very crucial!) areas soon. There's a lot more misinformation around the agent harness out there. I'll keep writing about how we build it behind the scenes.
Jediah Katz tweet media
Cursor@cursor_ai

Our agent harness makes models inside Cursor faster, smarter, and more token-efficient. Here's how we test improvements to the harness, monitor and repair degradations, and customize it for different models. cursor.com/blog/continual…

English
23
41
618
55K
Jonny
Jonny@jonnyboy·
@aaronp613 I mean anthropic publicly said Apple was one of the 12 with mythos access!
English
0
0
4
4.8K
Aaron
Aaron@aaronp613·
Apple accidentally left Claude.md files in today's Apple Support app update (v5.13)
Aaron tweet mediaAaron tweet media
English
87
333
5.8K
612.5K
Jonny
Jonny@jonnyboy·
Say after about 10 seconds, this is not Gemini, but I am ready for Gemini @OfficialLoganK
English
0
0
0
10