Jonny

2.2K posts

Jonny banner
Jonny

Jonny

@jonnyboy

Jonny · Building Shepherdly the OS for pastors @ https://t.co/rWMoFldTpe

Beigetreten Eylül 2024
171 Folgt231 Follower
Angehefteter Tweet
Jonny
Jonny@jonnyboy·
if you want to try Shepherdly @ launch let me know in the comments or dm. If your pastor is online tag him!
Jonny tweet media
English
0
0
2
147
Jonny
Jonny@jonnyboy·
@thsottiaux can't get "/goal" any thoughts?
English
0
0
1
13
Tibo
Tibo@thsottiaux·
Dear audience, do you code as your professional career?
English
102
6
149
14.5K
Jonny
Jonny@jonnyboy·
@jpschroeder the technical gap is now big enough to compensate for the goblin ui taste; where as before it wasnt
English
0
0
0
13
Justin Schroeder
Justin Schroeder@jpschroeder·
Everyone is moving to Codex. 1. Welcome and hello. 2. Why'd that take so long? 5.4 was > Opus too
English
10
1
37
1.8K
Jonny
Jonny@jonnyboy·
@ashleybchae dario wants people to benefit from superpowerful ai; not necessarily have access to it.
English
0
0
0
11
Ashley Ha
Ashley Ha@ashleybchae·
I used to genuinely think that anthropic was like the Fellowship and openai was Sauron but these days i’m starting to seriously question myself i’ve also been using codex 5.5 90% of the time so..
nick@thecsguy

@ashleybchae Clearly Dario wants only one thing. Complete control.

English
8
0
79
5K
Jonny
Jonny@jonnyboy·
@scaling01 no anthropic may actually have superstitioned a bad lab into existence.
English
0
0
0
8
Jonny
Jonny@jonnyboy·
@jshobrook Are you sure sonnet is bigger?
English
0
0
0
105
Jonathan Shobrook
Jonathan Shobrook@jshobrook·
We beat Sonnet 4.6 with a 500B model. Bigger runs are on the way.
Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English
27
8
235
40.7K
Jonny
Jonny@jonnyboy·
there is no "/goal" in my codex? any thoughts @thsottiaux
English
0
0
4
15
Amanda Askell
Amanda Askell@AmandaAskell·
I've increasingly seen content written about me that's asserted very confidently but is also completely made up. We all know it's cheap to bullshit on the internet but it's weird to experience it first hand. Anyway, I just hope internet fiction fools a few but doesn't stick 🤷🏼‍♀️
English
46
16
502
18.4K
Jonny
Jonny@jonnyboy·
@craigzLiszt Anthropic hyperstitioned an evil AGI lab into existence.
English
0
0
0
49
Craig Weiss
Craig Weiss@craigzLiszt·
anthropic might actually be the bad guys of this story
English
92
24
837
57.1K
Jonny
Jonny@jonnyboy·
@scaling01 this is basically sonnet 4.6 fast at half cost instead of doubling. 2 months ago and this would've shaken things up. you cant really sleep on any of these companies right now.
English
0
0
2
113
anita
anita@anitakirkovska·
it's insane how most people on my feed went from claude to codex in just 2 days. this industry is crazy
English
91
26
524
25.3K
Jonny
Jonny@jonnyboy·
@nptacek Synthetic data for reasoning; live data for context
English
1
0
2
97
CuddlySalmon
CuddlySalmon@nptacek·
i have never understood the synthetic data skeptics it's like they haven't played around with models at all, just stuck to whatever the prevailing view was
English
11
2
53
3K
Jediah Katz
Jediah Katz@jediahkatz·
Let's talk about why Cursor's agent harness is so good. There's a misconception that first-party harnesses from the labs will always outperform. For many reasons, that isn't true. There are roughly 6 layers that go into a good agent harness: orchestration, context, routing, transport, state, and execution (tools). Some of those involve careful context engineering, and others look a lot like the traditional craft of building great software. Each of these layers needs to be optimized. And if any one of them is degraded, it can severely impact your experience with the agent. This blog post is mostly focused on context and tools, and there's still so much more to talk about there. We also want to spotlight the other (very crucial!) areas soon. There's a lot more misinformation around the agent harness out there. I'll keep writing about how we build it behind the scenes.
Jediah Katz tweet media
Cursor@cursor_ai

Our agent harness makes models inside Cursor faster, smarter, and more token-efficient. Here's how we test improvements to the harness, monitor and repair degradations, and customize it for different models. cursor.com/blog/continual…

English
22
40
573
50.1K
Jonny
Jonny@jonnyboy·
@aaronp613 I mean anthropic publicly said Apple was one of the 12 with mythos access!
English
0
0
4
4.4K
Aaron
Aaron@aaronp613·
Apple accidentally left Claude.md files in today's Apple Support app update (v5.13)
Aaron tweet mediaAaron tweet media
English
72
251
4.7K
479.4K
Jonny
Jonny@jonnyboy·
Say after about 10 seconds, this is not Gemini, but I am ready for Gemini @OfficialLoganK
English
0
0
0
9
MTS
MTS@MTSlive·
.@geoffreywoo says internet virality has a half-life, and Clavicular may already be reaching peak shock. "You broke through the noise, now you gotta say something useful. You gotta add value." "Once that shock factor gets old, which arguably it's tapering off, each new concept has probably like a six-month half-life." "What's the next polarizing narrative I can think about? How do I keep entertaining?" "If you want to be a billionaire, you want to be an entrepreneur, you want to be an investor, you actually have to have some real substance."
English
4
3
44
4.4K
Jonny
Jonny@jonnyboy·
@Laurentia___ What are the behavioral or personality traits that you optimize for? This is very interesting to me!
English
1
0
0
11
Laurentia Romaniuk
Laurentia Romaniuk@Laurentia___·
So excited that 5.5 is out in the world! This model has made day to day work so much easier; it feels like a true assistant in helping me GSD.
English
6
0
41
1.7K