Simon Maple

23.4K posts

Simon Maple banner
Simon Maple

Simon Maple

@sjmaple

Founding DevRel @tessl_io. Java Champion, @virtualJUG founder. Previously VP DevRel @snyksec, ZeroTurnaround, @IBM, LJC co-leader.

Basingstoke Katılım Mart 2009
959 Takip Edilen15.2K Takipçiler
Simon Maple retweetledi
AI Native Dev
AI Native Dev@ainativedev·
Your eval leaderboard can change completely depending on which model grades the answers. Simon Maple ( @sjmaple ) reran the same benchmark suite using three different LLM judges: Sonnet, GPT-5.5, and Opus-4-7. Nothing else changed. The tasks, rubrics, scenarios, and model outputs were identical. The scores were not. One model moved by 47 points on a single skill depending on the judge. gpt-5.3 ranked near the top under Sonnet, then dropped sharply under GPT-5.5. Opus consistently scored itself higher than the other judges did. GPT-5.5 turned out to be dramatically stricter overall, averaging almost 7 points lower than Sonnet across the benchmark. What makes this especially interesting is that the instability wasn’t evenly distributed. Tasks with concrete pass/fail conditions stayed relatively consistent across judges. But as soon as the rubric involved interpretation, structure, writing quality, or “best practices”, the variance widened fast. Two judges could look at the exact same output and disagree by double digits on whether the model had actually solved the task properly or just approximated it convincingly. That has pretty major implications for how people read benchmark charts right now. A lot of public evals are presented as if the number is objective, when in reality the scoring model itself is shaping the outcome. In some cases, the judge preference is large enough to reorder the leaderboard entirely. The interesting exception was Opus. It stayed in first place regardless of which model acted as the judge. Everything below it shifted around. Read the full breakdown here: tessl.io/blog/your-benc…
English
0
1
2
64
Simon Maple retweetledi
AI Native Dev
AI Native Dev@ainativedev·
We audit packages. We audit infra. We don’t audit the instructions shaping our AI agents. That’s the problem tessl-audit is tackling. In a new post, Simon Maple ( @sjmaple ) introduces an open-source CLI that scans the skills and plugins loaded into your agent’s context for security findings, quality issues, and actual task uplift. Because agent skills are not just “extra context”. They directly influence how models reason, which patterns they follow, and what they decide to do. The workflow is intentionally simple: npx tessl-audit The tool reads your tessl.json, fetches registry data for installed skills, and produces a posture report showing risky plugins, weak guidance, and skills that have never been evaluated against real tasks. What makes this interesting is how familiar the problem suddenly feels. AI agents are starting to develop their own dependency graphs, except the dependencies are prompts, policies, evals, and reasoning layers instead of libraries. And unlike broken code, bad context often fails quietly. The post also walks through optimizing low-quality skills, generating eval scenarios, and measuring whether a plugin genuinely improves agent performance before trusting it in production. Read the full post here: tessl.io/blog/stop-trus…
English
0
1
5
250
Luke
Luke@lukeroyal1871·
@ReadingFC Surprised Couhig didn’t cut up the turf and sell it tbh.
English
1
0
0
1K
Reading FC
Reading FC@ReadingFC·
The first steps of our pitch renovation project are now underway at the Select Car Leasing Stadium. 🚧🌱 A major summer of investment continues… More updates to follow throughout the off-season.
English
23
17
338
90K
sudox
sudox@kmcnam1·
Terrible fashion part 2
sudox tweet media
English
29
6
112
6.7K
Cynthia Bell McGillis
Cynthia Bell McGillis@cynthiamcgillis·
How are y'all handling company-wide skills? Putting them in a repo? Does that work for Cowork and less technical teams? I feel like there has to be a better way to organize these.
English
136
8
470
110.2K
Simon Maple
Simon Maple@sjmaple·
Hey #readingfc fans - here's a spreadsheet to work out whether subscription or season ticket is best for you. Copy the spreadsheet for yourself, answer questions in column H, and you'll get some idea of overall costs over the season for each subscription option in row 26! I did this to help me, hope it helps others too :) Disclaimer, my sums may not add up. #gid=0" target="_blank" rel="nofollow noopener">docs.google.com/spreadsheets/d… May be of interest @TalkReading @ReadingFC @RFCCommunity @RFC_Analysis @ElmParkRoyals
Simon Maple tweet media
English
24
22
146
54.1K
Simon Maple retweetledi
Rohan Sharma
Rohan Sharma@rrs00179·
Simon Maple’s ( @sjmaple ) benchmark (1,742 tests): 5.5 vs 5.4 → tied with skills (89.4 vs 89.3) $0.49 vs $0.30 → +63% for +0.1 Only win: speed (89s vs 135s) Read more here: dev.to/tessl/gpt-55-i…
English
1
3
4
384
Simon Maple
Simon Maple@sjmaple·
@ChrisH1871 @Stuwayne8 You get strange looks when you bring in a four pack of doombar though. Agree it’s the better way to do it though, apart from teas and beers 🍻 ☕️
English
1
0
0
18
Christopher Hamblin
Christopher Hamblin@ChrisH1871·
@Stuwayne8 @sjmaple If you find a local good and drink shop and bring it with you, you'll save a shed load more. I get two bottles of Pepsi for 2.70, instead of 3 quid for one at the ground etc. High price and low quality at our place now
English
1
0
1
14
Andy
Andy@Brownie1871·
@sjmaple As car park season ticket is £276 for the 23 league games, expect individual car park tickets to be at least £12, if not more.
English
1
0
1
182
Simon Maple
Simon Maple@sjmaple·
@Steve1871 Feels like it's the difference in loyalty points across the subscription products that most are pissed off about.
English
0
0
1
40
Simon Maple
Simon Maple@sjmaple·
@Steve1871 Agree, although I think this is consistent across all plans. I'm not sure anyone would not renew because of this, but maybe that's the case :)
English
1
0
0
165
Simon Maple
Simon Maple@sjmaple·
@Stuwayne8 Yeh, I’m thinking the same. I figure we’ll just use the same discount from the elite, and use cores pretty much as regular season tickets.
English
0
0
1
21
Madstad1871
Madstad1871@Stuwayne8·
@sjmaple I've just renewed and went with 3 core and 1 elite as we drive to every game and do get food and drink etc so on the that basis it was worth it for us as a family. We miss at most 1 home game a season too.
English
2
0
1
47