Simon Maple

23.4K posts

Simon Maple

@sjmaple

Founding DevRel @tessl_io. Java Champion, @virtualJUG founder. Previously VP DevRel @snyksec, ZeroTurnaround, @IBM, LJC co-leader.

Basingstoke Katılım Mart 2009

959 Takip Edilen15.2K Takipçiler

Simon Maple retweetledi

AI Native Dev@ainativedev·34m

Your eval leaderboard can change completely depending on which model grades the answers. Simon Maple ( @sjmaple ) reran the same benchmark suite using three different LLM judges: Sonnet, GPT-5.5, and Opus-4-7. Nothing else changed. The tasks, rubrics, scenarios, and model outputs were identical. The scores were not. One model moved by 47 points on a single skill depending on the judge. gpt-5.3 ranked near the top under Sonnet, then dropped sharply under GPT-5.5. Opus consistently scored itself higher than the other judges did. GPT-5.5 turned out to be dramatically stricter overall, averaging almost 7 points lower than Sonnet across the benchmark. What makes this especially interesting is that the instability wasn’t evenly distributed. Tasks with concrete pass/fail conditions stayed relatively consistent across judges. But as soon as the rubric involved interpretation, structure, writing quality, or “best practices”, the variance widened fast. Two judges could look at the exact same output and disagree by double digits on whether the model had actually solved the task properly or just approximated it convincingly. That has pretty major implications for how people read benchmark charts right now. A lot of public evals are presented as if the number is objective, when in reality the scoring model itself is shaping the outcome. In some cases, the judge preference is large enough to reorder the leaderboard entirely. The interesting exception was Opus. It stayed in first place regardless of which model acted as the judge. Everything below it shifted around. Read the full breakdown here: tessl.io/blog/your-benc…

English

Simon Maple retweetledi

AI Native Dev@ainativedev·2d

The problem is not that AI has not been trained... the problem is that AI has been trained A LOT. @venkat_s shares his thoughts on "Accelerated Inference". Watch the full episode at youtu.be/C0OeWkbhiL8 or wherever you get your podcasts. #AI #agenticcoding #claudecode #codex #AIskills

YouTube

English

107

Simon Maple retweetledi

AI Native Dev@ainativedev·2d

We audit packages. We audit infra. We don’t audit the instructions shaping our AI agents. That’s the problem tessl-audit is tackling. In a new post, Simon Maple ( @sjmaple ) introduces an open-source CLI that scans the skills and plugins loaded into your agent’s context for security findings, quality issues, and actual task uplift. Because agent skills are not just “extra context”. They directly influence how models reason, which patterns they follow, and what they decide to do. The workflow is intentionally simple: npx tessl-audit The tool reads your tessl.json, fetches registry data for installed skills, and produces a posture report showing risky plugins, weak guidance, and skills that have never been evaluated against real tasks. What makes this interesting is how familiar the problem suddenly feels. AI agents are starting to develop their own dependency graphs, except the dependencies are prompts, policies, evals, and reasoning layers instead of libraries. And unlike broken code, bad context often fails quietly. The post also walks through optimizing low-quality skills, generating eval scenarios, and measuring whether a plugin genuinely improves agent performance before trusting it in production. Read the full post here: tessl.io/blog/stop-trus…

English

250

Simon Maple@sjmaple·2d

@lukeroyal1871 @ReadingFC I’d probably buy some tbh!

English

Luke@lukeroyal1871·3d

@ReadingFC Surprised Couhig didn’t cut up the turf and sell it tbh.

English

Reading FC@ReadingFC·3d

The first steps of our pitch renovation project are now underway at the Select Car Leasing Stadium. 🚧🌱 A major summer of investment continues… More updates to follow throughout the off-season.

English

338

90K

Simon Maple@sjmaple·8 May

@mj1531 @kmcnam1 Underrated comment!

English

Michael James@mj1531·6 May

@kmcnam1 Only comes in one size: XL

English

sudox@kmcnam1·6 May

Terrible fashion part 2

English

112

6.7K

Simon Maple@sjmaple·8 May

I’m honestly not sure I could sleep on a pillow that was untitled… this would keep my mind too active #bedSHEETS

Igal Tabachnik@hmemcpy

@kmcnam1

English

442

Simon Maple@sjmaple·8 May

@kmcnam1 I’d buy this 😂

English

Simon Maple@sjmaple·8 May

@hmemcpy @kmcnam1 BedSHEETS!

English

Igal Tabachnik@hmemcpy·7 May

@kmcnam1

QME

520

Simon Maple@sjmaple·7 May

@cynthiamcgillis Take a look at Tessl - package management for skills with with built in governance. tessl.io

English

175

Cynthia Bell McGillis@cynthiamcgillis·6 May

How are y'all handling company-wide skills? Putting them in a repo? Does that work for Cowork and less technical teams? I feel like there has to be a better way to organize these.

English

136

470

110.2K

Simon Maple@sjmaple·6 May

@ChrisH1871 @Stuwayne8 Expert level tips 😀

English

Christopher Hamblin@ChrisH1871·6 May

@sjmaple @Stuwayne8 Red stripe can be consumed outside as you can't take them in, rookie error 😉

English

Simon Maple@sjmaple·6 May

Hey #readingfc fans - here's a spreadsheet to work out whether subscription or season ticket is best for you. Copy the spreadsheet for yourself, answer questions in column H, and you'll get some idea of overall costs over the season for each subscription option in row 26! I did this to help me, hope it helps others too :) Disclaimer, my sums may not add up. #gid=0" target="_blank" rel="nofollow noopener">docs.google.com/spreadsheets/d… May be of interest @TalkReading @ReadingFC @RFCCommunity @RFC_Analysis @ElmParkRoyals

English

146

54.1K

Simon Maple retweetledi

Rohan Sharma@rrs00179·6 May

Simon Maple’s ( @sjmaple ) benchmark (1,742 tests): 5.5 vs 5.4 → tied with skills (89.4 vs 89.3) $0.49 vs $0.30 → +63% for +0.1 Only win: speed (89s vs 135s) Read more here: dev.to/tessl/gpt-55-i…

English

384

Simon Maple@sjmaple·6 May

@ChrisH1871 @Stuwayne8 You get strange looks when you bring in a four pack of doombar though. Agree it’s the better way to do it though, apart from teas and beers 🍻 ☕️

English

Christopher Hamblin@ChrisH1871·6 May

@Stuwayne8 @sjmaple If you find a local good and drink shop and bring it with you, you'll save a shed load more. I get two bottles of Pepsi for 2.70, instead of 3 quid for one at the ground etc. High price and low quality at our place now

English

Simon Maple@sjmaple·6 May

@CentralJoe1 Got it! Thx

English

Simon Maple@sjmaple·6 May

@CentralJoe1 Hey, which cell?

English

136

Simon Maple@sjmaple·6 May

@Brownie1871 Oh, it went up? I’ll update to 12 for now

English

149

Andy@Brownie1871·6 May

@sjmaple As car park season ticket is £276 for the 23 league games, expect individual car park tickets to be at least £12, if not more.

English

182

Simon Maple@sjmaple·6 May

@Steve1871 Feels like it's the difference in loyalty points across the subscription products that most are pissed off about.

English

Simon Maple@sjmaple·6 May

@Steve1871 Agree, although I think this is consistent across all plans. I'm not sure anyone would not renew because of this, but maybe that's the case :)

English

165

Simon Maple@sjmaple·6 May

@Shepo77 You're welcome!

English

265

David Shepherd@Shepo77·6 May

@sjmaple Thank you - this is great 👍👌

English

274

Simon Maple@sjmaple·6 May

@Stuwayne8 Yeh, I’m thinking the same. I figure we’ll just use the same discount from the elite, and use cores pretty much as regular season tickets.

English

Madstad1871@Stuwayne8·6 May

@sjmaple I've just renewed and went with 3 core and 1 elite as we drive to every game and do get food and drink etc so on the that basis it was worth it for us as a family. We miss at most 1 home game a season too.

English

Simon Maple@sjmaple·6 May

@RoyalstilIdie @JmlJourno There's always room for a spreadsheet! :)

English

190

Loyal Royal@RoyalstilIdie·6 May

@sjmaple @JmlJourno I love this - great job! But modern football eh, now we need a spreadsheet! 🤣⚽️👍 #LoveIt

English

219

Keşfet

@venkat_s @lukeroyal1871 @ReadingFC @mj1531 @kmcnam1 @hmemcpy @cynthiamcgillis @ChrisH1871