Quesma

99 posts

Quesma

@QuesmaOrg

Make AI agents production-ready through independent evaluation and training.

Katılım Ocak 2024

14 Takip Edilen125 Takipçiler

Sabitlenmiş Tweet

Quesma@QuesmaOrg·26 Oca

Recently we built OTelBench – a benchmark to test how well LLMs handle OpenTelemetry instrumentation. We tested 14 models. The best (Claude Opus 4.5) hit only 29%. These weren't trick questions, just small subset of typical SRE tasks. Link here: quesma.com/blog/introduci…

English

907

Quesma retweetledi

Piotr Migdal@pmigdal·27 Şub

AI + Ghidra by NSA = reverse-engineering fun I am speaking at @AITinkerers Warsaw, 4th Mar 2026. One of my favorite event series - by and for the creators community. Vibe-resurrecting an old game from binaries 👾 and vibe-hardware-ing a LED backpack 🎒🌈.

English

229

Quesma retweetledi

Piotr Migdal@pmigdal·10 Şub

Claude can code, but can it read machine code? We gave AI agents access to Ghidra (a decompiler by the NSA) and tasked them with finding hidden backdoors in servers - working solely from binaries, without any access to source code. See our BinaryAudit: quesma.com/blog/introduci…

English

181

1.5K

231K

Quesma retweetledi

Ryan Marten@ryanmart3n·25 Oca

Great to see the community releasing benchmarks in @harborframework now. These are invaluable resources for collectively building the most useful agents.

Jacek Migdal@jakozaur

@ryanmart3n Last week @QuesmaOrg released “terminal-bench-sre-part-1” called OTelBench in Harbor. Another releasing coming soon. Maybe even next week.

English

1.7K

Quesma retweetledi

Jacek Migdal@jakozaur·9 Oca

I used to cite Gartner, now I quote @GergelyOrosz and his Pragmatic Engineer. Enjoy our new blog post: quesma.com/blog/prompts-s…

English

215

Quesma@QuesmaOrg·24 Kas

Finally, an AI that can draw a map without getting lost. Nano Banana Pro uses tools to create factually correct infographics - and it's a game-changer. quesma.com/blog/nano-bana…

English

232

Quesma retweetledi

Jacek Migdal@jakozaur·5 Kas

Postmortems are painful to write, especially this one. Sharing my startup Quesma journey so far. quesma.com/blog/database-…

English

Quesma@QuesmaOrg·24 Eki

Interesting use case for AWS Lambda that we explored: sandboxing AI-generated code. We tried WebAssembly first but hit the wall. So, we scrapped our experiment for AWS Lambda with Docker containers in an isolated VPC. Full writeup from @pmigdal: awsfundamentals.com/blog/sandboxin…

Tobias Schmidt@tpschmidt_

Lambda has tons of use cases, but one I've missed: using it as some kind of sandbox for running AI-generated code. Lambda's isolation and scaling are a solid fit for this problem.

English

160

Quesma retweetledi

AISecHub@AISecHub·22 Eki

The security paradox of local LLMs - quesma.com/blog/local-llm… by @jakozaur at @QuesmaOrg If you’re running a local LLM for privacy and security, you need to read this. Our research on gpt-oss-20b (for OpenAI’s Red‑Teaming Challenge) shows they are much more prone to being tricked than frontier models. When attackers prompt them to include vulnerabilities, local models comply with up to 95% success rate. These local models are smaller and less capable of recognizing when someone is trying to trick them. #AISecurity #LLMSecurity #LocalLLM #GenAI #MLOps #ModelRisk #DataPrivacy #AIPrivacy #PromptInjection #AIThreats #AIGovernance #EdgeAI

English

327

Quesma@QuesmaOrg·18 Eyl

See the full ranking and every run (logs, commands, binaries), methodology & code: ▶️ compilebench.com 💻 github.com/QuesmaOrg/Comp… 📃 quesma.com/blog/introduci…

English

Quesma@QuesmaOrg·18 Eyl

Cost-efficiency crown: @OpenAI. Across difficulties, OpenAI models dominate the Pareto frontier of cost. GPT-5-mini (high reasoning) is a great price/perf pick; GPT-4.1 is the fastest with solid wins.

English

129

Quesma@QuesmaOrg·18 Eyl

Can AI compile 22-year-old code? We built CompileBench to find out. We know that LLMs can vibe-code or even win IOI, but what about dependency hell or legacy build systems? (image based on XKCD 2347)

English

184

Quesma@QuesmaOrg·17 Eyl

Our blog post is second on Hacker News. Enjoy!

English

2.9K

Quesma@QuesmaOrg·22 May

Our new blog post about Apache Ice erg limitations: quesma.com/blog-detail/ap…

English

143

Quesma@QuesmaOrg·9 May

quesma.com/blog-detail/hi…

ZXX

Quesma@QuesmaOrg·9 May

At #IcebergSummit 2025, Ryan Blue unveiled Iceberg beyond Java, plus the path to Table Spec V3 & forward to V4. Przemysław Delewski’s new blog covers Fokko Driesprong on Pylceberg, Matt Topol on Go, Julien Le Dem on modular DBs. Essential read for next-gen data platforms. Link👇

English

189

Quesma@QuesmaOrg·8 May

Iceberg Table V3 is coming: dremio.com/blog/apache-ic…

English

110

Quesma retweetledi

Piotr Migdal@pmigdal·24 Nis

Everything is better when Kawaii 🌸🌸🌸: Titanic survival rates with freshly-released Quesma Charts. app.charts.quesma.com/s/20bvqu At @DataCouncilAI conference in Oakland with Jacek Migdał. #dataViz @QuesmaOrg @jakozaur

English

308

Keşfet

@AITinkerers @harborframework @GergelyOrosz @pmigdal @jakozaur @OpenAI @DataCouncilAI @elonmusk