Opper (@opperai) - Twitter Profili | Zamantika Mersobahis Locabet

Opper@opperai·20h

Claude Cowork now works with 300+ models via Opper. Route through EU-hosted inference, add fallbacks, or swap to a cheaper model mid-session — same Cowork window, different routing under the hood. Setup takes 3 fields. Guide: opper.ai/blog/claude-co…

English

0

1

20

Opper@opperai·16 Nis

All the best agent frameworks can now run inference through Opper. Agents are just code that runs models. So they need what every production system needs: routing, observability, guardrails, fallbacks, and a model catalog that doesn't lock you in. • OpenClaw — the open-source personal agent running on millions of machines • pi — the terminal coding agent powering OpenClaw • Hermes by Nous Research — open-source agentic coding assistant • Vercel AI SDK — the de facto standard for AI in TypeScript apps • Continue.dev — the open-source coding assistant for VS Code and JetBrains • Cline — the autonomous coding agent built into VS Code • OpenCode — terminal-based AI coding for people who live in the shell One API key. 260+ models. EU-hosted. See our integrations page for more details: docs.opper.ai/overview/integ…

English

1

0

1

107

Opper@opperai·26 Mar

@okaris

GIF

QME

0

1

32

Opper retweetledi

ok@okaris·26 Mar

@opperai just launched this fun page where you can get any llm to debate on a question. i particularly love this one where most them are just plain wrong but none change their answer!

English

4

2

10

517

Opper retweetledi

Hacker News 300@betterhn300·24 Şub

“Car Wash” test with 53 models opper.ai/blog/car-wash-… (news.ycombinator.com/item?id=471281…)

English

0

2

213

Opper retweetledi

Felix@felix94123·23 Şub

Reran every model 10 times (via @opperai gateway). Same prompt, no system prompt, no cache. The results got worse. Of the 11 that passed once, only 5 held up. GPT-5: 7/10 GPT-5.1, GPT-5.2, Claude Sonnet 4.5, every Llama, every Mistral: 0/10

English

1

123

Opper@opperai·15 Eki

Today we are introducing our new Agent SDKs: opper.ai/blog/new-opper… We built these to offer a good starting point for building headless, reliable and extendable agents. SDKs are available for Python and Typescript and offers the following features: * Tool support (with MCP) * Hook system to extend the inner agent actions * Model interoperability with task completions * Observability and evaluations Only needs an Opper API key, which is available on our $10 free tier.

English

1

0

1

212

Opper retweetledi

Silicon Vikings (also on Blueskye and Threads)@siliconvikings·13 Eyl

Wk 37, 2025 in #eutech: 🇸🇪EcoDataCenter (€600M), 🇸🇪@opperai acquires @FinetuneDB_com, 🇱🇹Kashimi ($1.4M), 🇸🇪@saltfish_ai ($730K). By @tech_eu #NordicMade #cphftw #helyes #siliconfjord #sthlmtech #estotech #startinLatvia #LTstartups tech.eu/2025/09/12/mis…

English

0

2

6

285

Opper@opperai·11 Eyl

@tech_eu Thanks for the feature @tech_eu 🙏

English

0

16

Opper retweetledi

Tech.eu@tech_eu·11 Eyl

Opper AI acquires FinetuneDB for AI model tuning tech.eu/2025/09/11/opp…

English

1

3

5

480

Opper@opperai·13 Ağu

Join the conversation on Reddit about our GPT-OSS Benchmarks: How GPT-OSS-120B Performs in Real Tasks reddit.com/r/LocalLLaMA/c…

English

0

1

154

Opper@opperai·13 Ağu

Join the conversation on Reddit about our GPT-5 Benchmarks: How GPT-5, Mini, and Nano Perform in Real Tasks reddit.com/r/OpenAI/comme…

English

0

1

103

Opper retweetledi

Göran@gsandahl·4 Ağu

We at @opperai just published high level results and a leaderboard of task benchmarks for leading models Current leaderboard: Overall winner: xAI Grok 4 Grok 4 is the winner of agentic tasks (tied with o3) and normalization tasks. In the top 5 on all categories. Context usage: Claude Sonnet 4 This tests the models ability to correctly answer questions from supplied information. This tests "reading" context. Agent runtime: Open AI O3 and xAI Grok 4 This tests the models ability to plan, reflect and select appropriate actions to take. This tests "using" context. Normalization tasks: xAI Grok 4 This tests models ability to coherently produce output in a specific format from input. This basically tests "output" format consistency. SQL generation: Open AI GPT-4.1 This tests models ability to interact with a database with natural language goals. This tests a certain domain problem. Each category has around 30 tests of easy, medium and hard tasks. I think these evals mirrors the overall "vibes" of these models! What categories we should add? Coding? Multimodal? Drawing?

English

0

1

129

Opper@opperai·6 May

Implemented elegantly with evaluator + Pydantic + LLM calls, these metrics enable real-time scoring of LLM outputs. Link to blog: opper.ai/blog/reference…

English

0

117

Opper@opperai·6 May

✨ New blog post: Reference-Free LLM Evaluation with Opper SDKs ✨ In this blog post we introduce three lean evaluators that measure LLM outputs without gold references: ✅ Faithfulness: Catches hallucinations ✅ Groundedness: Verifies context loyalty ✅ Relevance: Measures question-answer alignment 1/2

English

1

182

Opper@opperai·28 Nis

Read more here: docs.opper.ai/capabilities/e…

English

0

70

Opper@opperai·28 Nis

✨ Introducing Custom Evaluations — Test Model Responses and Build Real Feedback Loops Today, we're introducing `opper.evaluate()` — flexible scaffolding for evaluating model responses, built right into our SDKs. Because no matter how clearly we describe a task, models are still probabilistic. You can't just trust the output. You have to test it. ✅ Support custom evaluators — code, eval frameworks, or LLM-as-a-judge. ✅ Automatically upload and track eval results on the platform — filter, observe, fix. ✅ Act on evaluation results directly inside your code — close the loop, not just measure it. Pricing: $0.50 per 1,000 metrics

English

1

0

110

Opper@opperai·28 Nis

✨ Introducing Custom Evaluations — Test Model Responses and Build Real Feedback Loops Today, we're introducing `opper.evaluate()` — flexible scaffolding for evaluating model responses, built right into our SDKs. Because no matter how clearly we describe a task, models are still probabilistic. You can't just trust the output. You have to test it. ✅ Support custom evaluators — code, eval frameworks, or LLM-as-a-judge. ✅ Automatically upload and track eval results on the platform — filter, observe, fix. ✅ Act on evaluation results directly inside your code — close the loop, not just measure it. Pricing: $0.50 per 1,000 metrics

English

0

60

Opper@opperai·28 Nis

✨ Introducing Custom Evaluations — Test Model Responses and Build Real Feedback Loops Today, we're introducing `opper.evaluate()` — flexible scaffolding for evaluating model responses, built right into our SDKs. Because no matter how clearly we describe a task, models are still probabilistic. You can't just trust the output. You have to test it. ✅ Support custom evaluators — code, eval frameworks, or LLM-as-a-judge. ✅ Automatically upload and track eval results on the platform — filter, observe, fix. ✅ Act on evaluation results directly inside your code — close the loop, not just measure it. Pricing: $0.50 per 1,000 metrics

English

0

47

Opper@opperai·17 Nis

See the full list of models and their prices at: docs.opper.ai/capabilities/m…

English

0

42

Opper@opperai·17 Nis

✨ New models! ✨ This week we have added GPT 4.1, 4.1 mini and 4.1 nano from OpenAI. These models are optimised for coding and API usage. We have also added two new reasoning models from OpenAI: o3 and o4-mini. Additionally, we have added XAIs Grok 3 and Grok 3-mini. As always, these models can be evaluated on a task level basis in Opper.

English

3

0

104

Opper

Keşfet