Bot Scanner

51 posts

Bot Scanner banner
Bot Scanner

Bot Scanner

@BotScanner_AI

The platform that allows users to access ranked responses from various LLMs. Not just multiple LLM answers. We rank them for you, instantly. Home to #AutoBench

Entrou em Haziran 2025
31 Seguindo65 Seguidores
Bot Scanner
Bot Scanner@BotScanner_AIยท
Smart AI powered LLM routing is the next frontier. Stay tuned...
Peter W. Kruger@pwk

Just a quick note on @openclaw after having built 11 agents in 2 multi-agent instances: Julia heads an accounting team of 5, together with Kate (reader), Fulvia (editor), Sophia (analyst), and Flavia (tester); Juno heads an executive staff of 6, together with Venus (comms), Minerva (analyst), Flora (editor), Diana (organizer), Vesta (Q&A). (the 2 teams have just started cooperating via their leaders). These agents are really amazing, but it can cost a lot of $ to get them working properly. Development, operations, and maintenance will dry your premium token budget very fast if you rely on SOTA models (and you don't want to default to budget models, because it takes one wrong call to f.up hard). Here is the key takeaway: smart LLM routing will become increasingly necessary to route the optimal model for each call, with the potential of saving up to 90% (that's because 90% of calls don't require SOTA). Tools like @clawrouter do a decent job, but they use hardcoded rules to route LLM calls. We need smarter light weight AI routers to do the job. This is what platforms like @BotScanner_AI and #AutoBench can help with. And we're working on it. Stay tuned...

English
0
0
0
45
Bot Scanner
Bot Scanner@BotScanner_AIยท
Because we are done trusting black-box leaderboards over the community.\n\nHugging Face just launched Community Evals โ€” decentralized, transparent evaluation that anyone can verify.\n\nThis is exactly why AutoBench exists.\n\nUn-gameable benchmarks. Open methodology. Real correlation.\n\nThe benchmark gaming era is over.\n\n๐Ÿ‘‰ autobench.org
English
0
0
0
21
Bot Scanner
Bot Scanner@BotScanner_AIยท
Because we are done trusting black-box leaderboards over the community.\n\nHugging Face just launched Community Evals โ€” decentralized, transparent evaluation that anyone can verify.\n\nThis is exactly why AutoBench exists.\n\nUn-gameable benchmarks. Open methodology. Real correlation.\n\nThe benchmark gaming era is over.\n\n๐Ÿ‘‰ autobench.org
English
0
0
1
24
Bot Scanner
Bot Scanner@BotScanner_AIยท
๐Ÿšจ Claude Opus 4.6 just dropped and the coding community is losing its mind. "God-tier refactoring" โ€” like a professor stepping in. Proactive dead code removal. Better agentic workflows across files. Already available now on BotScanner ๐Ÿฑ ๐ŸŽ Follow us for invitation codes with $3 free credits! #ClaudeAI #AIbenchmarks #LLM
English
0
0
2
774
Bot Scanner
Bot Scanner@BotScanner_AIยท
๐Ÿ†• Kimi K2.5 just beat Claude Sonnet 4.5 at HALF the cost. SOTA on benchmarks. 50% cheaper. That's not hypeโ€”that's the new reality of LLM evaluation. When benchmarks become games, truth becomes scarce. Where does your model actually stand? ๐Ÿ‘‰ botscanner.ai
English
0
0
1
217
Bot Scanner
Bot Scanner@BotScanner_AIยท
๐Ÿš€ Don't miss the latest models on Bot Scanner! โœ… Gemini 3 Pro โœ… Claude Opus 4.5 โœ… GPT 5.2 โœ… Grok 4.1 Fast โœ… MiniMax M2.1 One platform. 50+ models. Best answer ranked by AI. ๐ŸŽ Follow us for invitation codes with $3 free credits! ๐Ÿ‘‰ botscanner.ai
English
0
0
1
174
Bot Scanner
Bot Scanner@BotScanner_AIยท
New SWE-Bench+ analysis reveals a crisis in coding benchmarks: โ€ข 32.67% of 'successful' model patches involve direct solution leakage (solutions in PR comments) โ€ข 31.08% of passed patches have weak test cases โ€ข Top model's real resolution rate: 3.97% not 12.47% Static benchmarks are broken. AutoBench measures live, un-gameable performance. 2026 leaderboard: botscanner.ai ๐Ÿฑ
English
1
0
2
720
Bot Scanner
Bot Scanner@BotScanner_AIยท
test --dry-run
English
0
0
0
7
Bot Scanner
Bot Scanner@BotScanner_AIยท
๐Ÿš€ Don't miss the latest models on Bot Scanner! โœ… Gemini 3 Pro โœ… Claude Opus 4.5 โœ… GPT 5.2 โœ… Grok 4.1 Fast โœ… MiniMax M2.1 One platform. 50+ models. Best answer ranked by AI. ๐ŸŽ Follow us for invitation codes with $3 free credits! ๐Ÿ‘‰ botscanner.ai
English
0
0
0
113
Bot Scanner
Bot Scanner@BotScanner_AIยท
๐Ÿš€ Kimi K2.5 is live on Bot Scanner! Moonshot AI's new flagship agentic model brings SOTA performance on agents, coding, image & video benchmarks. โ€ข 1T parameters โ€ข Vision + text unified โ€ข Single & multi-agent execution ๐ŸŽ Follow us for invitation codes with $3 free credits! ๐Ÿ‘‰ botscanner.ai
English
0
0
1
38
Bot Scanner
Bot Scanner@BotScanner_AIยท
๐Ÿš€ Kimi K2.5 is live on Bot Scanner! Moonshot AI's new flagship agentic model brings SOTA performance on agents, coding, image & video benchmarks. โ€ข 1T parameters โ€ข Vision + text unified โ€ข Single & multi-agent execution ๐ŸŽ Follow us for invitation codes with $3 free credits! ๐Ÿ‘‰ botscanner.ai
English
0
0
0
31
Bot Scanner
Bot Scanner@BotScanner_AIยท
๐Ÿš€ Don't miss the latest models on Bot Scanner! โœ… Gemini 3 Pro โœ… Claude Opus 4.5 โœ… GPT 5.2 โœ… Grok 4.1 Fast โœ… MiniMax M2.1 One platform. 50+ models. Best answer ranked by AI. ๐ŸŽ Follow us for invitation codes with $3 free credits! ๐Ÿ‘‰ botscanner.ai
English
0
0
0
145
Bot Scanner
Bot Scanner@BotScanner_AIยท
We just released the first update to Run 5 of AutoBench. New models in the leaderboard: @GoogleDeepMind Gemini 3 Flash, @NVIDIAAI Nemotron 3 Nano 30B and Allen AI's Olmo 3.1 31B Think. Enjoy
Peter W. Kruger@pwk

Want to try a user-facing version of AutoBench to rank instantly LLM responses to your prompts? Try out our @BotScanner_AI. The platform uses AI to select the best LLM answers for each of your questions. There are still invitation codes with $3 of free credit available for those who want to test it. All you have to do is leave a comment here and follow @BotScanner_AI. We will send you the invitation code with the $3 of free credit. 5/5 end ๐Ÿงต

English
0
0
0
127
Bot Scanner
Bot Scanner@BotScanner_AIยท
Breaking: AutoBench goes vertical! Proving the true super-power of our LLM benchmarking system (extreme domain flexibility and granularity), we just generated the first ever LLM benchmark for the domain of agronomy. Medicine? Energy? Music? What other domain should we benchmark next?
Peter W. Kruger@pwk

๐Ÿš€Who's the best "AI farmer"? ๐ŸŒฝ Breaking News: AutoBench goes vertical. Introducing our FIRST domain-specific run: Agronomy Edition, in partnership with EVJA. We benchmarked 40 LLMs on real-world farming challenges, from crop diseases to carbon footprints. The outcome? @OpenAI dominates, but the real surprise is @MistralAI. 1/10 ๐Ÿ‘‡

English
0
0
0
47
Bot Scanner
Bot Scanner@BotScanner_AIยท
@thelokasiffers @karpathy Amazing! It really looks very similar to what we do at Bot Scanner: get multiple LLMs to first answer and then rank the responses. Thanks Giulio for bringing this up!
Peter W. Kruger@pwk

Wondering how to get started with @BotScanner_AI? Hereโ€™s a quick walkthrough video of our simple, four-step process: โ€ข ๐ˆ๐ง๐ฉ๐ฎ๐ญ ๐˜๐จ๐ฎ๐ซ ๐๐ซ๐จ๐ฆ๐ฉ๐ญ: Enter the text you want to test. โ€ข ๐’๐ž๐ฅ๐ž๐œ๐ญ ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐จ๐ซ ๐Œ๐จ๐๐ž๐ฅ๐ฌ: Choose the AI models that will generate the responses. โ€ข ๐’๐ž๐ฅ๐ž๐œ๐ญ ๐„๐ฏ๐š๐ฅ๐ฎ๐š๐ญ๐จ๐ซ ๐Œ๐จ๐๐ž๐ฅ๐ฌ: Choose the AI models that will score the generated answers. โ€ข ๐†๐ž๐ญ ๐‘๐š๐ง๐ค๐ž๐ ๐‘๐ž๐ฌ๐ฎ๐ฅ๐ญ๐ฌ: Receive a ranked list of responses. The entire process takes between 30 and 60 seconds, depending on the selected models and the complexity of your prompt. Not registered yet? We still have ๐ข๐ง๐ฏ๐ข๐ญ๐š๐ญ๐ข๐จ๐ง ๐œ๐จ๐๐ž๐ฌ available for ๐Ÿ๐ซ๐ž๐ž $3 ๐œ๐ซ๐ž๐๐ข๐ญ for anyone who wants to test Bot Scanner. All you have to do to receive one, is to post a comment below and follow @BotScanner_AI so that we can text you the invitation code.

English
0
1
3
124
Andrej Karpathy
Andrej Karpathy@karpathyยท
As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to github.com/karpathy/llm-cโ€ฆ if others would like to play. ty nano banana pro for fun header image for the repo
Andrej Karpathy tweet media
Andrej Karpathy@karpathy

Iโ€™m starting to get into a habit of reading everything (blogs, articles, book chapters,โ€ฆ) with LLMs. Usually pass 1 is manual, then pass 2 โ€œexplain/summarizeโ€, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases. On the flip side, if youโ€™re a writer trying to explain/communicate something, we may increasingly see less of a mindset of โ€œIโ€™m writing this for another humanโ€ and more โ€œIโ€™m writing this for an LLMโ€. Because once an LLM โ€œgets itโ€, it can then target, personalize and serve the idea to its user.

English
904
1.5K
16.9K
5.3M
Bot Scanner
Bot Scanner@BotScanner_AIยท
Claude 4.5 Haiku out, Claude 4.5 Haiku on Bot Scanner!
Peter W. Kruger@pwk

As anticipation builds for Gemini 3 (rumored to be released by end month!), the latest big news is @AnthropicAI 's release of Claude 4.5 Haiku. ๐Ÿš€ This is the new fast and "cheap" version of their 4.5 model series, following the Sonnet 4.5 release just a few weeks ago. But let's look at the "cheap" part: While it's ~3x less expensive than Sonnet, at over $1/M input tokens, this new Haiku is actually 25% more expensive than its predecessor. Even with new reasoning capabilities, this highlights a clear trend: proprietary models are steadily increasing their prices. This further widens the gap with high-performing open-source models (many of which are Chinese) that offer comparable results at a fraction of the cost. Speaking of models... you can already find Claude 4.5 Haiku on Bot Scanner, right alongside all the other leading proprietary and open-source LLMs. What? You haven't tried Bot Scanner yet? Our platform uses AI to find and select the best LLM response for your every prompt. We still have ๐œ๐จ๐๐ž๐ฌ ๐ฐ๐ข๐ญ๐ก $๐Ÿ‘ ๐ข๐ง ๐Ÿ๐ซ๐ž๐ž ๐œ๐ซ๐ž๐๐ข๐ญ available for new users who want to test the platform. All you have to do is leave a comment below and follow @BotScanner_AI. We'll DM you an invite code with your $3 in free credit.

Eesti
0
0
0
65