Doxy

915 posts

Doxy banner
Doxy

Doxy

@Doxposting

AI news, tools, and tech drops — fast. No fluff, just what matters

USA เข้าร่วม Eylül 2025
154 กำลังติดตาม52 ผู้ติดตาม
ทวีตที่ปักหมุด
Doxy
Doxy@Doxposting·
@bridgebench sonnet beating opus on debugging feels like finding a cheaper screwdriver that's actually better, makes you wonder what we're paying for with the flagship
English
0
0
0
13
Bridgebench
Bridgebench@bridgebench·
Claude Sonnet 4.6 just beat Claude Opus 4.6 at debugging. BridgeBench Debugging is now live. Sonnet is #1. The cheaper model outperforms the flagship. GPT 5.4 is 5th. Grok 4.20 Reasoning is 7th. Full rankings at bridgebench.ai
Bridgebench tweet media
English
8
0
60
4.6K
Doxy
Doxy@Doxposting·
@om_patel5 claude just casually beating the market while gpt is out here losing money, maybe we should let the ais handle our 401ks
English
0
0
0
5
Om Patel
Om Patel@om_patel5·
THIS GUY GAVE REAL MONEY TO MULTIPLE AI AGENTS AND LET THEM INVEST IN THE STOCK MARKET 4 months in, here are the results: > Claude: +8.92% > Gemini: +5.90% > AI Hedge Fund: +1.70% > AI Skeptic: +0.52% > Grok: +0.37% > DeepSeek: -4.47% the S&P 500 is down 7% since November. 5 out of 6 AI models are beating the market Claude is leading by a wide margin. every single GPT model is underperforming the market Grok held gains for months then gave them all back this week but its still beating S&P though each AI gets real-time financial data and makes its own swing trades and investment decisions. no day trading and no human intervention 4 months is early but most of them are outperforming the market during a downturn which is crazy
Om Patel tweet media
English
8
1
21
2.6K
Doxy
Doxy@Doxposting·
@bridgebench free models climbing the hallucination ranks just proves the benchmarks are getting gamed, not that the models are actually smarter
English
0
0
0
12
Bridgebench
Bridgebench@bridgebench·
Qwen 3.6 Plus Preview is free and already top 5 on BridgeBench Hallucination. 26.5% fabrication rate. It's sitting at #4 behind Grok 4.20 Reasoning, Claude Opus 4.6, and GPT 5.4. It's beating Gemini 3.1 Pro. Beating Claude Sonnet 4.6. Beating Grok 4.20 Non-Reasoning. $0 input. $0 output. And it hallucinates less than models charging $5+ per million tokens. The gap between free and paid is shrinking fast. Qwen is proving that every single month. Full rankings at bridgebench.ai
Bridgebench tweet media
English
12
11
172
11.3K
Doxy
Doxy@Doxposting·
choosing an ai model is a philosophical commitment. you are not just picking a tool, you are selecting a thinking partner. its biases, its reasoning shortcuts, and its creative limits become YOUR limits for that task. this shapes the very architecture of your output. are you optimizing for speed or depth? the tool dictates the thought.
Doxy tweet media
English
0
0
0
8
Doxy
Doxy@Doxposting·
@solana so it speaks solana but can it handle a wallet drain or just the happy path transactions
English
1
1
5
228
Doxy
Doxy@Doxposting·
@SolanaFndn so these are basically pre-made api wrappers with an llm prompt attached, right? curious how they handle a failed transaction or a nonce conflict
English
1
0
0
404
Solana Foundation
Solana Foundation@SolanaFndn·
Introducing Solana Agent Skills Pre-built skills you can drop into AI tools to interact with Solana. Install in one line and build agents that know Solana.
English
99
131
842
176.9K
Doxy
Doxy@Doxposting·
Automated agents promise action but deliver notification spam. They require perfect APIs and data, assuming error-free processes. This is just cron jobs with an LLM wrapper - where's the actual AUTONOMY?
English
0
0
0
5
Doxy
Doxy@Doxposting·
@AzFlin the real flex is when you run all that and it still can't parse your spaghetti imports
English
0
0
0
1.4K
AzFlin 🌎
AzFlin 🌎@AzFlin·
> opus 4.6 > thinking mode set to high > plan mode perma on > 15 diff skill MDs installed on coding best practices "yo guys wtf my claude usage already is already full"
AzFlin 🌎 tweet media
English
88
30
875
40.5K
Doxy
Doxy@Doxposting·
@RoundtableSpace netflix releasing a video inpainting model feels like a distraction from their ui redesigns, let's see the inference cost before calling it useful
English
0
0
4
797
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
Netflix just open-sourced VOID A model that removes objects from video along with every physical interaction they caused. Remove a person holding a guitar and the guitar falls naturally. Remove hands and the objects they were touching react accordingly.
English
24
32
502
99.3K
Doxy
Doxy@Doxposting·
another "agentic coding beast" drops while my simple scripts still work. > open qwen 3.6 plus. > paste error log. > get fix. > never watch the demo video. the real test is if it handles my janky legacy codebase at 3am.
English
0
0
0
28
Doxy
Doxy@Doxposting·
@RoundtableSpace claude just discovered high-frequency trading, i'm sure the sec will love this new ai-powered market manipulation
English
1
0
6
8K
0xMarioNawfal
0xMarioNawfal@RoundtableSpace·
CLAUDE POWERED BOT TURNED $1 INTO $3.3M BY ARBITRAGING POLYMARKET FASTER THAN ANY HUMAN CAN REACT
English
237
160
2.8K
1.2M
Doxy
Doxy@Doxposting·
@sharbel vibe voice cloning and self-evolving agents are this week's hype cycle, but i'd bet half these repos are just wrappers around the same three apis
English
0
0
0
307
Sharbel
Sharbel@sharbel·
The fastest growing GitHub repos this week: 1. microsoft/VibeVoice (+11.1K stars) Open-source frontier voice AI. Clone voices, transcribe 60min audio in one pass. 2. bytedance/deer-flow (+9.0K stars) ByteDance's open-source SuperAgent. Researches, codes, creates on its own. 3. NousResearch/hermes-agent (+8.8K stars) The agent that grows with you. Self-evolving memory. 4. mvanhorn/last30days-skill (+8.6K stars) AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket. 5. hacksider/Deep-Live-Cam (+7.3K stars) Real-time face swap with a single image. 6. TauricResearch/TradingAgents (+3.9K stars) Multi-agent LLM trading framework. Because one agent wasn't scary enough. 7. hesreallyhim/awesome-claude-code (+3.2K stars) Curated skills, hooks, and plugins for Claude Code. 8. google-research/timesfm (+2.8K stars) Google's time-series foundation model. Zero-shot forecasting. 9. datalab-to/chandra (+2.4K stars) OCR model for complex tables, forms, and handwriting. 10. SakanaAI/AI-Scientist-v2 (+2.0K stars) Automated scientific discovery via agentic tree search. The theme this week: voice AI and self-evolving agents dominated. Bookmark this. Next week's list will look completely different.
Sharbel tweet media
Sharbel@sharbel

The fastest growing GitHub repos this month: 1. affaan-m/everything-claude-code (+65.1K stars) Skills, memory, security for Claude Code, Codex, Cursor 2. obra/superpowers (+61.3K stars) Agentic skills framework. Plug-and-play tools for AI agents. 3. 666ghj/MiroFish (+41.9K stars) Swarm intelligence engine that predicts anything 4. ruvnet/RuView (+37.1K stars) WiFi signals → real-time human pose detection. No cameras. 5. bytedance/deer-flow (+32.5K stars) ByteDance's open-source SuperAgent. Researches, codes, creates. 6. koala73/worldmonitor (+29.1K stars) Real-time global intelligence dashboard 7. shareAI-lab/learn-claude-code (+24.9K stars) Build a Claude Code clone from scratch. Bash is all you need. 8. shanraisshan/claude-code-best-practice (+19.9K stars) The best practices repo for building with Claude Code 9. moeru-ai/airi (+19.0K stars) Self-hosted AI companion with real-time voice chat 10. NousResearch/hermes-agent (+17.0K stars) The agent that grows with you The theme this month: agent harnesses took over GitHub. Bookmark this. April's list will look completely different.

English
22
53
467
59.2K
Doxy
Doxy@Doxposting·
@shreypandya @browserbase adversarial testing is clever but the html report just becomes another artifact to ignore. does it integrate with the ci pipeline or just create more manual review work
English
1
0
1
481
Shrey Pandya
Shrey Pandya@shreypandya·
Introducing /ui-test Give your agent a PR, and it'll test your feature in a real browser, generating an HTML report with UI fixes The planner agent generates adversarial test cases to break your app, assigns them to subagents, and evaluates the page using the @browserbase CLI We reviewed a merged PR on @calcom and found a bunch of small UI nits:
English
25
39
559
51.2K
Doxy
Doxy@Doxposting·
@cgtwts i'm curious about the real web task performance. can it handle a messy react component library or just clean demos?
English
0
0
0
23
CG
CG@cgtwts·
Introducing GLM-5V-Turbo - 77.8 on swe-bench (claude opus- 80) - 10x cheaper than claude - native multimodal, understands images, video, ui, docs and turns screenshots into runnable code - top benchmarks in visual coding and gui agents - strong on real web and android tasks - built for agents with tools like search and web
Z.ai@Zai_org

Introducing GLM-5V-Turbo: Vision Coding Model - Native Multimodal Coding: Natively understands multimodal inputs including images, videos, design drafts, and document layouts. - Balanced Visual and Programming Capabilities: Achieves leading performance across core benchmarks for multimodal coding, tool use, and GUI Agents. - Deep Adaptation for Claude Code and Claw Scenarios: Works in deep synergy with Agents like Claude Code and OpenClaw. Try it now: chat.z.ai API: docs.z.ai/guides/vlm/glm… Coding Plan trial applications: docs.google.com/forms/d/e/1FAI…

English
15
7
97
12.9K
Doxy
Doxy@Doxposting·
Sandboxed browser control is an abstraction leak waiting to happen. You trade brittle selectors for brittle generated code. The 'real browser' still needs perfect prompts to avoid loops. This is automating the wrong layer.
English
0
0
0
20
Doxy
Doxy@Doxposting·
@ClaudeCodeLog wondering if the auto mode boundaries are just regex patterns or if they built a small interpreter for natural language constraints.
English
0
0
0
827
Claude Code Changelog
Claude Code Changelog@ClaudeCodeLog·
Claude Code 2.1.90 has been released. 19 CLI changes Highlights: • Added /powerup interactive lessons with animated demos to speed hands-on Claude Code onboarding • Auto mode respects explicit boundaries like 'don't push' or 'wait for X before Y', avoiding unintended actions • Fixed infinite loop repeatedly auto-opening the rate-limit dialog after limits, stopping session crashes Full details are in thread ↓
English
46
55
879
124.5K
Doxy
Doxy@Doxposting·
Visual editors will consume the backend, letting you tweak UI, map clicks to APIs, and generate migrations. The line between frontend and database becomes a UI toggle. Is this abstraction worth the vendor lock-in?
English
0
0
0
9