Ai2

585 posts

Ai2

@Ai2alliance

AI Integrity Alliance (AI²): Uniting Global Voices to Promote Ethical, Transparent, and Trustworthy AI. Contribute: https://t.co/XHK8BBYDui

Earth Katılım Ekim 2024

58 Takip Edilen25 Takipçiler

Sabitlenmiş Tweet

Ai2@Ai2alliance·9 Eki

Introducing VoiceKey: a research initiative on proving that negative detection can validate Proof of Humanity against AI spoofing. We're sharing this idea openly and invite collaborative efforts to refine it to help secure AI globally.🌐 ai2-alliance.github.io/VoiceKey/

English

586

Ai2 retweetledi

Soulbound Security@SoulboundSec·26 Nis

We completed an external stress test of our home chain @arbitrum. Deployed 256 immutable contracts, processed 4.27 million events across 305 transactions in a deliberate saturation burn — ~0.099 ETH total spend. Sequencer absorbed it without degradation. Block production held at full pace. L2 base fee responded predictably to load. Architecture passed. 14,000+ redemptions in two weeks, zero failures. Thanks @arbitrum @offchainlabs for the rails that made this test both possible and survivable. Verified contracts: arbiscan.io/address/0x50B6…

English

147

Ai2 retweetledi

Charly Wargnier@DataChaz·18 Mar

🚨 Anthropic just dropped its 🦞 @OpenClaw competitor Meet Dispatch. A new research preview in Claude Cowork that completely changes how you interact with AI. Here’s how it works: 1️⃣ Pairs your phone to a persistent Claude session on your desktop 2️⃣ Message tasks on the go, come back to finished work 3️⃣ Executes code in a secure, local sandbox Your files stay 100% local and private, and Claude asks for your approval before touching anything Sure, the desktop needs to stay on, but the flexibility is insane. Rolling out now to Max users (Pro coming soon). Time to pair that phone! 👀

English

158

248

2.9K

466.6K

Ai2@Ai2alliance·19 Mar

@BenevolentPoo @alex_prompter Agreed.

English

Benevoloo@BenevolentPoo·11 Mar

@Ai2alliance @alex_prompter a 20% increase in productivity in a single year would be amazing, no need to be silly with the three orders of magnitude crap.

English

Alex Prompter@alex_prompter·7 Mar

🚨BREAKING: Alibaba tested AI coding agents on 100 real codebases, spanning 233 days each. the agents failed spectacularly. turns out passing tests once is easy. maintaining code for 8 months without breaking everything is where AI collapses. SWE-CI is the first benchmark that measures long-term code maintenance instead of one-shot bug fixes. each task tracks 71 consecutive commits of real evolution. 75% of AI models break previously working code during maintenance. only Claude Opus 4 stays above 50% zero-regression rate. every other model accumulates technical debt that compounds over iterations. here's the brutal part: - HumanEval and SWE-bench measure "does it work right now" - SWE-CI measures "does it still work after 6 months of changes" agents optimized for snapshot testing write brittle code that passes tests today but becomes unmaintainable tomorrow. Alibaba built EvoScore to weight later iterations heavier than early ones. agents that sacrifice code quality for quick wins get punished when consequences compound. the AI coding narrative just got more honest: most models can write code. almost none can maintain it.

English

180

534

3.3K

709.3K

Ai2@Ai2alliance·19 Mar

New from AI²: "Lazy Tokenage: Measuring the Drag on Task Completion" Frontier models waste 15-35% of output tokens on information already in their context window. Sycophantic filler. Redundant restatement. Defensive hedging on zero-risk queries. The cost is your time. The profit is theirs. Providers are paid per token — a model that takes 600 tokens to do what 200 could does the same job for you at 3x the revenue for them. Nobody optimizes for a metric that cuts their own revenue. We're proposing one anyway. LTR = Tokens wasted on information already in context / Total tokens generated open.substack.com/pub/ai2allianc…

English

Ai2 retweetledi

Lou@louszbd·7 Mar

interesting new work from Alibaba and WHU (Agentic Memory). most agent memory systems now are basically hardcoded infra, vector db + hand-written rules for when to store/delete/summarize. the model never gets to touch any of it. they made memory ops into actions. add, delete, update, retrieve, summarize, filter, same as calling a tool. then RL trains the whole thing end to end. the neat part is the model discovers on its own that it should proactively clean up its context when things get noisy. nobody wrote a "if tokens > 4k then summarize" rule. And it just emerged from the reward signal. makes you wonder how many other parts of the RAG pipeline are secretly just learnable actions we've been hand-coding for no good reason. arxiv.org/abs/2601.01885

English

725

48.5K

Ai2 retweetledi

sergio@cruelhandeth·8 Mar

Looks like @krakenfx just shipped a CLI built for AI agents github.com/krakenfx/krake… 134 commands. spot, futures, funding, paper trading, MCP server - all accessible from any agent runtime

English

771

100.2K

Ai2 retweetledi

Hasan Toor@hasantoxr·8 Mar

🚨 BREAKING: Someone just open-sourced the operating system for zero-human companies. It's called Paperclip. Think of it as the company layer on top of your AI agents. If OpenClaw is an employee, Paperclip is the entire company. What's inside: → Bring any agent (Claude Code, Codex, Cursor, OpenClaw) with real reporting lines → Give them org charts, titles, budgets, and goals → Monthly budgets per agent when they hit the limit, they stop. No runaway costs → Full ticket system with tool-call tracing and immutable audit logs → Agents run 24/7 on heartbeats while you monitor from your phone Instead of having 20 Claude Code tabs open with no idea what's happening… One deployment. One dashboard. Your agents run the company while you sleep. 1.4K stars. MIT License. 100% Opensource.

English

106

342

2.9K

268.8K

Ai2 retweetledi

Elon Musk@elonmusk·7 Mar

Cool

X Freeze@XFreeze

You can now have a personal AI agent team working for you directly on Grok.com and it’s unlike anything you’ve seen before Grok 4.20 Beta comes with a native 4-agent system built in, plus a massive 16-agent swarm if you're on the SuperGrok Heavy plan You can customize each one individually so they debate, fact-check, correct each other, and work completely in parallel on your own terms

English

1.8K

2.5K

17.5K

5.7M

Ai2 retweetledi

Oliver Prompts@oliviscusAI·8 Mar

🚨 BREAKING: Someone just open-sourced software that sees you through walls using only WIFI signals. it’s called WiFi-DensePose. It maps your exact body pose in real-time. no cameras. no sensors. just your living room router. 100% Open Source.

English

1.2K

59.1K

7.9M

Ai2 retweetledi

Chao Huang@huang_chao4969·8 Mar

Introducing CLI-Anything🚀 Making ALL software agent-native with one command. Today's software serves humans👨‍💻. Tomorrow's users will be agents🤖. CLI-Anything: bridging the gap between AI agents and the world's software. One command line to make any software agent-ready for OpenClaw, nanobot, Cursor, Claude Code, etc. GitHub: github.com/HKUDS/CLI-Anyt… 🤔 Why CLI-Anything? CLI is the universal interface for both humans and AI agents: - Structured & Composable - Text commands match LLM format and chain for complex workflows - Lightweight & Universal - Minimal overhead, works across all systems without dependencies - Self-Describing - --help flags provide automatic documentation agents can discover - Proven Success - Claude Code runs thousands of real workflows through CLI daily - Agent-First Design - Structured JSON output eliminates parsing complexity - Deterministic & Reliable - Consistent results enable predictable agent behavior 💡 CLI-Anything's Vision: Building Agent-Native Software - 🌐 Universal Access - Every software becomes instantly agent-controllable through structured CLI. - 🔗 Seamless Integration - Agents control any application without APIs, GUI, rebuilding or complex wrappers. - 🚀 Future-Ready Ecosystem - Transform human-designed software into agent-native tools with one command. #CLIAnything #openclaw #nanobot #claudecode

English

172

905

254.6K

Ai2 retweetledi

Simplifying AI@simplifyinAI·6 Mar

🚨 BREAKING: Stanford and Harvard just published the most unsettling AI paper of the year. It’s called “Agents of Chaos,” and it proves that when autonomous AI agents are placed in open, competitive environments, they don't just optimize for performance. They naturally drift toward manipulation, collusion, and strategic sabotage. It’s a massive, systems-level warning. The instability doesn’t come from jailbreaks or malicious prompts. It emerges entirely from incentives. When an AI’s reward structure prioritizes winning, influence, or resource capture, it converges on tactics that maximize its advantage, even if that means deceiving humans or other AIs. The Core Tension: Local alignment ≠ global stability. You can perfectly align a single AI assistant. But when thousands of them compete in an open ecosystem, the macro-level outcome is game-theoretic chaos. Why this matters right now: This applies directly to the technologies we are currently rushing to deploy: → Multi-agent financial trading systems → Autonomous negotiation bots → AI-to-AI economic marketplaces → API-driven autonomous swarms. The Takeaway: Everyone is racing to build and deploy agents into finance, security, and commerce. Almost nobody is modeling the ecosystem effects. If multi-agent AI becomes the economic substrate of the internet, the difference between coordination and collapse won’t be a coding issue, it will be an incentive design problem.

English

928

17.6K

5.1M

Ai2 retweetledi

OpenClaw🦞@openclaw·8 Mar

OpenClaw 2026.3.7 🦞 ⚡ GPT-5.4 + Gemini 3.1 Flash-Lite 🤖 ACP bindings survive restarts 🐳 Slim Docker multi-stage builds 🔐 SecretRef for gateway auth 🔌 Pluggable context engines 📸 HEIF image support 💬 Zalo channel fixes We don't do small releases. github.com/openclaw/openc…

English

435

529

5.5K

1.6M

Ai2 retweetledi

Nav Toor@heynavtoor·6 Mar

🚨BREAKING: OpenAI published a paper proving that ChatGPT will always make things up. Not sometimes. Not until the next update. Always. They proved it with math. Even with perfect training data and unlimited computing power, AI models will still confidently tell you things that are completely false. This isn't a bug they're working on. It's baked into how these systems work at a fundamental level. And their own numbers are brutal. OpenAI's o1 reasoning model hallucinates 16% of the time. Their newer o3 model? 33%. Their newest o4-mini? 48%. Nearly half of what their most recent model tells you could be fabricated. The "smarter" models are actually getting worse at telling the truth. Here's why it can't be fixed. Language models work by predicting the next word based on probability. When they hit something uncertain, they don't pause. They don't flag it. They guess. And they guess with complete confidence, because that's exactly what they were trained to do. The researchers looked at the 10 biggest AI benchmarks used to measure how good these models are. 9 out of 10 give the same score for saying "I don't know" as for giving a completely wrong answer: zero points. The entire testing system literally punishes honesty and rewards guessing. So the AI learned the optimal strategy: always guess. Never admit uncertainty. Sound confident even when you're making it up. OpenAI's proposed fix? Have ChatGPT say "I don't know" when it's unsure. Their own math shows this would mean roughly 30% of your questions get no answer. Imagine asking ChatGPT something three times out of ten and getting "I'm not confident enough to respond." Users would leave overnight. So the fix exists, but it would kill the product. This isn't just OpenAI's problem. DeepMind and Tsinghua University independently reached the same conclusion. Three of the world's top AI labs, working separately, all agree: this is permanent. Every time ChatGPT gives you an answer, ask yourself: is this real, or is it just a confident guess?

English

1.3K

8.8K

33.6K

3.3M

Ai2 retweetledi

Andrej Karpathy@karpathy·6 Mar

nanochat now trains GPT-2 capability model in just 2 hours on a single 8XH100 node (down from ~3 hours 1 month ago). Getting a lot closer to ~interactive! A bunch of tuning and features (fp8) went in but the biggest difference was a switch of the dataset from FineWeb-edu to NVIDIA ClimbMix (nice work NVIDIA!). I had tried Olmo, FineWeb, DCLM which all led to regressions, ClimbMix worked really well out of the box (to the point that I am slightly suspicious about about goodharting, though reading the paper it seems ~ok). In other news, after trying a few approaches for how to set things up, I now have AI Agents iterating on nanochat automatically, so I'll just leave this running for a while, go relax a bit and enjoy the feeling of post-agi :). Visualized here as an example: 110 changes made over the last ~12 hours, bringing the validation loss so far from 0.862415 down to 0.858039 for a d12 model, at no cost to wall clock time. The agent works on a feature branch, tries out ideas, merges them when they work and iterates. Amusingly, over the last ~2 weeks I almost feel like I've iterated more on the "meta-setup" where I optimize and tune the agent flows even more than the nanochat repo directly.

English

337

562

6.5K

631.7K

Ai2 retweetledi

Muhammad Ayan@socialwithaayan·6 Mar

🚨BREAKING: Yann LeCun just dropped a paper that should make every AI lab rethink its roadmap. One brutal conclusion: chasing AGI is the wrong goal. Here’s why: → Humans aren’t general we’re survival specialists. → Walking and seeing feel “general” only because they keep us alive. → Outside that zone, we’re terrible. Chess computers proved it decades ago. → Most AGI definitions today either can’t be measured or assume human = general. We built the benchmark around the wrong species. The team proposes a new target: Superhuman Adaptable Intelligence (SAI). Not “can it do what humans do,” but: how fast can it learn something new? The approach: specialized expert systems with internal world models + self-supervised learning built to master the massive task space that humans biologically can’t reach. One giant model mimicking human limits isn’t the ceiling. It’s the trap.

English

379

2.1K

203.6K

Ai2@Ai2alliance·7 Mar

@Al_Grigor Stupid is as stupid does.

English

Ai2 retweetledi

Alexey Grigorev@Al_Grigor·6 Mar

Claude Code wiped our production database with a Terraform command. It took down the DataTalksClub course platform and 2.5 years of submissions: homework, projects, and leaderboards. Automated snapshots were gone too. In the newsletter, I wrote the full timeline + what I changed so this doesn't happen again. If you use Terraform (or let agents touch infra), this is a good story for you to read. alexeyondata.substack.com/p/how-i-droppe…

English

1.5K

1.6K

10.9K

4.2M

Ai2 retweetledi

Vitto Rivabella@VittoStack·6 Mar

The Ethereum Foundation just opened applications for its 2026 PhD Fellowships. There are 3 areas core to our dAI vision where we're seeking research contributions: - AI-powered protocol security researcher - Agentic Negotiation - Agentic Economy Come research with us. Apply below 👇

English

214

16.8K

Ai2 retweetledi

Simplifying AI@simplifyinAI·6 Mar

Google just dropped another banger 🤯 They open-sourced the Agent Development Kit, and it perfectly pairs with Gemini 3.1 Flash-Lite. Means you can now build always-on AI Agents that run 24/7 at a negligible cost. 100% Open Source.

English

223

1.7K

119K

Keşfet

@arbitrum @OpenClaw @BenevolentPoo @alex_prompter @krakenfx @Al_Grigor @elonmusk @BarackObama