
BlockchainGirl
9.3K posts

BlockchainGirl
@BlockchainGirll
Obsessed with all things #Blockchain.✨#Bitcoin and #AI Are you IN or OUT? 🇬🇷 #Web3







Can't go public or sell yourself? Try selling your codebase to an AI lab as training data! In this morning's AI Agenda, we get into this growing trend, as data curation firms like Turing and AfterQuery pick up failed startups' codebases. theinformation.com/articles/turin…





🔥 𝗘𝗻𝘁𝗲𝗿𝗽𝗿𝗶𝘀𝗲𝗢𝗽𝘀-𝗚𝘆𝗺 𝗶𝘀 𝘁𝗮𝗸𝗶𝗻𝗴 𝗼𝗳𝗳 𝗵𝘂𝗴𝗲: 2K downloads in 3 days (trending #6 dataset + #3 paper of the day) 🏆. So we re-ran the leaderboard on the 𝗹𝗮𝘁𝗲𝘀𝘁 𝗳𝗿𝗼𝗻𝘁𝗶𝗲𝗿 𝗰𝗹𝗼𝘀𝗲𝗱 𝗺𝗼𝗱𝗲𝗹𝘀… and the results were promising. ✅ Claude versions show a meaningful jump in reliability on enterprise tasks. ✅ Gemini 3.1 Pro is catching up fast, now much closer to Sonnet 4.6 than earlier releases. And yet, the bigger takeaway is still the same: - Big room for improvement on enterprise-grade agentic tasks. - These workflows punish "seemingly correct." One wrong default, one policy miss, one unintended side effect.. and the task fails. 📢 𝗖𝗮𝗹𝗹𝗼𝘂𝘁 (especially if you’re working on agents): As we prepare our next NeurIPS/COLM submissions, try your agents on EnterpriseOps-Gym and see how they hold up on realistic, policy-constrained, long-horizon tasks. 🌐 Website: enterpriseops-gym.github.io 🤗 Dataset: huggingface.co/datasets/Servi… @ServiceNowRSRCH , @sagardavasam , @turingcom , @turingcomdev , @Mila_Quebec , @shiva_malay @PShravannayak




Oscars week. Great time to talk about the first crypto project to win an Emmy. White Rabbit was crowdfunded on Ethereum, community-directed, and never pitched to a studio. @pplpleasr1 on building Shibuya. 0:00 Starting with a Fortune cover 1:49 White Rabbit and winning an Emmy 5:32 From Dickens to onchain storytelling 6:52 Why crypto unlocks capital formation 8:42 Lightning Round

🧵 Introducing 𝐄𝐧𝐭𝐞𝐫𝐩𝐫𝐢𝐬𝐞𝐎𝐩𝐬-𝐆𝐲𝐦🚀 : a rigorous new benchmark for stateful agentic planning and tool use in real enterprise environments. 1,150 expert-curated tasks · 512 tools · 164 DB tables · 8 domains. Every task verified by hand-written SQL, checking goal completion, state integrity and policy compliance🔥 𝐓𝐡𝐞 𝐡𝐞𝐚𝐝𝐥𝐢𝐧𝐞: Claude Opus 4.5 — our best-performing model succeeds on just 37.4% of tasks. With oracle tool access. No tool discovery required. 📄 arxiv.org/abs/2603.13594 (trending #4 on daily-papers) 🌐 enterpriseops-gym.github.io 🤗 huggingface.co/datasets/Servi… 💻 github.com/ServiceNow/Ent…