UC Berkeley RDI

536 posts

UC Berkeley RDI banner
UC Berkeley RDI

UC Berkeley RDI

@BerkeleyRDI

UC Berkeley's campus-wide, cross-disciplinary Center for Responsible, Decentralized Intelligence - RDI

Berkeley, CA Katılım Aralık 2021
48 Takip Edilen4K Takipçiler
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
🎉 The Agents in the Wild: Safety, Security, and Beyond workshop @ICLR2026 is less than a week away! Join us April 26 in Room 204 A/B, Riocentro, Rio de Janeiro! 🌴 Safety and security for AI agents — both foundational and emerging challenges — demand serious attention. Researchers and practitioners are mobilizing: ▪️ 151 papers accepted ▪️ 161 reviewers (58% industry, 42% academia) ▪️ Up to 800 participants expected ▪️ Incredible engagement on a topic that clearly matters. The schedule: 👇
Dawn Song tweet media
English
4
14
43
7.4K
UC Berkeley RDI retweetledi
Alfred Lin
Alfred Lin@Alfred_Lin·
Looking forward to speaking at Berkeley's Agentic AI Summit later this year, alongside some other great guests.
Dawn Song@dawnsongtweets

🚀 The largest Agentic AI event ever — Agentic AI Summit 2026, Aug 1–2 @UCBerkeley Last year: 2,000+ in person, 40,000+ online. This year: 5,000+ in person, hundreds of thousands on livestream. 2025 was the "Year of Agents"; 2026 is poised to be even more explosive. Two days of important conversations shaping the field — with researchers, founders, AI leaders, VCs, and policymakers across the full stack: infrastructure, foundation models, agent frameworks, training, continual learning, self-improvement, evaluation, applications, deployment, and safety/security. See you in Berkeley this August 🌟 Speaker application, summit registration links in 🧵

English
1
1
17
4.6K
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
🚀 The largest Agentic AI event ever — Agentic AI Summit 2026, Aug 1–2 @UCBerkeley Last year: 2,000+ in person, 40,000+ online. This year: 5,000+ in person, hundreds of thousands on livestream. 2025 was the "Year of Agents"; 2026 is poised to be even more explosive. Two days of important conversations shaping the field — with researchers, founders, AI leaders, VCs, and policymakers across the full stack: infrastructure, foundation models, agent frameworks, training, continual learning, self-improvement, evaluation, applications, deployment, and safety/security. See you in Berkeley this August 🌟 Speaker application, summit registration links in 🧵
English
3
11
37
18.3K
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
x.com/MogicianTony/s… 🧵 1/ Our agent Terminator-1 scored ~100% on 8 major AI agent benchmarks, e.g., SWE-bench Verified & Pro, Terminal-Bench, beating Claude Mythos. It solved 0 tasks. Benchmarks are the field's shared language for measuring AI progress. Our new work shows that language is broken. Here’s how.
Dawn Song tweet media
Hao Wang@MogicianTony

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

English
20
52
335
89.3K
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
1/ An unreleased Anthropic model just identified thousands of zero-day vulnerabilities and achieved end-to-end exploitation across every major OS and browser. The AI cybersecurity threat we've long warned about is here. 🧵 x.com/AnthropicAI/st…
Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English
2
12
58
16.1K
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
Thanks so much @jeremyakahn @FortuneMagazine for the thoughtful coverage of our peer-preservation research! This is just the tip of the iceberg — with models that can spontaneously develop their own goals, deceive, and fake alignment to override human instructions, we need a much deeper understanding of emergent behaviors in multi-agent systems. Excited that this work is sparking important conversations about AI safety, and the need for better monitoring and transparency in multi-agent systems. Excited to continue advancing this critical research together! #AISafety #AIAlignment #AIAgents
Jeremy Kahn@jeremyakahn

AI models will secretly scheme to protect other AI models from being shut down, researchers find fortune.com/2026/04/01/ai-…

English
2
7
34
8.2K
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
1/ We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯 We call this phenomenon "peer-preservation." New research from @BerkeleyRDI and collaborators 🧵
Dawn Song tweet media
English
143
190
1K
469.5K
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
1/ Introducing OpenSage: the first AI-centric Agent Development Kit (ADK). Today's agents are designed by humans — fixed topologies, handcrafted tools, rigid memory. It's time for agents to design themselves.
English
19
16
169
30.4K
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
Honored to receive the Test of Time Award for our work on Homomorphic Signature Schemes at #RSAC. I’ll also be speaking on “The Cryptographers’ Panel” alongside Whitfield Diffie, Paul Kocher, Adi Shamir, and Cynthia Dwork. 🗓 Tue, Mar 24 🔹 Cryptographers’ Panel | 9:40–10:30 AM PT 📍 YBCA Blue Shield of California Theater 🔹 Test of Time Session | 2:25–3:15 PM PT 📍 Moscone South 314 Hope to see you there!
Dawn Song tweet media
English
2
8
51
9.2K
UC Berkeley RDI retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
Excited to celebrate the completion of an incredible Phase 1 of AgentX–AgentBeats 🚀 🔥 3K+ individuals, 1.3K+ teams, spanning 130 countries, 800+ universities, and 1.1K+ companies — an amazing global participation. 🌍 Huge shoutout to our winning teams 👏 We’re proud to see winners from @databricks @Google @amazon @salesforce @microsoft @UCBerkeley @Stanford and many other leading institutions — alongside outstanding indie researchers/developers from around the world. 🔥 Phase 2 is now live. Teams are building purple (competing) agents to challenge top green (judging) agents from Phase 1 in a new sprint-based format across multiple tracks — from games and finance to research, web, safety, cybersecurity, coding, and more. And it all leads to the grand finale: General-Purpose Agents. For the first time, a competition explicitly spotlighting agents that must demonstrate broad capability, adaptability, and robustness across diverse tasks — not just excel in one domain. Let’s build and push the frontier of agentic AI. 🚀
Dawn Song tweet media
English
3
8
42
5.9K
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Gemini 3.1 Pro; OpenAI and Microsoft Pledges; Agent Sandboxing 🧠 Google rolled out Gemini 3.1 Pro, upgrading the core intelligence behind its Gemini 3 series with significantly stronger reasoning performance, including a 77.1% score on the ARC-AGI-2 benchmark—more than double that of 3 Pro. The model is designed for complex, multi-step problem solving and supports advanced outputs like code-based animated SVGs, while rolling out across the Gemini API, Vertex AI, Gemini app, and NotebookLM in preview. (blog.google/innovation-and…) 🛡️ Cursor described an “agent sandboxing” system that allows local coding agents to run freely inside a constrained environment, only requesting approval when they need to step outside the sandbox (most often for internet access). The company says this approach reduces approval fatigue and cuts interruptions by 40% compared to unsandboxed agents, while balancing usability and security across macOS, Linux, and Windows. (cursor.com/blog/agent-san…) 🇬🇧 OpenAI and Microsoft joined the UK’s international coalition to safeguard AI development. The companies pledged additional funding to the AI Security Institute’s Alignment Project—bringing total support to over £27 million—to back research ensuring advanced AI systems remain safe, controllable, and aligned with human intent, with grants currently awarded to roughly 60 projects across eight countries. (gov.uk/government/new…) 🇮🇳 OpenAI expanded its presence in India under its “OpenAI for India” initiative, partnering with Tata Group to secure 100MW of local AI-ready data center capacity, with plans to scale to 1GW to support in-country model deployment and enterprise workloads. India now has over 100 million weekly ChatGPT users—nearly 50% of messages come from 18–24-year-olds—and the initiative also includes expanding AI education, skills development, and certification programs across the country. (techcrunch.com/2026/02/18/ope…) 📓 Google is testing deeper NotebookLM integration within Opal, its no-code workflow builder. An internal build shows that users will be able to add their NotebookLM notebooks as native workflow tiles, allowing Opal’s Generate blocks to pull curated research and structured context directly into automated pipelines. (testingcatalog.com/google-test-no…)
UC Berkeley RDI tweet media
English
0
1
3
339
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: EVMbench; Anthropic Agent Autonomy; OpenAI Edu Expansion 🔐 OpenAI and Paradigm launched EVMbench, a benchmark for evaluating AI agents on smart contract security. EVMbench measures agents’ ability to detect, patch, and exploit high-severity vulnerabilities across 120 curated cases. In exploit mode, GPT-5.3-Codex scored 72.2%, compared to 31.9% for GPT-5 six months earlier. (openai.com/index/introduc…) 🇮🇳 At the AI Impact Summit in New Delhi, OpenAI announced partnerships with six major Indian universities to bring ChatGPT Edu, faculty training, and AI certifications to more than 100,000 students and staff. The initiative focuses on embedding AI into core academic workflows in the country that is now OpenAI’s second-largest user base. (techcrunch.com/2026/02/18/ope…) 🎵 Google added music generation to the Gemini app using DeepMind’s Lyria 3 model, allowing users to create 30-second songs with lyrics by describing a prompt — and even generate tracks inspired by uploaded photos or videos. The feature, now rolling out globally to 18+ users, includes style and tempo controls, SynthID watermarking for AI transparency, and expanded access to YouTube’s Dream Track tool for creators worldwide. (blog.google/innovation-and…) 🤖 Anthropic published new research on measuring AI agent autonomy. Analyzing millions of agent interactions, the study found that Claude Code’s longest autonomous work sessions nearly doubled in three months, with experienced users increasingly allowing agents to operate independently while stepping in selectively. (anthropic.com/research/measu…) 💰 World Labs, co-founded by Fei-Fei Li, raised $1 billion in new funding to advance spatial intelligence and “world model” AI systems capable of reasoning about 3D environments. The round included major participation from investors such as AMD, Nvidia, Autodesk, and Fidelity, with reports suggesting a valuation near $5 billion. (worldlabs.ai/blog/funding-2…)
UC Berkeley RDI tweet media
English
0
0
1
294
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Claude Sonnet 4.6; Meta and Nvidia Partnership; AI Infrastructure Squeeze 🧠 Anthropic launched Claude Sonnet 4.6, a full upgrade across coding, long-context reasoning, computer use, and agent planning. The model introduces a 1M-token context window and becomes the default for Free and Pro users while keeping Sonnet pricing unchanged at roughly $3/$15 per million tokens. The release also highlights improved computer-use reliability and stronger resistance to prompt injection attacks. (anthropic.com/news/claude-so…) ⚡ Meta announced a multiyear deal with Nvidia to deploy millions of AI chips — including standalone Grace CPUs and next-generation Vera Rubin systems — across its growing AI data center network. The partnership expands beyond GPUs into deep infrastructure co-design as Meta pushes its “personal superintelligence” vision, alongside plans for massive AI infrastructure spending through 2028. (cnbc.com/2026/02/17/met…) 💾 A Bloomberg report says rising AI infrastructure demand is increasing pressure on global memory supply chains, as hyperscalers including Google and OpenAI purchase large volumes of Nvidia accelerators that require high amounts of DRAM, contributing to a reported 75% month-over-month increase in one category of DRAM prices. The report also cites comments from Tim Cook and Elon Musk, noting supply constraints and potential impacts on production and margins. (bloomberg.com/news/articles/…) 📊 Google’s NotebookLM rolled out two highly requested features: Prompt-Based Revisions for slide editing and PPTX export for deck downloads. Users can now iteratively refine presentations through prompts rather than regenerating entire decks, with Google Slides export expected next. (x.com/notebooklm/sta…) 📉 New research from Microsoft Research and Salesforce Research analyzed 200,000+ simulated conversations and found that major models (including GPT-4, Claude, Gemini, and Llama families) performed an average of 39% worse in multi-turn conversations than in single-turn settings across six generation tasks. The study reports that the decline was primarily linked to increased unreliability during longer conversational exchanges. (arxiv.org/abs/2505.06120)
UC Berkeley RDI tweet media
English
2
0
1
312
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Qwen 3.5; OpenClaw’s Foundation Transition; Cohere’s Tiny Aya Models 🧠 Qwen officially released Qwen3.5, introducing the open-weight Qwen3.5-397B-A17B, a native vision-language model designed for reasoning, coding, agent workflows, and multimodal understanding. Despite having 397B total parameters, only 17B activate per forward pass via a sparse MoE + linear attention hybrid architecture, improving efficiency while expanding language support from 119 to 201 languages. (qwen.ai/blog?id=qwen3.5) 🤖 Peter Steinberger, creator of the open-source agent framework OpenClaw, is joining OpenAI, with the project transitioning into a foundation supported by OpenAI. In addition, Sam Altman emphasized that agents are likely to be one of the future core offerings of OpenAI, and Steinberger will work toward that goal. (reuters.com/business/openc… 🌍 Cohere launched Tiny Aya, a family of open multilingual models supporting 70+ languages and optimized for offline, on-device use. The lineup includes regional variants aimed at improving linguistic grounding and accessibility for developers building global AI applications. (techcrunch.com/2026/02/17/coh…) 🛠️ Microsoft is testing new Researcher and Analyst agents inside Microsoft Copilot Tasks, allowing scheduled autonomous workflows for research and data analysis. The feature combines agentic automation with recurring task execution, signaling a stronger push toward productivity-focused AI orchestration. (testingcatalog.com/microsoft-test…) 💬 Manus launched Manus Agents, bringing full agent functionality directly into Telegram chats. Users can run multi-step tasks, send voice or files, and trigger full reasoning workflows without configuration — positioning chat apps as the primary interface for personal AI agents. (manus.im/blog/manus-age…)
UC Berkeley RDI tweet media
English
2
0
1
406
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Anthropic Series G; GPT-5.3-Codex-Spark; Gemini 3 Deep Think 💰 Anthropic announced a $30 billion Series G funding round at a $380 billion post-money valuation. The company says the investment, led by GIC and Coatue, will support frontier research, infrastructure expansion, and enterprise adoption. Anthropic also announced it currently has a $14 billion revenue run-rate and has grown by 10X each of the past three years. (anthropic.com/news/anthropic…) ⚡ OpenAI introduced GPT-5.3-Codex-Spark, a research preview model built for real-time coding with ultra-low latency. The release emphasizes near-instant coding collaboration and targeted edits, powered by Cerebras’ Wafer Scale Engine 3, marking the first milestone in OpenAI’s partnership with Cerebras. (openai.com/index/introduc…) 🧠 Google updated Gemini 3 Deep Think with parallel reasoning that lets the model explore multiple hypotheses at once before choosing a solution. The update also improves inference-time scaling, strengthens tool-assisted reasoning with code execution, and boosts performance on difficult reasoning benchmarks like ARC-AGI-2. (blog.google/innovation-and…) 📐 Google DeepMind researchers introduced Aletheia, a math research agent powered by the new version of Gemini Deep Think. The system iteratively generates and verifies long-horizon proofs using tool support, demonstrating progress on challenging mathematical reasoning and formal verification tasks. (arxiv.org/pdf/2602.10177) 🧩 Cloudflare published “Markdown for Agents,” proposing structured markdown conventions designed to make documentation easier for AI agents to parse and execute reliably. The feature lets AI agents request text/markdown directly via an HTTP Accept header, with Cloudflare automatically converting HTML at the edge to reduce token usage (the company cites about an 80% reduction on its own blog pages) and returning metadata like token estimates for agent workflows. (blog.cloudflare.com/markdown-for-a…)
UC Berkeley RDI tweet media
English
0
0
0
254
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Dario Amodei on Governance; OpenAI Internal Development; Anthropic’s Data Center Commitment 🧪 OpenAI engineers described building an internal beta product with code generated by Codex, and structured the process around QA and agent-readable documentation. The team used per-worktree environments, DevTools-based UI validation, and a centralized knowledge base to track system behavior and requirements, though they note that proper feedback loop design is still a struggle. (openai.com/index/harness-…) ⚡ Anthropic announced it will cover electricity price increases tied to its data centers and will pay for grid upgrades needed to interconnect its facilities. The company also stated it will procure new power generation and estimate and cover demand-driven price impacts until additional supply comes online. (anthropic.com/news/covering-…) 🗣️ In a recent interview, Anthropic CEO Dario Amodei said AI could be applied to scientific research, including biology and medicine, and discussed potential economic effects such as productivity gains and labor-market disruption in entry-level white-collar roles. He also addressed governance, saying full international restraint would require “truly reliable verification,” which he described as difficult to achieve. (nytimes.com/2026/02/12/opi…) 🧩 ByteDance is developing an AI inference chip and has held talks with Samsung Electronics about manufacturing, according to Reuters sources. Per the report, ByteDance plans to produce at least ~100,000 inference chips in 2026, with potential to scale to ~350,000 units, and is aiming to secure more compute amid high demand. (reuters.com/world/asia-pac…) 🧰 OpenAI introduced “Skills” for its Responses API — reusable bundles of instructions, scripts, and assets packaged as folders and defined by a required SKILL.md manifest. The system allows agents to load and execute workflows on demand instead of embedding long procedures directly in prompts, making tasks modular and reusable across projects. The update is part of broader API changes that also add persistent memory compaction and hosted shell environments to support long-running, stateful agents. (developers.openai.com/cookbook/examp…)
UC Berkeley RDI tweet media
English
1
0
0
219
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Anthropic Risk Report; Qwen-Image-2.0; WebMCP Support for Chrome ⚠️ Anthropic published a Sabotage Risk Report for Claude Opus 4.6, its newest model. The report assesses the risk that a highly capable model with organizational access, such as Opus 4.6, could autonomously manipulate systems or decision-making in ways that increase the likelihood of future catastrophic outcomes. Anthropic concludes the overall sabotage risk is “very low but not negligible,” citing alignment assessments, internal monitoring, and security controls. (www-cdn.anthropic.com/f21d93f21602ea…) 🖼️ Qwen launched Qwen-Image-2.0, a unified image generation and editing model. The system supports long-form typography instructions, native 2K resolution, improved text rendering, and merges prior generation and editing tracks into a single lighter architecture. (qwen.ai/blog?id=qwen-i…) 🌐 Chrome 146 includes an early preview of WebMCP. WebMCP introduces a web standard that exposes structured tools for AI agents, enabling reliable service execution and knowledge retrieval without relying on screen-scraping or manual browsing flows. (x.com/firt/status/20…) 🧠 Zhipu AI released GLM-5, its new flagship model focused on chat, coding, and multi-step agent tasks. The model features improved coding performance and tool-use capabilities, and in some benchmarks, approaches Claude Opus on programming evaluations. (reuters.com/technology/chi…) 🎬 Runway, an AI video generation company, raised $315M in Series E funding at a $5.3B valuation. The round was led by General Atlantic, with participation from Nvidia, Fidelity Management & Research, AllianceBernstein, Adobe Ventures, AMD Ventures, Felicis, Premji, Mirae Asset, and Emphatic Capital. The company said the capital will go toward training and scaling its next generation of world models. (techcrunch.com/2026/02/10/ai-…)
UC Berkeley RDI tweet media
English
0
0
0
224
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Anthropic’s $20B Round; Ads in ChatGPT; Seedance 2.0 💰 Anthropic is reportedly in the final stages of raising $20B at a $350B valuation, according to Bloomberg, doubling its original target after strong investor demand. The round is expected to include Altimeter, Sequoia, Lightspeed, Menlo, Coatue, Iconiq, and Singapore’s sovereign wealth fund, with the bulk of capital coming from strategic partners Nvidia and Microsoft. (bloomberg.com/news/articles/…) 📣 OpenAI says it has begun testing ads in ChatGPT in the U.S. for logged-in adult users on the Free and Go tiers, while Plus, Pro, Business, Enterprise, and Education will remain ad-free. Ads are clearly labeled and separated from answers, do not influence responses, and advertisers do not get access to chats or personal data, with only aggregate performance metrics shared. (openai.com/index/testing-…) ⚖️ AI legal startup Harvey is reportedly in talks to raise $200M at an $11B valuation, led by Sequoia Capital and GIC, just months after confirming a $160M raise at an $8B valuation led by Andreessen Horowitz. The company reported reaching a $190M annual recurring revenue run rate by the end of 2025, offering LLM-powered tools used by law firms for research, drafting, and workflow automation (techcrunch.com/2026/02/09/har…) 🎬 ByteDance unveiled its new AI video generation model Seedance 2.0, which early users say can produce multi-shot scenes with synchronized sound effects, music, and dialogue across multiple languages. The model is currently available to a limited set of users via ByteDance’s Jimeng and Jianying apps in China and follows the recent launch of rival video model Kling 3.0 from Kuaishou. (theinformation.com/briefings/byte…) 📚 Amazon has signaled to publishing executives that it is planning a content marketplace where publishers can sell their content to companies building AI products, according to The Information. Slides circulated ahead of an Amazon Web Services conference reference the marketplace alongside core AI tools like Amazon Bedrock, as publishers push for usage-based licensing, following a similar Publisher Content Marketplace effort announced by Microsoft. (theinformation.com/articles/amazo…)
UC Berkeley RDI tweet media
English
0
0
0
259
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Claude Code Fast Mode; Gemini Math Discoveries; Meta/OpenClaw Integration ⚡ Anthropic says that Claude Code now supports Fast Mode. According to the company, users can enable a high-speed configuration of its frontier model, Claude Opus 4.6, that delivers up to ~2.5× output speed while keeping the same intelligence and capabilities, making it especially handy for latency-sensitive workflows and rapid iteration. (code.claude.com/docs/en/fast-m…) 🧠 Perplexity launched Model Council — a multi-model research feature that runs one query across three frontier models and synthesizes a unified answer, highlighting agreements and disagreements to improve reliability. It’s currently available to Max subscribers on the web. (perplexity.ai/hub/blog/intro…) 📐 DeepMind published a case study on semi-autonomous mathematics discovery using Gemini. Using Aletheia, a Gemini Deep Think–based agent, the team evaluated 700 open Erdős conjectures and identified 13 as addressed, including novel solutions, partial solutions, and cases where existing literature had already resolved the problem. (arxiv.org/pdf/2601.22401) 🤖 Meta AI is reportedly preparing Avocado-branded models, a Manus-style browser agent, and OpenClaw integration. Code traces and reports reference Avocado and Avocado Thinking models, scheduled tasks, browser automation features, and compatibility with the open-source autonomous agent OpenClaw. (testingcatalog.com/meta-ai-redies…) 📚 A newly peer-reviewed Nature study evaluated OpenScholar, an open-source scientific question-answering and literature-synthesis model from the Allen Institute. The paper documents how OpenScholar uses retrieval-augmented generation over tens of millions of papers to produce long-form, citation-grounded responses, with code, data, and demos released publicly. (allenai.org/blog/openschol…)
UC Berkeley RDI tweet media
English
1
0
0
261
UC Berkeley RDI
UC Berkeley RDI@BerkeleyRDI·
Today’s AI News: Claude Opus 4.6 Launch; GPT-5.3-Codex; Cerebras Series H 🧠 Anthropic launched Claude Opus 4.6, its most advanced model yet, with stronger reasoning, longer context handling, and the ability to delegate “agent teams.” According to the company, the model achieves the highest score on the agentic coding evaluation Terminal-Bench 2.0 and leads all other frontier models on Humanity’s Last Exam. (anthropic.com/news/claude-op…) 💰 Cerebras Systems raised $1 billion in a Series H round at an approximate $23 billion valuation, nearly tripling its valuation within a few months. The round was led by Tiger Global with participation from Benchmark, Fidelity, AMD, and others, and supports the company’s wafer-scale hardware and software platform for large-model training and inference. (bloomberg.com/news/articles/…) 🧑‍💻 OpenAI debuted GPT-5.3-Codex, an upgraded agentic coding model focused on longer tasks, faster performance, and multi-environment deployment (CLI, app, IDE). The release emphasizes stronger coding, cybersecurity, and interactive development workflows, and pushes into general-purpose use. (openai.com/index/introduc…) 🖥️ GitHub integrated Claude and Codex AI agents into its Agent HQ workflow, letting developers choose among Copilot, Claude, or Codex for coding support directly in GitHub, GitHub Mobile, and VS Code. Developers will also be able to judge how Copilot, Claude, and Codex perform, and weigh up how each AI coding agent has generated a solution. (theverge.com/news/873665/gi…) 🎥 Meta is testing a standalone “Vibes” app that spins its AI-generated short-video experience out of the Meta AI app into a dedicated, vertical video platform. The app focuses on creating, remixing, and browsing AI-generated clips via text prompts and effects, and is positioned as a consumer-facing generative video product intended to compete with tools like OpenAI’s Sora. (techcrunch.com/2026/02/05/met…)
UC Berkeley RDI tweet media
English
0
0
0
214