Utopic e/λ

7.5K posts

Utopic e/λ

@UtopicDev

AI Designer and Builder. Technology to save the world. There Is No Planet B... The hyperlink guy 😉

Sumali Haziran 2023

5K Sinusundan323 Mga Tagasunod

Utopic e/λ@UtopicDev·10h

@alexinexxx why isn't the codex on the list?

English

109

alexine 🏴‍☠️@alexinexxx·12h

it never stops

English

2.4K

Utopic e/λ nag-retweet

Mustafa Suleyman@mustafasuleyman·18h

Shared language =/= shared meaning. And that can turn multi-agent systems into a game of telephone without any human in the loop being the wiser. Our @MicrosoftAI pre-print tests a solve: if agents don't agree on a definition, they can't use the term. The results: disagreement drops 72% - 96%. Full paper: arxiv.org/pdf/2602.16424

English

10K

Utopic e/λ nag-retweet

Alex Goldie@AlexDGoldie·16h

1/ 🪩 Automating the discovery of new algorithms could unlock significant breakthroughs in ML research. But optimising agents for this research has been limited by too few tasks to learn from! Introducing DiscoGen, a procedural generator of algorithm discovery tasks 🧵

English

114

15K

Utopic e/λ@UtopicDev·16h

@levelsio @exame @exame_noticias If you ignore this, you're further validating this fake AI mafia that's only growing in Brazil.

English

@levelsio@levelsio·16h

Interesting @exame @exame_noticias is also using my name and image to sell AI courses, seems many do it now, it's kinda flattering but also they never asked permission Do you think I should do something about this is or just ignore it? What's normal here?

Darshan Gajara@WeirdoWizard

@levelsio He used you as a case study in a sponsored ad segment for some online tech school. Something on the lines of... Levels built solo businesses because he can code and understands business. Join this school so you can learn these skills...

English

136

317

84.7K

Utopic e/λ nag-retweet

Mixedbread@mixedbreadai·1d

For Agentic tasks, Oracle-level performance is the maximum performance a system can achieve, assuming it is able to retrieve all relevant documents perfectly, every time. We're proud to show that Mixedbread Search approaches the Oracle on multiple knowledge intensive benchmarks.

English

146

70K

Utopic e/λ nag-retweet

Omar Khattab@lateinteraction·1d

Look at these results carefully. Codex and Gemini 3, with gemini file search and codex default tools, versus with @mixedbread’s new late interaction model. Soon enough, if your coding agent is not an RLM with ColBERT file search, you’re ngmi.

Mixedbread@mixedbreadai

English

213

21.6K

Utopic e/λ nag-retweet

Sakana AI@SakanaAILabs·18h

The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune

GIF

English

286

1.4K

362.3K

Utopic e/λ nag-retweet

Google Gemini@GeminiApp·18h

Longer tracks are here with Lyria 3 Pro in Gemini! From experimenting with different styles to generating tracks with complex transitions, Lyria 3 Pro makes it easier to bring your full vision to life. Rolling out today to Google AI Plus, Pro, and Ultra users. Learn more 🧵

English

111

142

1.2K

193.5K

Utopic e/λ nag-retweet

Alex Ziskind@digitalix·20h

32GB of VRAM for under $1000! The Intel Arc Pro B70 just landed.

English

308

309

4.8K

868.8K

Utopic e/λ nag-retweet

DailyPapers@HuggingPapers·1d

Ai2 just released MolmoWeb on Hugging Face A fully open multimodal web agent that autonomously controls browsers to complete tasks, achieving SOTA results and surpassing GPT-4o based agents on WebVoyager and Mind2Web.

English

6.4K

Utopic e/λ nag-retweet

Oliver Prompts@oliviscusAI·1d

Someone built a Chromium browser that runs entirely in your terminal. It's called Carbonyl, and it renders actual web pages in your command line. The best part is it runs with 0% CPU usage when idle. - Full Chromium engine in the terminal. - dles at exactly 0% CPU. - Fast, lightweight, and completely terminal-native. 100% Open Source.

English

129

445

3.7K

309K

Utopic e/λ nag-retweet

Marktechpost AI Dev News ⚡@Marktechpost·1d

NVIDIA AI Introduces PivotRL: A New AI Framework Achieving High Agentic Accuracy With 4x Fewer Rollout Turns Efficiently Training long-horizon agents—for coding, terminal use, or web search—usually forces a choice: the speed of Supervised Fine-Tuning (SFT) or the generalization of End-to-End RL (E2E RL). SFT is fast but brittle; E2E RL is robust but incredibly expensive. PivotRL bridges this gap by operating on existing SFT trajectories to deliver RL-level accuracy at a fraction of the cost. But how does it work? - Pivot Filtering: Instead of full rollouts, it targets "pivots"—critical intermediate turns where actions show high outcome variance. - Functional Rewards: It ditches rigid string matching for domain-specific verifiers that reward any locally acceptable action. The Results: (1) In-Domain Boost: +4.17% higher accuracy than SFT across agentic domains. (2) OOD Stability: +10.04% higher out-of-domain accuracy in non-agentic tasks compared to SFT. (3) Massive Efficiency: On SWE-Bench, PivotRL matched E2E RL accuracy with 4x fewer rollout turns and ~5.5x faster wall-clock time. This isn't just theory based approach—PivotRL is the workhorse behind NVIDIA’s Nemotron-3-Super-120B-A12B..... Full analysis: marktechpost.com/2026/03/25/nvi… Paper: arxiv.org/pdf/2603.21383 @kuchaev @nvidia @NVIDIAAI @NVIDIARobotics

English

48.7K

Utopic e/λ nag-retweet

Omma@omma_ai·1d

Today, we are launching Omma. Create 3D, Websites, and Apps with AI agents. Start now on omma.build

English

389

30.9K

Utopic e/λ nag-retweet

AiBattle@AiBattle_·1d

ARC-AGI-3 launches tomorrow - The first interactive reasoning benchmark built to test human-like intelligence in AI - 1,000+ levels across 150+ environments requiring exploration, learning, planning, and adaptation - Video-game-like tasks with no instructions, requiring multi-step reasoning and rule discovery The highest score on ARC-AGI-1 currently is Gemini 3.1 Pro with 98%, while on ARC-AGI-2 it is Gemini 3 Deep Think with 84.6%

English

686

59.5K

Utopic e/λ@UtopicDev·1d

@mattpocockuk And it can get 100 times worse depending on the country 🤯

English

231

Matt Pocock@mattpocockuk·1d

6 months ago I disabled my home feed on X via CSS I got a new laptop yesterday that didn't have my CSS resets, so I did a cursory scroll Holy shit, the AI discourse is so dumb

English

270

37.6K

Utopic e/λ nag-retweet

Chubby♨️@kimmonismus·1d

OpenAI's Sora team is now working on world-models - they prioritize longer-term world simulation research especially as it pertains to robotics. tl;dr what we know so far: - Sora has been cancelled because they needed the compute for their new LLM - they renamed product organization to "AGI Deployment" - the LLM (codename Spud) is "very very strong" and "accelerates the economy" - release in a few weeks - Sam is going to focus on "raising capital, supply chains and “building datacenters at unprecedented scale” my take: To me, it really sounds like they are preparing for the IPO and will make AGI official beforehand.

Chubby♨️@kimmonismus

Either OpenAI officially achieved AGI or this is the biggest troll move ever: - they rename product organization to "AGI Deployment" - Altman says the next LLM is a "very strong model" - it very much accelerate the economy Quote: "Altman also said that the company would be renaming senior executive Fidji Simo’s product organization to “AGI Deployment,” a reference to artificial general intelligence, or AI that’s roughly on par with humans." However, Altman says "Spud is very strong model" in “a few weeks” that the team believes “can really accelerate the economy.”

English

738

101.4K

Utopic e/λ nag-retweet

Harrison Chase@hwchase17·1d

⛳️async subagents in deepagents==0.5.0a1 i think in future we will "chat" with a single agent, and it will manage multiple longer running agents in the background you can now do this with deepagents we released an alpha release (0.5.0a1) with this functionality - try it out and let us know what you think! docs.langchain.com/oss/python/rel…

English

156

20.5K

Utopic e/λ@UtopicDev·1d

@EnidPinxit the idea is perfect for these entities to live their lives

English

Enid Pinxit@EnidPinxit·1d

@UtopicDev kind of hoping they have something that flips the table and makes everyone go, w-t-f!?!? in a good way

English

Utopic e/λ@UtopicDev·1d

Why doesn't Zuckerberg pivot the metaverse into a universe designed for AI agents?

English

Utopic e/λ nag-retweet

Teknium (e/λ)@Teknium·1d

API Server with Responses API Hermes can now act as an OpenAI-compatible backend — any frontend (Open WebUI, LobeChat, LibreChat, ChatBox, etc.) can connect to it. Exposes both /v1/chat/completions and /v1/responses (stateful, with previous_response_id chaining). Full agent stack behind the API: tools, skills, memory, cron.

English

264

56.3K

Tuklasin

@alexinexxx @MicrosoftAI @levelsio @exame @exame_noticias @mixedbread @Nature @_chris_lu_