Utopic e/λ

7.5K posts

Utopic e/λ banner
Utopic e/λ

Utopic e/λ

@UtopicDev

AI Designer and Builder. Technology to save the world. There Is No Planet B... The hyperlink guy 😉

Sumali Haziran 2023
5K Sinusundan323 Mga Tagasunod
Utopic e/λ nag-retweet
Mustafa Suleyman
Mustafa Suleyman@mustafasuleyman·
Shared language =/= shared meaning. And that can turn multi-agent systems into a game of telephone without any human in the loop being the wiser. Our @MicrosoftAI pre-print tests a solve: if agents don't agree on a definition, they can't use the term. The results: disagreement drops 72% - 96%. Full paper: arxiv.org/pdf/2602.16424
Mustafa Suleyman tweet media
English
21
15
93
10K
Utopic e/λ nag-retweet
Alex Goldie
Alex Goldie@AlexDGoldie·
1/ 🪩 Automating the discovery of new algorithms could unlock significant breakthroughs in ML research. But optimising agents for this research has been limited by too few tasks to learn from! Introducing DiscoGen, a procedural generator of algorithm discovery tasks 🧵
Alex Goldie tweet media
English
2
33
114
15K
@levelsio
@levelsio@levelsio·
Interesting @exame @exame_noticias is also using my name and image to sell AI courses, seems many do it now, it's kinda flattering but also they never asked permission Do you think I should do something about this is or just ignore it? What's normal here?
@levelsio tweet media
Darshan Gajara@WeirdoWizard

@levelsio He used you as a case study in a sponsored ad segment for some online tech school. Something on the lines of... Levels built solo businesses because he can code and understands business. Join this school so you can learn these skills...

English
136
4
317
84.7K
Utopic e/λ nag-retweet
Mixedbread
Mixedbread@mixedbreadai·
For Agentic tasks, Oracle-level performance is the maximum performance a system can achieve, assuming it is able to retrieve all relevant documents perfectly, every time. We're proud to show that Mixedbread Search approaches the Oracle on multiple knowledge intensive benchmarks.
Mixedbread tweet media
English
4
22
146
70K
Utopic e/λ nag-retweet
Omar Khattab
Omar Khattab@lateinteraction·
Look at these results carefully. Codex and Gemini 3, with gemini file search and codex default tools, versus with @mixedbread’s new late interaction model. Soon enough, if your coding agent is not an RLM with ColBERT file search, you’re ngmi.
Mixedbread@mixedbreadai

For Agentic tasks, Oracle-level performance is the maximum performance a system can achieve, assuming it is able to retrieve all relevant documents perfectly, every time. We're proud to show that Mixedbread Search approaches the Oracle on multiple knowledge intensive benchmarks.

English
10
18
213
21.6K
Utopic e/λ nag-retweet
Sakana AI
Sakana AI@SakanaAILabs·
The AI Scientist: Towards Fully Automated AI Research, Now Published in Nature Nature: nature.com/articles/s4158… Blog: sakana.ai/ai-scientist-n… When we first introduced The AI Scientist, we shared an ambitious vision of an agent powered by foundation models capable of executing the entire machine learning research lifecycle. From inventing ideas and writing code to executing experiments and drafting the manuscript, the system demonstrated that end-to-end automation of the scientific process is possible. Soon after, we shared a historic update: the improved AI Scientist-v2 produced the first fully AI-generated paper to pass a rigorous human peer-review process. Today, we are happy to announce that “The AI Scientist: Towards Fully Automated AI Research,” our paper describing all of this work, along with fresh new insights, has been published in @Nature! This Nature publication consolidates these milestones and details the underlying foundation model orchestration. It also introduces our Automated Reviewer, which matches human review judgments and actually exceeds standard inter-human agreement. Crucially, by using this reviewer to grade papers generated by different foundation models, we discovered a clear scaling law of science. As the underlying foundation models improve, the quality of the generated scientific papers increases correspondingly. This implies that as compute costs decrease and model capabilities continue to exponentially increase, future versions of The AI Scientist will be substantially more capable. Building upon our previous open-source releases (github.com/SakanaAI/AI-Sc…), this open-access Nature publication comprehensively details our system's architecture, outlines several new scaling results, and discusses the promise and challenges of AI-generated science. This substantial milestone is the result of a close and fruitful collaboration between researchers at Sakana AI, the University of British Columbia (UBC) and the Vector Institute, and the University of Oxford. Congrats to the team! @_chris_lu_ @cong_ml @RobertTLange @_yutaroyamada @shengranhu @j_foerst @hardmaru @jeffclune
GIF
English
37
286
1.4K
362.3K
Utopic e/λ nag-retweet
Google Gemini
Google Gemini@GeminiApp·
Longer tracks are here with Lyria 3 Pro in Gemini! From experimenting with different styles to generating tracks with complex transitions, Lyria 3 Pro makes it easier to bring your full vision to life. Rolling out today to Google AI Plus, Pro, and Ultra users. Learn more 🧵
English
111
142
1.2K
193.5K
Utopic e/λ nag-retweet
Alex Ziskind
Alex Ziskind@digitalix·
32GB of VRAM for under $1000! The Intel Arc Pro B70 just landed.
English
308
309
4.8K
868.8K
Utopic e/λ nag-retweet
DailyPapers
DailyPapers@HuggingPapers·
Ai2 just released MolmoWeb on Hugging Face A fully open multimodal web agent that autonomously controls browsers to complete tasks, achieving SOTA results and surpassing GPT-4o based agents on WebVoyager and Mind2Web.
DailyPapers tweet media
English
1
12
52
6.4K
Utopic e/λ nag-retweet
Oliver Prompts
Oliver Prompts@oliviscusAI·
Someone built a Chromium browser that runs entirely in your terminal. It's called Carbonyl, and it renders actual web pages in your command line. The best part is it runs with 0% CPU usage when idle. - Full Chromium engine in the terminal. - dles at exactly 0% CPU. - Fast, lightweight, and completely terminal-native. 100% Open Source.
English
129
445
3.7K
309K
Utopic e/λ nag-retweet
Marktechpost AI Dev News ⚡
NVIDIA AI Introduces PivotRL: A New AI Framework Achieving High Agentic Accuracy With 4x Fewer Rollout Turns Efficiently Training long-horizon agents—for coding, terminal use, or web search—usually forces a choice: the speed of Supervised Fine-Tuning (SFT) or the generalization of End-to-End RL (E2E RL). SFT is fast but brittle; E2E RL is robust but incredibly expensive. PivotRL bridges this gap by operating on existing SFT trajectories to deliver RL-level accuracy at a fraction of the cost. But how does it work? - Pivot Filtering: Instead of full rollouts, it targets "pivots"—critical intermediate turns where actions show high outcome variance. - Functional Rewards: It ditches rigid string matching for domain-specific verifiers that reward any locally acceptable action. The Results: (1) In-Domain Boost: +4.17% higher accuracy than SFT across agentic domains. (2) OOD Stability: +10.04% higher out-of-domain accuracy in non-agentic tasks compared to SFT. (3) Massive Efficiency: On SWE-Bench, PivotRL matched E2E RL accuracy with 4x fewer rollout turns and ~5.5x faster wall-clock time. This isn't just theory based approach—PivotRL is the workhorse behind NVIDIA’s Nemotron-3-Super-120B-A12B..... Full analysis: marktechpost.com/2026/03/25/nvi… Paper: arxiv.org/pdf/2603.21383 @kuchaev @nvidia @NVIDIAAI @NVIDIARobotics
Marktechpost AI Dev News ⚡ tweet media
English
2
14
63
48.7K
Utopic e/λ nag-retweet
Omma
Omma@omma_ai·
Today, we are launching Omma. Create 3D, Websites, and Apps with AI agents. Start now on omma.build
English
22
39
389
30.9K
Utopic e/λ nag-retweet
AiBattle
AiBattle@AiBattle_·
ARC-AGI-3 launches tomorrow - The first interactive reasoning benchmark built to test human-like intelligence in AI - 1,000+ levels across 150+ environments requiring exploration, learning, planning, and adaptation - Video-game-like tasks with no instructions, requiring multi-step reasoning and rule discovery The highest score on ARC-AGI-1 currently is Gemini 3.1 Pro with 98%, while on ARC-AGI-2 it is Gemini 3 Deep Think with 84.6%
AiBattle tweet media
English
25
62
686
59.5K
Matt Pocock
Matt Pocock@mattpocockuk·
6 months ago I disabled my home feed on X via CSS I got a new laptop yesterday that didn't have my CSS resets, so I did a cursory scroll Holy shit, the AI discourse is so dumb
English
30
4
270
37.6K
Utopic e/λ nag-retweet
Chubby♨️
Chubby♨️@kimmonismus·
OpenAI's Sora team is now working on world-models - they prioritize longer-term world simulation research especially as it pertains to robotics. tl;dr what we know so far: - Sora has been cancelled because they needed the compute for their new LLM - they renamed product organization to "AGI Deployment" - the LLM (codename Spud) is "very very strong" and "accelerates the economy" - release in a few weeks - Sam is going to focus on "raising capital, supply chains and “building datacenters at unprecedented scale” my take: To me, it really sounds like they are preparing for the IPO and will make AGI official beforehand.
Chubby♨️ tweet media
Chubby♨️@kimmonismus

Either OpenAI officially achieved AGI or this is the biggest troll move ever: - they rename product organization to "AGI Deployment" - Altman says the next LLM is a "very strong model" - it very much accelerate the economy Quote: "Altman also said that the company would be renaming senior executive Fidji Simo’s product organization to “AGI Deployment,” a reference to artificial general intelligence, or AI that’s roughly on par with humans." However, Altman says "Spud is very strong model" in “a few weeks” that the team believes “can really accelerate the economy.”

English
50
74
738
101.4K
Utopic e/λ nag-retweet
Harrison Chase
Harrison Chase@hwchase17·
⛳️async subagents in deepagents==0.5.0a1 i think in future we will "chat" with a single agent, and it will manage multiple longer running agents in the background you can now do this with deepagents we released an alpha release (0.5.0a1) with this functionality - try it out and let us know what you think! docs.langchain.com/oss/python/rel…
Harrison Chase tweet media
English
23
21
156
20.5K
Enid Pinxit
Enid Pinxit@EnidPinxit·
@UtopicDev kind of hoping they have something that flips the table and makes everyone go, w-t-f!?!? in a good way
English
1
0
1
7
Utopic e/λ
Utopic e/λ@UtopicDev·
Why doesn't Zuckerberg pivot the metaverse into a universe designed for AI agents?
English
1
0
2
25
Utopic e/λ nag-retweet
Teknium (e/λ)
Teknium (e/λ)@Teknium·
API Server with Responses API Hermes can now act as an OpenAI-compatible backend — any frontend (Open WebUI, LobeChat, LibreChat, ChatBox, etc.) can connect to it. Exposes both /v1/chat/completions and /v1/responses (stateful, with previous_response_id chaining). Full agent stack behind the API: tools, skills, memory, cron.
English
24
15
264
56.3K