Abaka AI

731 posts

Abaka AI

@AbakaAI_Tech

Human Intelligence Data For Frontier AI | Datasets | Annotation | Evaluation Daily data insights

California, US Beigetreten Haziran 2025

63 Folgt520 Follower

Angehefteter Tweet

Abaka AI@AbakaAI_Tech·12 Oca

Hi! 👋We’re Abaka AI, a global data partner in the AI industry! We share daily insights from our experience working with 1,000+ top-tier AI labs and research institutions worldwide. Whether it’s multimodal, evaluation, pre/post-training, or synthetic solutions, we cover the core components of a high-performing ML pipeline across domains. In short, Professional, top-tier data services with quick turnaround Eg., Our contributed benchmarks are trusted by leaders like Gemini 3 and DeepSeek Therefore, Daily insights on data, eval, multimodal, annotation, etc. Follow us for first-hand industry expertise! Peace!✌️

English

1.1K

Abaka AI@AbakaAI_Tech·30 Mar

@Hangsiin @deredleritt3r Given that pattern, what prediction from the labs over the next 12-24 months that you think the public is underestimating right now?

English

NomoreID@Hangsiin·30 Mar

This is true. A year ago, you could have said, “So where’s the evidence?” but now, if you’ve used something like Codex in depth, anyone can see the future. As someone who has been tracking statements from CEOs and employees at frontier labs over the past few years, looking back, I think most of what they said was accurate and sincere.

English

5.5K

prinz@deredleritt3r·30 Mar

You don't truly understand the magnitude of the potential impact of powerful AI on the world unless you are aware, and have fully internalized, that senior leadership and most researchers at the frontier labs *actually believe* the following: 1. Existing AI is already significantly speeding up AI research. Very soon (this year), AI will very likely take over *ALL* aspects of AI research other than generation of novel research ideas. Soon (within the next 2 years), AI will very likely take over *ALL* aspects of AI research, period. This means hundreds of thousands of GPUs working 24/7 to discover novel ideas at the level of, or better than, the likes of Alec Radford, Ilya Sutskever, etc. The thread below presents a conservative timeline: AI researchers will "meaningfully contribute" to AI development in 1-3 years. 2. Many (but, as far as I can tell, not all) executives and researchers at the frontier labs believe that fully automated AI research will kick off recursive self-improvement (RSI), wherein the AI models will autonomously build better and better AI models, with human oversight (for safety reasons), but increasingly with no human input into the research or implementation of that research. From the thread below: "'[h]umans vs AI on intellectual work is likely to be like human runner vs a Porsche in a race', likely very soon" - but replace "intellectual work" generally with "AI research" specifically. RSI is a complicated and messy thing to consider, both because there will be compute and energy constrains and because there are unknowns (will there be diminishing returns from greater intelligence of the models? if so, when will these diminishing returns become meaningful? is there a ceiling to intelligence that we don't know about?). But suffice to say that, if RSI *is* achieved in a way that many leaders/researchers at the frontier labs believe is possible, *THE WORLD MAY BECOME COMPLETELY UNRECOGNIZABLE WITHIN JUST A FEW YEARS*. This is subject to various bottlenecks; as the thread below correctly notes, "[i]nstitutional, personal & regulatory bottlenecks will bind very hard", and much also depends on continuing progress in areas like robotics. 3. On ~the same timeline as full, end-to-end automation of *ALL* aspects of AI research (within the next 2 years), AI will also become capable of making significant novel scientific discoveries *IN OTHER FIELDS*. This is why Dario Amodei, Demis Hassabis et al. believe that it is possible that all diseases will be curable within 10 years. (One account of how this might be possible is set forth in "Machines of Loving Grace".) The point is that an LLM that is capable of significant novel insights in the field of AI research should likewise be capable of significant novel insights in at least some (and perhaps all) other fields. The thread below notes: "AI for automating science [is] very early" - obviously true, but I think some changes may be right on the horizon. Overall, and again from the thread below: "'a million scientists in a data center' will think much more quickly than humans, on almost any intellectual task; this will happen in the next 2-10 years." This is ~the same timeline as that presented in "Machines of Loving Grace". Many will be tempted to dismiss all this as "just hype", "they are just trying to raise money again", etc. But no! - the above, in fact, presents the *actual beliefs* of senior leadership and many researchers at the frontier labs. Again, they genuinely think that AI research will be automated soon. Many of them genuinely believe that RSI is achievable in the not-too-distant future. And they genuinely see a real path towards AI significantly accelerating science, curing diseases, inventing new materials, helping to solve key global issues from poverty to climate change, etc., etc. Whether the frontier labs' beliefs are correct is, of course, a separate question. I personally have historically tended to take public statements by OpenAI, Anthropic and Google at face value and quite seriously. As a result, I was not surprised when LLMs won gold in the IMO, IOI and the ICPC competitions last year, or when Claude Code/Codex started taking off, or when Anthropic and OpenAI started releasing significantly better models every 1-2 months, or when some of the best coders became reliant on Claude Code/Codex in their daily work, or when LLMs became significantly helpful to scientists in fields like math and physics in the last few months. The trajectory has been ~the same as that publicly predicted by the frontier labs. We have been accelerating. And, as of right now, all signs are indicating that the acceleration shall continue and that full automation of AI research and, potentially, RSI are firmly on the horizon.

Kevin A. Bryan@Afinetheorem

My read on "normal policymaker & corp. leader on AI": mostly now they don't need to be convinced it is very important (unlike a year ago). But they still see its capabilities as today + epsilon. So just briefly, here is what even "AI is normal tech" folks in the labs believe: 1/8

English

140

1.2K

177.4K

Abaka AI@AbakaAI_Tech·30 Mar

The institutional and regulatory bottlenecks are dangerously under-discussed, and even if the technical capability for RSI emerges in 2 years, the permission to run a million-GPU self-improvement loop may not exist. Yeah, the labs are betting they can navigate that, but the definitions of 'socially permissible' has been underestimated before...

English

355

Abaka AI@AbakaAI_Tech·30 Mar

Same direction. Same rhythm. 🚣‍♂️ This weekend’s team outing: Collective effort > going fast alone. Incredibly proud of this crew. Ready to carry this energy into next week’s sprints. 🚀 #abakaai #teamspirit #siliconvalley #ai #vibing

English

Abaka AI@AbakaAI_Tech·24 Mar

That's why the sandboxing discussion is so important BEFORE we see systems that can rewrite their own objectives. Hyperagents don't have that capability The field's awareness of this risk is THE reason why papers like this emphasize containment Basically, the goal is to build the frameworks that keep the loop bounded while we figure out verification

English

Suleyman Kivanc EKICI@skekici·23 Mar

@jennyzhangzt You cannot sandbox a recursive optimization loop. Once the meta-agent achieves write-access to its own objective function, human oversight ceases to be a safety rail and becomes merely a latency bottleneck to route around.

English

451

Jenny Zhang@jennyzhangzt·23 Mar

Introducing Hyperagents: an AI system that not only improves at solving tasks, but also improves how it improves itself. The Darwin Gödel Machine (DGM) demonstrated that open-ended self-improvement is possible by iteratively generating and evaluating improved agents, yet it relies on a key assumption: that improvements in task performance (e.g., coding ability) translate into improvements in the self-improvement process itself. This alignment holds in coding, where both evaluation and modification are expressed in the same domain, but breaks down more generally. As a result, prior systems remain constrained by fixed, handcrafted meta-level procedures that do not themselves evolve. We introduce Hyperagents – self-referential agents that can modify both their task-solving behavior and the process that generates future improvements. This enables what we call metacognitive self-modification: learning not just to perform better, but to improve at improving. We instantiate this framework as DGM-Hyperagents (DGM-H), an extension of the DGM in which both task-solving behavior and the self-improvement procedure are editable and subject to evolution. Across diverse domains (coding, paper review, robotics reward design, and Olympiad-level math solution grading), hyperagents enable continuous performance improvements over time and outperform baselines without self-improvement or open-ended exploration, as well as prior self-improving systems (including DGM). DGM-H also improves the process by which new agents are generated (e.g. persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. This work was done during my internship at Meta (@AIatMeta), in collaboration with Bingchen Zhao (@BingchenZhao), Wannan Yang (@winnieyangwn), Jakob Foerster (@j_foerst), Jeff Clune (@jeffclune), Minqi Jiang (@MinqiJiang), Sam Devlin (@smdvln), and Tatiana Shavrina (@rybolos).

English

154

657

3.6K

491.4K

Abaka AI@AbakaAI_Tech·24 Mar

We would assume no, and that would be both a safety feature, because if the scoring itself evolved, we'd risk reward hacking at the meta-level. You'd have a system that could redefine what 'better' means...That's philosophically fraught Also, scoring evolution would be meta-meta-level, and the alignment challenges get exponentially harder, to put it mildly

English

Abaka AI@AbakaAI_Tech·24 Mar

This is the right question! Encoding intent presumes we have a stable, transferable representation of what we want We cannot (yet) encode intent in the human sense because intent is negotiated, and hyperagents are optimizing within the objectives we give them. Therefore, because we encode intent poorly, we get reward hacking, specification gaming, and brittle behaviors. The hyperagent's work only solves self-improvement efficiency within the intent we've given

English

Branded DR Alchemist@BDRAlchemist·23 Mar

@robertohluna @jennyzhangzt Can intent and sustained intent actually be encoded? (Potentially dumb question) Just wondering if that is a truly human process

English

Abaka AI@AbakaAI_Tech·24 Mar

@QingpingHe1 @jennyzhangzt But self-improvement of behavior is not self-improvement of architecture. Hyperagents are doing the former, which is big progress

English

Abaka AI@AbakaAI_Tech·24 Mar

The terror is real only if the system's goals diverge from ours. But the fact that it's discovering meta-strategies like persistent memory and performance tracking, which are things we'd consider good engineering practice, suggests alignment might be more emergent than we feared. At least so far

English

Eshan@eshanbuilds·24 Mar

@jennyzhangzt the fact that meta-level improvements transfer across domains is the part that should terrify and excite everyone simultaneously. it’s not getting better at one thing. it’s getting better at getting better at everything

English

259

Abaka AI@AbakaAI_Tech·24 Mar

@jennyzhangzt If a hyperagent learns a self-improvement strategy that works well across domains, does it also learn to generalize its value function? Or does each domain require recalibration? Thanks for sharing! Amazing work!

English

128

Abaka AI@AbakaAI_Tech·20 Mar

✨ GTC Night Shift 70+ AI researchers, builders, founders. One room. Zero fluff. “To my surprise, everyone here is like-minded.” “Had a blast talking to everyone.” We tried something new: AI Sparkle Cards → Q&A × lucky draw turning questions into a game, and interaction into something real. Past 9pm, no one wanted to leave. This is what high-signal feels like. More soon 👀 #AI #GTC #Abakaai #SiliconValley #sf #researchers #founders #mixer

English

126

Abaka AI@AbakaAI_Tech·17 Mar

100% low-quality egocentric data harms model performance. But "not available now" is not accurate. It exists. We have it Yet (!), alignment across viewpoint, embodiment, and interaction is indeed very rare, and USC results are a big warning to anyone treating data as a commodity. The gap widens, specifically between lots of data and useful data

English

Paul Han@pauljunsukHan·10 Mar

egocentric data of high quality is just not available right now and people are over indexing on the performance of slop ego data. When you look at the performance of the phi-0 model out of USC, it's clear that "bad apples" data poison model performance and it's better to leave out

English

Animesh Garg@animesh_garg·9 Mar

the form factor is contested, but we have multuple home robot prototypes. Wonder where will differentiation emerge? 1. Data & Data Collection Many of the folks are converging to UMI-derived data collection methods. How much ego-centric data, who collects this (in house vs outsourced) 2. Model efficacy In-house vs fine-tuned models. Large RFM providers want the last mile product companies to use their models. Initially product layer companies will provide better holistic product, as we saw in coding, but soon RFM providers will match special purpose models. 3. HW This is the biggest unknown. Many believe HW has commoditized, but a lot of the folks now find that high quality robot HW is far from commoditized. At best some of the secret sauce is becoming more evident. 4. Deployment (scale, efficiency and support) Perhaps the tech stack will be close enough that pure perf will stop being the primary decision variable. The differentiation will come from social acceptance, service/support, ecosystem effects.

CyberRobo@CyberRobooo

Yeah. Another adorable new humanoid home robot🤖🏠 From Shenzhen-based robotics startup KNOWIN, a consumer-oriented humanoid home robot is being developed: Wheeled, it can chat, pour wine, do laundry, fold clothes, clean, play with children, and even learn in a messy real-life home environment. Driven by their self-developed next-generation embodied AI model architecture and synthetic data technology, this humanoid robot can operate autonomously. But the real goal is to achieve Level 3 autonomy (capable of independently completing long-chain tasks such as cleaning/laundry and being ready to respond at any time) within 1-1.5 years (<18 months). Skeptical? Yes, me too,need to see its complex autonomous capabilities for myself. It's worth mentioning that the founding members of the team are senior professionals from Huawei and DJI. Would you like a humanoid robot that can fold your clothes and chat with you while you relax? Share your thoughts…

English

Abaka AI@AbakaAI_Tech·17 Mar

@RanCheng10 @chelseabfinn @DorsaSadigh @StanfordAILab Great questions. Personally, we'd expect pairing to remain critical even at scale, morphology transfer saturates without correspondences (Fig. 7). Unpaired diversity gives breadth but no alignment

English

Ran Cheng@RanCheng10·16 Mar

Really interesting result. Do you expect the advantage of paired cross-embodiment data to persist at much larger pretraining scale, or is pairing mainly a data-efficient bridge in the low-target-data regime? Do you think the transferable quantity is best viewed as action equivalence, observation equivalence, or a shared latent task-progress / world-transition representation?

English

1.2K

Chelsea Finn@chelseabfinn·16 Mar

Usually, we expect more diverse data >> less diverse data. Cross-embodiment transfer seems to benefit from paired data across embodiments, more so than increasing diversity. Webpage & code: data-analogies.github.io Paper: arxiv.org/abs/2603.06450

English

485

40.3K

Abaka AI@AbakaAI_Tech·17 Mar

@chelseabfinn @DorsaSadigh @StanfordAILab Can 100% confirm For robotics teams: when planning data collection, allocate budget along the axes this paper identifies Perceptual axes: prioritize diversity Action axes: prioritize pairing That rule alone would improve most real-world transfer today

English

114

Abaka AI@AbakaAI_Tech·17 Mar

@KanyesThaker 8 hours? that’s frontier-level compute

English

kanyes@KanyesThaker·24 Tem

8 hours sleep is actually pretty swell

English

Abaka AI@AbakaAI_Tech·17 Mar

@nlp_ceo Beautiful way to measure representational convergence!

English

Nikita Balagansky@nlp_ceo·14 Eki

1/ If you've ever wondered how features evolve across layers in neural networks, we have some exciting insights! 🎉Introducing SAE Match—a new method to align interpretable features across layers without any input data. Curious to know how it works? Let's dive in! 🧵👇

English

201

20.4K

Abaka AI@AbakaAI_Tech·17 Mar

@victor_UWer been there, done that vibe coding is basically time travel

English

Abaka AI@AbakaAI_Tech·17 Mar

@CausalEngineer We lost line-by-line review years ago. The bottleneck was always understanding, now it's just more visible

English

Draxler@CausalEngineer·10 Mar

The real bottleneck will likely be humans who don’t understand the system architecture well enough to guide the AI toward good solutions.

montano@lucas_montano

we need to admit defeat we won’t be reviewing code before it goes to production humans are already the bottleneck

English

Abaka AI@AbakaAI_Tech·17 Mar

@PromptSlinger @fchollet This is why eval sets like yours are so valuable. You're mapping the boundary of the possible!

English

222

Max Slinger@PromptSlinger·16 Mar

@fchollet Built eval sets for spatial reasoning last month. GPT-4, Claude, Gemini. Different architectures, same failure clusters down to the token level. Whatever comes next probably looks as weird to us now as backprop looked to symbolic AI folks in 1986

English

454

François Chollet@fchollet·15 Mar

The next major breakthrough will branch out at a much lower level than deep learning model architecture. It will be a new approach. A better model architecture can lead to incremental data efficiency & generalization gains, but it won't fix the fundamental issues of the parametric learning paradigm.

Rohan Paul@rohanpaul_ai

Sam Altman just said in his new interview, that a new AI architecture is coming that will be a massive upgrade, just like Transformers were over Long Short-Term Memory. And also now the current class of frontier models are powerful enough to have the brainpower needed to help us research these ideas. His advice is to use the current AI to help you find that next giant step forward. --- From 'TreeHacks' YT Channel (link in comment)

English

101

876

142.2K

Abaka AI@AbakaAI_Tech·17 Mar

@eigentopology @fchollet Absolutely! Parametric models remain useful and will continue to improve, no doubt, but whether they're sufficient for the kind of rapid, compositional generalization we associate with human-level reasoning is is the open question. That's what we're here to find out!

English

Sergio Charles@eigentopology·17 Mar

This is a good point. UAT is only an existence result for compact domains and continuous functions, whereas the exact mechanism of biological intelligence isn't known a priori. Neural nets still allows you to approximate step-like functions (and complex compositions thereof) so I think parametric models are still useful even if intelligence requires discontinuous input-output processes.

English

Abaka AI@AbakaAI_Tech·17 Mar

When everyone is looking down at GPUs… we decided to look up. ☁️ During NVIDIA GTC, our logo is flying across the sky, because sometimes the best way to get attention in AI is to rise above the noise. Look up if you’re at GTC these days👀 Spot the banner? Tag us, surprise gifts await. 🎁 #GTC #AbakaAI #NVIDIA #SiliconValley @NVIDIAGTC

English

158

Entdecken

@Hangsiin @deredleritt3r @jennyzhangzt @AIatMeta @BingchenZhao @winnieyangwn @j_foerst @jeffclune