LDJ

466 posts

LDJ banner
LDJ

LDJ

@ldjconfirmed

Currently: Doing some stuff with AI. Prev founding team of: @NousResearch (2023) and @TTSLabsAI (2020) DM for interesting conversations.

S4 انضم Mart 2021
709 يتبع6K المتابعون
تغريدة مثبتة
LDJ
LDJ@ldjconfirmed·
Moores law created AI to save itself.
English
5
7
67
10.6K
LDJ
LDJ@ldjconfirmed·
@kelstar_ @NousResearch @OpenRouter Not sure why that would impact the rankings in this way. But further update now; Hermes Agent is now #1 on the monthly global charts too, and is so far ahead that it has more usage than OpenClaw, Kilo Code and Claude Code combined.
LDJ tweet media
English
2
0
1
38
Nous Research
Nous Research@NousResearch·
Hermes Agent is now #1 on the Global @OpenRouter token rankings. While our journey together has just begun, we'd like to take this opportunity to thank our contributors, supporters, and users for all they have done to get us this far.
Nous Research tweet media
English
439
727
7.2K
3M
Nous Research
Nous Research@NousResearch·
@kelstar_ @OpenRouter It's the daily global chart as clearly indicated in the screenshot. Don't worry, we'll do an all time #1 post when we get there too!
English
2
0
52
5.5K
LDJ
LDJ@ldjconfirmed·
@anko_979 @BlackHC It’s a common misconception that it was pulled, it wasn’t, it’s placement in the code of conduct was simply changed from the preface of the code of conduct to the ending section of the code of conduct where it still exists today.
English
1
0
0
44
AnKo
AnKo@anko_979·
@BlackHC It's almost like this fundamental slogan was pulled for some reason
English
2
0
3
133
Andreas Kirsch 🇺🇦
Andreas Kirsch 🇺🇦@BlackHC·
I'm speechless at Google signing a deal to use our AI models for classified tasks. Frankly, it is shameful. For HR, I'm not speaking on behalf of Google but in my personal capacity, quoting public information from a well-sourced article of a reputable publication
Andreas Kirsch 🇺🇦 tweet media
English
214
202
1.3K
253K
LDJ
LDJ@ldjconfirmed·
Roughly related but I don't think they intend to use any GDPval score as evidence for AGI being achieved. They say in the paper themselves that it's largely work that is only a time-horizon of a few hours and the context is much more assisted than a real job. GDPVal is also far easier and more saturated than something like RemoteLaborIndex which comprises of real Upwork tasks (but still not typical employment positions) Current GDPVal SOTA is over 80% Current RemoteLaborIndex SOTA is less than 5%
English
0
0
1
27
prinz
prinz@deredleritt3r·
As to the timeline for OpenAI declaring AGI, recall that Amazon has committed to invest $35B in OpenAI upon the earlier of: (1) OpenAI IPO, or (2) OpenAI meeting "specified milestones" - which Reuters reported means OpenAI declaring AGI. Importantly, Amazon's commitment to invest $35B *expires* on December 31, 2028. Timing for the IPO depends heavily on market conditions. Thus, if Reuters' reporting was accurate, OpenAI must have felt very confident that AGI will be declared within the next ~2 years.
prinz tweet media
English
7
10
149
26.2K
LDJ
LDJ@ldjconfirmed·
@daniel_mac8 @deredleritt3r In Microsofts October 2025 blog post about their latest partnership terms with OpenAI: "Once AGI is declared by OpenAI, that declaration will now be verified by an independent expert panel."
English
0
0
2
21
Dan McAteer
Dan McAteer@daniel_mac8·
@deredleritt3r true! hard for me to imagine how you can base an agreement on such vague language (unless, as you mention Prinz, it's more precise in the private agreement)
English
2
0
2
95
LDJ
LDJ@ldjconfirmed·
@deredleritt3r @daniel_mac8 "Economically valuable work" is further defined by people internally at OpenAI as the jobs tracked by the US bureau of labor statistics. So I suppose it's a majority of those jobs that they mean.
English
1
0
2
53
prinz
prinz@deredleritt3r·
Not just "most"! What is "outperforming humans" - do we mean the average human, a Ph.D, a child...? What is "economically valuable work" - jobs or tasks? And do we include physical labor or not? Some of these things may be explained further in the agreement, but again that's never been made publicly available.
English
2
0
3
169
LDJ
LDJ@ldjconfirmed·
@deredleritt3r In Feb 2026, Sam Altman said at a Stanford hackathon: “If you are a sophomore now, you will graduate into a world with AGI in it" Sophomores in Feb 2026 are set to graduate around mid-2028. I believe this is the first and only time he's stated such a near-term AGI prediction.
English
1
0
5
510
LDJ
LDJ@ldjconfirmed·
@ChaosEmergent @haider1 GPT-4.5 started training ~may 2024, almost exactly 2 years ago now. (Based on official OpenAI statements that mentioned starting training on their new next generation model at the time, along with corroboration from WallStreetJournal and others)
English
0
0
1
155
Nikhil Shinday
Nikhil Shinday@ChaosEmergent·
@haider1 wait then what was gpt-4.5 I assumed that the 5-series were distilled from 4.5 then RLVR'd as smaller models
English
3
0
43
6K
Haider.
Haider.@haider1·
greg brockman recently confirmed that "spud" is openai first new pre-train model in two years since gpt-5.x models seem to build on gpt-4o/4-turbo, if openai RL can push a weaker base model like gpt-4o close to gpt-5.4-x-high-level intelligence then openai clearly has a secret sauce
English
33
27
1.1K
97.2K
LDJ
LDJ@ldjconfirmed·
@zephyr_z9 That quote is not true to what he said. His statement was directly opposite of the what you created within your quotations. Here is his actual quote about that topic: "Even by 2028, I don’t expect that we’ll get systems as smart as people in all ways"
English
0
0
4
183
Zephyr
Zephyr@zephyr_z9·
"AI systems will be as smart as humans in all ways by 2028" Multimodality will still be a bottleneck
prinz@deredleritt3r

New interview with Jakub Pachocki in the MIT Technology Review: - The automated AI researcher (planned for 2028) is described as a "multi-agent" system, and will be able to "tackle problems that are too large or complex for humans to cope with". This is a clear indication that OpenAI expects the automated AI researcher to be superhuman at AI research. - But it won't be used only for AI research. "In theory, you would throw such a tool any kind of problem that can be formulated in text, code or whiteboard scribbles." This includes math, physics, biology, chemistry, "or even business and policy dilemmas". - Pachocki: "I think we are getting close to a point where we'll have models capable of working indefinitely in a coherent way just like people do... we will get to a point where you... have a whole research lab in a data center." - Saving the world by solving its hardest problems is the stated mission of all the top AI firms. "Pachocki says OpenAI now has most of what it needs to get there." - The automated AI research intern (targeted for September 2026) will be able to take on tasks "that would take a person a few days". Consider what this means with regard to METR's time horizon. - Pachocki: "If we really wanted to, we could build an amazing automated mathematician. We have all the tools, and I think it would be relatively easy. But it's not something we're going to prioritize now... there's much more urgent things to do. We are much more focused now on research that's relevant in the real world." - Pachocki does not expect that AI systems will be as smart as humans in all ways by 2028, "but I don't think it's absolutely necessary... you don't need to be as smart as people in all their ways in order to be very transformative."

English
5
2
124
16.7K
LDJ
LDJ@ldjconfirmed·
@juristr L9: You have the AI itself write the optimal coordination layer on the fly for spawning, routing and managing agents programmatically, in the way that works best for a given project and your preferences.
English
1
0
0
106
Juri Strumpflohner
Juri Strumpflohner@juristr·
What's your AI adoption level? (according to Steve Yegge)
Juri Strumpflohner tweet media
English
287
89
989
2.2M
LDJ
LDJ@ldjconfirmed·
@otium33 @BasedBiohacker @bryan_johnson He has already publicly talked about results of his personal peptide experimentation prior to doing his shroom experiments.
English
0
0
0
72
BasedBiohacker
BasedBiohacker@BasedBiohacker·
bryan johnson was NOT tracking 100+ biomarkers, doing blood transfusions with his son and turning down sex to sleep at 8:30 when he was building braintree/ venmo he was dogging it on cheeseburgers, stimulants and zero sleep he was fat. unhappy. suffered from existential doubt. the state of madness that births greatness. he only got into longevity because he lost purpose when he sold his company to paypal for $800 million so no bro, you should not be comparing yourself to nor model your life after what lizard johnson is doing. you can focus on that when you sell your biz for just short of a billion fucking dollars. focus on output and results, not optimizing for 50 years from now. yes - don't nuke your brain with amphetamines and coke, but also don't get so neurotic about health that you sacrifice your potential. max output. max results. you can track your nighttime boners and do total plasma exchanges when you have 100 mill.
BasedBiohacker tweet mediaBasedBiohacker tweet media
English
19
18
527
98.6K
Haider.
Haider.@haider1·
openai probably didn't have the model just sitting there for a while instead, gpt-5.4 was likely a checkpoint from a model that was still training, released when they felt pressure even logan from google told me in DM that: "google doesn't hold models back because the competition is high"
English
42
15
530
83.8K
LDJ
LDJ@ldjconfirmed·
@thebasedcapital I think it did well. It lasted nearly 3 full years.
English
0
0
5
1.2K
basedcapital
basedcapital@thebasedcapital·
@ldjconfirmed benchmarks built to prove ai can't do something have a short shelf life. the gaia authors probably expected it to last years, not months. at some point you have to stop moving the goalposts and accept that general capability is here even if it's not agi
English
1
0
9
1.5K
LDJ
LDJ@ldjconfirmed·
In November 2023, Yann LeCun, Thomas Wolf and others from Meta and Huggingface created a benchmark called GAIA, which described itself as: "A benchmark for General AI Assistants that, if solved, would represent a milestone in AI research." Most of the problem solutions were kept private, not released online. It proposed 466 "real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency." On the hardest level, the average human score was 87%, while the leading systems scored less than 3%. 10 months later OpenAI released O1-preview, reaching ~30% on that level. Now in 2026 the human baseline for the hardest level has officially been surpassed, the best agent systems are now scoring 88.9% on GAIAs hardest level (level 3).
LDJ tweet media
English
25
58
798
78.7K
LDJ
LDJ@ldjconfirmed·
@ThomasScialom Unfortunately I can’t find any human baselines for GAIA 2.
English
1
0
0
1.2K
Thomas Scialom
Thomas Scialom@ThomasScialom·
@ldjconfirmed There is Gaia 2 now which is a very good eval for environments a la Openclaw
English
1
1
13
3K
LDJ
LDJ@ldjconfirmed·
@xundecidability @WaveTheoryAI The difference here is that GAIA is real world questions involving highly specific information that exists amongst human civilization across a diverse set of modalities, not an abstract puzzle.
English
1
0
0
49
xundecidability
xundecidability@xundecidability·
@WaveTheoryAI @ldjconfirmed "designed" is aspirational, its possible their design was defeated. ARC also aspired to this, but the labs soon learned to beat it. O1 was the first model they fine-tuned on it, but they didn't release that version because it was dumber at everything else.
English
1
0
1
118
LDJ
LDJ@ldjconfirmed·
@nithin_k_anil The human baseline score was also matched/surpassed by GPT-5 and Gemini-3-Pro working together without any specialized orchestrator in the loop, and only scored ~2% below the top score by Nvidia. I imagine Opus 4.6, GPT-5.4 and Gemini-3.1 together would get an even better score.
English
0
0
1
194
LDJ
LDJ@ldjconfirmed·
The current highest level 3 score was achieved by Nvidia, leveraging a multi-agent system that includes Nvidias own tool orchestrator model. It scores 89.8% on Lvl 3 (even higher than the 88.9% typo I wrote above) The public leaderboard can be seen here: huggingface.co/spaces/gaia-be…
LDJ tweet media
English
0
1
81
6.5K