baldwin

1.6K posts

baldwin banner
baldwin

baldwin

@baldodev

Living at the Edge of Chaos

Katılım Şubat 2024
123 Takip Edilen77 Takipçiler
baldwin retweetledi
H
H@hcompany_ai·
Holo3 is here 🚀. Today, we're launching Holo3: our new series of frontier computer-use models. 78.9% on OSWorld-Verified. That puts us ahead of GPT-5.4 and Opus 4.6, at one-tenth of the cost. Weights on Hugging Face. API is live. Test it now! #Holo3 #OpenSource #ComputerUse #OSWorld #AI #AgenticAI
H tweet media
English
32
93
812
59K
rahat
rahat@Rahatcodes·
Claude Code has a regex that detects "wtf", "ffs", "piece of shit", "fuck you", "this sucks" etc. It doesn't change behavior...it just silently logs is_negative: true to analytics. Anthropic is tracking how often you rage at your AI Do with this information what you will
rahat tweet media
English
398
549
11K
829.7K
baldwin
baldwin@baldodev·
@x225franc @CommieSlayer8 @Pirat_Nation was playing arc raiders with max settings in 4K with framegen x4 hitting 280 FPS and the experience felt smooth without much input delay, though i am running a 5090
English
1
0
1
134
Kaido
Kaido@x225franc·
@CommieSlayer8 @Pirat_Nation man i swear going to 3x increase latency so fucking much .. why don't they just optimize games ...
English
2
0
3
1.3K
Pirat_Nation 🔴
Pirat_Nation 🔴@Pirat_Nation·
Nvidia's DLSS 4.5 Dynamic Multi Frame Generation and 6X Multi Frame Generation* beta arrives tomorrow >6X Mode lets the AI generate up to five extra frames for every one frame rendered by the GPU. >Dynamic Mode automatically adjusts the frame generation multiplier in real time to target your chosen FPS cap or monitor refresh rate
Pirat_Nation 🔴 tweet mediaPirat_Nation 🔴 tweet media
English
120
87
1.7K
154.2K
baldwin
baldwin@baldodev·
@SamaHoole I must have missed the patch notes cuz i went for a walk this morning in the sun and didn’t even pay a dime!
English
0
0
0
14
Sama Hoole
Sama Hoole@SamaHoole·
The sun was free. They sold you SPF 50 and a vitamin D deficiency. Sleep was free. They sold you an app, a pill, and a wearable that tells you your sleep was bad. Walking was free. They sold you a treadmill, a fitness tracker, and a £180 pair of trainers. Fasting was free. They sold you meal replacement shakes and the anxiety that skipping breakfast would wreck your metabolism. Cold water was free. They sold you a £3,000 plunge barrel and a podcast episode about it. Silence was free. They sold you a meditation app with a premium tier. Animal fat was cheap. They sold you seed oils, then supplements to replace what the animal fat contained. Tallow was cheap. They sold you a seventeen-step skincare routine and a clinical trial proving your face needs ceramides. Meat was cheap. They are currently selling you the idea that you shouldn't eat it. The 20th century removed access to everything the body needs to function. The 21st century is selling it back, one subscription at a time. Your great-grandmother had none of the products. She had all of the things.
English
391
6.6K
28.8K
2.4M
baldwin
baldwin@baldodev·
i shall no longer lose braincells reading schizoposts on AGI
baldwin tweet media
English
0
0
0
21
baldwin retweetledi
Evolve Performance
Evolve Performance@evolveOS_·
We spent 6 months building a PC optimization tool that actually tells you what it does. No "boost your PC" marketing. Here's exactly what Evolve strips, tunes, and rewires - and why your $2000 setup still feels sluggish on stock Windows. Launching in 2h.
English
5
13
32
9.2K
baldwin retweetledi
Evolve Performance
Evolve Performance@evolveOS_·
We built custom kernel configurations for every major CPU and GPU combination. Not a preset pack. Not a registry cleaner. A desktop app that knows your hardware. Launching tomorrow with 10 free upgrades from Evolve to our highest tier ($180 value). Follow us so you dont miss it
Evolve Performance tweet media
English
6
16
26
7.5K
baldwin retweetledi
𝐀𝐍𝐓𝐔𝐍𝐄𝐒
Host: Sir, do you know if Iranians are starving? Trump: Yeah I do. But you’re so sexy.
English
740
3.3K
33.5K
5.2M
baldwin retweetledi
Mark Gadala-Maria
Mark Gadala-Maria@markgadala·
While Americans argue over what is "AI slop" the Chinese are busy creating absolute cinema using Seedance 2.
English
246
510
5.9K
612.8K
baldwin
baldwin@baldodev·
@AI_evangelist42 @arcprize a more correct representation of the results is: "raw API models, without any tool access (not representative of real world use cases), score 100x worse than top 20% of humans on this specific benchmark"
English
0
0
1
25
baldwin
baldwin@baldodev·
@AI_evangelist42 @arcprize they did everything possible to make the AI models seem as stupid as possible, including misleading "percentage" scores "<1%" is the squared error, for example a 10/100 score would be 1%, so the model would look 100x worse when it's actually be 10x
English
1
0
1
12
ARC Prize
ARC Prize@arcprize·
Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn
GIF
English
237
586
4.3K
680.8K
baldwin
baldwin@baldodev·
@d4m1n "meanwhile 100% of human testers solved every single environment. first try." no, they only picked top #2 out of 10 human scores for each game. i guess the other 8 people can't be classified as humans lol?
English
0
0
2
114
Dan ⚡️
Dan ⚡️@d4m1n·
so let me get this straight > OpenAI renamed their whole division "AGI Deployment" > Jensen said AGI is "already in the room" and then ARC-AGI-3 drops and on a scale from 0 to 100% SOTA models score: GPT-5.4: 0.26% Gemini Pro: 0.37% Claude Opus 4.6: 0.25% Grok: literally 0% meanwhile 100% of human testers solved every single environment. first try. no instructions. no training. AGI is "in the room" brother it couldn't find the room
English
143
180
2.9K
245.4K
baldwin
baldwin@baldodev·
more AGI propaganda 😔, jensen huang and literally the inventor of the term "AGI" both said we have AGI, so what point is ARC-AGI trying to make now? no harness, unscientific study, and a very human-biased scoring methodology. Plus a misleading "percentage" labeling
ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English
1
0
0
59
Kyler
Kyler@AI_evangelist42·
@arcprize humans beating AIs by 100x is unprecedented. sounds like a great benchmark
English
3
0
37
5.6K
baldwin
baldwin@baldodev·
@kevinhoff @DeryaTR_ it is not really a percentage, it's their own human-biased efficiency score #score-interpretation" target="_blank" rel="nofollow noopener">docs.arcprize.org/methodology#sc
English
0
0
0
4
Kevin Hoff
Kevin Hoff@kevinhoff·
@DeryaTR_ The benchmark isn't the point. The gap it reveals is. Argue the methodology all day, the models still score under 1%.
English
1
0
0
88
Derya Unutmaz, MD
Derya Unutmaz, MD@DeryaTR_·
ARC-AGI-3 is an important benchmark. However, I have a major issue with the “Human score 100%” statement. How many humans have tested all 1000 puzzles? How were people selected? This was not published for previous ARCs either. In one case, the human score was based on I think 2 people. This is really an unscientific way, as it assumes all humans are the same or that previous exposure to puzzles or video games, for example, is not considered. What education level and background did these humans have? I am sure humans will still score highly, but it would be very surprising if this was 100%. Without this data and scientific measurement, this appears a biased test that assumes solving 100% of the puzzles is purely intrinsic intelligence common to all humans.
ARC Prize@arcprize

Announcing ARC-AGI-3 The only unsaturated agentic intelligence benchmark in the world Humans score 100%, AI <1% This human-AI gap demonstrates we do not yet have AGI Most benchmarks test what models already know, ARC-AGI-3 tests how they learn

English
55
22
280
39.1K
baldwin retweetledi
ARC Raiders PVP
ARC Raiders PVP@ArcRaidersPVP·
"If you kill me you hate women"
English
180
1.2K
18.1K
1.3M
baldwin
baldwin@baldodev·
why are my DEXA scan results like 10x less accurate than my inbody results
English
0
0
0
23
Osvaldo Pinali Doederlein
Oh my god, from @digitalfoundry: this thing needs TWO RTX 5090's to render those demos 🤣😭😭😭 Yes it will be optimized but don't hope for 3X. This tech will be a techdemo for Blackwell users, practical only for RTX 6000. So here's your carrot for the next upgrade.
Osvaldo Pinali Doederlein tweet media
English
62
181
3.2K
160.4K