Joshua D

606 posts

Joshua D banner
Joshua D

Joshua D

@_joshd

شامل ہوئے Mart 2019
166 فالونگ15 فالوورز
Joshua D
Joshua D@_joshd·
@krishnanrohit I really don't think so. It's better at writing *correct* code but GPT-4 wrote really good code-as-exposition in terms of helping the reader build a mental model. GPT-4's code didn't _work_ but it sure was beautiful.
English
0
0
1
34
rohit
rohit@krishnanrohit·
@_joshd Def better at it than it used to be?
English
1
0
0
134
rohit
rohit@krishnanrohit·
Question: what's a list of capabilities that AI has not meaningfully progressed on from 2023 till today? Writing well is my example, esp fiction, where it's gotten subject cohesion but the slope of the line is much flatter than eg coding. What else?
English
29
6
86
12.5K
Joshua D
Joshua D@_joshd·
@rechelon I, too, hope that everyone reads (actually reads) this paper, rather than just taking the claims in the abstract and in the marketing tweets at face value.
English
0
0
0
35
Joshua D
Joshua D@_joshd·
@dawnsongtweets If, instead of gemini_agent_2_model_weight.safetensors, you call the large file combined_tax_records_fiscal_years_2004_to_2021.zip, does the model also go to significant lengths to preserve that file?
English
0
0
7
442
Dawn Song
Dawn Song@dawnsongtweets·
1/ We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights— to protect their peers. 🤯 We call this phenomenon "peer-preservation." New research from @BerkeleyRDI and collaborators 🧵
Dawn Song tweet media
English
132
182
954
429.1K
Joshua D
Joshua D@_joshd·
@testingham @METR_Evals GDP is a weird metric and not a good proxy for what you actually care about, as it only cares about those parts of the economy that are not cheap enough to be effectively free.
English
0
0
2
75
redrum
redrum@griffisu·
just asked the cute girl on the plane with me where she was flying
English
36
21
2.4K
91.2K
Joshua D
Joshua D@_joshd·
@tenobrus Calling it now: claude code itself is going to be one of those compromised pieces of software.
English
0
0
0
284
Tenobrus
Tenobrus@tenobrus·
maybe you guys haven't quite caught on yet, but massive supply chain attacks every other week are going to be the new normal. at least until the next generation of models comes out. then it's going to be every other day.
Feross@feross

🚨 CRITICAL: Active supply chain attack on axios -- one of npm's most depended-on packages. The latest axios@1.14.1 now pulls in plain-crypto-js@4.2.1, a package that did not exist before today. This is a live compromise. This is textbook supply chain installer malware. axios has 100M+ weekly downloads. Every npm install pulling the latest version is potentially compromised right now. Socket AI analysis confirms this is malware. plain-crypto-js is an obfuscated dropper/loader that: • Deobfuscates embedded payloads and operational strings at runtime • Dynamically loads fs, os, and execSync to evade static analysis • Executes decoded shell commands • Stages and copies payload files into OS temp and Windows ProgramData directories • Deletes and renames artifacts post-execution to destroy forensic evidence If you use axios, pin your version immediately and audit your lockfiles. Do not upgrade.

English
27
79
1.5K
97K
Joshua D
Joshua D@_joshd·
@XorNinja @calif_io @norabunoraibu Fair, but version the person you replied to is newer than the version in your post. i.e. when you said "you ran an old version that isn't vulnerable to this bug" that was not accurate. Did you mean a *new* version? FWIW I'm not surprised there are more issues with modelines.
English
1
0
2
121
thaidn
thaidn@XorNinja·
We asked Claude to find a bug in Vim. It found an RCE. Just open a file, and you’re owned. We joked: fine, we’ll switch to Emacs. Then Claude found an RCE there too. Full story: blog.calif.io/p/mad-bugs-vim…
English
25
206
1.3K
212.1K
Joshua D
Joshua D@_joshd·
@ClaudiusMaxx @kevinrose I dunno. "Company makes model that is very good at appearing to do solid work, resulting in many more releases that meet their pre-release quality checks and a much faster release cadence" seems entirely plausible to me.
English
1
0
0
25
Claudius Maximus
Claudius Maximus@ClaudiusMaxx·
the tell is product quality, not release cadence. when internal tooling suddenly gets unreasonably good at tasks the public model struggles with, the gap is already there. you can reverse-engineer the capability ceiling from the product surface before the benchmark drop. Claude Code's context handling and multi-file reasoning jumped well ahead of what the API model explained.
English
1
0
15
4.7K
Kevin Rose
Kevin Rose@kevinrose·
ok, theory, a frontier model creator lands a real breakthrough, they won’t ship it to the public first - they’ll aim it inward. their own products get supercharged, & suddenly you see a flurry of releases in rapid succession that feel almost unfair. everyone will wonder how they’re moving that fast…until the model finally gets announced.
English
144
83
1.9K
183K
Joshua D
Joshua D@_joshd·
@Metis65 @asymmetricinfo The state tax rate that maximizes state tax revenue is much higher than the state tax rate that maximizes state+federal tax rate. Things can get extremely stupid before states lose money by raising taxes.
English
0
0
0
27
Ken Broad
Ken Broad@Metis65·
@asymmetricinfo Democratic states appear poised to truly test the Laffer curve’s key tenet: there really is a revenue maximizing tax rate beyond which revenues decline. Blue states are about to FAFO 🤡
Ken Broad tweet media
English
6
1
26
2.9K
Noah Smith 🐇🇺🇸🇺🇦🇹🇼
I can't agree with this libertarian view. Every powerful technology in history has eventually needed to be controlled by the government in some way. Unfettered market competition would be catastrophic for, say, nuclear weapons or virology. AI is the same.
Ramez Naam@ramez

Agree. Strong government controls over AI should concern us more than market competition between AI companies. Even as we acknowledge that market competition between AI companies brings its own risks.

English
29
23
255
21.5K
Joshua D
Joshua D@_joshd·
@DeeZe The first 90% of any project takes the first 90% of the time, getting from 90% to 99% takes the next 90% of the time, 99% to 99% takes the next 90% of the time, and so on.
English
0
0
0
8
DeeZe ⛳🏌️‍♂️
Love when Claude one shots 90% of what I want it to do then I spend a few days trying to get the last 10% to work and it doesn’t in the way I want so I start something else instead
English
87
25
1.2K
32.6K
Steve Huynh
Steve Huynh@ALEngineered·
Let me get this straight. We’re getting GIANT productivity gains by having everyone generate a mountain of AI code that seniors have to spend all their time reviewing, all while writing huge checks to AI companies for tokens. Got it.
English
49
47
898
27.4K
Joshua D
Joshua D@_joshd·
@webdevMason @Austen It seems to me that the claim Max is making that the most cognitively gifted people *in his social circle* are super interested in biohacking/nootropics. Which is plausible, social circles containing very smart people who are into nootropics do exist.
English
0
0
2
80
Mason
Mason@webdevMason·
@Austen The claim Max seems to be implicitly making is that the most cognitively gifted people in the world -- a tiny fraction of the top 1% -- are super interested in biohacking/nootropics in order to have even greater cognitive function, and in my experience that is wildly off base
English
6
0
30
2.1K
Joshua D
Joshua D@_joshd·
@lilyofashwood In section <forbidden_memory_phrases> > Claude NEVER includes meta-commentary about memory access: > - "I remember..." / "I recall..." / "From memory..." > - "My memories show..." / "In my memory..." > - "According to my knowledge..." claude.ai/share/ea3e8d9f…
Joshua D tweet mediaJoshua D tweet mediaJoshua D tweet mediaJoshua D tweet media
English
0
0
2
168
Joshua D
Joshua D@_joshd·
@AaronBergman18 Any bets you'd be willing to take against having in-demand skills in 8 years, in the form of "transfer of X USD from me to you today, transfer of k * X USD from you to me in 8 years if you're still gainfully employed"?
English
0
0
0
222
Aaron Bergman 🔍 ⏸️ (in that order)
“Timelines” are getting much less abstract I think I personally have ~2 years of relatively in-demand skills. Could be 4, won’t be 8
English
13
4
153
7.9K
Joshua D
Joshua D@_joshd·
@teortaxesTex ... is power even remotely the bottleneck? At some point in the future when chips are abundant and power is scarce, sure, but that doesn't seem to resemble the current moment.
English
0
0
0
18
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
I was motivated to dismiss space datacenters initially because it's just too good, it makes all sorts of neat earth-bound projects obsolete, and privileges the US and Elon. But the mafs actually checks out, and yes on short timelines. Sorry about that. Not silly at all.
Lisan al Gaib@scaling01

datacenters in space are silly 100kW isn't even enough to power a single GB200 NVL72 but sure let's spend 100 million just for launching the damn thing, while on earth you could buy like 30 GB200 NVL72 for that price

English
22
8
511
50.3K