edward

1.9K posts

edward

@edwardmarkdown

international disputes resolution lawyer, victim of ai psychosis

Paris, France Katılım Ağustos 2014

1K Takip Edilen230 Takipçiler

Sabitlenmiş Tweet

edward@edwardmarkdown·11 Şub

we used to proofread drafts with our bare eyes

English

425

edward@edwardmarkdown·5h

@jackclarkSF @deredleritt3r also, just read my initial reply, and it says 'should probably be discounted'. did claude tell you that i'm *confident* "in being able to discount what [you] think and how [you] think" or you're hallucinating to?

English

edward@edwardmarkdown·5h

c'mon man don't be such a pussy. this wasn't meant to attack you personally. all i've said is that you're not a technical person and being a co-founder of anthropic makes you biased. if i got any part wrong, i apologize. what makes me confident? nothing really. but i can't take your word for it neither, partly due to reasons above, partly because i don't know you personally, partly because i don't believe everything people post on fucking twitter. what i (hopefully) have is critical thinking and it makes me suspicious when a bias non-expert claims something deeply technical which also conveniently helps your company fundraise. this is what i meant by discounting. i'm not confident in anything really. your claims might be correct, or they might not be ( i hope they are correct). but i don't know and i can't trust neither you nor claude for god's sake. i can't fathom how anything above could offend you. but maybe it's just me so if i did offend you, i, again, apologize.

English

prinz@deredleritt3r·8h

Jack Clark believes that there's a ~60% likelihood that we will reach fully automated AI research in 2028, but only a ~30% likelihood we will reach it in 2027. The full Substack post (linked in the tweet below) explains his rationale and is worth reading in its entirety.

Alex Imas@alexolegimas

In today's newsletter @jackclarkSF predicted that full no-human-involved AI R&D will happen by the end of 2028. Much of the pushback against RSI has been that AI has not yet shown the capacity to generate fully new ideas. This is the key part from Jack's post: the majority of AI research doesn't need for this to happen---RSI in AI research can just be driven by `meat and potatoes' engineering work. open.substack.com/pub/importai/p…

English

150

13.1K

edward@edwardmarkdown·5h

@jackclarkSF @andrksl @deredleritt3r > have claude rate my understanding excellent methodology lol

English

Jack Clark@jackclarkSF·6h

@andrksl @edwardmarkdown @deredleritt3r It's 90% my own understanding, 10% colleagues. My hobby is sitting around and reading arxiv papers. I read the papers until I understand them, then I write summaries, have claude rate my understanding, etc. For Import AI, I've now read something like ~5,000 papers over 10 yrs

English

104

edward@edwardmarkdown·5h

@sama @jparkjmc openai trained a model to shitpost on twitter on behalf of sam ai ceo is coming sooner than expected

English

134

Sam Altman@sama·8h

@jparkjmc i do

384

28.2K

jpark@jparkjmc·16h

do you guys think sam altman will respond to a RL env company shillpost

English

169

30.8K

edward@edwardmarkdown·7h

@signulll > we need to add back some friction why

English

signüll@signulll·19h

uh i think traveling is a bit too easy these days, we need to add back some friction somehow. ppl are casually doing 97 countries by the time they are like 25.

English

1.7K

84.6K

edward@edwardmarkdown·7h

@flowersslop probably because ~99% of instant's users are normies on free tier and they already satisfied with 5.3 quality

English

500

Flowers ☾@flowersslop·8h

why does 5.5 not have an instant version if its based on a new pretrain?

English

123

14.5K

edward@edwardmarkdown·7h

ofc i know he’s anthropomorphic’s cofounder. just saying that he’s not a technical person so he just saying what his employees tell him multiplied by marketing + fundraising goals this is not to criticize your post. i thank you for distributing the info because it’s interesting what jack got to say. just think that what he says should probably be discounted by a large margin because he doesn’t have the necessary expertise to make such claims and his obvious bias.

English

135

prinz@deredleritt3r·7h

@edwardmarkdown Jack is a co-founder of Anthropic, and I certainly take the things he says very seriously.

English

252

edward@edwardmarkdown·9h

starting today, i'll refer to so called anthropic exclusively as 'anthropomorphic'

English

edward@edwardmarkdown·9h

@VictorTaelin fear-mongering is a lie. anthropomorphic just doesn't have compute to serve mythos for general public.

English

Taelin@VictorTaelin·1d

I am still upset about Mythos I don't think I've ever felt so betrayed by a company the evil in me is secretly hoping Bend2 is a massive fucking hit just so I can leave them out (: ridiculous we had GPT 5.5 for a week and fucking nothing happened

AI Security Institute@AISecurityInst

OpenAI’s GPT-5.5 is the second model to complete one of our multi-step cyber-attack simulations end-to-end 🧵

English

533

47K

edward@edwardmarkdown·14h

@flowersslop just leave it at xhigh and see that it’s already true, ie the model won’t think to much on simple questions

English

Flowers ☾@flowersslop·23h

gpt models coming in reasoning strenghts from low to xxhigh is kinda bad UX tbh, if the models are so smart than they should just know how much to think without me having to tell them, meaning they should just automatically know which tasks require how much reasoning

English

7.4K

edward@edwardmarkdown·23h

@nikitabier funny that this tweet doesn’t have a “made with ai” disclaimer

English

157

Nikita Bier@nikitabier·1d

What it’s like working at X

English

1.7K

519

8.9K

897.3K

edward@edwardmarkdown·23h

@TheAhmadOsman @sama 5.5 thinking and pro are generally available mythos is not tell me who democratizes ai and who doesn’t?

English

303

Ahmad@TheAhmadOsman·1d

@sama So am I invited? :) Would love to hear more about “democratize a lot of super capable AI” and discuss what that means for the folks like me that believe in “Open” AI

Ahmad@TheAhmadOsman

I am the only one in the Bay Area who didn’t sign up for the OpenAI party Because if I were them I wouldn’t let me in 😂

English

141

44.4K

Ahmad@TheAhmadOsman·1d

The difference between Anthropic and OpenAI is that one of them consistently keeps gaslighting us about not being an evil company Big brother energy in the worst possible way

English

1.1K

195.1K

edward@edwardmarkdown·23h

@TheAhmadOsman you can’t mean that anthrop[omorph]ic doesn’t gaslight us about not being evil

English

edward@edwardmarkdown·2d

@eriskiiii a good one

English

Eris@eriskiiii·2d

Wtf is this benchmark

prinz@deredleritt3r

Added to prinzbench: DeepSeek-V4 (Pro). This model did not perform well on my benchmark. Its result (23/99) was comparable to those of older models like Grok 4 and Kimi K2 Thinking.

English

2.1K

edward@edwardmarkdown·3d

@sethsaler @jshobrook AHAHAHAHHHAHAHAHAHAHAHHAHAHAH

Filipino

Seth Saler@sethsaler·3d

@jshobrook Hoping this timeline stays true.

Elon Musk@elonmusk

@techdevnotes Supplemental training has been added to 4.3. Grok 4.4 will be twice the size (1T) with training data through early April. Probably ready for release in early May. Grok 4.5 will be 1.5T and hopefully out by late May.

English

1.3K

Jonathan Shobrook@jshobrook·3d

We beat Sonnet 4.6 with a 500B model. Bigger runs are on the way.

Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English

126

1.8K

272.3K

edward@edwardmarkdown·3d

@hopes_revenge have you tried sleeping until 10?

English

hope hopes hoping@hopes_revenge·3d

my chungus morning with the expensive coffee maker Skooks influenced ( tricked ) me into purchasing . whereupon i soon realized i don’t really care what my coffee tastes like as long as it helps me withstand the onslaught of pain and boredom that assails me sharply at 9 am .

English

228

10K

edward@edwardmarkdown·3d

@uwukko @luciascarlet say the name

English

695

wukko@uwukko·3d

@luciascarlet i pay a small fee for access to a piracy streaming service with all movies/shows from everywhere ever for convenient streaming on all devices instead of maintaining a home server or a torrent box

English

464

16.4K

† lucia scarlet 🩸@luciascarlet·4d

what’s your most embarrassing sub mine is either ChatGPT Plus or SoundCloud Pro for that one account I don’t even release music on

English

281

32.7K

edward@edwardmarkdown·3d

@deredleritt3r @ValsAI @xai @elonmusk also sad and pathetic

English

edward@edwardmarkdown·3d

@deredleritt3r @ValsAI @xai @elonmusk hahahahaha hilarious

Filipino

Vals AI@ValsAI·4d

Today @xai just rearranged our leaderboards… Grok 4.3 jumped 25 points to take #1 on CaseLaw v2 and climbed 21 spots to lead CorpFin at 68.5%. Congrats @xai @elonmusk 🚀

English

348

3.3K

665.4K

Keşfet

@jackclarkSF @deredleritt3r @andrksl @sama @jparkjmc @signulll @flowersslop @VictorTaelin