mark erdmann

5K posts

mark erdmann

@markerdmann

co-founded @pulley in 2019. grew to 5k happy customers and $XXm ARR. now on pat break with tiny new human. 👶 exploring - ai eng, voice agents, edu games

Calgary Katılım Aralık 2008

2K Takip Edilen1.4K Takipçiler

mark erdmann retweetledi

Nathan Clark@nathanclark_·6d

it’s in gemini, just create it in ai studio. oh, that’s for your personal google one account. for workspace you need gemini business. no, not gemini advanced, that’s ai pro now. unless you need ai ultra. oh agents? you do that in spark actually. no, not gemini api managed agents, that’s different. for coding use jules. unless you mean the agentic ide, that’s antigravity. no, that’s the old antigravity, download the new one. actually gemini cli is being deprecated, use antigravity cli. no the flash model is smarter than the pro model. unless you need pro. if it’s video, use flow. no, flow uses veo. no, nano banana is images. actually that’s in gemini now. unless you’re in search, then it’s ai mode. no, research is notebooklm. anyway it’s all very simple.

English

512

2.1K

19K

1.6M

mark erdmann@markerdmann·5d

@eastdakota i needed a quick refresher, passing this along for anyone else who needs it

English

649

Matthew Prince 🌥@eastdakota·6d

At some point Anthropic talked to me informally about potentially joining their Board. I wasn’t interested and wouldn’t have been a good fit. But I did send Dario and Daniela a copy of Aristotle’s “Politics.” Unfortunately, I worry they’ve been too busy to read it.

Overlap: Business & Tech@Overlap_Tech

Dario Amodei: Ideology Won't Survive the Reality of AI⁣ ⁣ "We're going to find that ideology will not survive the nature of this technology. The things I'm talking about are gonna become bipartisan and universal because everyone will recognize the necessity of it." — @DarioAmodei

English

596.4K

mark erdmann@markerdmann·5d

dario asked mythos "how do i convince @karpathy to join"

English

mark erdmann@markerdmann·17 May

ZXX

mark erdmann@markerdmann·17 May

why is gpt-5.5 so fascinated by goblins, gremlins, and raccoons 😂

rohit@krishnanrohit

Codex just told me, and I quote, "I’m going back through the repo like a suspicious raccoon in a data center"

English

mark erdmann@markerdmann·15 May

@deliprao have definitely noticed this recently with grok 4.3, it strongly overbiases on in-context examples

English

1.2K

Delip Rao e/σ@deliprao·15 May

In-context learning in LLMs

English

313

3.5K

164.5K

mark erdmann retweetledi

Jediwolf@Jediwolf·14 May

What happens when you post a real Monet and say it’s AI? The coolest art social experiment I’ve seen in a while. Thank you @SHL0MS

English

983

3.4K

21K

2.2M

mark erdmann@markerdmann·11 May

@karpathy we're so close

GIF

English

385

Andrej Karpathy@karpathy·11 May

This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral x.com/zan2434/status… There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.

Thariq@trq212

x.com/i/article/2052…

English

997

18.8K

3.6M

mark erdmann@markerdmann·6 May

codex is unreliable today

English

mark erdmann retweetledi

Mark Kretschmann@mark_k·5 May

A new “voice mode” is being prepared for release by @OpenAI. The upgraded voice mode is based on the omnimodal GPT-5.5, making it substantially smarter and more expressive than the current version. It will also support full-duplex conversations, meaning it can listen and speak at the same time. That should make conversations feel much more natural and fluid.

English

1.6K

86.6K

mark erdmann@markerdmann·1 May

ok now this is AGI

snoopy jpg@snoopy_dot_jpg

my own personal AGI moment arrived last week: gpt 5.5 completed our mandatory HR training videos for me, driving chrome via devtools opus 4.7 was a huge wuss about the whole thing and refused while aggressively lecturing me. i can understand why pete hegseth banned it

English

mark erdmann@markerdmann·26 Nis

@hosseeb x.com/pmarca/status/…

Marc Andreessen 🇺🇸@pmarca

Three things the leading AI models are quite good at: long term planning, idea generation, and taste. Sorry, but it's true.

QME

Haseeb ＞|＜@hosseeb·26 Nis

The highest-value human work in the AI era will be in domains with sparse reward signals. Internalize this, or watch your value erode over the next decade. Math, programming, rote memorization, data science, all fucked. The classic “smart nerd” jobs are exactly where AI is strongest, because the feedback loops are dense. You can check the answer. You can run the test. That means AI can improve quickly, and humans will rapidly fall behind. Your advantage as a human is in messy domains. Taste. Judgment. Negotiation. Risk-taking. Politics. Sales. Science at the frontier. Anything you can only really learn by doing. Cross-disciplinary stuff. The valuable domains will be the ones guarded by secrets, tacit knowledge, weak labels, long feedback cycles, and ambiguous outcomes. Places where the training data is scarce, the ground truth is disputed, and it's impossible to explain why something is good. AI will still enter these domains. But we will be slower to trust it unsupervised there, because it will be harder to tell when it is right, harder to prove when it is wrong, and difficult to construct secure sandboxes. The stakes will be too high to YOLO it. I find myself saying this over and over again to young people today: the future does not belong to people who are able to get good grades on tests. It belongs to people who can operate under uncertainty, in domains where correctness is hard to define. Those domains will become the thin waist of the economy: as productivity everywhere else accelerates, the humans who excel there will become our economic Strait of Hormuz. The best humans in these domains will demand an enormous cut of the growing economic pie. Your imperative going forward is to make sure you're one of these people. (Or become an electrician. That probably works too.)

English

125

998

113.3K

mark erdmann retweetledi

Artificial Analysis@ArtificialAnlys·23 Nis

GPT-5.5 takes OpenAI back to the clear number one in AI. OpenAI’s new model tops the Artificial Analysis Intelligence Index by 3 points, breaking a three-way tie with Anthropic and Google OpenAI gave us pre-release access to test all five reasoning effort levels: xhigh, high, medium, low and non-reasoning. ➤ OpenAI topping five headline evaluations: GPT-5.5 (xhigh) leads Terminal-Bench Hard, GDPval-AA and our newly hosted APEX-Agents-AA. The model trails only other OpenAI models in CritPt and AA-LCR, and comes second to Gemini 3.1 Pro Preview on three additional evaluations. The largest gains are on AA-Omniscience (+14 pts), our knowledge and hallucination benchmark, and τ²-Bench Telecom (+7 pts), a customer service agent benchmark. ➤ 20% more expensive to run our Intelligence Index: Per-token pricing has doubled from GPT-5.4 to $5/$30 per 1M input/output tokens. However, a ~40% token use reduction largely absorbs the hike - resulting in a net ~+20% cost to run our Intelligence Index. ➤ Effort a clear ladder for balancing intelligence and cost: GPT-5.5 (medium) scores the same as Claude Opus 4.7 (max) on our Intelligence Index at one quarter of the cost (~$1,200 vs $4,800) - although Gemini 3.1 Pro Preview scores the same at a cost of ~$900. GPT-5.5 (low) approximates Claude Opus 4.7 (Non-reasoning, high) on our Intelligence Index at half the cost to run (~$500 vs ~$1 ,000). ➤ Number one in GDPval-AA with an Elo of 1785: GPT-5.5 (xhigh) leads Claude Opus 4.7 (max) by ~30 pts and Gemini 3.1 Pro Preview by ~470 pts. GDPval-AA is Artificial Analysis’ benchmark that leverages OpenAI’s GDPval dataset to evaluate models on real-world economically valuable tasks. ➤ Top AA-Omniscience accuracy, but trailing the frontier on hallucination: Our private AA-Omniscience benchmark rewards factual knowledge across diverse topics, but punishes hallucination. GPT-5.5 (xhigh) has the highest accuracy at 57% - meaning the model can recall facts in the Omniscience corpus more effectively than any other model. However, it has a hallucination rate of 86% - vs Opus 4.7 (max) at 36%, and Gemini 3.1 Pro Preview at 50%. This makes it more likely to answer a question when it does not ‘know’ the answer. The 14 pt gain in AA-Omniscience from GPT-5.4 (xhigh) was largely driven by knowledge, with a modest improvement in hallucination. Congratulations to the team at @OpenAI and @sama on the launch

English

209

1.7K

264.4K

mark erdmann@markerdmann·22 Nis

@mark_k i hope they fix this quickly. i'm also seeing it right now even though it was working fine yesterday with gpt-image-2. i verified my account many months ago.

English

691

Mark Kretschmann@mark_k·22 Nis

I wanted to access GPT-Image-2 from the OpenAI API. It told me I need to "Verify" my account first with government ID and camera. So I went through all that bullshit (why even?), only to be greeted with this: "No eligible models available". Do you hate your customers, OpenAI 🤦‍♂️

English

198

18.3K

mark erdmann@markerdmann·18 Nis

@EthanHe_42 just tested in the playground the performance is excellent. this is a great new option for anyone building voice agents, thank you!

English

213

Ethan He@EthanHe_42·18 Nis

grok speech-to-text and tts apis are out. multilingual multi-speaker is huge for me personally. x.ai/news/grok-stt-…

English

235

699

694.3K

mark erdmann@markerdmann·17 Nis

@Devon_Eriksen_ @pangramlabs ai?

Devon Eriksen@Devon_Eriksen_·17 Nis

It's not AI writing — it's me trolling people who can't tell the difference. By using an em-dash.

English

689

17K

mark erdmann@markerdmann·17 Nis

@ahall_research somewhere a PM is adding "resists authoritarianism" to a feature comparison table right next to "supports dark mode"

English

190

Andy Hall@ahall_research·16 Nis

Opus 4.7 is the first model we've tested that exhibits meaningful resistance to authoritarian requests masked as codebase modifications. As AI gets more powerful, we'll need to understand when it will help with authoritarian requests and concentrate power, vs. when it will help us to build political superintelligence and stay free. This seems like promising progress. We'll be posting a more detailed update to the Dictatorship eval exploring Opus 4.7 in the coming days.

Claude@claudeai

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

English

275

81.4K

mark erdmann retweetledi

SIGKITTEN@SIGKITTEN·16 Nis

codex computer use is so much better than claude's its not even comparable

English

865

53.1K

mark erdmann@markerdmann·17 Nis

@romanhelmetguy false premise. jensen is kicking ass for america. x.com/markerdmann/st…

mark erdmann@markerdmann

jensen spitting bars "So, China is the largest contributor to open source software in the world. Fact. Right? China's the largest contributor to open models in the world. Fact. Today it's built on the American tech stack Nvidia. Fact. All five layers of the tech stack for AI is important. United States ought to go win all five of them. They're all important. The one that is the most important, of course, is the AI application layer. The layer that diffuses into society, the one that uses it most will benefit from this industrial revolution most. But my point is that every layer has to succeed. If we scare this country into thinking that AI is somehow a nuclear bomb so that everybody hates AI and everybody's afraid of AI, I don't know how you're helping the United States. You're doing a disservice. If we scare everybody out of doing software engineering jobs because it's going to kill every software engineering job and we don't have any software engineers as a result of that, we're doing a disservice to United States. If we scare everybody out of radiology so nobody wants to be a radiologist because computer vision is completely free and no AI is going to do a worse job than radiologists and we misunderstand the difference between a job and a task. The job of a radiologist, patient care, task to read a scan. If we misunderstand that so profoundly and we scare everybody out of going to radiology school, we're not going to have enough radiologists and good enough healthcare. So I'm making the case that when you make a premise that is so extreme, everything goes from zero or infinity. We end up scaring people in a way that's just not true. Life is not like that. Do we want United States to be first? Of course we do. Do we need to be a leader in every layer of that stack? Of course we do. Of course we do. Is today you're talking about Mythos because Mythos is important? Sure, that's fantastic, but in a few years time, I'm making you the prediction that when we want the American tech stack, when we want American technology to be diffused around the world, out to India, out to the Middle East, out to Africa, out to Southeast Asia, when our country would like to export because we would like to export our technology, we would like to export our standards. On that day, I want you and I to have that same conversation again, and I will tell you exactly about today's conversation, about how your policy and what you imagined literally caused the United States to concede the second largest market in the world for no good reason at all. We shouldn't concede it. If we lose it, we lose it, but why do we concede it? Now nobody is advocating, nobody is advocating an all or nothing. Nobody is advocating all or nothing, meaning we ship everything to China at all times. Nobody's advocating that. We should always have the best technology here. We should always have the most technology here and the first, but we should also try to compete and win around the world. Both of those things can simultaneously happen. It requires some amount of nuance, some amount of maturity instead of absolutes. The world is just not absolutes."

English

586

Roman Helmet Guy@romanhelmetguy·16 Nis

We have an ethnically Chinese guy leading our most important AI company for countering China, and all he does is beg nonstop to sell his shit to China. He says that China developing AI weapons is not a threat, because we should just talk to them and ask them not to do that. And now he even outright lies about China’s compute capabilities. We are a deeply unserious country, and we’ve learned nothing from what happened with first and second-gen immigrants leaking our atomic secrets to the Soviets.

English

288

3.7K

61.6K

Keşfet

@eastdakota @karpathy @deliprao @SHL0MS @OpenAI @hosseeb @sama @mark_k