Ericlamideas

9.8K posts

Ericlamideas

@ericlamideas

Building things on the computer

Katılım Şubat 2021

821 Takip Edilen6.2K Takipçiler

Ericlamideas retweetledi

Chamath Palihapitiya@chamath·18 May

In an early meeting at Facebook (c. 2007), when I was describing the goals of Facebook Platform (an area I oversaw) Bill Gates yelled at me/us. His quote has stuck with me to this day: “This isn’t a platform. A platform is where the collective sum of revenues of the participants exceeds those of the platform itself.” Ladies and gentlemen, I present to you the tokenmaxxing circle jerk.

English

181

201

3.3K

760.4K

Ericlamideas retweetledi

Dr. D 🌱@optimalfocus·16 May

Claude is down? API Error: 500 Internal server error. This is a server-side issue, usually temporary — try again in a moment.

English

136

7.8K

Ericlamideas retweetledi

OpenAI Developers@OpenAIDevs·28 Nis

You can build interactive applications with gpt-realtime-1.5, so users can control app state more naturally with voice. Hi Chappy 👋

English

222

551

6.9K

1.8M

Ericlamideas@ericlamideas·18 Nis

It’s been effective in improving Agent Skills.

Shopify Engineering@ShopifyEng

Since we open-sourced pi-autoresearch, @Shopify teams have been running it on everything. Results so far: Unit tests: 300x faster React component mounting: 20% faster CI build time: 65% reduction Made pnpm run faster Autoresearch never stops trying things you'd never have time to try. Repo: github.com/davebcn87/pi-a…

English

142

Ericlamideas retweetledi

Andrej Karpathy@karpathy·9 Nis

Someone recently suggested to me that the reason OpenClaw moment was so big is because it's the first time a large group of non-technical people (who otherwise only knew AI as synonymous with ChatGPT as a website) experienced the latest agentic models.

English

272

184

4.1K

514.7K

Ericlamideas@ericlamideas·6 Nis

Is there a repo we can fork? @eglyman

Ramp Labs@RampLabs

Introducing Steer AI. We made an AI that can't stop thinking about any concept you choose, by steering a model's internal representations at inference time. Ask it anything, and watch it bend reality around that concept. Available for one week only.

English

226

Ericlamideas@ericlamideas·3 Nis

one of the most interesting things I've see this year. watch this now.

Anthropic@AnthropicAI

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

English

175

Ericlamideas@ericlamideas·1 Nis

@bcherny this guy is cute and all but can you make him like a senior dev or CTO that looks over my shoulder and helps me?

English

Ericlamideas retweetledi

Boris Cherny@bcherny·1 Nis

@Rahatcodes 👋 This is one of the signals we use to figure out if people are having a good experience. We put it on a dashboard and call it the “fucks” chart

English

273

168

4.7K

309.7K

Ericlamideas retweetledi

Austen Allred@Austen·31 Mar

I’ll simply say this: Everyone I know working at a frontier AI model company is either suffering from extreme AI psychosis or genuinely believes we’re single digit months away from liftoff.

Hadley Harris@Hadley

I’ve heard from 2 people in the last 2 days that internally Anthropic expects to have AGI in 6-12 months. That’s faster than Dario has stated publicly. Plan your business and personal finances appropriately.

English

117

2.1K

259.2K

Ericlamideas@ericlamideas·25 Mar

@rahulgs so in short get to market asap and start iterating..?

English

281

rahul@rahulgs·25 Mar

seems obvious but: things that are changing rapidly: 1. context windows 2. intelligence / ability to reason within context 3. performance on any given benchmark 4. cost per token things that are not changing much: 1. humans 2. human behavior, preferences, affinities 3. tools, integrations, infrastructure 4. single core cpu performance therefore, ngmi: 1. "i found this method to cut 15% context" 2. "our method improves retrieval performance 10% by using hybrid search" 3. "our finetuned model is cheaper than opus at this benchmark" 4. "our harness does this better because we invented this multi agent system" 5. "we're building a memory system" 6. "context graphs" 7. "we trained an in house specialized rl model to improve task performance in X benchmark at Y% cost reduction" wagmi: 1. product/ui 3. customer acquisition 4. integrations 5. fast linting, ci, skills, feedback for agents 6. background agent infra to parallelize more work 7. speed up your agent verification loops 8. training your users, connecting to their systems and working with their data, meeting them where they are

English

109

228

3.3K

403.5K

Ericlamideas retweetledi

brian flynn@Flynnjamm·24 Mar

we are literally all building the same thing

Linear@linear

Issue tracking is dead. We are building what comes next. linear.app/next

English

687

109K

Ericlamideas retweetledi

Thariq@trq212·17 Mar

x.com/i/article/2033…

ZXX

387

2.3K

16.4K

6.9M

Ericlamideas@ericlamideas·16 Mar

@archiexzzz @karpathy This is perturbation basically? Why not use DSPy?

English

2.3K

Archie Sengupta@archiexzzz·15 Mar

Introducing AutoVoiceEvals I've applied the @karpathy autoresearch loop to voice AI agents. It's open source. Your voice agent has a system prompt. That prompt determines how it handles every call - bookings, complaints, edge cases, background noises, long pauses, people trying to trick it. Most teams write it once, test manually, and hope for the best. autovoiceevals makes it a loop. One artifact (system prompt), one metric (adversarial eval score), keep what improves it, revert what doesn't. Run it overnight. Wake up to a better agent. > How it works: You describe your agent in a config file - what it does, its services, policies, and what it should never do. You don't write test cases. You don't define attack vectors. provider: vapi / smallest ai assistant: id: "your-agent-id" description: | Voice receptionist for a hair salon. Maria does coloring only. Jessica does cuts only. $25 cancellation fee under 24 hours notice. Cannot advise on skin conditions. Closed Sundays. From that description alone, Claude generates adversarial caller personas - each with an attack strategy, a voice profile (accents, background noise, mumblers, interrupters), a multi-turn caller script, and pass/fail evaluation criteria. The eval suite is generated once and held fixed for the entire run, like a validation set. > The loop: 1. Read the agent's current prompt from the platform 2. Generate adversarial eval suite from your description 3. Run baseline 4. Claude proposes ONE surgical change to the prompt 5. Push the modified prompt to the agent via API 6. Run all scenarios against the updated agent 7. Score improved? Keep. Same score but shorter prompt? Keep. Otherwise revert. 8. Go to 4. Run until Ctrl+C. The system sees its own experiment history. When a change fails, the next proposal knows what was tried and why it didn't work. We ran 20 experiments on a live Vapi dental scheduling agent. 0 human intervention. > Score: 0.728 → 0.969 (+33%) > CSAT: 45 → 84 > Pass rate: 25% → 100% > 9 kept, 10 discarded > Prompt: 1191 → 1139 chars (better AND shorter) You describe your agent. It figures out how to break it.

English

1.2K

279.4K

Ericlamideas@ericlamideas·15 Mar

@sukh_saroy @grok how would this compare to tobi lutke’s qmd for which is hypothetically more performant?

English

5.8K

Sukh Sroay@sukh_saroy·15 Mar

🚨Breaking: Someone just open sourced a knowledge graph engine for your codebase and it's terrifying how good it is. It's called GitNexus. And it's not a documentation tool. It's a full code intelligence layer that maps every dependency, call chain, and execution flow in your repo -- then plugs directly into Claude Code, Cursor, and Windsurf via MCP. Here's what this thing does autonomously: → Indexes your entire codebase into a graph with Tree-sitter AST parsing → Maps every function call, import, class inheritance, and interface → Groups related code into functional clusters with cohesion scores → Traces execution flows from entry points through full call chains → Runs blast radius analysis before you change a single line → Detects which processes break when you touch a specific function → Renames symbols across 5+ files in one coordinated operation → Generates a full codebase wiki from the knowledge graph automatically Here's the wildest part: Your AI agent edits UserService.validate(). It doesn't know 47 functions depend on its return type. Breaking changes ship. GitNexus pre-computes the entire dependency structure at index time -- so when Claude Code asks "what depends on this?", it gets a complete answer in 1 query instead of 10. Smaller models get full architectural clarity. Even GPT-4o-mini stops breaking call chains. One command to set it up: `npx gitnexus analyze` That's it. MCP registers automatically. Claude Code hooks install themselves. Your AI agent has been coding blind. This fixes that. 9.4K GitHub stars. 1.2K forks. Already trending. 100% Open Source. (Link in the comments)

English

125

529

4.5K

462.1K

Ericlamideas retweetledi

Greg Brockman@gdb·15 Mar

a small window into the opportunity of AGI

vittorio@IterIntellectus

this is actually insane > be tech guy in australia > adopt cancer riddled rescue dog, months to live > not_going_to_give_you_up.mp4 > pay $3,000 to sequence her tumor DNA > feed it to ChatGPT and AlphaFold > zero background in biology > identify mutated proteins, match them to drug targets > design a custom mRNA cancer vaccine from scratch > genomics professor is “gobsmacked” that some puppy lover did this on his own > need ethics approval to administer it > red tape takes longer than designing the vaccine > 3 months, finally approved > drive 10 hours to get rosie her first injection > tumor halves > coat gets glossy again > dog is alive and happy > professor: “if we can do this for a dog, why aren’t we rolling this out to humans?” one man with a chatbot, and $3,000 just outperformed the entire pharmaceutical discovery pipeline. we are going to cure so many diseases. I dont think people realize how good things are going to get

English

190

170

2.9K

389.5K

Ericlamideas@ericlamideas·14 Mar

Every based dev said to try Convex. Constant outages - major regret.

English

203

Ericlamideas retweetledi

tobi lutke@tobi·9 Mar

the singularity has begun. so many signs.

Andrej Karpathy@karpathy

@tobi Who knew early singularity could be this fun? :) I just confirmed that the improvements autoresearch found over the last 2 days of (~650) experiments on depth 12 model transfer well to depth 24 so nanochat is about to get a new leaderboard entry for “time to GPT-2” too. Works 🤷‍♂️

English

143

2.6K

445.4K

Ericlamideas@ericlamideas·8 Mar

@karpathy @grok why is this interesting

English

144

Andrej Karpathy@karpathy·8 Mar

The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.

English

530

710

7.6K

1.2M

Ericlamideas retweetledi

Garry Tan@garrytan·7 Mar

People get high on abstraction too early. They want the system before they’ve earned the insight. But the good abstractions are never designed. They’re discovered. You do the stupid manual thing enough times and the real bottleneck just emerges. Your initial agency might be driven by a hunch you had in the shower, but that moment won’t get you all the way to making something people want. The right way to make anything is forced on you by reality: what are the real jobs to be done? And what sequence? This is why “do things that don’t scale” still hits, especially now when AI makes it trivially easy to scale things that probably shouldn’t be scaled yet. PG’s point was never about suffering. It was about contact. When you’re the one manually doing the loop, you see the edge cases. The weird user behavior. The failure modes nobody designed for. The hidden dependencies that only show up at 2am when some flow or intermediate step breaks in a way you didn’t anticipate. If you automate before you have that contact, you just scale your misunderstanding faster. When the machines can help you vibe code perfection it gives you a false sense of power. I love that feeling as much as you do. But fuck perfection. Do it live. Be the loop. Feel every friction point. Notice what’s actually true every single time versus what just looked true because you hadn’t seen enough cases yet. Formalize that. Build the recursive version. Then keep checking that your abstraction is still attached to real humans and their needs. Because reality drifts. Your users drift. The ground truth changes under you. You may think you understand but no plan survives contact with the real users and what they want. You find those body blows in analytics and user feedback and we call them the roadmap. Humans left with not enough data hallucinate too. But just like the LLMs with enough data you unlock real transcendence. Real utility. Prosperity for humans in real life. The abstraction is a tool, not a destination. The moment you forget that, you’re cooked.

English

152

245

187.7K

Keşfet

@eglyman @bcherny @Rahatcodes @rahulgs @archiexzzz @karpathy @sukh_saroy @grok