Eli Gerhard

195 posts

Eli Gerhard banner
Eli Gerhard

Eli Gerhard

@eligerhard

Prompt engineering shouldn't exist | Ex-Stripe

NYC Beigetreten Mayıs 2015
359 Folgt72 Follower
Eli Gerhard
Eli Gerhard@eligerhard·
@weswinham @AlecStapp Oh good catch. I was using their announcement date. 🤦‍♂️ But the core premise stands though. It's a previous generation chip that is about halfway between US best chips and China best chips. Thoughts on it being a good balance of maintaining US lead while being a good "carrot"?
English
1
0
0
6
Eli Gerhard
Eli Gerhard@eligerhard·
That's a bit of an absolute statement if we're aiming for practicality, don't you think? China's 2026 legal acquisitions of H200s (3yo chips) exceed 10x the compute of ALL smuggled chips over time (not just the latest B200/B300s). I'm not saying smuggling isn't an issue, but you can't hand waive away the impact of legal sales because <1% of manufactured chips get smuggled.
English
0
0
0
13
Eli Gerhard
Eli Gerhard@eligerhard·
@PedagogicalMess @AlecStapp Sure, but you can assume a similar baseline rate of smuggled B200s in either case, right? My question is about the net natsec risk of allowing an additional stream of H200s, given the tradeoff of: ⬇️ P(Conflict) ⬆️ P(Damage, given Conflict)
English
1
0
1
17
Eli Gerhard
Eli Gerhard@eligerhard·
@deanwball How am I supposed to generate views with that as my headline??
English
0
0
4
260
Dean W. Ball
Dean W. Ball@deanwball·
For a moment, substitute the notion of “believing in short AGI timelines” for: “acknowledging the idea of AGI as an ill-defined thing that will nonetheless probably exist within a strategically relevant timeframe, the pursuit of which will produce importantly capable artifacts along the way, whose arrival will be even sooner than the so-called ‘AGI,’ and so we don’t really need to quibble all that much about exact AGI definitions and timelines, because the mega-capable artifacts already kinda resemble ‘AGI,’ have national security implications, and seem like they’re going to keep improving rapidly, so functionally we just have to accept that we live in AGI world now, regardless of whether one’s personal definition of AGI is satisfied in 2028, 2035, or, indeed, 2026.” If this was your view—and it is mine—then it is not so much about short timelines to AGI as it is “short timelines to the importantly useful artifacts produced along the path to AGI, so capable that maybe in some ways they blend into AGI.” Thus “Mythos” or “the latest frontier model” can be substituted for “AGI” in many debates about timelines. The rhetoric of “we should sell China more chips” or “AI is the next internet platform business and it should we regulated exactly like prior waves of internet platform businesses, which is to say ‘basically not at all’” pivots not so much on short timelines to AGI but instead short timelines to models that *matter,* national-security-wise. The fatal flaw in the 2024-era accelerationist view, epitomized by Jensen, was that models would never matter in this way; or at least, that you should not think so much about the world in which models mattered in that way. It is much easier to justify “doing what we have been doing” if you don’t believe neural networks will ever truly matter to national security. Basically all AI debates hinge not so much on “AGI timelines” but on “will LLMs ever matter, really, to national security.” The near-term existence of LLMs with national-security-relevant capabilities can therefore be thought of as, to borrow a phrase, an inconvenient truth.
English
6
15
159
18.1K
Eli Gerhard
Eli Gerhard@eligerhard·
@signulll You have to follow the hierarchy & condense as updates happen at a task level, not with a 30,000ft/retroactive consolidation
English
0
0
0
545
signüll
signüll@signulll·
can ppl tell me how they manage their linears? ours is getting out of hand cuz it has so much stuff & a lot of not relevant anymore or out dated. i wish an ai agent would proactively consolidate, & clean up. would love that feature. like a "linear master" agent that is continuously running.
English
46
0
82
19.6K
Eli Gerhard
Eli Gerhard@eligerhard·
Bold. This isn't even a commentary on LLMs - nothing in my life is this accurate at predicting my just-in-time intentions (not Google Ads, the YouTube algo, my girlfriend). With no (direct) user input, is the defense against spam to be highly-selective in what gets surfaced? An empty notifications bar (or home page) is a real dopamine hit.
English
0
0
0
78
signüll
signüll@signulll·
@eligerhard nope! just straight up ai lol. no optionality. the onboarding is just connectors.
English
1
0
1
390
signüll
signüll@signulll·
building an ambient ai that knows your email, calendar, finances, health, & location (aka your entire fucking life) means we're staring down three really really hard problems: 1. privacy. we're asking for the most sensitive data on your phone. & obviously trust us isn't good enough. every company says that. we need to make our privacy posture architecturally legible with real structural constraints on what we can do with your data. this is the existential challenge & we're going to be obsessively transparent about how we approach it. e.g. we built an ai agent in production that monitors audit logs, & alerts *you* the user with a push notification if anyone besides the system or you accesses your data. we'll experiment here way more around creative mechanics of ensuring user trust. 2. cost. ambient means always on. always on means continuous inference. sophisticated inference isn't cheap, especially when your ai agent needs real reasoning to decide what matters to you right now or to figure out how to complete your reminder task. model costs are dropping fast but the economics of running a personal intelligence layer for every user is a problem we have to design around from day one. we have several business model ideas esp some creative ones around "premium agents". i'll post about this soon. 3. relevance. for skye surfacing the right thing at the right time is the whole product. what matters to you at 7am monday vs 9pm friday is completely different. we have an ai agent managing feed ranking today.. it's good, but it's expensive, & getting this right across millions of lives with varying signal density is a challenge. we're solving these in public. more soon on each but insanely fascinating problems to deal with. most labs have very different things like focused on improving benchmarks etc. we have to focus on primitives.
English
37
8
256
18.8K
Eli Gerhard
Eli Gerhard@eligerhard·
@signulll Is this primarily a progress tracking & coordination win? Is there any escalation through the hierarchy of key decision points the sub-agents encounter, as would be done with a human org?
English
0
0
0
566
signüll
signüll@signulll·
for our first product, we built an agent that manages other agents. & it’s so god damn fucking beautiful cuz it works so well. the architecture is deceptively simple.. define responsibility sets, abilities to delegate, & orchestrate but what emerges is something surprisingly elegant. each layer builds on the others with an internal agent notification system (like you’d tap on someone’s shoulder when complete). there’s a reason org design is its own discipline. it turns out the same principles that make human organizations readable make agentic systems composable. programming this way is genuinely lovely, my lord. my life is so fuckin boring that this is what excites me.
English
34
13
566
38.4K
Eli Gerhard
Eli Gerhard@eligerhard·
Motion to make [DNN] (descriptive, not normative) tags standard practice for policy discussions. Would go a long way in changing discourse from "here's why I disagree" to "let's address the public's concern".
Patrick Collison@patrickc

@mattyglesias @RuxandraTeslo I definitely don’t mean to imply that any such requirement would be helpful — indeed, aesthetic requirements are part of what I view as the problem today. (Massing rules and similar.) I’m just making a descriptive point about how I perceive the political economy.

English
0
0
0
29
Eli Gerhard
Eli Gerhard@eligerhard·
@dexhorthy I'm more of the "wait till they rip" type, but I guess I can accommodate the obligatory NYC clothes souvenir
English
0
0
0
50
dex
dex@dexhorthy·
@eligerhard does that mean we're going shopping cuz i need new jeans
English
1
0
0
228
dex
dex@dexhorthy·
hey new york friends I'll be in NYC Friday 17th and Saturday 18th - who wants to meet up for lunch/coffee and/or talk about agentic coding and/or walk in the park etc
English
19
0
43
4.2K
Eli Gerhard
Eli Gerhard@eligerhard·
@dexhorthy But it is worth the effort to get models to be self aware enough to recognize when the human needs to be hands on
English
0
0
0
243
dex
dex@dexhorthy·
plans vs outlines vs design docs: I think the value of a plan-for-the-model is take intent + codebase research and build like a "guide to building this feature for idiots" - at least 80-90% correct, close enough that you have high success chance to spray out the diffs and fix anything that was missed during typecheck/testing The 1) "where are we going / what is the correct architecture" and 2) "how do we split up the work to optimize for verifiability/backpressure" are still things that humans excel at so much that its just not worth the wasted effort to try to figure out how to get models to do them well in a hands-off way. because when its wrong you're spelunking thousands of LOC and that's a lot less leverage
English
3
5
86
5K
Eli Gerhard
Eli Gerhard@eligerhard·
@cremieuxrecueil Is the assumption that high UPP% also corresponds with larger Group 3 : Group 2 : Group 1 ratios? In other words, does higher Group 4 consumption mean higher Group 2+, or is it simply replacing Group 2/3 foods?
English
1
0
1
555
Crémieux
Crémieux@cremieuxrecueil·
My latest article is about how: It's not clear what the term "ultraprocessed food" means, what it correlates with, if it can even be used as a valid variable in epidemiology, or if it matters one bit for population health. Link below!
Crémieux tweet media
English
23
17
194
48.8K
Eli Gerhard
Eli Gerhard@eligerhard·
Can you reconcile this with the talking point of "we can't bring enough power online" for AI? I assume compute demand predictions (and bubble risk) have already been factored in by the industry, but I'm confused on a few talking points: - Pre-war, were we actually close to peak build-out capacity for wells or are there other infrastructure limitations (e.g. insufficient power plants to consume the net new oil) preventing build out? - Are purchase guarantees (e.g. of excess oil from net new wells if war ends) with hyper-scalers feasible? Would this require new storage infra? Is there a guaranteed amount which would offset secondary effects like price drops and pull-forward?
English
0
1
1
418
Zack Golden
Zack Golden@CSI_Starbase·
I've noticed there appears to be a fundamental misunderstanding about how the oil and gas industry actually works, especially in the US. So let me try to put this into proper perspective. You can’t dramatically increase oil output without bringing new wells online. There is no magic dial you can just turn up when ever you need more. From leasing and permitting, to site prep, to drilling, to completion, to production, the process can take anywhere from 4 to 12 months. So companies don’t respond to today’s prices, they respond to where they expect prices to be over that entire window. If the expectation is that current conditions are short-lived, there’s very little incentive to ramp activity. If the message being communicated is that the conflict is already over, or close to it, that reinforces the idea that by the time new wells come online, prices will likely have normalized. Because of that, we’re unlikely to see a meaningful increase in rig count in the US in the near term, and in fact, activity has already begun to trend lower. The only thing that reliably drives higher activity is sustained confidence that elevated prices will persist long enough to justify the investment. In other words, they are waiting for a clear signal that current conditions will last, not just spike. In this case that signal would probably be an openly communicated commitment to a lengthy ground war. Everything I've just stated is basically a fact, although I'm sure some will disagree about specific details. Now here's where the speculation starts. If someone were trying to plan for this potential disruption in advance without a clear understanding of how the industry works, they might assume Venezuela, the country with the largest oil reserves, could simply be brought in to offset any future supply disruption. And from that perspective, you could imagine taking aggressive action in an attempt to force that into reality. But that was never a realistic short-term solution. Even under ideal conditions, increasing production there would require tens of billions of dollars in investment and many years of development before meaningful volumes reach the market. Unfortunately, having reserves is not the same thing as having supply.
English
28
10
342
16.8K
Eli Gerhard
Eli Gerhard@eligerhard·
Fault does not need punishment, but abstracting responsibility to abstract concepts of "process", "culture", or "infra" is a surprisingly bureaucratic take that seems better suited for managing AI agents than high-talent humans at Anthropic.
Boris Cherny@bcherny

Mistakes happen. As a team, the important thing is to recognize it’s never an individuals’s fault — it’s the process, the culture, or the infra. In this case, there was a manual deploy step that should have been better automated. Our team has made a few improvements to the automation for next time, a couple more on the way.

English
0
0
0
50
Eli Gerhard
Eli Gerhard@eligerhard·
@signulll Huh... I'm not up to speed on your product, but it seems like the decay slope should be monotonic under a consistent relevancy criteria?
English
0
0
0
107
signüll
signüll@signulll·
@eligerhard for us new memories aren’t immediately prioritized unless absolutely relevant to the context.
English
1
0
2
146
signüll
signüll@signulll·
this post cuts to something i’ve been personally thinking & posting about a lot which is how the human mind’s forgetting machinery is underrated as a design primitive. in our first product we’ve built our memory model around a specific decay factor influenced by multiple variables.. each memory degrades by default unless actively reinforced. this relies on a combination of recency, retrieval frequency, & contextual reactivation. this ain’t perfect any means. but it’s annoying af that current llm memory implementations essentially treat every retrieved fact as equally alive. that’s likely not how cognition works. idk if our approach is the final answer but i’m increasingly convinced the forgetting curve is as important as the learning curve. & the right memory model may be way more about what you let go than what you store.
Andrej Karpathy@karpathy

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English
70
30
656
58.3K
Eli Gerhard
Eli Gerhard@eligerhard·
@signulll Yeah I'm not implying that it's useless, but I've found older-but-topical context to be more valuable than newer-but-unrelated context
English
1
0
2
153