saam

1.8K posts

saam

@snwmanst

building @openai

Katılım Şubat 2009

3.1K Takip Edilen509 Takipçiler

saam retweetledi

will depue@willdepue·18 Mar

we're launching a challenge, inspired by @kellerjordan0's NanoGPT Speedrun, to train the best model under extreme parameter limitations anytime i get asked how to break into ML, i point to challenges like this: just start training models! also, we're covering $1M of compute

OpenAI@OpenAI

Are you up for a challenge? openai.com/parameter-golf

English

801

135.9K

saam@snwmanst·17 Mar

with codex we can be much more ambitious as it unlocks parts of the solution space that were out of reach before

English

saam retweetledi

roon@tszzl·14 Mar

the value of this technology will mostly not be captured by its inventors, the labs, or even the chipmakers, but rather will be captured by the consumers as surplus. these are highly competitive markets without any natural monopolistic effects like many other technologies before it, machine intelligence democratizes abilities previously only available to the wealthy, in this case by commoditizing the services of the white collar elite who mostly live in rich countries it’s not that there are no programmers, it’s that really anybody can make software now now so the “rents” of the “human capital” of knowing how to write JavaScript for example should shrink dramatically this will reduce the inequality between countries: services that previously required lots of human capital now require chatbot subscriptions at worst, or may even be given away for free you can receive medical advice worthy of a $1000/hr American specialist doctor likely for free while living under a thatched roof in eg Papua New Guinea somewhere while I think Americans have plenty of reason to be excited by AI, I would be more excited as someone in a poor country

Olivia Moore@omooretweets

The U.S. has a weird cultural relationship with AI Despite the fact that we’ve driven the vast majority of AI breakthroughs, we still rank among the lowest countries in terms of consumer trust (Data from Edelman 2025 study) 👇

English

123

140

1.6K

213.4K

saam retweetledi

Greg Brockman@gdb·6 Şub

Software development is undergoing a renaissance in front of our eyes. If you haven't used the tools recently, you likely are underestimating what you're missing. Since December, there's been a step function improvement in what tools like Codex can do. Some great engineers at OpenAI yesterday told me that their job has fundamentally changed since December. Prior to then, they could use Codex for unit tests; now it writes essentially all the code and does a great deal of their operations and debugging. Not everyone has yet made that leap, but it's usually because of factors besides the capability of the model. Every company faces the same opportunity now, and navigating it well — just like with cloud computing or the Internet — requires careful thought. This post shares how OpenAI is currently approaching retooling our teams towards agentic software development. We're still learning and iterating, but here's how we're thinking about it right now: As a first step, by March 31st, we're aiming that: (1) For any technical task, the tool of first resort for humans is interacting with an agent rather than using an editor or terminal. (2) The default way humans utilize agents is explicitly evaluated as safe, but also productive enough that most workflows do not need additional permissions. In order to get there, here's what we recommended to the team a few weeks ago: 1. Take the time to try out the tools. The tools do sell themselves — many people have had amazing experiences with 5.2 in Codex, after having churned from codex web a few months ago. But many people are also so busy they haven't had a chance to try Codex yet or got stuck thinking "is there any way it could do X" rather than just trying. - Designate an "agents captain" for your team — the primary person responsible for thinking about how agents can be brought into the teams' workflow. - Share experiences or questions in a few designated internal channels - Take a day for a company-wide Codex hackathon 2. Create skills and AGENTS[.md]. - Create and maintain an AGENTS[.md] for any project you work on; update the AGENTS[.md] whenever the agent does something wrong or struggles with a task. - Write skills for anything that you get Codex to do, and commit it to the skills directory in a shared repository 3. Inventory and make accessible any internal tools. - Maintain a list of tools that your team relies on, and make sure someone takes point on making it agent-accessible (such as via a CLI or MCP server). 4. Structure codebases to be agent-first. With the models changing so fast, this is still somewhat untrodden ground, and will require some exploration. - Write tests which are quick to run, and create high-quality interfaces between components. 5. Say no to slop. Managing AI generated code at scale is an emerging problem, and will require new processes and conventions to keep code quality high - Ensure that some human is accountable for any code that gets merged. As a code reviewer, maintain at least the same bar as you would for human-written code, and make sure the author understands what they're submitting. 6. Work on basic infra. There's a lot of room for everyone to build basic infrastructure, which can be guided by internal user feedback. The core tools are getting a lot better and more usable, but there's a lot of infrastructure that currently go around the tools, such as observability, tracking not just the committed code but the agent trajectories that led to them, and central management of the tools that agents are able to use. Overall, adopting tools like Codex is not just a technical but also a deep cultural change, with a lot of downstream implications to figure out. We encourage every manager to drive this with their team, and to think through other action items — for example, per item 5 above, what else can prevent a lot of "functionally-correct but poorly-maintainable code" from creeping into codebases.

English

414

1.6K

12.2K

2.1M

saam retweetledi

Sebastien Bubeck@SebastienBubeck·20 Ağu

Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.

English

306

1.2K

7.3M

saam retweetledi

Keren Gu 🌱👩🏻‍💻@KerenGu·17 Tem

We’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biology & chemistry under our Preparedness Framework. Here’s why that matters–and what we’re doing to keep it safe. 🧵

OpenAI@OpenAI

We’ve decided to treat this launch as High Capability in the Biological and Chemical domain under our Preparedness Framework, and activated the associated safeguards. This is a precautionary approach, and we detail our safeguards in the system card. We outlined our approach on preparing for future AI capabilities in biology through a blog post earlier this month. openai.com/index/preparin…

English

130

1.2K

550.2K

saam retweetledi

Judea Pearl@yudapearl·16 Tem

Exciting breakthrough from @eliasbareinboim -- a counterfactual-calculus (cfl-calculus), akin to do-calculus, for handling identification problems in Layer-3 of the Causal Hierarchy.

Elias Bareinboim@eliasbareinboim

That’s a really good and fundamental question; thank you, Borya! :) It’s non-trivial to answer, since it requires some maturity: there are various moving parts involved, and we tried to address this in the cited paper (also in the textbook, and more recently here: causalai.net/r130.pdf). There are various ways of seeing this connection, but I’ll try to be brief here. The invariances of any mechanism f_i, pre- and post-interventions, imply the whole calculus -- both do- and ctf- -- for layers 2 and 3. They differ in that they carve out different types of constraints over the collection of distributions induced by the SCM M (ie, over the mechanisms and exogenous distributions). The attached figure is just one possible representation. Note that we have different fragments of P* (represented by different colors) depending on the layer being discussed. (Ignore 2.25 and 2.5, since it’s a more fine-grained slicing related to some other discussion, and confusion, in the literature.) What’s interesting is that probabilistic consistency "pops up" regardless of the SCM. Rules 2 and 3 of the ctf-calculus apply depending on the SCM, reflecting a more fine-grained relationship between the endogenous variables involved. A complementary interpretation I like is through the notions of a local basis and global facts. The local part ties naturally to the locality of each mechanism f_i at the SCM level. In reality, as with any deductive system, the calculus is just a tool to verify the validity of facts that are true but not explicitly stated in the model. Each mechanism f_i, at the structural level, implies a basis of constraints relating V_i, its observed parents Pa_i, and its unobserved parents U_i. These leave imprints on the set of distributions P*. The calculus relates to the global part, and it’s simply a method to take these local facts and expand them into broader facts composed of multiple local ones. (Why do we care at all about that, one may ask? Well, the local basis is usually more parsimonious (depending on the case, it can be polynomial), but it encodes an exponential number of truths. So, the reason is computational and feasibility, since locality/parsimony and compression are key, necessary in any kind of intelligent behavior.) The do-calculus, and of course the ctf-calculus, perform this kind of "gluing" of the local facts to ascertain the validity of global ones. The example in Eqs. 14-18 in R-130 exemplifies this process, from local (what is in the model) to global (what the do-calculus or ctf-calculus says that it's true). Footnotes 47-48 in R-60 make this connection for the first time, and recall that the celebrated d-separation criterion -- foundational to probabilistic reasoning (layer 1) -- performs a similar role via the graphoid axioms in terms of basic probabilities. Finally, I should mention that for a more syntactic comparison of the calculi, see Appendix C.2 in R-115. Even though it contains a fair amount of algebra, I think it offers insight into the relationship between the layers. (Q2, 7, 8, 9 in the FAQ, p. 35, is possibly helpful as well.) TLDR: To answer your question -- it's the same in one way, but not the same in another, since different facts are being stated about the real world depending on the model interpretation being chosen (i.e., the layer/color in the figure). Happy to talk more when we meet!

English

3.2K

saam@snwmanst·19 Nis

@MajmudarAdam and now we can use llms to capture our thoughts in high dimensional spaces and shape them for consumption by others. Ideas can be personalized and shared at higher bandwidth w/ more clarity. (words → image / video, personalized using receiver history) x.com/snwmanst/statu…

saam@snwmanst

We used to speak words to evoke a thing deeper within, now we deploy bots who understand that deeper thing and generate infinite manifestations of our thoughts, customized and reaching further.

English

adammaj@MajmudarAdam·19 Nis

it’s crazy how well the concept of embeddings in deep learning translates directly to human communication and how clear it makes everything every idea, feeling, concept in your head can be framed as a physical embedding in your brain represented by some set of neural activations the goal of good communication is to reconstruct the same embedding (whether it be an idea, feeling, etc.) in another persons brain so they can experience the same thing you do good articulation is our ability to: 1. inspect an embedding in our head 2. extract out the essential information that is most representative of this embedding 3. find the most efficient combination of words to communicate this information ideally, the sentences we select are a near lossless compression of the embedding in our head. when this happens, the other person receives a near identical copy of the feeling/idea another noteworthy detail is that the recipients system for recognizing signal in your sentences and reconstructing the related embedding is also a factor at play; you have to cater your sentences to the specific intricacies of their current understanding

English

491

60.8K

saam retweetledi

Beff (e/acc)@beffjezos·10 Nis

This is such a brilliant and succinct explanation of unsupervised learning

English

278

3.4K

282.4K

saam@snwmanst·1 Nis

human-to-human communication bandwidth just got an upgrade 🚀

English

saam@snwmanst·18 Mar

We used to speak words to evoke a thing deeper within, now we deploy bots who understand that deeper thing and generate infinite manifestations of our thoughts, customized and reaching further.

English

201

saam@snwmanst·16 Mar

Consumer companies are going to have to compete with end users empowered by AI customizing and building their own experiences. Cos like @tillermoney deliver the data and let you build your own dashboards with @marimo_io or do custom analysis with @juliusai

English

105

saam retweetledi

Ian Macomber@iandmacomber·15 Mar

IMO data teams have two internal mandates right now -- 1. How can they make their own workflows more effective 2. How can they make their stakeholder's workflows more effective I have less of a use for text-to-SQL in my own dev work, but it's been helpful for 2. We've one-shotted SQL responses to help-data slack channel questions with text-to-SQL wrapped in context and ability to run and return Snowflake queries. Which expands the window of self-serve analytics and lets stakeholders ask a few more questions before they get stuck and take a data scientist out of flow state. So we spend more time in Hex!

English

14.9K

saam@snwmanst·1 Mar

@kevinakwok @googledocs Imagine a world where they bring in better threading, integration between comments and revision history, some features of Google Wave (rip)… such a massive opportunity to reimagine collaboration

English

Kevin Kwok@kevinakwok·1 Mar

@snwmanst @googledocs The way google docs handles comments is so obviously suboptimal. It’s crazy no attempt has been made to improve

English

206

saam@snwmanst·1 Mar

It should be possible to read comments in @googledocs without having to turn on suggestion mode. To read comments today is to see a bunch of cursors jumping around and highlighting, while trying to be careful to not make a suggestion when you’re simply trying to read.

English

338

saam@snwmanst·1 Mar

Biggest unlock was when I realized it’s not what you say or how you say it, but rather which part of their latent space you light up.

English

saam@snwmanst·25 Şub

x.com/karpathy/statu…

Andrej Karpathy@karpathy

Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are we educating for agency? Are you acting as if you had 10X agency? Grok explanation is ~close: “Agency, as a personality trait, refers to an individual's capacity to take initiative, make decisions, and exert control over their actions and environment. It’s about being proactive rather than reactive—someone with high agency doesn’t just let life happen to them; they shape it. Think of it as a blend of self-efficacy, determination, and a sense of ownership over one’s path. People with strong agency tend to set goals and pursue them with confidence, even in the face of obstacles. They’re the type to say, “I’ll figure it out,” and then actually do it. On the flip side, someone low in agency might feel more like a passenger in their own life, waiting for external forces—like luck, other people, or circumstances—to dictate what happens next. It’s not quite the same as assertiveness or ambition, though it can overlap. Agency is quieter, more internal—it’s the belief that you *can* act, paired with the will to follow through. Psychologists often tie it to concepts like locus of control: high-agency folks lean toward an internal locus, feeling they steer their fate, while low-agency folks might lean external, seeing life as something that happens *to* them.”

ZXX

saam@snwmanst·4 Şub

Agency is the new currency.

English

128

saam retweetledi

Joran Dirk Greef@jorandirkgreef·19 Şub

Towards the D in ACID, how many DBMSs: - fsync() on commit - fsync() on opening the WAL - daisy chain checksums (cf. misdirected I/O) - open the WAL with O_DIRECT (cf. fsyncgate) - have 2 WALs (cf. Protocol-Aware Recovery) - don't trust the inode to get WAL size - test this?

English

432

53.5K

saam retweetledi

Awni Hannun@awnihannun·13 Şub

Two really nice recent papers from Apple machine learning research: - Scaling laws for MoEs - Scaling laws for knowledge distillation Work by @samira_abnar @danbusbridge et al.

English

397

23.3K

saam@snwmanst·12 Şub

Related: saamtalaie.com/safety-nets-un…

English

saam@snwmanst·12 Şub

safety nets >> guardrails safety nets allow you to explore the space freely, make mistakes and learn, guardrails artificially constrain the exploration space and prevent you from learning important lessons firsthand

English

Keşfet

@kellerjordan0 @eliasbareinboim @MajmudarAdam @tillermoney @marimo_io @juliusai @kevinakwok @googledocs