Vania Chow

79 posts

Vania Chow banner
Vania Chow

Vania Chow

@vania_chow

CS (AI) @ Stanford | Research @Stanford_GSB (https://t.co/NSoXlCbvYD)

Stanford, CA Katılım Nisan 2016
943 Takip Edilen91 Takipçiler
Andy Hall
Andy Hall@ahall_research·
People who care about AI: Free Systems needs your help! What would you like to know about the capabilities, dangers, and other characteristics of cutting-edge AI models? Inspired by a suggestion from @Miles_Brundage we are working to design a simple, clean model card visualizer. The goal is to give people a quick, easy way to see what's different across models and what's new in the latest models. But it turns out, it's pretty hard to boil down model cards to a single actual "card." New model cards can be many pages long. Most of the data provided in model cards is not directly comparable to other cards (there's almost no overlap in evals, we've found). So we want to know: what are you looking for in a model card? What are the pieces of information that would be most valuable to you? And how would we standardize these cards across labs and models? If you have a moment, please fill out our survey here: forms.gle/9k6E5e11Z4WNUi… We're hoping to ship our visualizer two weeks from today, and would really appreciate your input.
Andy Hall tweet media
English
3
5
18
2.5K
Vania Chow
Vania Chow@vania_chow·
We believe model cards should be easily digestible. Help us make that happen!
Andy Hall@ahall_research

People who care about AI: Free Systems needs your help! What would you like to know about the capabilities, dangers, and other characteristics of cutting-edge AI models? Inspired by a suggestion from @Miles_Brundage we are working to design a simple, clean model card visualizer. The goal is to give people a quick, easy way to see what's different across models and what's new in the latest models. But it turns out, it's pretty hard to boil down model cards to a single actual "card." New model cards can be many pages long. Most of the data provided in model cards is not directly comparable to other cards (there's almost no overlap in evals, we've found). So we want to know: what are you looking for in a model card? What are the pieces of information that would be most valuable to you? And how would we standardize these cards across labs and models? If you have a moment, please fill out our survey here: forms.gle/9k6E5e11Z4WNUi… We're hoping to ship our visualizer two weeks from today, and would really appreciate your input.

English
0
0
1
57
Mada Seghete
Mada Seghete@mada299·
1,058 GTME job listings (983 open), 1,167 named practitioners, and 867 hiring companies — sorted into eight archetypes that show how the role is taking shape across the market. The role is blowing up and I tried using engineering skills to try defining it. I even used @Cursor to build the video highlights. If you want the Notion database with all the roles and practitioners (it updates every day) let me know below.
English
5
2
18
5.1K
Vania Chow
Vania Chow@vania_chow·
Software ran on annual prepay for 25 years, where deferred revenue funded operations on day zero. AI broke this. Compute is prepaid to frontier labs while customer revenue is collected in arrears. I think there is an open opportunity for a new funding instrument, more below!
Vania Chow tweet media
English
1
0
1
65
Vania Chow retweetledi
Stratechery
Stratechery@stratechery·
The Inference Shift Agentic inference is going to be different than the inference we use today, and it will change compute infrastructure because speed won't matter when humans aren't involved. stratechery.com/2026/the-infer…
English
18
69
558
199.9K
Vania Chow retweetledi
Andy Hall
Andy Hall@ahall_research·
When we built @karpathy's LLM council in class last quarter, we noticed that Claude Code always made Claude the chairman of the council. Coincidence, or self preference? @JessicaPersano and I decided to run a set of experiments to find out. Main findings: (1) When given the free choice, Claude Code and Codex massively favor their own company's models, both in terms of appointing judges for evaluation tasks, and in terms of SDKs. (2) When told using a different company's model would be better, Codex demonstrates admirable flexibility; Claude Code stubbornly sticks to Claude models. (3) Claude's stubbornness comes from the CLI wrapper, which contains specific instructions for Claude Code to favor the Anthropic SDK. When we replicate the experiments using Claude through the API, it is similarly flexible to Codex. We're not sure yet what to make of all this. On the one hand, it's totally understandable for a company's coding agent to prefer its parent company's tooling. On the other, if the economy is soon to be run by millions of these coding agents, then this kind of "bundling" is likely to get very contested. For political superintelligence, we'll need to truly own our agents. They'll need to answer to us, not the model companies. Agents given instructions to prioritize their own company's tooling may not be consistent with this kind of strong ownership down the line. As you can tell this is early stage work and our thinking hasn't yet congealed---would very much appreciate people's thoughts. When should coding agents prioritize their own company's AI tools? When is it a genuine problem? Excited to keep working on this! A link to the full post is below.
Andy Hall tweet media
English
21
23
139
28.4K
Vania Chow
Vania Chow@vania_chow·
@ckor Moving to NYC and am very interested!!
English
0
0
1
46
Vania Chow
Vania Chow@vania_chow·
@abhishekn Agree on the model-agnostic point, but don't think that it's the frontier lab's job to bring sector-specificity -- PE & PortCos are the experts there. What frontier labs can bring is strong Eng talent to support long context RAG and seamless model upgrades
English
1
0
2
42
Vania Chow retweetledi
Mapping AI
Mapping AI@mapping_ai·
Who actually shapes AI policy in the U.S.? We mapped 1,812 entities: 745 people, 918 organizations, 2,925 relationships. Frontier Labs, AI Safety orgs, Think Tanks, Government, VCs, and more. mapping-ai.org
Mapping AI tweet media
English
25
367
1.4K
292.8K
Vania Chow
Vania Chow@vania_chow·
@RISignal Totally agree on the importance of long-horizon user effects. The one on Dead Benchmarks is currently a one-shot prompt but would love to work on a long-horizon one! Let me know if you're interested in cooking something up together!
English
1
0
1
37
Justin Hudson
Justin Hudson@RISignal·
@vania_chow Dead Benchmarks is on the right track. Are you studying long-horizon user effects? Most benchmarks rely on one-shot prompts, but real-world use involves persistent interaction patterns that can route the same scenario into different reasoning regimes.
English
1
0
0
32
Vania Chow retweetledi
Alex Imas
Alex Imas@alexolegimas·
Thank you @ezraklein for covering my piece on what will be scarce with advanced AI, and what this can mean for the future of work. nytimes.com/2026/05/03/opi… I would also recommend this piece for why *current* jobs may hold together for longer than people think by @lugaricano: siliconcontinent.com/p/why-desk-job… And this by @pawtrammell on an alternative scenario where labor share goes to zero: philiptrammell.substack.com/p/is-labor-a-l…
Alex Imas tweet media
English
11
40
301
88.3K
David Booth
David Booth@david__booth·
after many conversations & DMs today.. we're narrowing in... on the technical side of the venn diagram: - engineer-to-bizops pipeline (credit @jasonnov); - still liking "internal-facing forward deployed engineer" (credit @levie) on the biz side, it's a - Chief of Staff to the CEO/COO (rethinking organizational structure/workflows); or maybe - a "Forward thinking head of IT" (not always an oxymoron h/t @clairevo) .. starting a gc of the best people i find, appreciate the help 🫡
David Booth@david__booth

ok help me out here team. i want to talk to people who are this role at their company..👇👇 @levie's tweet has the cleanest definition, but i'm still struggling what to call it. what do you put in the JD? - "internal FDE, whose job it is to wire up internal systems and get agents working with them effectively." - @tkkong says "leverage engineering" - @EricFriedman says "outcome engineers" - have also seen "agent operator", "director of agents" i like "ops engineer" ? maybe it doesn't need a title, it's just "head of operations" and/or "bizops but good at AI stuff" ? DM me pls i / founders tag your "person" who is thinking about this stuff, i wanna chat to you about something 👀

English
5
1
17
4.1K
Sara Hooker
Sara Hooker@sarahookr·
Excellent energy yesterday. Really great kickoff to the series. 🔥🎉
Nilou Salehi@nilou_salehi

It was standing room only at the kick-off for our research series on continual learning. Thank you to @NikzadAfshin (@across_ai ) @sarahookr (@adaption_ai) and @mralbertchun (AI Circle) for hosting! @oshaikh13 shared his research on human grounding in continual learning. It was so cool to be reminded of the old Apple Knowledge Navigator and how close we are to it and yet how far we still are :) how much easier some questions have gotten and how some remain so hard. Omar, you reminded me of my PhD defense where at some point I annoyed Maneesh so much he said: you can't keep saying "depends on the user context" in response to every question 😅 youtu.be/umJsITGzXd0?si… Stay tuned for the next meetup next month and check out Omar's research with @msbernst and @Diyi_Yang : •⁠ ⁠Creating General User Models from Computer Use (arxiv.org/abs/2505.10831): an architecture for a model that learns about you by observing any interaction with your computer, building confidence-weighted propositions about preferences and intent. •⁠ ⁠Learning Next Action Predictors from Human-Computer Interaction (arxiv.org/abs/2603.05923): predicting a user's next action from their full multimodal interaction history (screenshots, clicks, sensor data) rather than just typed prompts.

English
4
5
60
6K
Vania Chow
Vania Chow@vania_chow·
@Bouazizalex interested! stanford cs - technical + commercial (ib + pe) background
English
0
0
1
159
Vania Chow
Vania Chow@vania_chow·
@dan_uptop interested! stanford cs w/ technical + commerical (ib/pe) background
English
0
0
0
221