tim weingarten

161 posts

tim weingarten

@timweingarten

Anthropic product team, head of product @ Adept AI Labs, product @ Airbnb, product @ Pinterest & Co-founder/CEO The Hunt (acquired by Pinterest)

San Francisco Katılım Mayıs 2007

2.2K Takip Edilen570 Takipçiler

tim weingarten@timweingarten·31 Mar

This was a fun side project. I used Dispatch in the Claude mobile app while i was traveling last week and built this harness for claude code that takes as input your task, generates a rubric for it, scores Claude’s outputs against the rubric and runs in a loop until the score plateaus. The eval shows a +20pp avg lift over baseline for tasks like writing an investment memo, writing a counter-argument to a claim, designing schema for a billing system and more. It’s available here: github.com/timwein/auto-v…

English

179

tim weingarten retweetledi

Mike Krieger@mikeyk·29 Eyl

We asked every version of Claude to make a clone of Claude(dot)ai, including today’s Sonnet 4.5… see what happened in the video

English

125

297

3.9K

457.2K

tim weingarten retweetledi

Ethan Mollick@emollick·10 Eyl

Claude's new ability to work with Excel files is the best I have seen so far I have given it existing spreadsheets to work with and asked it to create new ones. Good use of formatting, formulas, etc. It created all of this, including 406 formulas, from one prompt (& its solid).

English

263

2.8K

291.9K

tim weingarten retweetledi

Hao AI Lab@haoailab·28 Şub

Claude-3.7 was tested on Pokémon Red, but what about more real-time games like Super Mario 🍄🌟? We threw AI gaming agents into LIVE Super Mario games and found Claude-3.7 outperformed other models with simple heuristics. 🤯 Claude-3.5 is also strong, but less capable of planning complex maneuvers. Gemini-1.5-pro and GPT-4o perform less well.

English

214

1.1K

234.9K

tim weingarten retweetledi

Anthropic@AnthropicAI·25 Şub

A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem. Can Claude play Pokémon? A thread:

English

313

1.1K

8.8K

1.6M

tim weingarten retweetledi

Mike Krieger@mikeyk·24 Şub

Today, we introduced Claude 3.7 Sonnet and Claude Code! Claude 3.7 Sonnet is our smartest model yet and the first hybrid reasoning model on the market. Claude Code is a command line tool for agentic coding, so developers can hand off complex engineering jobs to Claude. Can't wait for folks to try them out, let me know what you think!

Anthropic@AnthropicAI

Introducing Claude 3.7 Sonnet: our most intelligent model to date. It's a hybrid reasoning model, producing near-instant responses or extended, step-by-step thinking. One model, two ways to think. We’re also releasing an agentic coding tool: Claude Code.

English

975

89.6K

tim weingarten retweetledi

Mike Krieger@mikeyk·23 Oca

Exciting times at @AnthropicAI — we’re actively hiring for product engineering roles across Growth, Mobile, and more. Open roles here: anthropic.com/jobs let me know if you apply!

English

322

82.4K

tim weingarten retweetledi

Gradio@Gradio·26 Oca

Fuyu-Heavy from @AdeptAILabs looks very promising -- this demo shows that the model excels at multimodal reasoning and has a brilliant UI understanding. 🙌Gradio demos are an effective tool to showcase your models' capabilities. Use them to communicate the value of your models.

English

26.4K

tim weingarten retweetledi

ruperts.world@rupertmanfredi·3 May

AI interfaces should go beyond a chatbot: generative UI is next. Here's a demo of a library for LLMs that I've been working on. It prompts models with a vocabulary of rich components described using natural language. Presented at @causalislands last week.