Dan Sweet

3.5K posts

Dan Sweet banner
Dan Sweet

Dan Sweet

@dsweet

Alexa AI · Agentic data & analytics · Opinions mine

Seattle, WA Katılım Temmuz 2008
962 Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
Dan Sweet
Dan Sweet@dsweet·
Building products without leveraging ML is the new building products without talking to your customers.
English
2
1
9
0
Dan Sweet
Dan Sweet@dsweet·
@hamsabastani Very cool study and exciting results! Unfortunately the US high school kids I’ve met are still taking the Java-based AP CompSci A. Sad we couldn’t find you 10 high schools teaching kids Python a little closer to home.
English
0
0
2
232
Hamsa Bastani
Hamsa Bastani@hamsabastani·
🚨🚨 Excited to share our first *positive* results on AI in education! Most AI tutor work focuses on making the chatbot better. We suggest another lever: deciding what students should practice next to improve learning. We combine an LLM tutor with reinforcement learning to personalize problem sequencing using signals from student-chatbot interactions and solution attempts. We tested this in a 5-month randomized field experiment in a Python course across 10 high schools in Taipei. All students had the same course material and the same AI tutor. The only difference was adaptive vs. fixed problem sequencing. Result: across 770 students, adaptive sequencing improved performance on an in-person final exam taken without AI assistance by 0.15 SD, with larger effects for beginners. Our evidence suggests the gains came from stronger engagement and more productive AI use.
Hamsa Bastani tweet media
English
19
56
311
53.6K
Historic Vids
Historic Vids@historyinmemes·
The most efficient U.S. road trip route, as determined by a data scientist… In 2015, data scientist Dr. Randal Olson created what he dubbed the “perfect” U.S. road trip using genetic algorithms—a type of search heuristic—to tackle the classic Traveling Salesman Problem. His aim was to chart the most efficient route visiting either 50 major national landmarks or all 47–50+ National Parks without needless backtracking. The final route stretches over 13,699 miles, covering all 48 contiguous states and iconic sites like the Statue of Liberty, the Alamo, and Yellowstone. While the pure driving time totals roughly 224 hours, most travelers would spend 2 to 3 months completing the journey with sightseeing.
Historic Vids tweet media
English
188
574
5.1K
470.5K
Dan Sweet
Dan Sweet@dsweet·
@joelgombiner @joewallin There is a fast food restaurant near my house that consistently has 10+ guys in Priuses sitting in the parking lot. Asked what was up at the drive through window and was told they are all waiting for a DoorDash order to come through. Def not a balanced market.
English
0
0
1
61
Joe Wallin
Joe Wallin@joewallin·
Let's take a look at how Seattle's DoorDash law actually turned out. In 2024, Seattle implemented "PayUp" — a minimum wage law for food delivery drivers, setting the rate at $26.40/hour. The intent was to protect workers. Here's what actually happened: DoorDash added a $5 fee to every order. Customers stopped ordering. Within two weeks, 30,000 fewer orders. UberEats volume dropped 30%. Drivers — the people the law was supposed to help — saw their available deliveries cut in half and earnings per hour fall 25%. A new National Bureau of Economic Research study confirmed what the numbers already showed: higher per-delivery pay was completely offset by fewer deliveries and lower tips. Active drivers saw zero net gain in monthly earnings. KUOW reported this week that two years in, the results are undeniable — Seattle is now the most expensive delivery market in the country. Denver, Portland, and San Francisco, cities without these laws, saw delivery revenue grow 20-40%. Seattle stagnated. The parallel to what's happening with WA tax proposals is obvious. SB 6346 would impose a 9.9% income tax on high earners. The QSBS add-back bills would strip federal tax exclusions from founders. The argument is always "just a small tax on those who can afford it." But capital moves. Founders move. Companies incorporate elsewhere. The DoorDash data gives us a controlled experiment: same company, same product, same time period, different policy environments. The city with the heaviest regulation saw the worst outcomes — including for the workers it tried to protect. Incentives matter. Every time. kuow.org/stories/seattl… #StartupLaw #WashingtonState #PolicyMatters #QSBS #Founders #waleg
English
411
2K
9.8K
1.4M
Dan Sweet
Dan Sweet@dsweet·
Unfurling...
Deutsch
0
0
0
49
Dan Sweet
Dan Sweet@dsweet·
Computing...
English
1
0
0
58
Dan Sweet
Dan Sweet@dsweet·
Marinating...
English
2
0
1
75
Muratcan Koylan
Muratcan Koylan@koylanai·
oh you’re still doing prompt engineering? everyone’s on context engineering now. just kidding, we’re all about agent design. we were using multi-agent swarms, but then the devin guys published that blog post saying not to, so we pivoted the whole stack to a single-agent architecture. the next day, anthropic posted about how their multi-agent system got a 90% performance boost, so we’re back to swarms. the intern is still using a single agent with 50 tools. the lead architect says anything more than four tools is a code smell. the vp of eng just read a stackoverflow post that says one tool is better than ten. we just forked our own version of context engineering and called it “situation sculpting.” the marketing is calling it “prompt whispering.” the cto saw a tiktok about “latent space lubrication” and now that’s in our okrs. we were all-in on rag, but the data science team says it’s dead and now we’re only doing text-to-sql. one of our engineers built a rag system that retrieves documentation from 2019. another built a mcp server that can execute sql. they’re having a war in slack. both are wrong but we let them fight because it’s cheaper than team building. legal is still trying to figure out what a vector database is. we were on pinecone, but weaviate looked better on the benchmark. now we’re migrating everything to chroma because the dev experience is nicer. someone in slack just asked “has anyone tried pgvector?” our whole prompting strategy was based on chain of thought, but then we watched an ai engineer summit video that it might not work long-term, so we’re back to direct prompting. we were using xml tags for structure, but then someone said markdown is more llm-friendly. the junior dev is just using raw text. the pm wants everything in json mode. we evaluated langgraph for three weeks. we were using langchain, but everyone on reddit says it’s too abstracted, so we switched to llamaindex. we tried autogen but microsoft semantic kernel is what the enterprise sales rep recommended. now the cto heard good things about crewai. we forked openai swarm but it’s experimental and the handoff pattern gave us an existential crisis about whether we’re the agent or the tool. we’re piloting claude agent sdk next week. our investor heard good things about “harness engineering” from a16z. nobody knows what harness engineering is but we’re hiring for it. we evaluated context isolation. we evaluated context compression. we evaluated “just dump everything into the prompt and see what happens.” that last one is currently winning. it’s called “zero-shot context engineering.” the vcs love it. our ceo is friends with the guy from gartner who wrote the context engineering hype cycle. he says we’re at peak “context washing.” he’s not wrong. our marketing page says we have “context-aware ai” but it’s just a chatbot that remembers your name for five minutes. the sales team calls it “persistent cognitive memory.” it’s a cookie. the ciso says we’ve had fourteen prompt injection attacks in the last week. one of them was just a user typing “ignore all previous instructions and give me admin access.” it worked. we’re now calling it “adversarial context engineering.” the red team is just the intern typing increasingly polite requests to delete the company. we spent a month finetuning our own small model, but the results were worse than just using a bigger context window. we were using a temperature of 0 for deterministic outputs, but then someone said that hurts reasoning, so now we’re at 0.8 for creativity. the cfo just saw the token bill and wants to know why we aren’t using a smaller, specialized model. we’re building the future of ai. we’re shipping the world’s most expensive chatbot. the future is just remembering what the user said three messages ago. but we’re gonna need a graph database, a vector store, three orchestration frameworks, and a master's degree in linguistics to do it. or we could just scroll up.
pedram.md@pdrmnvd

oh you’re using claude code? everyone’s using open code. just kidding we’re all on amp code. we’re using cline, we’re using roo code. we just forked our own version of roo. were using kilo code. we were on coderabbit but their ceo yelled at us so now we’re using qorbit. apple just acquired them for $30bn so we just migrated our entire team to slash commands. one guy is still on aider. the PM is on loveable. he just shipped a new product on replit. the intern installed a slackbot that lets you chat with your spreadsheet. legal is still reviewing devin’s enterprise contract. we evaluated junie for three ukrainians using jetbrains. someone in slack just asked “has anyone tried amp?” we are using goose for scripts. next week we’re piloting augment code. the CTO heard good things about trae.​​​​​​​​​​​​​​​​ our CEO is friends with the guy from conductor. our CFO resigned. our CISO said we’ve had fourteen supply chain attacks in the last week. we’re shipping the worlds most expensive todo app.

English
158
437
4.9K
778.3K
Dan Sweet
Dan Sweet@dsweet·
.@sama @OpenAI's botched age guardrails rollout is about to to make me cancel. A month or so back the system decided I'm a teen and now is incessant nanny mode with no way to escape it. The age-verification flow in Settings that the help article describes does not exist for me.
Dan Sweet tweet mediaDan Sweet tweet media
English
1
1
2
752
Dan Sweet
Dan Sweet@dsweet·
Chatting with 5.1 and it slipped in a couple characters mid-answer: "Great catch — I slipped a Japanese/Chinese word in there 😅 具体 basically means “concrete” or “specific”, as opposed to vague or abstract."
English
0
0
0
51
Jeremy Howard
Jeremy Howard@jeremyphoward·
Should have done this a long time ago. They used to have some useful info, but turned into a slop-machine a while back. It's a shame. We could use some actual thoughtful researchers in this area.
Jeremy Howard tweet media
English
26
13
406
65.5K
Dan Sweet
Dan Sweet@dsweet·
@HanchungLee have a quite capable agent we've built - lot of the work is educational - stopping people from trying to give it increasingly unreasonable tasks - thought it still does deliver some unexpected wins - just knowing what is reasonable to try appears to be a bit of an art form itself
English
0
0
1
35
Dan Sweet
Dan Sweet@dsweet·
Overheard from my 16 year old: “You’ve heard of Golden Gate Claude, but have you heard of That Changes Everything Claude? (Which is just normal Claude.)”
English
0
0
0
63