Ben De La Haye

684 posts

Ben De La Haye banner
Ben De La Haye

Ben De La Haye

@FuriousBenD

Katılım Nisan 2016
934 Takip Edilen148 Takipçiler
Scott Stevenson
Scott Stevenson@scottastevenson·
"o1 is just doing more compute per token" is an insanely bad take So is "it's just parroting the next token" from 3 years ago Even the smartest minds sink to technical romanticism I think one of the biggest secrets @OpenAI/@sama knows is this:
Scott Stevenson tweet media
English
4
5
58
5.4K
Ben De La Haye
Ben De La Haye@FuriousBenD·
@scottastevenson Given WFH, no meeting/focus days etc. what other things would you need for Fridays to feel so productive? Is it something around the knowledge that no one else will contact you that day? You're truly isolated from work colleagues, in a way that isn't the same during the week.
English
0
0
0
74
Scott Stevenson
Scott Stevenson@scottastevenson·
Same here. I have worked virtually every Saturday for the past 5 years, and I really look forward to making material progress on hard/creative problems every week. I don't think I could enjoy any job without "Saturday time", because weeks are too hectic and noisy, and leave you feeling stressed like you are only treading water. It is the feeling of actually moving the boulder on a hard + significant problem on Saturdays that really keeps me going. I would burn out without that.
delian@zebulgar

I'd estimate that over the past decade About 60-70% of the most impactful work I've done has been on a Sunday It's a day meant for being in a quiet office, doing deep work with your cofounder, to ensure that the direction of the activities during the week is the right one

English
7
2
58
19.8K
Ben De La Haye
Ben De La Haye@FuriousBenD·
@scottastevenson @SpellbookLegal Likewise. Our evals are especially challenging as they're all multimodal, so the cost and time is pretty high. I imagine those are code based evals? Is that a dedicated codebase for y'all, or a Jupiter notebook / local code snippets?
English
0
0
1
12
Scott Stevenson
Scott Stevenson@scottastevenson·
@FuriousBenD @SpellbookLegal For a V1 I prefer subjective iteration in our product. Evals don’t tell you what prompts to try, or give you a subjective sense of how the landscape “feels” Once you get to a good v1 I think some kind of eval is a lot more helpful for fine iteration and preventing regression.
English
1
0
1
31
Scott Stevenson
Scott Stevenson@scottastevenson·
One of the highest leverage things we can do at @SpellbookLegal is to spend a deep day iterating on and massaging prompts This fundamentally changes the prioritization stack in software companies. SQL queries never needed to be massaged the same way, nor did they create the kind of leverage an agent chain can. It’s not (just) Software Engineering. It’s not ML. It’s not product management. It’s not design. There is a new discipline emerging. There is absurd alpha in mastering this discipline before it becomes named and socially validated. Most people will not be motivated to master this craft until it is given social status. But the status structures of yesterday are lagging far behind where the leverage is today. AI woodworking, with patience for relentless sanding—is going to be one of the highest leverage skills of the next 5-10 years IMHO.
English
4
1
40
2.8K
Ben De La Haye
Ben De La Haye@FuriousBenD·
@scottastevenson @SpellbookLegal Same with us. We're trying to get the team more involved, spread the knowledge and learnings, but it causes a pretty hard hit to iteration speed. Are you guys just vibe checking the results as you iterate in a session? Or running a suite of evals on the fly?
English
1
0
0
35
Scott Stevenson
Scott Stevenson@scottastevenson·
@FuriousBenD @SpellbookLegal We have still not figured out the repeatable formula honestly. It's usually me and 1-2 engineers sitting down and iterating a bunch together
English
1
0
1
59
Scott Stevenson
Scott Stevenson@scottastevenson·
Legibility bias rules everything around me And going against it is where all the alpha is
English
2
1
12
836
Scott Stevenson
Scott Stevenson@scottastevenson·
All the "picks and shovels" LLM tooling for eval and observability is not solving the core problem for app developers: LLM prompt chain evaluation is so subjective that it crosses over into the realm of UX design, more than engineering. Many of our successes resulted from 80 hours of "iterating until it feels right". This process is very similar to tweaking a design in Figma for 2 weeks, or fiddling around with CSS until the frontend "feels right". We are all under the illusion that great eval tools will fix this, but I think this is like thinking a great design system will solve design for you. It's a mirage. Creative work is solved with enormous elbow grease and patient subjective iteration. There is no escaping this when building new LLM features IMHO. The problem is that we need to get the fiddly UX/design/frontend people able to iterate on prompt chains effectively. A lot of creative/subjective work is simply getting allocated to the wrong team, because you currently need pretty good coding skills to develop complex prompt chains.
English
24
20
214
35.5K
Ben De La Haye retweetledi
Find me on bsky @colin-fraser.net
I wrote a new post about The Hallucination Problem. In this post I really lay out what I think "hallucinations" really are, a bit about why they happen, and whether we can expect them to get better. Here's a thread with some of the main points. @colin.fraser/hallucinations-errors-and-dreams-c281a66f3c35" target="_blank" rel="nofollow noopener">medium.com/@colin.fraser/…
English
5
8
61
22.2K
Ben De La Haye retweetledi
Alvaro Cintas
Alvaro Cintas@dr_cintas·
You can transcribe videos and audio fast with Whisper Web. It’s free and takes seconds. Here is how:
Alvaro Cintas tweet media
English
31
135
892
194.5K
Ben De La Haye retweetledi
Linus ✦ Ekenstam
Linus ✦ Ekenstam@LinusEkenstam·
this is what AI is for 🔊 sound on
English
190
895
5.3K
811.5K
Ben De La Haye
Ben De La Haye@FuriousBenD·
@visakanv I love these kind of epiphanies where two completely unrelated subjects have an echo of the same trend or pattern
English
0
0
2
123
Visa is doing marketing consults (see pinned!)
notesprawl: the rapid expansion of the volume of notes, characterized by low-density single-use notes, relying on cheap storage and search. caused in part by the problem of info overload, but also correlated with increased cognitive load and decrease in mental cohesion
Visa is doing marketing consults (see pinned!) tweet mediaVisa is doing marketing consults (see pinned!) tweet media
English
12
4
169
21.4K
Ben De La Haye
Ben De La Haye@FuriousBenD·
Despite the limited brain power of their hosted models, Groq's insane speed makes it perfect for NLP "glue steps" in LLM workflows, like formatting responses, and sanity checking output. Even more so with structured output.
LangChain@LangChain

🚨🧱 Groq tool calling + structured output 🧱🚨 @GroqInc just dropped tool calling! We've added LangChain support (including the popular `withStructuredOutput` method!) so you can try it in your favorite chains and apps. It supports @MistralAI Mixtral, Llama 70B, and Google Gemma. See docs below: Python 🐍: #groq" target="_blank" rel="nofollow noopener">python.langchain.com/docs/guides/st… JavaScript ☕: #tool-calling" target="_blank" rel="nofollow noopener">js.langchain.com/docs/integrati…

English
0
0
2
91
Ben De La Haye
Ben De La Haye@FuriousBenD·
These are the kind of non-obvious tasks that are easy to overlook when working with LLMs, yet are actually the most reliably performed, with real benefits. Exciting times ahead!
Michele Catasta@pirroh

Today we announced @Replit Code Repair, the first low-latency code repair AI agent. 🧵 1/6

English
0
0
0
63
Lisa
Lisa@lisas8_·
10 days until we host the most ‘black mirror’ event at Mission Control. You apply and receive an AI agent based on your personality and identity. Your agent interacts with all of the other guests in a simulation prior to the party to predict who you’ll vibe with most irl. We’ll suggest who you should chat with at the party based on your agents’ top preferences. Let’s see if your AI agent can reliably predict chemistry 🧑‍🔬 DM @edgarhnd for access
Edgar Haond@edgarhnd

i'm throwing the first ever AI simulated party. it's 3 days long. day 1 and day 2 are in the simulation. day 3 you pull up irl to Mission Control in sf. here's how it works: 1. every guest gets an AI character. 2. you customize it to your personality. 3. your character is thrown into a virtual world where it meets everyone else attending the party. 4. the day of the irl party, you get a report of the top 3 ppl to meet and more importantly, who to avoid lmao. this is the future of irl parties. drop a 🎉 now and ill send u an invite.

English
7
8
94
42.3K
jason
jason@jxnlco·
instructor 1.0.0 is here python.useinstructor.com/blog/2024/04/0… I've earned a layer of indirection and heres why: with 120k pypi downloads, 20k monthly unique visitors, 500k monthly page views on the docs. the philosophy was to keep it simple, as more sdks come online I want to protect users from downstream changes. instructor.from_openai will look and feel the same as instructor.patch, but whats new? proper autocomplete, and helper methods to create partials, iterables, and get the original response back.
English
13
35
338
54.5K