Ben De La Haye

684 posts

Ben De La Haye

@FuriousBenD

Katılım Nisan 2016

934 Takip Edilen148 Takipçiler

Ben De La Haye@FuriousBenD·12 Eki

@MartinGTobias We're already predicting human behaviour for marketing at neuronsinc.com, and this will surely be going into our AI suite.

English

Martin Tobias (Pre-Seed VC)@MartinGTobias·12 Eki

Ok, who is productizing this. I will fund it.

Robert Youssef@rryssf

Market research firms are cooked 😳 PyMC Labs + Colgate just published something wild. They got GPT-4o and Gemini to predict purchase intent at 90% reliability compared to actual human surveys. Zero focus groups. No survey panels. Just prompting. The method is called Semantic Similarity Rating (SSR). Instead of the usual "rate this 1-5" they ask open ended questions like "why would you buy this" and then use embeddings to map the text back to a numerical scale. Which is honestly kind of obvious in hindsight but nobody bothered trying it until now. Results match human demographic patterns, capture the same distribution shapes, include actual reasoning. The stuff McKinsey charges $50K+ for and delivers in 6 weeks. Except this runs in 3 minutes for under a buck. I've been watching consulting firms tell everyone AI is coming for their industry. Turns out their own $1M market entry decks just became a GPT-4o call. Bad week to be charging enterprise clients for "proprietary research methodologies."

English

568

97.5K

Ben De La Haye@FuriousBenD·13 Eyl

@scottastevenson @OpenAI @sama The Scrub is a great article 👌Excited to test it in action for our usecases

English

136

Scott Stevenson@scottastevenson·13 Eyl

"o1 is just doing more compute per token" is an insanely bad take So is "it's just parroting the next token" from 3 years ago Even the smartest minds sink to technical romanticism I think one of the biggest secrets @OpenAI/@sama knows is this:

English

5.4K

Ben De La Haye@FuriousBenD·8 Tem

@scottastevenson Given WFH, no meeting/focus days etc. what other things would you need for Fridays to feel so productive? Is it something around the knowledge that no one else will contact you that day? You're truly isolated from work colleagues, in a way that isn't the same during the week.

English

Scott Stevenson@scottastevenson·7 Tem

Same here. I have worked virtually every Saturday for the past 5 years, and I really look forward to making material progress on hard/creative problems every week. I don't think I could enjoy any job without "Saturday time", because weeks are too hectic and noisy, and leave you feeling stressed like you are only treading water. It is the feeling of actually moving the boulder on a hard + significant problem on Saturdays that really keeps me going. I would burn out without that.

delian@zebulgar

I'd estimate that over the past decade About 60-70% of the most impactful work I've done has been on a Sunday It's a day meant for being in a quiet office, doing deep work with your cofounder, to ensure that the direction of the activities during the week is the right one

English

19.8K

Ben De La Haye@FuriousBenD·5 Tem

@scottastevenson @SpellbookLegal Likewise. Our evals are especially challenging as they're all multimodal, so the cost and time is pretty high. I imagine those are code based evals? Is that a dedicated codebase for y'all, or a Jupiter notebook / local code snippets?

English

Scott Stevenson@scottastevenson·5 Tem

@FuriousBenD @SpellbookLegal For a V1 I prefer subjective iteration in our product. Evals don’t tell you what prompts to try, or give you a subjective sense of how the landscape “feels” Once you get to a good v1 I think some kind of eval is a lot more helpful for fine iteration and preventing regression.

English

Scott Stevenson@scottastevenson·4 Tem

One of the highest leverage things we can do at @SpellbookLegal is to spend a deep day iterating on and massaging prompts This fundamentally changes the prioritization stack in software companies. SQL queries never needed to be massaged the same way, nor did they create the kind of leverage an agent chain can. It’s not (just) Software Engineering. It’s not ML. It’s not product management. It’s not design. There is a new discipline emerging. There is absurd alpha in mastering this discipline before it becomes named and socially validated. Most people will not be motivated to master this craft until it is given social status. But the status structures of yesterday are lagging far behind where the leverage is today. AI woodworking, with patience for relentless sanding—is going to be one of the highest leverage skills of the next 5-10 years IMHO.

English

2.8K

Ben De La Haye@FuriousBenD·5 Tem

@scottastevenson @SpellbookLegal Same with us. We're trying to get the team more involved, spread the knowledge and learnings, but it causes a pretty hard hit to iteration speed. Are you guys just vibe checking the results as you iterate in a session? Or running a suite of evals on the fly?

English

Scott Stevenson@scottastevenson·5 Tem

@FuriousBenD @SpellbookLegal We have still not figured out the repeatable formula honestly. It's usually me and 1-2 engineers sitting down and iterating a bunch together

English

Ben De La Haye@FuriousBenD·4 Tem

@scottastevenson I see you ribbonfarm.com/2010/07/26/a-b…

English

Scott Stevenson@scottastevenson·4 Tem

Legibility bias rules everything around me And going against it is where all the alpha is

English

836

Ben De La Haye@FuriousBenD·29 Haz

@kenneth_skovhus Looks amazing, congrats! How long did it take to implement this?

English

147

Kenneth Skovhus@kenneth_skovhus·28 Haz

My most visual contribution to our planning features is our start and target date picker.

Linear@linear

With the release of Initiatives, Linear now enables you to plan your product end-to-end within a single, purpose-built system. ⬢ Set the direction ⬢ Map out your project journey ⬢ Navigate from idea to launch linear.app/plan

English

378

53.8K

Ben De La Haye@FuriousBenD·17 Haz

@scottastevenson Having this exact same issue, and literally building the tooling to solve it.

English

478

Scott Stevenson@scottastevenson·16 Haz

All the "picks and shovels" LLM tooling for eval and observability is not solving the core problem for app developers: LLM prompt chain evaluation is so subjective that it crosses over into the realm of UX design, more than engineering. Many of our successes resulted from 80 hours of "iterating until it feels right". This process is very similar to tweaking a design in Figma for 2 weeks, or fiddling around with CSS until the frontend "feels right". We are all under the illusion that great eval tools will fix this, but I think this is like thinking a great design system will solve design for you. It's a mirage. Creative work is solved with enormous elbow grease and patient subjective iteration. There is no escaping this when building new LLM features IMHO. The problem is that we need to get the fiddly UX/design/frontend people able to iterate on prompt chains effectively. A lot of creative/subjective work is simply getting allocated to the wrong team, because you currently need pretty good coding skills to develop complex prompt chains.

English

214

35.5K

Ben De La Haye retweetledi

Find me on bsky @colin-fraser.net@colin_fraser·18 Nis

I wrote a new post about The Hallucination Problem. In this post I really lay out what I think "hallucinations" really are, a bit about why they happen, and whether we can expect them to get better. Here's a thread with some of the main points. @colin.fraser/hallucinations-errors-and-dreams-c281a66f3c35" target="_blank" rel="nofollow noopener">medium.com/@colin.fraser/…

English

22.2K

Ben De La Haye retweetledi

Alvaro Cintas@dr_cintas·12 Nis

You can transcribe videos and audio fast with Whisper Web. It’s free and takes seconds. Here is how:

English

135

892

194.5K

Ben De La Haye retweetledi

Linus ✦ Ekenstam@LinusEkenstam·6 Nis

this is what AI is for 🔊 sound on

English

190

895

5.3K

811.5K

Ben De La Haye@FuriousBenD·6 Nis

This is a wonderful article that everyone should read once.

Rob Henderson@robkhenderson

"the mediocrity trap: situations that are bad-but-not-too-bad keep you forever in their orbit...Terrible situations, once exited, often become funny stories or proud memories. Mediocre situations, long languished in, simply become Lost Years" experimental-history.com/p/so-you-wanna…

English

Ben De La Haye@FuriousBenD·4 Nis

@visakanv I love these kind of epiphanies where two completely unrelated subjects have an echo of the same trend or pattern

English

123

Visa is doing marketing consults (see pinned!)@visakanv·4 Nis

notesprawl: the rapid expansion of the volume of notes, characterized by low-density single-use notes, relying on cheap storage and search. caused in part by the problem of info overload, but also correlated with increased cognitive load and decrease in mental cohesion

Visa is doing marketing consults (see pinned!) tweet media

English

169

21.4K

Ben De La Haye@FuriousBenD·4 Nis

Despite the limited brain power of their hosted models, Groq's insane speed makes it perfect for NLP "glue steps" in LLM workflows, like formatting responses, and sanity checking output. Even more so with structured output.

LangChain@LangChain

🚨🧱 Groq tool calling + structured output 🧱🚨 @GroqInc just dropped tool calling! We've added LangChain support (including the popular `withStructuredOutput` method!) so you can try it in your favorite chains and apps. It supports @MistralAI Mixtral, Llama 70B, and Google Gemma. See docs below: Python 🐍: #groq" target="_blank" rel="nofollow noopener">python.langchain.com/docs/guides/st… JavaScript ☕: #tool-calling" target="_blank" rel="nofollow noopener">js.langchain.com/docs/integrati…

English

Ben De La Haye@FuriousBenD·3 Nis

These are the kind of non-obvious tasks that are easy to overlook when working with LLMs, yet are actually the most reliably performed, with real benefits. Exciting times ahead!

Michele Catasta@pirroh

Today we announced @Replit Code Repair, the first low-latency code repair AI agent. 🧵 1/6

English

Ben De La Haye@FuriousBenD·3 Nis

Why indeed?

Codex@codexeditor

Why is it a webpage? Why isn't it a webbox? Or webroom? Or webcathedral? Why do we insist on doing things in 2D?

English

Ben De La Haye@FuriousBenD·2 Nis

@LisaShmulyan @andruyeung

QAM

121

Lisa@lisas8_·2 Nis

10 days until we host the most ‘black mirror’ event at Mission Control. You apply and receive an AI agent based on your personality and identity. Your agent interacts with all of the other guests in a simulation prior to the party to predict who you’ll vibe with most irl. We’ll suggest who you should chat with at the party based on your agents’ top preferences. Let’s see if your AI agent can reliably predict chemistry 🧑‍🔬 DM @edgarhnd for access

Edgar Haond@edgarhnd

i'm throwing the first ever AI simulated party. it's 3 days long. day 1 and day 2 are in the simulation. day 3 you pull up irl to Mission Control in sf. here's how it works: 1. every guest gets an AI character. 2. you customize it to your personality. 3. your character is thrown into a virtual world where it meets everyone else attending the party. 4. the day of the irl party, you get a report of the top 3 ppl to meet and more importantly, who to avoid lmao. this is the future of irl parties. drop a 🎉 now and ill send u an invite.

English

42.3K

Ben De La Haye@FuriousBenD·2 Nis

We've certainly come a ways, for better or worse.

Nabeel S. Qureshi@nabeelqu

GPT-2 (2019) vs. GPT-4 (2024) look how beautifully original gpt-2 is! it's like comparing gpt's poetic child drawings vs. corporate emails from middle age.

English

Ben De La Haye@FuriousBenD·1 Nis

@jxnlco Ha! Yeah that'll do it

English

jason@jxnlco·1 Nis

@FuriousBenD trick is to write no code

English

jason@jxnlco·1 Nis

instructor 1.0.0 is here python.useinstructor.com/blog/2024/04/0… I've earned a layer of indirection and heres why: with 120k pypi downloads, 20k monthly unique visitors, 500k monthly page views on the docs. the philosophy was to keep it simple, as more sdks come online I want to protect users from downstream changes. instructor.from_openai will look and feel the same as instructor.patch, but whats new? proper autocomplete, and helper methods to create partials, iterables, and get the original response back.

English

338

54.5K

Keşfet

@MartinGTobias @scottastevenson @OpenAI @sama @SpellbookLegal @kenneth_skovhus @elonmusk @BarackObama