Patrick Hunt

2.8K posts

Patrick Hunt

Patrick Hunt

@phunt

Apache ZooKeeper committer, Cloudera Employee, Hacker, Architect, Husband, Dad

San Francisco Bay Area Katılım Ocak 2008
180 Takip Edilen3.6K Takipçiler
Patrick Hunt
Patrick Hunt@phunt·
My AI/ML team at Cloudera is expanding - we have open Manager as well as IC positions (more coming, keep an eye) in the USA and in India. Find out more here, great opportunity to work with some awesome folks on latest technologies to enable customers: cloudera.wd5.myworkdayjobs.com/External_Caree…
English
0
0
1
159
Patrick Hunt
Patrick Hunt@phunt·
What character is this from Jules? Fail Octopus? Fail Cuttlefish? That thing from Alien? (and why can't I use images in X polls....?)
Patrick Hunt tweet media
English
0
0
1
130
Patrick Hunt
Patrick Hunt@phunt·
Got a chance to try Aider with gemini-2.5-pro-exp-03-25 this weekend - very impressed! This is the first time I've been able to use Gemini models successfully to vibe-code a project from scratch (with some help from grok as PM). Kudos to the Google team. Need. More. Quota. :-)
English
0
1
6
333
Patrick Hunt retweetledi
Chaithanya Kumar
Chaithanya Kumar@ChaithanyaK42·
This has to be one of the best blogs that I have read on model context protocol (MCP) by @swyx @latentspacepod
Chaithanya Kumar tweet media
English
3
6
21
2.2K
Patrick Hunt retweetledi
Amr Awadallah 🤖
Amr Awadallah 🤖@awadallah·
🎉 Big milestone alert! 📷Our Hughes Hallucination Detection Model (HHEM) just hit 2 MILLION downloads on @huggingface! Launched last Nov, we hit 1M downloads 3 weeks ago, and now we hit 2M downloads. This recent acceleration is a clear leading indicator that the #RAG AI Assistants/Agents market is maturing and moving into more production use cases where accuracy is paramount. Thank you to our amazing community for making this possible! 📷bit.ly/4h1f8sy 📷 #NLP #HallucinationDetection #MachineLearning #RAG
Amr Awadallah 🤖 tweet media
English
1
12
24
5K
Patrick Hunt retweetledi
Vectara
Vectara@vectara·
With this funding, we're launching Mockingbird, a groundbreaking LLM for Retrieval-Augmented Generation (#RAG). Perfect for healthcare, law, and banking with unparalleled accuracy & performance. Join us as we revolutionize #AI for regulated industries!
English
2
6
13
927
Andrej Karpathy
Andrej Karpathy@karpathy·
Nice, a serious contender to @lmsysorg in evaluating LLMs has entered the chat. LLM evals are improving, but not so long ago their state was very bleak, with qualitative experience very often disagreeing with quantitative rankings. This is because good evals are very difficult to build - at Tesla I probably spent 1/3 of my time on data, 1/3 on evals, and 1/3 on everything else. They have to be comprehensive, representative, of high quality, and measure gradient signal (i.e. not too easy, not too hard), and there are a lot of details to think through and get right before your qualitative and quantitative assessments line up. My goto pointer for some of the fun subtleties is probably the Open LLM Leaderboard MMLU writeup: github.com/huggingface/bl… The other non-obvious part is that any open (non-private) test dataset inevitably leak into training sets. This is something people strongly intuitively suspect, and also why this GSM1k made rounds recently arxiv.org/html/2405.00332 Even if LLM developers do their best, preventing test sets from seeping into training sets (and answers getting memorized) is difficult. Sure, you can do your best to filter out exact matches. You can also filter out approximate matches with n-gram overlaps or so. But how do you filter out synthetic data re-writes, or related online discussions about the data? Once we start routinely training multi-modal models, how do you filter out images/screenshots of the data? How do you prevent developers from e.g. vector embedding the test sets, and specifically targeting training to data that has high alignment (in the embedding space) with the test sets? And the last component of this is that not all LLM tasks we care about are automatically evaluateable (e.g. think summarization, etc), and at that point you want to involve humans. And when you do, how do you control for all the variables involved, e.g. how much people pay attention to the actual answer, or the length, or the style, or how refusals are treated, etc. Anyway, good evals are unintuitively difficult, highly work-intensive, but quite important, so I'm happy to see more organizations join the effort to do it well.
Andrej Karpathy tweet mediaAndrej Karpathy tweet media
Alexandr Wang@alexandr_wang

1/ We are launching SEAL Leaderboards—private, expert evaluations of leading frontier models. Our design principles: 🔒Private + Unexploitable. No overfitting on evals! 🎓Domain Expert Evals 🏆Continuously Updated w/new Data and Models Read more in 🧵 scale.com/leaderboard

English
42
304
2.4K
462.7K
Simon Willison
Simon Willison@simonw·
@vijayabhaskarj Even 100,000 tokens (or a million tokens) isn't enough for most of the things I'd want to use RAG for - I tried my blog's full archive with Gemini Pro 1.5 and it didn't fit into their million token window
English
2
0
9
1.4K
Vijayabhaskar J
Vijayabhaskar J@vijayabhaskarj·
Just wondering, If Groq serves bigger context models with the same speed, do we need RAGs anymore? Just load the entire dataset in parallel as context, and we can get a perfect answering machine. It would be costly, but it should easily be the most accurate one out there.
Simon Willison@simonw

I pulled together notes on all of the LLM plugins that have worked for me for Llama 3 - both for hosting locally (I've run 8B and 70B on my 64GB M2) and access via APIs (Groq is SO FAST for that) Options for accessing Llama 3 from the terminal using LLM simonwillison.net/2024/Apr/22/ll…

English
1
0
2
2.5K
Eli Collins
Eli Collins@elicollins·
The Gemini era continues with our introduction of Gemma, a new family of lightweight state-of-the-art open models. We’re releasing two models, built from the same research & tech as Gemini - yet they can run on a laptop. Looking forward to seeing what people create!
Google DeepMind@GoogleDeepMind

Introducing Gemma: a family of lightweight, state-of-the-art open models for developers and researchers to build with AI. 🌐 We’re also releasing tools to support innovation and collaboration - as well as to guide responsible use. Get started now. → dpmd.ai/3UJu1Y1

English
2
0
19
2.8K
Patrick Hunt retweetledi
Amr Awadallah 🤖
Amr Awadallah 🤖@awadallah·
(plz retweet) Today we launched my new startup Vectara & announced the general availability of our easy-to-use API-first #NeuralSearch as a service. It comes with 15,000 queries/month for free. I invite you to learn more and signup for the free offering at vectara.com/meet-vectara-p…
Palo Alto, CA 🇺🇸 English
20
92
609
0
Patrick Hunt retweetledi
Wes Kao 🏛
Wes Kao 🏛@wes_kao·
I've got a secret to tell you... Your boss is tired of being your manager. They want you to manage them. Managing up: How to get what you want & give your boss what they need 🧵
English
476
2.6K
14.2K
0
Patrick Hunt retweetledi
Henry Robinson
Henry Robinson@HenryR·
I recently changed role at Slack, to focus on the HHVM runtime that executes so much of our product code. This is the first time I’ve worked primarily on a compiler and VM, so had to get myself up to speed. Here are some fantastic resources that have helped me do so *quickly*.
English
5
25
188
0
Patrick Hunt retweetledi
just k
just k@SBinLondon·
just k tweet media
ZXX
46
1.2K
7.8K
0
Patrick Hunt retweetledi
Mark Grover
Mark Grover@mark_grover·
Excited to announce Stemma's launch today. It's been 3 years since Lyft's data catalog - Amundsen was born. It's enabled Lyft and many others like Instacart, Square & Brex to be more effective at data-driven decision making. We are excited to bring the same to the broader market.
English
2
7
30
0
Patrick Hunt retweetledi
Tristan Zajonc
Tristan Zajonc@tristanzajonc·
We're live! Introducing Continual – the missing AI layer for the modern data stack. Get continually improving predictions directly in your data warehouse without complex engineering. Request access starting today. continual.ai/post/introduci…
English
2
4
47
0
Patrick Hunt retweetledi
@al3x@hachyderm.io
@[email protected]@al3xandru·
Please fill in the dots: "It's 2021. I am starting a new web application in Java. I'll use ...." Please retweet and help me get a wide range of answers. Thank you
English
27
14
13
0