Patrick Hunt

2.8K posts

Patrick Hunt

@phunt

Apache ZooKeeper committer, Cloudera Employee, Hacker, Architect, Husband, Dad

San Francisco Bay Area Katılım Ocak 2008

180 Takip Edilen3.6K Takipçiler

Patrick Hunt@phunt·26 Eki

More news on the hiring front @cloudera : we are hiring over 175 people across Europe, Asia and the Americas to build an amazing data platform! linkedin.com/posts/phunt1_c…

English

151

Patrick Hunt@phunt·9 Eki

My AI/ML team at Cloudera is expanding - we have open Manager as well as IC positions (more coming, keep an eye) in the USA and in India. Find out more here, great opportunity to work with some awesome folks on latest technologies to enable customers: cloudera.wd5.myworkdayjobs.com/External_Caree…

English

159

Patrick Hunt@phunt·21 May

What character is this from Jules? Fail Octopus? Fail Cuttlefish? That thing from Alien? (and why can't I use images in X polls....?)

English

130

Patrick Hunt@phunt·31 Mar

Got a chance to try Aider with gemini-2.5-pro-exp-03-25 this weekend - very impressed! This is the first time I've been able to use Gemini models successfully to vibe-code a project from scratch (with some help from grok as PM). Kudos to the Google team. Need. More. Quota. :-)

English

333

Patrick Hunt retweetledi

Chaithanya Kumar@ChaithanyaK42·11 Mar

This has to be one of the best blogs that I have read on model context protocol (MCP) by @swyx @latentspacepod

English

2.2K

Patrick Hunt retweetledi

Amr Awadallah 🤖@awadallah·12 Eki

🎉 Big milestone alert! 📷Our Hughes Hallucination Detection Model (HHEM) just hit 2 MILLION downloads on @huggingface! Launched last Nov, we hit 1M downloads 3 weeks ago, and now we hit 2M downloads. This recent acceleration is a clear leading indicator that the #RAG AI Assistants/Agents market is maturing and moving into more production use cases where accuracy is paramount. Thank you to our amazing community for making this possible! 📷bit.ly/4h1f8sy 📷 #NLP #HallucinationDetection #MachineLearning #RAG

English

Patrick Hunt@phunt·16 Tem

@ofermend @vectara Congratulations @ofermend , @awadallah and the @vectara team! I'm excited to give Mockingbird a try - the details look amazing! vectara.com/blog/mockingbi… Please consider providing model access through @ollama - it's my main consumption route these days for testing out LLMs.

English

584

Ofer Mendelevitch@ofermend·16 Tem

I'm super excited to share the news about @vectara Series A, and the launch of our RAG focused Mockingbird LLM: businesswire.com/news/home/2024…

English

3.5K

Patrick Hunt retweetledi

Vectara@vectara·16 Tem

With this funding, we're launching Mockingbird, a groundbreaking LLM for Retrieval-Augmented Generation (#RAG). Perfect for healthcare, law, and banking with unparalleled accuracy & performance. Join us as we revolutionize #AI for regulated industries!

English

927

Patrick Hunt@phunt·29 May

@karpathy @lmsysorg Our generations version of Low-background Steel? en.wikipedia.org/wiki/Low-backg…

English

339

Andrej Karpathy@karpathy·29 May

Nice, a serious contender to @lmsysorg in evaluating LLMs has entered the chat. LLM evals are improving, but not so long ago their state was very bleak, with qualitative experience very often disagreeing with quantitative rankings. This is because good evals are very difficult to build - at Tesla I probably spent 1/3 of my time on data, 1/3 on evals, and 1/3 on everything else. They have to be comprehensive, representative, of high quality, and measure gradient signal (i.e. not too easy, not too hard), and there are a lot of details to think through and get right before your qualitative and quantitative assessments line up. My goto pointer for some of the fun subtleties is probably the Open LLM Leaderboard MMLU writeup: github.com/huggingface/bl… The other non-obvious part is that any open (non-private) test dataset inevitably leak into training sets. This is something people strongly intuitively suspect, and also why this GSM1k made rounds recently arxiv.org/html/2405.00332 Even if LLM developers do their best, preventing test sets from seeping into training sets (and answers getting memorized) is difficult. Sure, you can do your best to filter out exact matches. You can also filter out approximate matches with n-gram overlaps or so. But how do you filter out synthetic data re-writes, or related online discussions about the data? Once we start routinely training multi-modal models, how do you filter out images/screenshots of the data? How do you prevent developers from e.g. vector embedding the test sets, and specifically targeting training to data that has high alignment (in the embedding space) with the test sets? And the last component of this is that not all LLM tasks we care about are automatically evaluateable (e.g. think summarization, etc), and at that point you want to involve humans. And when you do, how do you control for all the variables involved, e.g. how much people pay attention to the actual answer, or the length, or the style, or how refusals are treated, etc. Anyway, good evals are unintuitively difficult, highly work-intensive, but quite important, so I'm happy to see more organizations join the effort to do it well.

Alexandr Wang@alexandr_wang

1/ We are launching SEAL Leaderboards—private, expert evaluations of leading frontier models. Our design principles: 🔒Private + Unexploitable. No overfitting on evals! 🎓Domain Expert Evals 🏆Continuously Updated w/new Data and Models Read more in 🧵 scale.com/leaderboard

English

304

2.4K

462.7K

Patrick Hunt@phunt·22 Nis

@simonw @vijayabhaskarj I realize it's not a "solution" - but have you considered/tried prompt compression? eg microsoft.com/en-us/research…

English

Simon Willison@simonw·22 Nis

@vijayabhaskarj Even 100,000 tokens (or a million tokens) isn't enough for most of the things I'd want to use RAG for - I tried my blog's full archive with Gemini Pro 1.5 and it didn't fit into their million token window

English

1.4K

Vijayabhaskar J@vijayabhaskarj·22 Nis

Just wondering, If Groq serves bigger context models with the same speed, do we need RAGs anymore? Just load the entire dataset in parallel as context, and we can get a perfect answering machine. It would be costly, but it should easily be the most accurate one out there.

Simon Willison@simonw

I pulled together notes on all of the LLM plugins that have worked for me for Llama 3 - both for hosting locally (I've run 8B and 70B on my 64GB M2) and access via APIs (Groq is SO FAST for that) Options for accessing Llama 3 from the terminal using LLM simonwillison.net/2024/Apr/22/ll…

English

2.5K

Patrick Hunt@phunt·21 Şub

@elicollins Great to see it's already available on ollama! : ollama.com/library/gemma

English

Eli Collins@elicollins·21 Şub

The Gemini era continues with our introduction of Gemma, a new family of lightweight state-of-the-art open models. We’re releasing two models, built from the same research & tech as Gemini - yet they can run on a laptop. Looking forward to seeing what people create!

Google DeepMind@GoogleDeepMind

Introducing Gemma: a family of lightweight, state-of-the-art open models for developers and researchers to build with AI. 🌐 We’re also releasing tools to support innovation and collaboration - as well as to guide responsible use. Get started now. → dpmd.ai/3UJu1Y1

English

2.8K

Patrick Hunt@phunt·13 Eki

I'm really impressed with the results, this is stellar "meaningful search" -- congrats!

Amr Awadallah 🤖@awadallah

(plz retweet) Today we launched my new startup Vectara & announced the general availability of our easy-to-use API-first #NeuralSearch as a service. It comes with 15,000 queries/month for free. I invite you to learn more and signup for the free offering at vectara.com/meet-vectara-p…

English

Patrick Hunt retweetledi

Amr Awadallah 🤖@awadallah·12 Eki

Palo Alto, CA 🇺🇸 English

609

Patrick Hunt retweetledi

Wes Kao 🏛@wes_kao·23 Eki

I've got a secret to tell you... Your boss is tired of being your manager. They want you to manage them. Managing up: How to get what you want & give your boss what they need 🧵

English

476

2.6K

14.2K

Patrick Hunt retweetledi

Henry Robinson@HenryR·3 Ara

I recently changed role at Slack, to focus on the HHVM runtime that executes so much of our product code. This is the first time I’ve worked primarily on a compiler and VM, so had to get myself up to speed. Here are some fantastic resources that have helped me do so *quickly*.

English

188

Patrick Hunt retweetledi

just k@SBinLondon·8 Tem

ZXX

1.2K

7.8K

Patrick Hunt@phunt·25 Haz

Well deserved!

fpj@fpjunqueira

10 years later, I’m very proud of what we accomplished. I’m happy and humbled to receive this award alongside my co-authors Ben and Marco. Thanks @DsnIeee.

English

Patrick Hunt retweetledi

Mark Grover@mark_grover·2 Haz

Excited to announce Stemma's launch today. It's been 3 years since Lyft's data catalog - Amundsen was born. It's enabled Lyft and many others like Instacart, Square & Brex to be more effective at data-driven decision making. We are excited to bring the same to the broader market.

English

Patrick Hunt retweetledi

Tristan Zajonc@tristanzajonc·2 Haz

We're live! Introducing Continual – the missing AI layer for the modern data stack. Get continually improving predictions directly in your data warehouse without complex engineering. Request access starting today. continual.ai/post/introduci…

English

Patrick Hunt retweetledi

@[email protected]@al3xandru·1 May

Please fill in the dots: "It's 2021. I am starting a new web application in Java. I'll use ...." Please retweet and help me get a wide range of answers. Thank you

English

Keşfet

@cloudera @swyx @latentspacepod @huggingface @ofermend @vectara @awadallah @ollama