Jonathan M

44 posts

Jonathan M banner
Jonathan M

Jonathan M

@DataWithJon

CEO @ StealthAI | Ex-Founder bit(.)io (acquired by @databricks) | AI, Data, Security Innovator | Exploring AI infra by day, managing @virtubuzz.

Katılım Ekim 2022
27 Takip Edilen42 Takipçiler
Jonathan M
Jonathan M@DataWithJon·
🤔 How many examples does an LLM need to master competition-level math? Conventional wisdom says 100,000+ examples. Our finding? Just 817 carefully selected ones 🤩 With pure SFT, LIMO achieves: 📌 57.1% on AIME 📌 94.8% on MATH LIMO: Less is More for Reasoning 📝
Jonathan M tweet media
English
2
0
1
756
Jonathan M retweetledi
Tsarathustra
Tsarathustra@tsarnick·
Anthropic CEO Dario Amodei says AI safety evaluations conducted on DeepSeek showed that it was the worst-performing model they had ever tested at generating potentially dangerous information
English
465
135
1.1K
1.1M
Jonathan M
Jonathan M@DataWithJon·
PDF parsing at scale is essentially a solved problem now. With Gemini 2 Flash offering $0.40 per million tokens and a 1M token context, you can parse 6,000-page PDFs with near-perfect accuracy for just $1.
Jonathan M tweet mediaJonathan M tweet media
English
0
0
0
368
Jonathan M retweetledi
1LittleCoder💻
1LittleCoder💻@1littlecoder·
The entire Deepseek story in 1 pic! How R1, R1 Zero and Distilled models are created!
1LittleCoder💻 tweet media
English
2
75
431
27.9K
Jonathan M
Jonathan M@DataWithJon·
Every engineer is doing vibe coding with. But that’s just the beginning. Soon, all knowledge work will be based on vibes—have an idea, add some rough context, and transform it into an essay, email, or PowerPoint with a deep researcher agent. The top performers will be those with the best vibes and the fastest lowercase typing skills.
English
0
0
0
236
Jonathan M
Jonathan M@DataWithJon·
Could this handle more structured data, like finance or CRM datasets, in addition to contracts? Seems like the architecture could generalize well to those use cases too.
Jerry Liu@jerryjliu0

We built a knowledge agent that can do automated contract review against any knowledge base in minutes 📜🔎 Think: matching contracts against your company policies, compliance rules, negotiation playbooks, past agreements - it takes a legal/ops team a few hours to do this manually. I’ve heard this use case in 2-3 calls already this week. You can do this by interleaving @llama_index agentic workflows with the right architecture for parsing, indexing, and retrieving your data (LlamaCloud). Notebook: github.com/run-llama/llam… If you’re interested in this use case come talk to us! llamaindex.ai/contact

English
0
0
0
249
Jonathan M retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
ColBERT got ridiculously fast in 2022 with PLAID. I thought that was as fast as it could get. But Luca Scheerer taught us that you can make it 3x faster: a single CPU core can encode the query *and* search hundreds of millions of tokens in 100ms. WARP—worth a thread tmrw?
Sumit@_reachsumit

WARP: An Efficient Engine for Multi-Vector Retrieval Introduces an efficient engine that significantly reduces query latency for multi-vector retrieval systems through implicit decompression and dynamic similarity imputation. 📝arxiv.org/abs/2501.17788 👨🏽‍💻github.com/jlscheerer/xtr…

English
10
30
244
32.2K
Jonathan M retweetledi
Liana
Liana@lianapatel_·
We've been building LOTUS at Stanford and Berkeley to make LLM-powered data processing fast, easy and declarative. LOTUS is an open-source query engine that makes programming as easy as writing Pandas and optimizes your programs for up to 400x speedups. To celebrate the holidays, we're excited to share our release of LOTUS 1.0.0 with a batch of new updates that make reasoning over your data faster, easier and better than ever! Code: github.com/guestrin-lab/l… 🧵👇
Liana tweet media
English
17
183
1.3K
159.6K
Jonathan M retweetledi
Unity Catalog
Unity Catalog@unitycatalog_io·
🚀 Excited to announce the release of Unity Catalog 0.2.1! 🎉 This update brings: 🐍 New unitycatalog-client Python library 📋 MODIFY permissions for tables 📈 Enhanced model APIs/UI ❄️ Iceberg Catalog APIs updates Read the release notes 👉 hubs.la/Q02_PzHJ0 #opensource
English
0
2
6
1.6K
Jonathan M retweetledi
Jared Quincy Davis
Jared Quincy Davis@jaredq_·
Today's AI landscape is reminiscent of the early automotive and aviation industries. Although we have seen remarkable demonstrations and early successes, the full transformative impact and proliferation of LLM systems are bottlenecked by robustness and reliability challenges. Building on the analogy, massive leaps were needed to progress from the Wright Brothers' initial Kitty Hawk breakthrough to the contemporary aviation industry, where over 2M humans fly daily. Notably, the gap from Kitty Hawk to what is considered the dawn of commercial aviation with Jannus was ~10 years. In this paper, Ion Stoica, along with collaborators @matei_zaharia, @joseph_gonzalez , @Ken_Goldberg, @haozhangml , @ml_angelopoulos, @shishirpatil_, @ChenLingjiao, @infwinston, and I, surveys the landscape and lays out a vision for advancing today’s LLM systems design into a mature engineering discipline with even broader deployed impact. This paper begins to address how we can reconcile the tensions arising from the value of these systems partially being their stochasticity and “creativity” (hallucination) and the engineering imperative to build robust, reliable 'compound AI' systems out of these noisy components. SPECIFICATIONS: THE MISSING LINK TO MAKING THE DEVELOPMENT OF LLM SYSTEMS AN ENGINEERING DISCIPLINE arxiv.org/pdf/2412.05299
Jared Quincy Davis tweet media
English
2
28
110
15.6K
Jonathan M retweetledi
Naveen Rao
Naveen Rao@NaveenGRao·
Gen AI is still very much in the phase of fast price reduction/quality increase! Mosaic's Law (4x increase perf/$ yoy) in full effect. databricks.com/blog/making-ai…
English
0
3
10
2.8K
Jonathan M retweetledi
Databricks
Databricks@databricks·
Introducing Salesforce Connectors for Lakehouse Federation and LakeFlow Connect! These connectors provide seamless access to Salesforce CRM and Data Cloud data in @databricks Unity Catalog for comprehensive data lifecycle management. dbricks.co/4dLBa04
Databricks tweet media
English
1
7
32
3.4K
Jonathan M retweetledi
Databricks
Databricks@databricks·
We’re excited to announce that Databricks SQL Serverless is now GA on Google Cloud Platform! Optimize your data for BI use cases with: ▪️ Instant and elastic compute ▪️ Lower infrastructure costs ▪️ No management overhead ▪️ Improved performance bit.ly/3yO85lL
Databricks tweet media
English
0
5
25
3.3K