Wangda Tan (@leftnoteasy) - Twitter Profili | Zamantika Mersobahis Locabet

(6/n) Looking forward: While these new reasoning models aren't fully practical yet, they show huge potential. Once they solve the fast/slow thinking problem and learn when to stop deliberating, they'll be game-changing. My prediction? The future belongs to such thinking models!

English

0

68

Wangda Tan@leftnoteasy·26 Oca

A Quick Deepseek R1 Testing for SQL Generation (with Waii) Finally got to test Deepseek R1! Tried both versions: distilled LLaMA 8B from R1 (runs in local ollama) and Deepseek R1 (671B) from fireworks. Here's what I discovered. 🧵

English

6

0

193

Wangda Tan@leftnoteasy·26 Oca

(5/n) About the distilled model - don't get too excited. It can't handle complexity: - Struggles with AMC-8 (8th-grade math competition) questions -- just went into an infinite loop of output and cannot give me an answer. - Can't generate SQL based on the schema input.

English

0

63

Wangda Tan@leftnoteasy·26 Oca

(4/n) more downside: - Query generation takes 4-5 mins vs 10-20 secs with GPT-4o - Most time is spent on unnecessary self-debate. The solution is clear in first 20-30% (40-60 secs), but it keeps rewriting and second-guessing the correct solution

English

0

48

Wangda Tan@leftnoteasy·26 Oca

(3/n) Downsides? Everything comes with a cost: - Model takes long <think> output for any task, even simple entity extraction - I still need to use GPT-4o for quick tasks (reranking, entity extraction) during the test, otherwise it will take forever.

English

0

43

Wangda Tan@leftnoteasy·26 Oca

(2/n) R1 is REALLY good at understanding aggregation. Current models like GPT-4o, Claude 3.5 sometimes mess up complex aggregations (like avg of daily avg sales) or window functions with filters. But R1 handles these consistently better than other models I've tested. Example:

English

0

43

Wangda Tan@leftnoteasy·26 Oca

(1/n) 1) The self-debate is fascinating - it's like watching a real, capable but hesitant person think through problems. Check out this snippet for one of the query

English

0

45

Wangda Tan retweetledi

Zipeng Fu@zipengfu·4 Oca

Mobile ALOHA's hardware is very capable. We brought it home yesterday and tried more tasks! It can: - do laundry👔👖 - self-charge⚡️ - use a vacuum - water plants🌳 - load and unload a dishwasher - use a coffee machine☕️ - obtain drinks from the fridge and open a beer🍺 - open doors🚪 - play with pets🐱 - throw away trash - turn on/off a lamp💡 Project website: mobile-aloha.github.io Co-lead @tonyzzhao, advised by @chelseabfinn (amazing photographing from @qingqing_zhao_ )

English

372

1.6K

6.8K

3M

Wangda Tan@leftnoteasy·16 Ara

@JosephJacks_ @garrytan @MistralAI @togethercompute @abacusai @DeepInfra Competitor of Mistral AI

GIF

Română

0

338

JJ@JosephJacks_·15 Ara

Last week @MistralAI launched pricing for the Mixtral MoE: $2.00~ / 1M tokens. Hours later @togethercompute took the weights and dropped pricing by 70% to $0.60 / 1M. Days later @abacusai cut 50% deeper to $0.30 / 1M. Yesterday @DeepInfra went to $0.27 / 1M. Who’s next ??? 📉

English

54

111

1.1K

778.1K

Wangda Tan@leftnoteasy·13 Ara

Wrote a blog post about how to build an enterprise-ready Text-to-SQL system, plan to build one yourself? Check out the post! medium.com/querymind/buil…

English

0

5

207

Wangda Tan retweetledi

Jim Fan@DrJimFan·6 Ara

AlphaCode-2 is also announced today, but seems to be buried in news. It's a competitive coding model finetuned from Gemini. In the technical report, DeepMind shares a surprising amount of details on an inference-time search, filtering, and re-ranking system. This may be Google's Q*? 🤔 They also discussed the finetuning procedure, which is 2 rounds of GOLD (an offline RL algorithm for LLM from 2020), and the training dataset. AlphaCode-2 scores at 87% percentile among the human competitors. Don't miss it: storage.googleapis.com/deepmind-media…

English

27

238

1.3K

199.2K

Wangda Tan@leftnoteasy·6 Ara

Excited to feature in @llama_index's blog! 🚀 We're blending text-to-SQL with PDF data for powerful enterprise solutions. Check it out! 👉

LlamaIndex 🦙@llama_index

Building advanced text-to-SQL is hard. Building advanced QA over both structured and unstructured docs is even harder. We’re excited to feature a blog by @leftnoteasy (Waii.ai) - build an agent that can query enterprise-grade DB’s along with PDF data, with @llama_index + Waii.ai The enterprise text-to-SQL consists of the following: ✅ Knowledge Graph modeling metadata/query history to help table/schema selection ✅ Semantic rules: guide producing the right queries ✅ Automatic error correction through query compiler We use this over a SQL database of retail data, and combine this with a @llama_index RAG pipeline over a Deloitte PDF report. This allows our agent to compare the structured/unstructured data ⚖️ - e.g. the top items sold during the holidays. Check out the full blog! blog.llamaindex.ai/llamaindex-wai… Notebook: #scrollTo=fvye9sqAcn5j" target="_blank" rel="nofollow noopener">colab.research.google.com/drive/1hL_Ztb1… Signup with Waii here: #request-demo" target="_blank" rel="nofollow noopener">waii.ai/#request-demo

English

0

236

Wangda Tan retweetledi

LlamaIndex 🦙@llama_index·6 Ara

Building advanced text-to-SQL is hard. Building advanced QA over both structured and unstructured docs is even harder. We’re excited to feature a blog by @leftnoteasy (Waii.ai) - build an agent that can query enterprise-grade DB’s along with PDF data, with @llama_index + Waii.ai The enterprise text-to-SQL consists of the following: ✅ Knowledge Graph modeling metadata/query history to help table/schema selection ✅ Semantic rules: guide producing the right queries ✅ Automatic error correction through query compiler We use this over a SQL database of retail data, and combine this with a @llama_index RAG pipeline over a Deloitte PDF report. This allows our agent to compare the structured/unstructured data ⚖️ - e.g. the top items sold during the holidays. Check out the full blog! blog.llamaindex.ai/llamaindex-wai… Notebook: #scrollTo=fvye9sqAcn5j" target="_blank" rel="nofollow noopener">colab.research.google.com/drive/1hL_Ztb1… Signup with Waii here: #request-demo" target="_blank" rel="nofollow noopener">waii.ai/#request-demo

English

3

45

206

56.9K

Wangda Tan@leftnoteasy·25 Ağu

Fine-Tuned GPT-3.5 vs. GPT-4 For SQL Generation The results? We found that the fine-tuned version outperformed GPT-4, achieving higher accuracy at 1/3 of the cost! We also explore how fine-tuning affects readability and SQL statements usage. link: medium.com/querymind/fine… 🚀

English

0

4

156

Wangda Tan@leftnoteasy·10 Nis

@yakrobat 5/5 🔮 #GPT4 has ushered in a new era of AI capabilities for SQL generation. Stay tuned for more updates as we push the boundaries of what's possible in automating SQL generation and data analytics!

English

0

1

135

Wangda Tan@leftnoteasy·10 Nis

GPT-4's SQL Mastery: Solved 'Text to SQL' Problem? @yakrobat and I collaborated on research that demonstrates GPT-4's impressive SQL generation abilities through fine-tuning and optimized techniques (Read our blog post: lnkd.in/gCe3YyVC). Summary see this 🧵

English

6

0

3

335

Wangda Tan@leftnoteasy·10 Nis

@yakrobat 4/5 🌐 Our goal is refining GPT-4's query generation for enterprise warehouses. Challenges include testing on real-world databases, handling wide data models, and generating complex queries. We're working on addressing these limitations. #AI #Enterprise

English

0

1

119

Wangda Tan@leftnoteasy·10 Nis

@yakrobat 3/5 🛠️ Techniques like constraints, query examples, samples, semantics, and human guidance help optimize SQL generation. However, complexity remains a challenge, with more complex queries less likely to succeed. #FineTuning #AI

English

0

1

98

Wangda Tan@leftnoteasy·10 Nis

@yakrobat 2/5 🧪 We tested GPT-4 on the Spider dataset, a SQL generation benchmark. Our evaluation focused on query result correctness. With proper prompting and tuning, GPT-4 outperforms previous methods. #Benchmarking #ML

English

0

161

Wangda Tan

Keşfet