Bai Li

152 posts

Bai Li banner
Bai Li

Bai Li

@libai_94

ML Engineer & PhD in NLP

Vancouver, Canada Katılım Ağustos 2009
184 Takip Edilen157 Takipçiler
Bai Li
Bai Li@libai_94·
Voice Writer is live on Product Hunt 👍🔼 Check it out and show your support by giving an upvote if you find it useful! producthunt.com/posts/voice-wr…
English
0
0
0
132
Bai Li
Bai Li@libai_94·
@joshalbrecht Great work team and kudos to the very fast progress! 👏
English
0
0
9
34
Isabel Papadimitriou
Isabel Papadimitriou@isabelpapad·
Really really excited to be joining @UBCLinguistics! I'm so happy to get to work with the lovely people in the department I'll be going to @KempnerInst in the interim, again lucky to work with lovely, interdisciplinary people I'd love to hang if you're in Boston or Vancouver!
UBC Linguistics@UBCLinguistics

We are thrilled that Isabel Papadimitriou (@isabelpapad) will be joining @UBCLinguistics as an Assistant Professor as of Sept 2025!

English
25
10
171
26.6K
Bai Li
Bai Li@libai_94·
Now, with voice technology and and AI for grammar correction, I find I can produce 3-4x more content without worrying about minor language issues — between 2,000 to 3,000 words! 🚀This has allowed me to express my ideas more fully and creatively.
English
1
0
0
151
Bai Li
Bai Li@libai_94·
I've been using Voice Writer for my blog posts, and I am much more productive. 🌟 Before, my posts averaged around 750 words.
English
1
0
0
120
Bai Li
Bai Li@libai_94·
I built a voice writer tool to help you write things quickly. ⚡️ It uses AI for speech recognition and grammar correction. I have been using it for my book reviews, emails, Slack messages, and more. Here is a demo video. 😊 Try it out here: efficientnlp.com/voice-writer
English
0
0
1
144
Bai Li
Bai Li@libai_94·
In this video, I cover the top 10 most cited papers in the history of natural language processing, ranked by number of Google Scholar citations. 📚 We cover milestones like the Transformer model, RNN, word vectors, and even go back to the roots with WordNet!
English
1
0
0
91
Bai Li
Bai Li@libai_94·
Challenges we faced: - Teochew is related to Mandarin, a high-resource language, but how do we apply transfer learning? - With zero resources for training, we had to build our dataset from scratch. 🛠️ - Teochew doesn't even have a writing system! How do we model that? 🤔
English
0
0
0
96
Bai Li
Bai Li@libai_94·
🎥 New Video! In this video, we train a speech recognition model (using OpenAI's Whisper) to recognize our family's Chinese dialect, Teochew, or Chaozhou dialect (潮州话). It has about 10 million speakers and is a part of the Min Nan language family. youtube.com/watch?v=JH_78K…
YouTube video
YouTube
English
1
0
2
196
Bai Li
Bai Li@libai_94·
More seriously - we'll use the RAG pattern, indexing HuggingFace metadata, integrating OpenAI embeddings with pgvector and chat models. I'll also explain some tips on how to rerank the chatbot's suggestions, deploy the project efficiently, and more.
English
0
0
0
127
Bai Li
Bai Li@libai_94·
📹 New Video! Ever had trouble deciding which AI to use for your projects? Let's solve that with AI. In this video, I will build an AI to find the best AI for you 🤯 youtu.be/2r-SqtxhgmY
YouTube video
YouTube
English
1
0
1
140
Bai Li
Bai Li@libai_94·
@osanseviero I've made a video about the KV cache, and I've also got videos on other LLM topics like RoPE embeddings, speculative sampling, quantization, etc. If you like learning through colorful animated videos, check them out! youtube.com/watch?v=80bIUg…
YouTube video
YouTube
English
0
1
13
553
Omar Sanseviero
Omar Sanseviero@osanseviero·
Which are the best resources to learn about LLMs topics such as KV cache for attention, parallelism techniques, MQA and GQA, GPTQ, AWQ, RoPE, and so on?
English
14
22
254
90K
Bai Li
Bai Li@libai_94·
It's a new technique called speculative sampling. A smaller LLM generates the easier tokens and a larger LLM checks them. And using a rejection sampling trick, there is no difference in accuracy! Check out my video on how this works ➡️youtube.com/watch?v=S-8yr_…
YouTube video
YouTube
English
0
0
0
113
Bai Li
Bai Li@libai_94·
📹 New Video! #LLMs can be slow, so this @GoogleDeepMind paper proposed to speed it up by running two LLMs at the same time. 😕 Wait what?
English
1
0
0
123
Bai Li
Bai Li@libai_94·
Just published a comprehensive video highlighting EVERY area of Natural Language Processing research, in 24 categories. From Phonology to Translation to Summarization to LLMs, explore all of of NLP in 30 minutes!
English
0
0
0
88
Bai Li
Bai Li@libai_94·
@sherwinwu ChatGPT clearly states that sensitive information should not be fed to it. My question to you is: if OpenAI does not train on user data, then why is this warning in place?
English
2
0
4
603
Sherwin Wu
Sherwin Wu@sherwinwu·
I chatted with some customers of our API today and was surprised that they didn’t know the answer to this. So here’s an experiment. Does OpenAI train on the data that you send in through the API?
English
26
7
36
98.2K