Ivan Chan

80 posts

Ivan Chan

Ivan Chan

@ivanchanavinah

CTO/Cofounder at @RunLocalAI (YC S24). Helping engineering teams ship better on-device AI faster and without the hassle. 🇭🇰 / 🇬🇧

Katılım Aralık 2019
524 Takip Edilen127 Takipçiler
Ivan Chan retweetledi
Ismail Salim
Ismail Salim@IssySalim·
🔥 VLMs on mobile devices with world-facing cameras key for proactive, intelligent computing. Local/on-device inference key for real-time, private experiences. Great to see an emphasis on smaller VLMs. Excited to see where @huggingface, @moondreamai, etc. take things 🚀
Miquel Farré@micuelll

Holy shit! Did we just open-source the smallest video-LM in the world? SmolVLM2 is runnning natively on your iPhone 🚀 huggingface.co/blog/smolvlm2

English
1
4
8
839
Ivan Chan retweetledi
Ismail Salim
Ismail Salim@IssySalim·
Snap's announcement about their on-device text-to-image model seems to have slipped under the radar… Apparently, it generates 1024x1024 images with quality that's comparable to cloud-oriented models like Stable Diffusion XL. But it can do that locally on an iPhone 16 Pro Max in <1.5 seconds! 🤯 Snap are planning to ship it soon to their ~450m daily active users, and I wouldn't be surprised if it's free. I wonder how all these subscription-driven, cloud-based image generation apps will respond… Announcement: newsroom.snap.com/ai-text-to-ima… Paper: arxiv.org/abs/2412.09619
Ismail Salim tweet media
English
2
3
7
367
Ivan Chan retweetledi
Ismail Salim
Ismail Salim@IssySalim·
Short but sweet talk about the WebNN API: youtube.com/watch?v=FoYBWz… Def worth checking out the YouTube playlist from @jason_mayes WebAI Summit last year. It's packed with great talks! Looking forward to the next summit!
YouTube video
YouTube
Ismail Salim tweet media
English
1
2
7
301
Ivan Chan retweetledi
Ismail Salim
Ismail Salim@IssySalim·
Awesome work from @soldni and team at @allen_ai! If you're interested in shipping on-device AI language features in your app, I highly recommend checking out their demo app to get a sense of what's possible these days on an iPhone: apps.apple.com/us/app/ai2-olm…
Ai2@allen_ai

We took our most efficient model and made an open-source iOS app📱but why? As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime. Learn more from @soldni👇

English
1
2
7
497
Stephen Panaro
Stephen Panaro@flat·
Is there a “research paper → LLM text” tool? Something that doesn’t mangle formulas, weird text layouts, images for VLMs, etc
English
3
0
2
265
Stephen Panaro
Stephen Panaro@flat·
Two ways to run ModernBERT on Apple Neural Engine: A: Slow, inaccurate, but easy. B: Fast, accurate, but a lil’ tricky. Wrote about how to get from A to B.
Stephen Panaro tweet media
English
1
0
7
451
Ivan Chan
Ivan Chan@ivanchanavinah·
will it ever be feasible to run a 10M+ token prompt using a local LLM even the LLM supports it?
English
0
0
2
74
Andrew Wilkinson
Andrew Wilkinson@awilkinson·
Has anyone made a simple web app where you can upload huge amounts of PDF docs/text files/images and have it convert them into a lightweight text/CSV format to maximize LLM context window?
English
152
42
850
209K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
Incredible difference in accuracy 4bit vs 8bit with 8B models like Dolphin3.0-Llama3.1-8B 🤯 18.4% of difference! 🤯 Tested with MLX on French wines reviews: - 4bit accuracy: 40.60% - 8bit accuracy: 59.20% 4bit is pretty bad on 8B models, no?
English
10
1
24
2.4K
Ivan Chan
Ivan Chan@ivanchanavinah·
@omarsar0 I think NotebookLM does CAG, and it’s far better than any RAG solution I’ve ever used
English
0
0
1
58
elvis
elvis@omarsar0·
Don't do RAG Proposes cache-augmented generation (CAG) to eliminate retrieval latency and minimize retrieval errors. What is CAG? CAG aims to leverage the capabilities of long-context LLMs by preloading the LLM with all relevant docs in advance and precomputing the key-value (KV) cache. The preloaded context helps the model to provide contextually accurate answers without the need for additional retrieval during runtime. When to apply CAG? It's a useful alternative to RAG for cases where the documents/knowledge for retrieval are of limited, manageable size. My thoughts: As LLMs advance in capabilities, I suspect that what we know as RAG today could change significantly either architecturally or how it's optimized. CAG is one in a growing list of developments and new ideas that have emerged recently to address limitations like poor retrieval relevancy and latency. There could also be hybrid methods that combine preloading with selective retrieval. Don't sleep on long-context LLMs. They are here to stay.
elvis tweet media
English
52
296
2K
169.8K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
@cognitivecompai Yes and it was slightly lower than 8B that sounded strange to me. I’ll retry tomorrow with more reviews. It’s local, virtually 0 costs 😎
English
1
0
3
176
Ivan Chan
Ivan Chan@ivanchanavinah·
@flat Sorry I meant *best* inference time (not highest)
English
0
0
0
20
Ivan Chan
Ivan Chan@ivanchanavinah·
Let me get back to you once I’m with my computer :) but I think you’re right, because we do compression “sweeps” all the time. And usually, the pruning configs with the lowest accuracy have the highest inference time. So yes, it sounds like higher ratios lead to faster inference times
English
1
0
1
36
Stephen Panaro
Stephen Panaro@flat·
Lengthening Llama 3.2 1B’s context slows it down. What about speeding it up? Swept ~all quantization options on Apple Neural Engine. Takeaways: → Under 4 bit not worth it on M1 → Some bottleneck >6 bit on iPhone 15 Pro Confused by the second one, but measured twice.
Stephen Panaro tweet media
English
4
1
10
826
Ivan Chan retweetledi
Sam
Sam@SamuelCrombie·
Hello, world!
English
2
1
9
604
Ivan Chan retweetledi
Julia Turc
Julia Turc@juliarturc·
There’s so much OSS out there, and I’m like a kid in the candy store. I want to try it all, but there are only so many hours in the day. Sometimes I just want to ask some high-level questions before deciding whether to invest in a repo. We made Code Sage to scratch that itch: fresh knowledge about OSS repos, with Perplexity-like references. DM me if you want a particular repo supported!
English
3
7
37
7.2K
Ivan Chan retweetledi
David Lieb
David Lieb@dflieb·
I already take @Waymo for granted, but I just found this photo from 2012 of the first time I saw one one the road in Mountain View. 12 years of persistence and hard work.
David Lieb tweet media
English
17
13
442
32.5K