RunAnywhere (YC W26)

209 posts

RunAnywhere (YC W26) banner
RunAnywhere (YC W26)

RunAnywhere (YC W26)

@RunAnywhereAI

RunAnywhere: The default way of running on-device AI at scale. Backed by @ycombinator @yoheinakajima

Beigetreten Temmuz 2025
7 Folgt1.1K Follower
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
At @RunAnywhereAI we just extended MetalRT with S2S support: beating @Apple at their own game once again and delivering the FASTEST Speech-to-Speech engine on Apple Silicon right now, the ONLY truly multimodal inference provider on the market. - 1.68s best latency - 1.52x faster than mlx-audio - 123 tok/s generate We crushed mlx-audio across short, medium, and long audio clips on @liquidai LFM2.5-Audio-1.5B 8-bit quantized on a single M4 Max. Multimodal inference just hit warp speed, full voice-video-text fusion coming soon. #ycombinator #runanywhere #ondeviceai #applesilicon #metalrt #S2S
Sanchit monga tweet media
Sanchit monga@sanchitmonga22

At @RunAnywhereAI we just extended MetalRT with 👀 support: beating @Apple at their own game once AGAIN and delivering the FASTEST VLM decode engine on the market for Apple Silicon right now. - 279 tok/s vision decode - 1.22× faster than mlx-vlm We crushed mlx-vlm and llama.cpp across every configuration tested on Qwen3-VL-2B-Instruct 4-bit quantized across multiple image resolutions on a single M4 Max. Vision decode just hit warp speed! Video analysis coming soon :) #ycombinator #runanywhere #metalrt #applesilicon #vlm #ondeviceai

English
3
3
16
967
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
At @RunAnywhereAI we just made VLM analysis warp-speed easy with MetalRT in RCLI. Grabbed a live $NVDA chart from the web, screenshot, and boom: Qwen3-VL-2B crushes the breakdown on my M4 Max in seconds. Trend spotting, levels, buy signals, all on-device. Vision decode at 279 tok/s changes everything. #ycombinator #runanywhere #ondeviceai #applesilicon #vlm #metalrt
Sanchit monga@sanchitmonga22

At @RunAnywhereAI we just extended MetalRT with 👀 support: beating @Apple at their own game once AGAIN and delivering the FASTEST VLM decode engine on the market for Apple Silicon right now. - 279 tok/s vision decode - 1.22× faster than mlx-vlm We crushed mlx-vlm and llama.cpp across every configuration tested on Qwen3-VL-2B-Instruct 4-bit quantized across multiple image resolutions on a single M4 Max. Vision decode just hit warp speed! Video analysis coming soon :) #ycombinator #runanywhere #metalrt #applesilicon #vlm #ondeviceai

English
2
3
11
926
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
At @RunAnywhereAI we just extended MetalRT with 👀 support: beating @Apple at their own game once AGAIN and delivering the FASTEST VLM decode engine on the market for Apple Silicon right now. - 279 tok/s vision decode - 1.22× faster than mlx-vlm We crushed mlx-vlm and llama.cpp across every configuration tested on Qwen3-VL-2B-Instruct 4-bit quantized across multiple image resolutions on a single M4 Max. Vision decode just hit warp speed! Video analysis coming soon :) #ycombinator #runanywhere #metalrt #applesilicon #vlm #ondeviceai
Sanchit monga tweet media
Sanchit monga@sanchitmonga22

In just 48 hours at @RunAnywhereAI we built MetalRT: beating @Apple at their own game and delivering the FASTEST LLM inference engine on the market for Apple Silicon right now. - 570 tok/s decode @liquidai LFM 2.5-1.2B 4-bit - 658 tok/s decode @Alibaba_Qwen Qwen3-0.6B, 4-bit - 6.6 ms time-to-first-token - 1.19× faster than Apple's own MLX (identical model files) - 1.67× faster than llama.cpp on average We crushed Apple MLX, llama.cpp, uzu(by TryMirai), and Ollama across four different 4-bit models, including the on-device optimized LFM2.5-1.2B on a single M4 Max. Excited for this one! #ycombinator #runanywhere #ondeviceai #applesilicon #mlx

English
5
8
36
5.8K
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
VLM support just added to RCLI. Your local Mac AI can now see images. Drag and drop any image or press V and it analyzes instantly. All on-device with llama.cpp right now. In the demo video I pointed the camera at my phone showing a photo of @ShubhamMal72313 and me at @ycombinator. Qwen3-VL-2B gave a very accurate description. We're bringing vision support back to MetalRT which will once again be the FASTEST visual language model on Apple Silicon :) Faster than human eyes can perceive and react. PR in comments. All local. All open source. All free. #ycombinator #runanywhere #metalrt #vlm
Sanchit monga tweet media
Sanchit monga@sanchitmonga22

just shipped Personalities on RCLI: your local voice AI can now be professional, sarcastic, cynical, or a full-blown nerd who references Star Wars in every answer siri could never have a personality crisis running entirely on your GPU all of it still powered by our fastest inference engine: MetalRT by @RunAnywhereAI , check it out! #runanywhere #ondeviceai #sirikiller #metalRT

English
1
2
3
571
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
just shipped Personalities on RCLI: your local voice AI can now be professional, sarcastic, cynical, or a full-blown nerd who references Star Wars in every answer siri could never have a personality crisis running entirely on your GPU all of it still powered by our fastest inference engine: MetalRT by @RunAnywhereAI , check it out! #runanywhere #ondeviceai #sirikiller #metalRT
Sanchit monga@sanchitmonga22

Launching RCLI + MetalRT: the FASTEST on-device voice AI for macOS, crushing Apple MLX, MLX Whisper, MLX Audio, llama.cpp, sherpa-onnx, Ollama, uzu & every inference engine out there, built by @RunAnywhereAI - 658 tok/s LLM decode (1.19x > MLX) - 714x real-time STT (4.6x > MLX Whisper) - 8.8x RTF TTS (2.8x > MLX Audio) - 63ms voice-to-audio latency Hybrid RAG: <4ms retrieval (HNSW+BM25+RRF), sub-200ms w/ embedding cache 36 macOS actions: open apps, web search, control Spotify - all local, offline, open-source - accepting PRs for more custom actions. No API keys. Stay tuned, more updates incoming, about to hit WARP SPEED!! #ycombinator #runanywhere #applesilicon #metalRT

English
2
3
17
2.5K
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
This has been obvious for a while: local models would ignite hardware competition and finally democratize frontier-level AI. We didn't wait to see it play out. That's the whole reason @RunAnywhereAI exists: seamless, high-performance local inference on any hardware today.
@jason@Jason

The top .1% of users are playing with local LLMS This will 10x every 12 months.... Until Apple, Dell and MSFT have one on your local device BY DEFAULT in 2028 This takes the local hardware spec race from irrelevant for 99% of users to critical. Going to be insane.

English
0
1
10
878
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
So close to fulfilling the prophecy!
Sanchit monga tweet media
English
1
2
7
1.7K
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
Launching RCLI + MetalRT: the FASTEST on-device voice AI for macOS, crushing Apple MLX, MLX Whisper, MLX Audio, llama.cpp, sherpa-onnx, Ollama, uzu & every inference engine out there, built by @RunAnywhereAI - 658 tok/s LLM decode (1.19x > MLX) - 714x real-time STT (4.6x > MLX Whisper) - 8.8x RTF TTS (2.8x > MLX Audio) - 63ms voice-to-audio latency Hybrid RAG: <4ms retrieval (HNSW+BM25+RRF), sub-200ms w/ embedding cache 36 macOS actions: open apps, web search, control Spotify - all local, offline, open-source - accepting PRs for more custom actions. No API keys. Stay tuned, more updates incoming, about to hit WARP SPEED!! #ycombinator #runanywhere #applesilicon #metalRT
Sanchit monga@sanchitmonga22

We built the future of voice AI on your Mac. RCLI is here @RunAnywhereAI! Our optimized end-to-end voice + RAG pipeline: talk → instant control + doc answers, ~131ms latency, - all LOCAL - all OPEN SOURCE - all FREE. 43 actions, no cloud, your data forever private. Siri: “Let me think about that…” RCLI: 131 ms voice-to-action. Done. Next. Experience it—install & command your machine: curl -fsSL raw.githubusercontent.com/RunanywhereAI/… | bash Next level incoming: MetalRT support (fastest Apple Silicon inference 658 tok/s decode, blazing ASR and TTS). Your Mac's about to hit warp speed! #OnDevice #MetalRT #YCW26 #NoMoreWaiting

English
12
20
146
16.7K
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
MetalRT just became the first complete AI inference engine for Apple Silicon: LLM + STT + TTS by @RunAnywhereAI. We already had the fastest LLM decode (658 tok/s). Now we've crushed STT and TTS too, beating MLX across the board. Today's numbers on M4 Max: - 1-hour podcast transcribed in ~5 seconds - 3-hour meeting transcribed in ~15 seconds - Live captioning with zero perceptible delay - 714x faster than real-time for STT - 4.6x faster than Apple's MLX on speech-to-text All three modalities. One unified engine. And this is just the individual components. The full voice AI pipeline we're building on top will be the FASTEST ever on Apple Silicon. Launching soon. Full benchmarks, charts, and details in the comments. #AppleSilicon #OnDeviceAI #MetalRT #STT #TTS #VoiceAI
Sanchit monga tweet media
Sanchit monga@sanchitmonga22

In just 48 hours at @RunAnywhereAI we built MetalRT: beating @Apple at their own game and delivering the FASTEST LLM inference engine on the market for Apple Silicon right now. - 570 tok/s decode @liquidai LFM 2.5-1.2B 4-bit - 658 tok/s decode @Alibaba_Qwen Qwen3-0.6B, 4-bit - 6.6 ms time-to-first-token - 1.19× faster than Apple's own MLX (identical model files) - 1.67× faster than llama.cpp on average We crushed Apple MLX, llama.cpp, uzu(by TryMirai), and Ollama across four different 4-bit models, including the on-device optimized LFM2.5-1.2B on a single M4 Max. Excited for this one! #ycombinator #runanywhere #ondeviceai #applesilicon #mlx

English
9
15
103
12K
RunAnywhere (YC W26) retweetet
Shubham Malhotra
Shubham Malhotra@ShubhamMal72313·
Since MetalRT support is coming soon by @RunAnywhereAI (@ycombinator, W'26) In the meantime, here's what's already running fully on-device in RCLI: full voice-to-action + RAG on my own documents. No cloud, no data leaving the machine. Your files stay yours. Running @liquidai LFM models + Piper TTS locally - blazing fast voice. #MetalRT #LocalRAG #YCW26 #OnDeviceAI #runanywhere
Sanchit monga@sanchitmonga22

We built the future of voice AI on your Mac. RCLI is here @RunAnywhereAI! Our optimized end-to-end voice + RAG pipeline: talk → instant control + doc answers, ~131ms latency, - all LOCAL - all OPEN SOURCE - all FREE. 43 actions, no cloud, your data forever private. Siri: “Let me think about that…” RCLI: 131 ms voice-to-action. Done. Next. Experience it—install & command your machine: curl -fsSL raw.githubusercontent.com/RunanywhereAI/… | bash Next level incoming: MetalRT support (fastest Apple Silicon inference 658 tok/s decode, blazing ASR and TTS). Your Mac's about to hit warp speed! #OnDevice #MetalRT #YCW26 #NoMoreWaiting

English
4
4
13
1.7K
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
We built the future of voice AI on your Mac. RCLI is here @RunAnywhereAI! Our optimized end-to-end voice + RAG pipeline: talk → instant control + doc answers, ~131ms latency, - all LOCAL - all OPEN SOURCE - all FREE. 43 actions, no cloud, your data forever private. Siri: “Let me think about that…” RCLI: 131 ms voice-to-action. Done. Next. Experience it—install & command your machine: curl -fsSL raw.githubusercontent.com/RunanywhereAI/… | bash Next level incoming: MetalRT support (fastest Apple Silicon inference 658 tok/s decode, blazing ASR and TTS). Your Mac's about to hit warp speed! #OnDevice #MetalRT #YCW26 #NoMoreWaiting
GIF
Erick@ErickSky

Esta startup es una seria amenaza para Siri ☠️ Se llama @RunAnywhereAI y acaba de soltar RCLI: un asistente de voz 100% local que ya le gana en velocidad y privacidad. 131 ms end-to-end (voz → respuesta hablada) ⭐️Controla 43 acciones nativas de macOS (Spotify, ventanas, FaceTime, recordatorios…). ⭐️RAG instantáneo en tus PDFs y documentos. ⭐️Todo offline, sin nube, sin API keys. Eso no es todo. Lo que se viene... mamita. El founder acaba de revelar MetalRT (su nuevo motor TTS hecho con Metal y lo que ves en el vídeo) que logra 291 ms para 5 palabras y 8.4x más rápido que el tiempo real. Cuando salga esa actualización… Siri va a llorar. Por mientras, REPOOO 👇

English
7
18
127
31.2K
RunAnywhere (YC W26) retweetet
Erick
Erick@ErickSky·
Esta startup es una seria amenaza para Siri ☠️ Se llama @RunAnywhereAI y acaba de soltar RCLI: un asistente de voz 100% local que ya le gana en velocidad y privacidad. 131 ms end-to-end (voz → respuesta hablada) ⭐️Controla 43 acciones nativas de macOS (Spotify, ventanas, FaceTime, recordatorios…). ⭐️RAG instantáneo en tus PDFs y documentos. ⭐️Todo offline, sin nube, sin API keys. Eso no es todo. Lo que se viene... mamita. El founder acaba de revelar MetalRT (su nuevo motor TTS hecho con Metal y lo que ves en el vídeo) que logra 291 ms para 5 palabras y 8.4x más rápido que el tiempo real. Cuando salga esa actualización… Siri va a llorar. Por mientras, REPOOO 👇
Español
3
14
87
21.3K
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
MetalRT delivers the fastest TTS inference on Apple Silicon. Key results on M3 Max: - 291 ms latency for 5 words, 8.4x RTF, 2.8x faster than mlx - Lowest latency recorded: 291 ms - Peak RTF: 8.8x on longer inputs This enables instant-feeling text-to-speech directly on device. FASTEST voice AI pipeline on Apple silicon coming soon powered by @RunAnywhereAI #AppleSilicon #TTS #MetalRT #OnDeviceAI #runanywhere
English
10
31
306
34.9K
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
Couldn’t agree more. @yoheinakajima isn’t just an investor, I still remember when our @RunAnywhereAI app was super early and janky, he could see our vision. We created a small baby AGI demo with running @liquidai models on that, and that was the “baby AGI” moment for us.
Forward Future@ForwardFuture

“The first time we realized agents were possible was watching @yoheinakajima build with Claude and Replit side by side.” @Replit CEO @amasad on the origins of coding agents: “He would prompt Claude Code, paste it into Replit, run it, get an error, copy the error back to Claude, and repeat.” “And watching that process, it became obvious.” “We could just automate the whole loop.” “That’s where agents start.”

English
0
2
9
1.2K
RunAnywhere (YC W26) retweetet
Sanchit monga
Sanchit monga@sanchitmonga22·
Excited to share that @RunAnywhereAI was featured in International Business Times, Singapore: ibtimes.sg/runanywhere-in… A big reason this mission is personal to me is because growing up, I did not always have access to the best computer at home, and internet was not always available either. Sometimes Wi-Fi felt expensive, and sometimes my parents understandably did not see a reason to pay extra for it. That stayed with me. It shaped how I think about technology, access, and who gets left out when progress depends on constant connectivity and expensive hardware. Now, living in San Francisco, I see so many tools and technologies people take for granted every day. That contrast has only made this mission feel more urgent to me. We’re building RunAnywhere so AI can run directly on devices people already have, making it faster, more private, and far more accessible. Grateful to be building this with the support and community of @ycombinator behind us and ofcourse my cofounder @ShubhamMal72313 We believe the future of AI should not belong only to those with the best cloud infrastructure, but to everyone, everywhere. #ycombinator #ondeviceai #runanywhere #yc
English
1
3
11
571