Bohan Hou

51 posts

Bohan Hou banner
Bohan Hou

Bohan Hou

@bohanhou1998

CS Ph.D. student @ CMU

Pittsburgh, PA Katılım Ağustos 2020
89 Takip Edilen806 Takipçiler
Bohan Hou retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust tvm.apache.org/2025/10/21/tvm…
Tianqi Chen tweet media
English
3
41
165
38.4K
Bohan Hou retweetledi
Tim Dettmers
Tim Dettmers@Tim_Dettmers·
Happy to announce that I joined the CMU Catalyst with three of my incoming students. Our research will bring the best models to consumer GPUs with a focus on agent systems and MoEs. It is amazing to see so many talented people at Catalyst -- a very exciting ecosystem!
CMU School of Computer Science@SCSatCMU

Huge thank you to @NVIDIADC for gifting a brand new #NVIDIADGX B200 to CMU’s Catalyst Research Group! This AI supercomputing system will afford Catalyst the ability to run and test their work on a world-class unified AI platform.

English
13
48
340
24.3K
Bohan Hou retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
Really thrilled to receive #NVIDIADGX B200 from @nvidia . Looking forward to cooking with the beast. Together with an amazing team at CMU Catalyst group @BeidiChen @Tim_Dettmers @JiaZhihao @zicokolter, We are looking at the innovate across entire stack from model to instructions
CMU School of Computer Science@SCSatCMU

Huge thank you to @NVIDIADC for gifting a brand new #NVIDIADGX B200 to CMU’s Catalyst Research Group! This AI supercomputing system will afford Catalyst the ability to run and test their work on a world-class unified AI platform.

English
0
17
84
11.2K
Bohan Hou retweetledi
Zhihao Jia
Zhihao Jia@JiaZhihao·
Thank you to @NVIDIA for gifting our Catalyst Research Group the latest NVIDIA DGX B200! The B200 platform will greatly accelerate our research in building next-generation ML systems.🚀 #NVIDIADGX #DGXB200 @NVIDIADC
CMU School of Computer Science@SCSatCMU

Huge thank you to @NVIDIADC for gifting a brand new #NVIDIADGX B200 to CMU’s Catalyst Research Group! This AI supercomputing system will afford Catalyst the ability to run and test their work on a world-class unified AI platform.

English
0
10
51
8.1K
Bohan Hou retweetledi
Hongyi Jin
Hongyi Jin@HongyiJin258·
🚀Making cross-engine LLM serving programmable. Introducing LLM Microserving: a new RISC-style approach to design LLM serving API at sub-request level. Scale LLM serving with programmable cross-engine serving patterns, all in a few lines of Python. blog.mlc.ai/2025/01/07/mic…
Hongyi Jin tweet media
English
0
31
64
18.5K
Bohan Hou retweetledi
Ruihang Lai
Ruihang Lai@ruihanglai·
Announcing MLCEngine, a universal LLM deployment engine with ML Compilation. We rebuilt the engine with state-of-the-art serving optimizations and maximum local env portability. Fully OpenAI compatible for both cloud and local use cases. Check out the blog blog.mlc.ai/2024/06/07/uni…
Ruihang Lai tweet media
English
3
15
44
13.5K
Bohan Hou retweetledi
Charlie Ruan
Charlie Ruan@charlie_ruan·
Llama 3 from @AIatMeta is now up on WebLLM! Try it on webllm.mlc.ai with local inference accelerated by @WebGPU. Or start building your local agent with the web-llm package -- everything in-browser!
Charlie Ruan tweet mediaCharlie Ruan tweet media
English
2
12
77
23.5K
Bohan Hou retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
#Llama3 🦙🦙 running fully locally on iPad without internet connnection. credits to @ruihanglai and the team
English
0
15
73
7.8K
Bohan Hou retweetledi
Ruihang Lai
Ruihang Lai@ruihanglai·
Deploy #Llama3 locally with native GPU acceleration on CUDA/ROCm/Vulkan/Metal with MLC LLM. Check out llm.mlc.ai/docs/ for quick start instructions.
Ruihang Lai tweet media
English
1
6
11
1.8K
Bohan Hou retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
Please spread the words, #MLSys2024 will feature a full day single track-event young professional symposium with invited talks, panels, round tables, and poster sessions. Submit your 1-page abstract by April 1st & present your work at our poster session. sites.google.com/view/mlsys24yps
English
2
19
69
23K
Bohan Hou retweetledi
Mishaal Rahman
Mishaal Rahman@MishaalRahman·
I asked @Google's Gemma 2B LLM to write me a poem. This is being run using the MLCChat app for Android on my Samsung Galaxy S24 Ultra.
English
5
16
228
18.6K
Bohan Hou retweetledi
Junru Shao
Junru Shao@junrushao·
(1/3) 🦙🌟 Looking to run Llama2-70B? With two NV/AMD GPUs or more? 💥🔥 Machine learning compilation (MLC) now supports multi-GPU. ⚡️💻 We achieve 34 tok/sec on 2 x RTX 4090, the fastest solution at $3.2k. 🌐💡Two AMD 7900XTX delivers 30 tok/sec at $2k. blog.mlc.ai/2023/10/19/Sca…
Junru Shao tweet media
English
8
37
166
41.4K
Bohan Hou retweetledi
Junru Shao
Junru Shao@junrushao·
While LLM is resource hungry and challenging to run at satisfactory speed on small devices, we show that ML compilation (MLC) techniques makes it possible to actually generate tokens at 5 tok/sec on a $100 Orange Pi with a Mali GPU. blog.mlc.ai/2023/08/09/GPU…
Junru Shao tweet media
English
11
49
229
75.8K
Bohan Hou
Bohan Hou@bohanhou1998·
Making @AMD @amdradeon GPUs competitive for LLM inference! 130 toks/s of Llama 2 7B, 75 toks/s for 13B with ROCm 5.6 + 7900 XTX + 4 bit quantization 80% performance of Nvidia RTX 4090 See how we do this in detail and try out our Python packages here: blog.mlc.ai/2023/08/09/Mak…
Bohan Hou tweet media
English
9
39
184
77.2K
Bohan Hou retweetledi
Ruihang Lai
Ruihang Lai@ruihanglai·
Running Llama 2 directly in web browser with @WebGPU acceleration. Try it out at webllm.mlc.ai Build your own web app with Web LLM in 35 lines of code 👇, with npm package at @mlc-ai/web-llm" target="_blank" rel="nofollow noopener">npmjs.com/package/@mlc-a…
Ruihang Lai tweet mediaRuihang Lai tweet media
English
0
15
69
25.8K
Bohan Hou retweetledi
Zihao Ye
Zihao Ye@ye_combinator·
MLC-LLM now supports deploying Llama-2-70B-chat locally (needs an Apple Silicon Mac w/ 50GB VRAM to run).🦙💬🔥 The decoding speed can achieve ~10.0 tokens/s on an M2 Ultra! Try it out at: mlc.ai/mlc-llm/docs/g… and join our discord server: discord.gg/9Xpy2HGBuD
GIF
Junru Shao@junrushao

(1/2) 🦙 Buckle up and ready for a wild llama ride with 70B Llama-2 on a single MacBook 💻 🤯 Now 70B Llama-2 can be run smoothly on an 64G M2 max with 4bit quantization. 👉 Here is a step-by-step guide: mlc.ai/mlc-llm/docs/g… 🚀 How about the performance? It's

English
0
10
32
6.5K