Festus

3.6K posts

Festus banner
Festus

Festus

@_enfinity

10X Builder | AI Performance Engineer | Co-founded @mybridgecard(YC S22)

Making GPUs Go Brrr Inscrit le Temmuz 2018
1.4K Abonnements462 Abonnés
Tweet épinglé
Festus
Festus@_enfinity·
Emulating @deepseek_ai 's Christmas day release, i'm in my “DeepSeek era”. Over the next 12 days I will be documenting an article series called Nanomechat. Nanomechat is a deep dive series into LLM fine‑tuning from fundamentals, 🧵
Festus tweet media
English
1
1
1
503
Festus
Festus@_enfinity·
# Switch to the new fix branch git checkout mig-fix # Install the modified version pip install -e . --no-build-isolation
English
0
0
0
8
Festus
Festus@_enfinity·
These commands should suffice: # Clone the repository git clone github.com /tmp/vllm-mig-fix # Move into the directory cd /tmp/vllm-mig-fix # Fetch the specific fix from the pull request git fetch origin pull/35526/head:mig-fix
English
1
0
0
10
Festus
Festus@_enfinity·
In a recent post I explained how MIG works. I actually tried it on a real benchmark. I had two @vllm_project instances, each on a 1g.5gb MIG slice. One tenant with steady traffic ,one bursting. 🧵 Image source: @nvidia Link: docs.nvidia.com/datacenter/tes… x.com/_enfinity/stat…
Festus tweet media
Festus@_enfinity

You know how you can split a CPU into cores and have different processes use separate, dedicated cores? Well, did you know you can do something almost identical on a GPU? 🧵 Image source: @nvidia Link: docs.nvidia.com/datacenter/tes…

English
1
0
0
37
Festus
Festus@_enfinity·
Think of MIG as "QoS for GPUs." If you want predictable, isolated slices, turn it on. If you want maximum raw, pooled performance, stick to the full GPU and use a smart scheduler.
English
0
0
0
32
Festus
Festus@_enfinity·
When should you skip it? If your workload relies on fast cross-GPU communication or needs every ounce of pooled performance to hit peak numbers, MIG is going to be your biggest bottleneck.
English
1
0
0
15
Festus
Festus@_enfinity·
You know how you can split a CPU into cores and have different processes use separate, dedicated cores? Well, did you know you can do something almost identical on a GPU? 🧵 Image source: @nvidia Link: docs.nvidia.com/datacenter/tes…
Festus tweet media
English
1
0
0
71