Festus

@_enfinity

10X Builder | AI Performance Engineer | Co-founded @mybridgecard(YC S22)

Making GPUs Go Brrr Inscrit le Temmuz 2018

1.4K Abonnements462 Abonnés

Tweet épinglé

Festus@_enfinity·25 Ara

Emulating @deepseek_ai 's Christmas day release, i'm in my “DeepSeek era”. Over the next 12 days I will be documenting an article series called Nanomechat. Nanomechat is a deep dive series into LLM fine‑tuning from fundamentals, 🧵

English

503

Festus@_enfinity·7h

youtu.be/LUGxx1XNqcM?si…

YouTube

ZXX

Festus@_enfinity·1d

Had to terminate my 1xA100 node on lambda earlier this morning and it was gone after I refreshed! 1xH100 is available currently on Lambda though.

snwy@snwy_me

where the fuck are all of the GPUs going?!? i need literally one 8xH100 node and i cannot for the life of me get one ANYWHERE

English

Festus@_enfinity·1d

# Switch to the new fix branch git checkout mig-fix # Install the modified version pip install -e . --no-build-isolation

English

Festus@_enfinity·1d

These commands should suffice: # Clone the repository git clone github.com /tmp/vllm-mig-fix # Move into the directory cd /tmp/vllm-mig-fix # Fetch the specific fix from the pull request git fetch origin pull/35526/head:mig-fix

English

Festus@_enfinity·1d

In a recent post I explained how MIG works. I actually tried it on a real benchmark. I had two @vllm_project instances, each on a 1g.5gb MIG slice. One tenant with steady traffic ,one bursting. 🧵 Image source: @nvidia Link: docs.nvidia.com/datacenter/tes… x.com/_enfinity/stat…

Festus@_enfinity

You know how you can split a CPU into cores and have different processes use separate, dedicated cores? Well, did you know you can do something almost identical on a GPU? 🧵 Image source: @nvidia Link: docs.nvidia.com/datacenter/tes…

English

Festus@_enfinity·1d

Think of MIG as "QoS for GPUs." If you want predictable, isolated slices, turn it on. If you want maximum raw, pooled performance, stick to the full GPU and use a smart scheduler.

English

Festus@_enfinity·1d

When should you skip it? If your workload relies on fast cross-GPU communication or needs every ounce of pooled performance to hit peak numbers, MIG is going to be your biggest bottleneck.

English

Festus@_enfinity·1d

English

Découvrir

@vllm_project @nvidia @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA