Nikola P. Borisov

62 posts

Nikola P. Borisov

Nikola P. Borisov

@nikolaborisof

CEO, Co-founder @DeepInfra, ex @imoim

Los Altos, CA Katılım Mart 2011
93 Takip Edilen179 Takipçiler
Nikola P. Borisov
Nikola P. Borisov@nikolaborisof·
Had a great time digging into this with @realmtbman. The supply chain behind inference matters more than most enterprises realize. youtu.be/DS2-iheW6pI
YouTube video
YouTube
Yohann Calpu@realmtbman

Enterprises ask "is your AI compliant?" The better question: who actually runs the inference? Nikola Borisov, co-founder of @DeepInfra ($107M Series B raise - including NVIDIA) on @palebluenexus: "You want to make sure you're not giving it to someone that will give it to someone that will give it to someone. And maybe the final inference happens in China."

English
0
1
2
95
DeepInfra
DeepInfra@DeepInfra·
Moonshot AI's Kimi 2 is now live on DeepInfra, as always at the best price of $0.55/$2.20, full tool call and context support. Best open source non-reasoning model available according to multiple benchmarks. Running on Nvidia Blackwell🇺🇸.
English
13
10
158
15.3K
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
Get Nvidia B200 GPUs for $1.99/h on demand until the end of July. Why not?
GIF
English
5
6
29
3.1K
nixCraft 🐧
nixCraft 🐧@nixcraft·
Linus Torvalds & Bill Gates just met each other for the first time
nixCraft 🐧 tweet media
English
383
1.3K
14.5K
1M
Nikola P. Borisov retweetledi
Supermicro
Supermicro@Supermicro·
Introducing Supermicro DLC-2: Superior liquid cooling that reduces power, water usage, noise, and space. The new liquid-cooled 4U NVIDIA HGX B200 8-GPU system doubles cooling capacity with advanced cold plates and a 250kW CDU.
English
20
49
379
2.1M
Nikola P. Borisov
Nikola P. Borisov@nikolaborisof·
GCP going down today was kind of crazy. Gather.town didn't work, google meet was kind of broken. Someone on the team had to join audio on WA. DeepInfra was not affected, but our GPUs cooled down a bit because some of our clients were using GCP.
English
0
1
3
220
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
Our execs came back from vacation and decided to have a pricing meeting. I guess llama3-70b is 35c now.
DeepInfra tweet media
English
4
7
25
3.7K
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
We just launched a TURBO version of the popular MythoMax model. You can get up to 120 tokens per second. Same price of 0.13 USD per 1M tokens.
English
1
3
9
1.3K
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
Official Mixtral-8x22b-Instruct model just got released and is now on @DeepInfra. This is the best open LLM and we are hosting at the best price of $0.65 / 1M tokens. deepinfra.com/mistralai/Mixt…
English
0
6
13
1.2K
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
Also we just dropped pricing on most of our 7b, 13b and 70b models to $0.10, $0.18, $0.64 per million input tokens. We will always have the best prices. deepinfra.com/pricing
English
1
5
9
1.2K
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
We set the bar when we launched the first mixtral, and we're going to do it again! The new mixtral, with 65K context, at ... 65c / 1 million tokens! This new model is almost 3 times larger. deepinfra.com/mistralai/Mixt…
English
1
9
21
2.6K
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
You can now host your custom LLMs at DeepInfra. It's managed LLM hosing service. Pay-per GPU/h $2/A100, $4/H100. It's super simple. Read more here deepinfra.com/blog/custom-ll…
English
0
8
13
1.9K
Nikola P. Borisov retweetledi
Hao AI Lab
Hao AI Lab@haoailab·
Still optimizing throughput for LLM Serving? Think again: Goodput might be a better choice! Splitting prefill from decode to different GPUs yields - up to 4.48x goodput - up to 10.2x stricter latency criteria Blog: hao-ai-lab.github.io/blogs/distserv… Paper: arxiv.org/abs/2401.09670
GIF
English
4
52
179
78.2K
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
Guided JSON response is now available @DeepInfra API. Read more about it here deepinfra.com/blog/json-mode. Restricting the output to JSON had almost no performance penalty and is FREE.
English
0
6
9
1.1K
Nikola P. Borisov retweetledi
DeepInfra
DeepInfra@DeepInfra·
We just shipped function calling. With great power comes great responsibility. If the LLM tells you to `shutil.rmtree("/")` maybe don't do it. deepinfra.com/blog/function-…
English
2
9
13
2.5K