Alluxua

241 posts

Alluxua

@alluxio_f

Machine Learning Engineer. Love investment, love machine learning . Create the best version of myself! Be a winner of life !!

Katılım Temmuz 2016

285 Takip Edilen54 Takipçiler

Alluxua@alluxio_f·26 Şub

@JFPuget @summeryue0 🤪llama3.1 🤪

Español

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·23 Şub

@summeryue0 Courageous of you to share. Thanks for that. What model were you using with openclaw? Would compaction quality depend on the model? Would some models retain instructions better?

English

4.8K

Summer Yue@summeryue0·23 Şub

Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.

English

2.3K

1.7K

17.4K

10.1M

Alluxua retweetledi

the tiny corp@__tinygrad__·15 Eki

Thanks to sirhcm, tinygrad now supports all the backends in Mesa by rendering to NIR. One of the Mesa backends is NAK, and with it, we can compile to SASS. An NVIDIA free stack! github.com/tinygrad/tinyg…

English

159

11.8K

Alluxua@alluxio_f·29 Tem

@__tinygrad__ Wow, that is amazing! Can I get a bunch of GPUs from lambda and use tingrad to cluster them together?

English

657

the tiny corp@__tinygrad__·29 Tem

tinygrad now supports NVIDIA 5090/4090 without the kernel driver, similar to AMD. It's tested in our CI.

English

1.1K

83.5K

Alluxua@alluxio_f·31 Oca

@ZhidingYu He did great at Nvidia, but that doesn’t mean DeepSeek follows the same approach, without hidden agenda. ZipLab is tied to Zhejiang University, a government-funded institution. Most friends who returned to China learned the hard way—no exception. Disclaimer: Not into politics.

English

Zhiding Yu@ZhidingYu·27 Oca

Zizheng was one of our interns at NVIDIA back in summer 2023. Later, when we were considering to make him a FT offer, he chose to join DeepSeek without much hesitance. Back then, the DeepSeek multimodal team only has 3 people. I am still very much impressed by Zizheng’s decision at that time. He has been an important contributor of several important works at DeepSeek, including DeepSeek-VL2, DeepSeek-V3, and DeepSeek-R1. I am personally very happy for his decision and the great achievements. Zizheng’s case is a very typical example of what I have witnessed in recent years. Many of our best talents come from China, and these talents don’t have to succeed only in a US company. Instead, we learn a lot from them. The same Sputnik Moment has already happened in AV back in 2022, and it will continue to happen in Robotics and LLM industry as well. I love NVIDIA and want to see her as a continued major contributor to the path of AGI and general autonomy. But if we keep cooking up geo-political agendas and creating hostile opinions to Chinese researchers, we will shoot ourselves in the foot and lose even more competitiveness. We need more talent density, professionalism, learnings, creativity and stronger execution. We don’t need political narratives and clowns like Alexandr Wang.

Zizheng Pan@zizhpan

This moment is absolutely phenomenal to me.

English

225

9.4K

2.1M

Alluxua@alluxio_f·31 Oca

@tunguz Curious—why do you think DeepSeek is good? I tested it for days: one old OpenAI model, one keyword matcher. No real China knowledge, just government PR docs.

English

160

Bojan Tunguz@tunguz·31 Oca

China bad. We good.

English

171

19.1K

Alluxua@alluxio_f·31 Oca

@SamAltsMan @tunguz Well, they find a way to get INTO your system. Absolutely not distillations. There are 2 types of model into their system: one is from OpenAI, another is key words match to their internal documents.

English

Bojan Tunguz@tunguz·30 Oca

The average age of a DeepSeek employee is 16.

English

418

58.3K

Alluxua@alluxio_f·31 Oca

@tunguz Deep seek knows keywords match. That is it. Everything else is hacking into the system!

English

Alluxua@alluxio_f·30 Oca

@markchen90 @dylan522p @Mayhem4Markets If you test DeepSeek-R1 extensively, you’ll see it’s not a distilled model but a full model, strikingly similar to an older OpenAI model—without a real Chinese knowledge base. Test it for a day, and you’ll see for yourself.

English

Mark Chen@markchen90·29 Oca

@dylan522p is more often right than wrong (h/t @Mayhem4Markets)

Mark Chen@markchen90

However, I think the external response has been somewhat overblown, especially in narratives around cost. One implication of having two paradigms (pre-training and reasoning) is that we can optimize for a capability over two axes instead of one, which leads to lower costs.

English

247

159K

Alluxua@alluxio_f·30 Oca

@alexandr_wang If you test DeepSeek-R1 extensively, you’ll see its performance doesn’t match the claims. It’s not a distilled model but a full model, strikingly similar to an older OpenAI model—without a real Chinese knowledge base. Test it for a day, and you’ll see for yourself.

English

131

Alexandr Wang@alexandr_wang·29 Oca

What does DeepSeek R1 & v3 mean for LLM data? Contrary to some lazy takes I’ve seen, DeepSeek R1 was trained on a shit ton of human-generated data—in fact, the DeepSeek models are setting records for the disclosed amount of post-training data for open-source models: - 600,000 reasoning data [1] - 200,000 non-reasoning SFT data [2] - human preference (RLHF) dataset of undisclosed size [3] - human-processed synthetic data for cold-start data [4] According to Chinese AI engineers, DeepSeek actually values data annotation even more than other Chinese labs, with the CEO personally labeling data for the model [5] (This reminds me of @karpathy who used to spend a quarter of his time labeling at Tesla). The DeepSeek-v3 paper even has a dedicated acknowledgement section for Data Annotation [6]. DeepSeek-V3, which was distilled from DeepSeek-R1, was also trained on an instruction-tuning dataset of 1.5M samples. [7] These SFT datasets are even larger than other open-source models: - Qwen-2.5 was trained on 1M SFT samples [8] - the last time Meta disclosed was for Llama 2, which was trained on only 30k SFT samples and 3M RLHF samples [9] - Kimi k1.5 was trained on roughly 1M SFT, 1M multi-modal SFT, 800k samples for classic reward modeling, and another 800k CoT labeled examples for reasoning [10] It’s interesting that the size of the RLHF dataset was undisclosed, while they disclosed the size of the SFT and reasoning datasets. This could be because it is much larger than one would expect, or it reveals some interesting technical detail they don’t care to share. Human preference datasets are often much larger than SFT datasets in most models, so a reasonable estimate would be that DeepSeek’s models are probably trained on at least 3-5M samples, which is quite a large preference dataset! The main technical breakthrough of DeepSeek-R1 is that for reasoning, you can forgo SFT data in favor of reasoning data—but reasoning data is still human data of difficult problems&answers in a variety of domains. The reasoning dataset is actually quite large—600k reasoning samples is a LOT. This is in line with a broader trend we’ve seen from SFT data towards other data types like human preference/RLHF data and reasoning data. This is for technical reasons—SFT caps the performance of the model at a certain level, whereas RLHF or other methods enable the models to continue improving without bound beyond the limits of the dataset. DeepSeek R1 is a very exciting model, and it’s great to see o1 reasoning capabilities replicated in the wild. In terms of training data, however, the DeepSeek models are actually setting open-source records in terms of the amount of human data used. [1] [2] [3] [4] arxiv.org/pdf/2501.12948 [5] chinatalk.media/p/deepseek-the… [6] arxiv.org/html/2412.1943… [7] arxiv.org/html/2412.1943… [8] arxiv.org/pdf/2412.15115 [9] arxiv.org/pdf/2307.09288 [10] arxiv.org/html/2501.1259…

English

180

304

1.9K

547.3K

Alluxua@alluxio_f·29 Oca

@markchen90 @dylan522p @Mayhem4Markets They may just get the leaked openai model from some sources. Their output not consistent and tons of answers not make sense at all. They may have faked every thing to short Nvidia .

English

210

Alluxua retweetledi

Alberta Independence@albertaseparate·17 Kas

RETWEET If you give ALBERTA SEPARATION your THUMBS UP! #albertaseparation

English

515

2.9K

4.4K

Alluxua@alluxio_f·2 Eyl

@realGeorgeHotz Can you make it 12 months no interest payment?

English

Alluxua@alluxio_f·26 Tem

Music Created by AI: Evening Chill

English

Alluxua@alluxio_f·25 Tem

Please enjoy my AI music🥰🎊🎈 youtu.be/wkCrcfxhbxo?si…

YouTube

English

Alluxua retweetledi

Elon Musk@elonmusk·14 Tem

I fully endorse President Trump and hope for his rapid recovery

English

82.8K

333.2K

2.2M

224.1M

Alluxua@alluxio_f·7 Tem

@ilyasut Congrats 🎉🎊🎈

English

Alluxua retweetledi

Ilya Sutskever@ilyasut·19 Haz

I am starting a new company:

SSI Inc.@ssi

Superintelligence is within reach. Building safe superintelligence (SSI) is the most important technical problem of our time. We've started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence. It’s called Safe Superintelligence Inc. SSI is our mission, our name, and our entire product roadmap, because it is our sole focus. Our team, investors, and business model are all aligned to achieve SSI. We approach safety and capabilities in tandem, as technical problems to be solved through revolutionary engineering and scientific breakthroughs. We plan to advance capabilities as fast as possible while making sure our safety always remains ahead. This way, we can scale in peace. Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures. We are an American company with offices in Palo Alto and Tel Aviv, where we have deep roots and the ability to recruit top technical talent. We are assembling a lean, cracked team of the world’s best engineers and researchers dedicated to focusing on SSI and nothing else. If that’s you, we offer an opportunity to do your life’s work and help solve the most important technical challenge of our age. Now is the time. Join us. Ilya Sutskever, Daniel Gross, Daniel Levy June 19, 2024

English

1.5K

3.1K

30.4K

7.4M

Alluxua@alluxio_f·2 Tem

@__tinygrad__ @realGeorgeHotz Have interest. Where to purchase it?

English

186

the tiny corp@__tinygrad__·19 Haz

Customer tinybox #1

English

133

2.5K

205.6K

Alluxua retweetledi

Shubham Saboo@Saboo_Shubham_·28 Haz

Claude Sonnet 3.5 can now search the web and generate images with ChatLLM. With just $10 you can use Claude Sonnet 3.5, GPT-4o, & Llama 3 in a single AI playground. Plus, build custom AI agents with RAG without writing a single line of Python Code.

English

529

79.8K

Alluxua@alluxio_f·14 Nis

@__tinygrad__ Can I still buy 6 new Nvidia 4090 and get it working as you did? Or only the old Nvidia 4090 can do tricks. Thank you!

English

the tiny corp@__tinygrad__·12 Nis

Source is here. With some cleanups, it might even be upstreamable. It relies on large BAR support. There's a decent writeup of how it works in the README. github.com/tinygrad/open-…

English

279

22.6K

the tiny corp@__tinygrad__·12 Nis

We added P2P support to 4090 by modifying NVIDIA's driver. Works with tinygrad and nccl (aka torch). 14.7 GB/s AllReduce on tinybox green!

English

1.1K

192.5K

Keşfet

@JFPuget @summeryue0 @__tinygrad__ @ZhidingYu @tunguz @SamAltsMan @markchen90 @dylan522p