Alluxua

241 posts

Alluxua

Alluxua

@alluxio_f

Machine Learning Engineer. Love investment, love machine learning . Create the best version of myself! Be a winner of life !!

Katılım Temmuz 2016
285 Takip Edilen54 Takipçiler
JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱
@summeryue0 Courageous of you to share. Thanks for that. What model were you using with openclaw? Would compaction quality depend on the model? Would some models retain instructions better?
English
3
1
29
4.8K
Summer Yue
Summer Yue@summeryue0·
Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.
Summer Yue tweet mediaSummer Yue tweet mediaSummer Yue tweet media
English
2.3K
1.7K
17.4K
10.1M
Alluxua retweetledi
the tiny corp
the tiny corp@__tinygrad__·
Thanks to sirhcm, tinygrad now supports all the backends in Mesa by rendering to NIR. One of the Mesa backends is NAK, and with it, we can compile to SASS. An NVIDIA free stack! github.com/tinygrad/tinyg…
the tiny corp tweet media
English
5
7
159
11.8K
Alluxua
Alluxua@alluxio_f·
@__tinygrad__ Wow, that is amazing! Can I get a bunch of GPUs from lambda and use tingrad to cluster them together?
English
0
0
0
657
the tiny corp
the tiny corp@__tinygrad__·
tinygrad now supports NVIDIA 5090/4090 without the kernel driver, similar to AMD. It's tested in our CI.
the tiny corp tweet media
English
24
62
1.1K
83.5K
Alluxua
Alluxua@alluxio_f·
@ZhidingYu He did great at Nvidia, but that doesn’t mean DeepSeek follows the same approach, without hidden agenda. ZipLab is tied to Zhejiang University, a government-funded institution. Most friends who returned to China learned the hard way—no exception. Disclaimer: Not into politics.
English
0
0
0
48
Zhiding Yu
Zhiding Yu@ZhidingYu·
Zizheng was one of our interns at NVIDIA back in summer 2023. Later, when we were considering to make him a FT offer, he chose to join DeepSeek without much hesitance. Back then, the DeepSeek multimodal team only has 3 people. I am still very much impressed by Zizheng’s decision at that time. He has been an important contributor of several important works at DeepSeek, including DeepSeek-VL2, DeepSeek-V3, and DeepSeek-R1. I am personally very happy for his decision and the great achievements. Zizheng’s case is a very typical example of what I have witnessed in recent years. Many of our best talents come from China, and these talents don’t have to succeed only in a US company. Instead, we learn a lot from them. The same Sputnik Moment has already happened in AV back in 2022, and it will continue to happen in Robotics and LLM industry as well. I love NVIDIA and want to see her as a continued major contributor to the path of AGI and general autonomy. But if we keep cooking up geo-political agendas and creating hostile opinions to Chinese researchers, we will shoot ourselves in the foot and lose even more competitiveness. We need more talent density, professionalism, learnings, creativity and stronger execution. We don’t need political narratives and clowns like Alexandr Wang.
Zizheng Pan@zizhpan

This moment is absolutely phenomenal to me.

English
225
1K
9.4K
2.1M
Alluxua
Alluxua@alluxio_f·
@tunguz Curious—why do you think DeepSeek is good? I tested it for days: one old OpenAI model, one keyword matcher. No real China knowledge, just government PR docs.
English
0
0
0
160
Bojan Tunguz
Bojan Tunguz@tunguz·
China bad. We good.
English
28
3
171
19.1K
Alluxua
Alluxua@alluxio_f·
@SamAltsMan @tunguz Well, they find a way to get INTO your system. Absolutely not distillations. There are 2 types of model into their system: one is from OpenAI, another is key words match to their internal documents.
English
0
0
0
35
Bojan Tunguz
Bojan Tunguz@tunguz·
The average age of a DeepSeek employee is 16.
English
54
16
418
58.3K
Alluxua
Alluxua@alluxio_f·
@tunguz Deep seek knows keywords match. That is it. Everything else is hacking into the system!
English
0
0
0
43
Alluxua
Alluxua@alluxio_f·
@markchen90 @dylan522p @Mayhem4Markets If you test DeepSeek-R1 extensively, you’ll see it’s not a distilled model but a full model, strikingly similar to an older OpenAI model—without a real Chinese knowledge base. Test it for a day, and you’ll see for yourself.
English
0
0
0
42
Alluxua
Alluxua@alluxio_f·
@alexandr_wang If you test DeepSeek-R1 extensively, you’ll see its performance doesn’t match the claims. It’s not a distilled model but a full model, strikingly similar to an older OpenAI model—without a real Chinese knowledge base. Test it for a day, and you’ll see for yourself.
English
0
0
0
131
Alexandr Wang
Alexandr Wang@alexandr_wang·
What does DeepSeek R1 & v3 mean for LLM data? Contrary to some lazy takes I’ve seen, DeepSeek R1 was trained on a shit ton of human-generated data—in fact, the DeepSeek models are setting records for the disclosed amount of post-training data for open-source models: - 600,000 reasoning data [1] - 200,000 non-reasoning SFT data [2] - human preference (RLHF) dataset of undisclosed size [3] - human-processed synthetic data for cold-start data [4] According to Chinese AI engineers, DeepSeek actually values data annotation even more than other Chinese labs, with the CEO personally labeling data for the model [5] (This reminds me of @karpathy who used to spend a quarter of his time labeling at Tesla). The DeepSeek-v3 paper even has a dedicated acknowledgement section for Data Annotation [6]. DeepSeek-V3, which was distilled from DeepSeek-R1, was also trained on an instruction-tuning dataset of 1.5M samples. [7] These SFT datasets are even larger than other open-source models: - Qwen-2.5 was trained on 1M SFT samples [8] - the last time Meta disclosed was for Llama 2, which was trained on only 30k SFT samples and 3M RLHF samples [9] - Kimi k1.5 was trained on roughly 1M SFT, 1M multi-modal SFT, 800k samples for classic reward modeling, and another 800k CoT labeled examples for reasoning [10] It’s interesting that the size of the RLHF dataset was undisclosed, while they disclosed the size of the SFT and reasoning datasets. This could be because it is much larger than one would expect, or it reveals some interesting technical detail they don’t care to share. Human preference datasets are often much larger than SFT datasets in most models, so a reasonable estimate would be that DeepSeek’s models are probably trained on at least 3-5M samples, which is quite a large preference dataset! The main technical breakthrough of DeepSeek-R1 is that for reasoning, you can forgo SFT data in favor of reasoning data—but reasoning data is still human data of difficult problems&answers in a variety of domains. The reasoning dataset is actually quite large—600k reasoning samples is a LOT. This is in line with a broader trend we’ve seen from SFT data towards other data types like human preference/RLHF data and reasoning data. This is for technical reasons—SFT caps the performance of the model at a certain level, whereas RLHF or other methods enable the models to continue improving without bound beyond the limits of the dataset. DeepSeek R1 is a very exciting model, and it’s great to see o1 reasoning capabilities replicated in the wild. In terms of training data, however, the DeepSeek models are actually setting open-source records in terms of the amount of human data used. [1] [2] [3] [4] arxiv.org/pdf/2501.12948 [5] chinatalk.media/p/deepseek-the… [6] arxiv.org/html/2412.1943… [7] arxiv.org/html/2412.1943… [8] arxiv.org/pdf/2412.15115 [9] arxiv.org/pdf/2307.09288 [10] arxiv.org/html/2501.1259…
Alexandr Wang tweet mediaAlexandr Wang tweet mediaAlexandr Wang tweet mediaAlexandr Wang tweet media
English
180
304
1.9K
547.3K
Alluxua
Alluxua@alluxio_f·
@markchen90 @dylan522p @Mayhem4Markets They may just get the leaked openai model from some sources. Their output not consistent and tons of answers not make sense at all. They may have faked every thing to short Nvidia .
English
0
0
0
210
Alluxua
Alluxua@alluxio_f·
Music Created by AI: Evening Chill
English
0
0
0
40
Alluxua retweetledi
Elon Musk
Elon Musk@elonmusk·
I fully endorse President Trump and hope for his rapid recovery
English
82.8K
333.2K
2.2M
224.1M
Alluxua retweetledi
Ilya Sutskever
Ilya Sutskever@ilyasut·
I am starting a new company:
SSI Inc.@ssi

Superintelligence is within reach. Building safe superintelligence (SSI) is the most important technical problem of our​​ time. We've started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence. It’s called Safe Superintelligence Inc. SSI is our mission, our name, and our entire product roadmap, because it is our sole focus. Our team, investors, and business model are all aligned to achieve SSI. We approach safety and capabilities in tandem, as technical problems to be solved through revolutionary engineering and scientific breakthroughs. We plan to advance capabilities as fast as possible while making sure our safety always remains ahead. This way, we can scale in peace. Our singular focus means no distraction by management overhead or product cycles, and our business model means safety, security, and progress are all insulated from short-term commercial pressures. We are an American company with offices in Palo Alto and Tel Aviv, where we have deep roots and the ability to recruit top technical talent. We are assembling a lean, cracked team of the world’s best engineers and researchers dedicated to focusing on SSI and nothing else. If that’s you, we offer an opportunity to do your life’s work and help solve the most important technical challenge of our age. Now is the time. Join us. Ilya Sutskever, Daniel Gross, Daniel Levy June 19, 2024

English
1.5K
3.1K
30.4K
7.4M
the tiny corp
the tiny corp@__tinygrad__·
Customer tinybox #1
the tiny corp tweet media
English
70
133
2.5K
205.6K
Alluxua retweetledi
Shubham Saboo
Shubham Saboo@Saboo_Shubham_·
Claude Sonnet 3.5 can now search the web and generate images with ChatLLM. With just $10 you can use Claude Sonnet 3.5, GPT-4o, & Llama 3 in a single AI playground. Plus, build custom AI agents with RAG without writing a single line of Python Code.
English
37
73
529
79.8K
Alluxua
Alluxua@alluxio_f·
@__tinygrad__ Can I still buy 6 new Nvidia 4090 and get it working as you did? Or only the old Nvidia 4090 can do tricks. Thank you!
English
0
0
0
67
the tiny corp
the tiny corp@__tinygrad__·
Source is here. With some cleanups, it might even be upstreamable. It relies on large BAR support. There's a decent writeup of how it works in the README. github.com/tinygrad/open-…
English
3
17
279
22.6K
the tiny corp
the tiny corp@__tinygrad__·
We added P2P support to 4090 by modifying NVIDIA's driver. Works with tinygrad and nccl (aka torch). 14.7 GB/s AllReduce on tinybox green!
English
48
89
1.1K
192.5K