Brian Dezhou Shen🇨🇳🇬🇧

1.7K posts

Brian Dezhou Shen🇨🇳🇬🇧 banner
Brian Dezhou Shen🇨🇳🇬🇧

Brian Dezhou Shen🇨🇳🇬🇧

@dezhou

Pythonist. Researcher. Data/Computer Scientist. C1@Oxford School of English, United Kingdom. CS Master's@Tsinghua University, China.

Katılım Mart 2009
100 Takip Edilen62 Takipçiler
Sabitlenmiş Tweet
Brian Dezhou Shen🇨🇳🇬🇧
#ChatGPT as an Analyst Motivation I want to ask ChatGPT to estimate the AI market value in 2023 providing the history data. Here is my question, and the generation output of ChatGPT is astonishing.
English
1
0
0
879
Sam Bhagwat
Sam Bhagwat@calcsam·
last month we wrote a new agents book: patterns for building ai agents it has everything you need to take your agents from prototype to production, like agent design patterns, the basics of security, etc reply to this tweet with BOOK and we'll dm you so you can get a copy
Sam Bhagwat tweet media
English
4.1K
450
5.1K
589K
Brian Dezhou Shen🇨🇳🇬🇧 retweetledi
Zhuang Liu
Zhuang Liu@liuzhuang1234·
New paper - Transformers, but without normalization layers (1/n)
Zhuang Liu tweet media
English
76
577
4.1K
1.3M
vLLM
vLLM@vllm_project·
👀 @vllm_project will be testing and integrating these GEMM kernels ASAP as well.
DeepSeek@deepseek_ai

🚀 Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled ✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes ✅ Supports dense layout and two MoE layouts 🔗 GitHub: github.com/deepseek-ai/De…

English
8
16
251
22.7K
Benjamin De Kraker
Benjamin De Kraker@BenjaminDEKR·
I resigned from xAI tonight. It makes me very sad, but was the right thing to do -- and here's why. xAI told me I either had to delete the post quoted below, or face being fired. After reviewing everything and thinking a lot, I've decided that I'm not going to delete the post -- which is very clearly a harmless personal opinion. Why did they tell me to remove this opinion? Well, according to them, the reason is that I acknowledged that Grok 3... exists. I wish I was joking. I'm not. That's the reason -- the fact that I wrote "Grok 3 (TBD)" is grounds for being fired. But wait, hasn't Grok 3 been officially acknowledged by xAI? Yes. Yes it has. I'll post below the official xAI blog post talking about Grok 3, along with many public Elon posts and video where it is repeatedly acknowledged. To be clear, the post they wanted me to remove is 100% just my personal opinion. I do not know where Grok 3 will stack up against other SOTA models. Hopefully it does well, I don't know. That's why it says "opinion" and "to be determined." It will probably be pretty good at some things and imperfect at others. I didn't think this was a particularly wild opinion. Again, their official demand said that even writing "Grok 3 - TBD" is somehow "confidential information." This is absolutely absurd, since it's repeatedly been acknowledged by the company and its famous CEO. Are they mad that my clearly-labeled opinion didn't guess that the still-unreleased Grok 3 will be higher? Maybe. Probably. Again, maybe it is at the top, I genuinely don't know. That's why it says "to be determined." The specific feature of Grok I spent the majority of my time working on with a really hard-working team is very cool and I hope it works extremely well for everyone. I won't say what it is because that would be **actual** confidential information. (Maybe after it comes out.) I still hope Elon and xAI win. Yet...... It's very disappointing to me that a company and leaders who supposedly champion free speech and openness would try to fire a low-level employee over a clearly-labeled opinion that contains absolutely nothing controversial, but here we are. The entire situation has been very strange. I thought about just deleting the damn thing.... But you know, once you start caving and giving up holding mild personal opinions, the slope becomes very slippery. I'll keep my speech and dignity and get another job, or build one. Catch ya on the flip side.
Benjamin De Kraker@BenjaminDEKR

The ranking currently (my opinion), for code: ChatGPT o1-pro o1 o3-mini (all kind of tied) Grok 3 (expected, tbd) Claude 3.5 Sonnet DeepSeek GPT-4o Grok 2 Gemini 2.0 Pro Series (might be higher, will probably move up)

English
3K
1.3K
21.7K
6.3M
Brendan Dolan-Gavitt
Brendan Dolan-Gavitt@moyix·
Who is serving DeepSeek R1 right now (aside from the official API)? It’s an open model so surely lots of other providers have sprung up?
English
7
1
13
4.3K
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
A new tutorial on RL by Kevin Patrick Murphy, a Research Scientist at Google DeepMind who also wrote several comprehensive, well-regarded textbooks on ML/DL. This ought to be a good read 👀
Tanishq Mathew Abraham, Ph.D. tweet media
English
18
266
2.3K
224K
Wolfram Ravenwolf
Wolfram Ravenwolf@WolframRvnwlf·
@dezhou @Alibaba_Qwen Since others are already conducting those evaluations, I focus on testing models that run effectively on my own system (48 GB VRAM). I care more about getting real, practical results than theoretical possibilities. It's what I can evaluate myself and most importantly run myself.
English
2
0
4
198
Wolfram Ravenwolf
Wolfram Ravenwolf@WolframRvnwlf·
Finished my @Alibaba_Qwen QwQ-32B-Preview benchmark (MMLU-Pro, CS category) just now – remember this is a 32B model at 8-bit EXL2 quantization that's overtaking Llama 405B and 70B, Mistral 123B, and even ChatGPT/GPT-4o in these tests!
Wolfram Ravenwolf tweet media
English
27
56
452
117.7K
Brendan Dolan-Gavitt
Brendan Dolan-Gavitt@moyix·
So we have pre-training. And now, post-training. So when exactly does the training happen???
English
2
0
23
1.6K
Nathan Lambert
Nathan Lambert@natolambert·
Question — who came up with the term “post-training?” Emerged in the last 12-18 months but I don’t know where from, and I need to know. 🙇
English
26
3
119
26.5K
Jakob Foerster
Jakob Foerster@j_foerst·
My group at Oxford (@FLAIR_Ox) is talent rich but GPU poor (both compared to industry), so adding more GPUs would be a win for open science, but is difficult to finance from grants. Does anyone have leads for possible donors? Christmas is coming up so I guess I am allow to dream
Jakob Foerster tweet media
English
47
28
572
75.9K
Victor M
Victor M@victormustar·
When going offline, what LLMs are your go-to choices? ✈️
Victor M tweet media
English
5
2
23
2.2K
Brian Dezhou Shen🇨🇳🇬🇧 retweetledi
Adina Yakup
Adina Yakup@AdinaYakup·
Exciting release from @Alibaba_Qwen 🔥 Qwen 2.5-Coder is now live on @huggingface 👉huggingface.co/collections/Qw… ✨ Apache 2.0 license ✨ 0.5B, 1.5B, 3B, 7B, 14B, 32B base & instruct ✨ 128K long context support ✨ SOTA performance on coding benchmarks
English
1
6
20
3.6K
Rohan Paul
Rohan Paul@rohanpaul_ai·
"Attention Is All You Need" paper was truly a landmark paper. However, the original "vanilla" transformers are seldom used now. The huge key upgrade is the use of RoPE, or Rotary Positional Embeddings. **Vanilla Decoder** - Input tokens -> Embeddings -> Embeddings + Positional Encoding -> Decoder Blocks **RoPE Decoder** - Input tokens -> Embeddings -> Decoder Blocks **Rotary Positional Embeddings** RoPE are used in attention blocks, which need to know token positions. Attention blocks combine information from a lot of tokens and need to know their relative positions For example, consider this sentence "It's a big thrill to climb a big mountain." "mountain" should focus more on the nearby "big." RoPE applies a rotational matrix to queries and keys, not values. If "mountain" is the 9th word, it rotates fully, while earlier words rotate less, aligning "mountain" more with the second "big." This approach is efficient as it applies positional embeddings only where needed and keeps token magnitudes unchanged. RoPE scales well to longer contexts, allowing models to be pre-trained on 4k contexts and fine-tuned for up to 4M by adjusting rotation speed.
Rohan Paul tweet media
English
9
77
466
25.6K