Chao Wang

8K posts

Chao Wang

Chao Wang

@excel_wang

Associate Professor in health and social care statistics at Kingston University. PhD in econometrics.

London, England Inscrit le Mayıs 2014
405 Abonnements1.6K Abonnés
Tweet épinglé
Chao Wang
Chao Wang@excel_wang·
Great that the data & code for Bangladesh mask RCT has been released gitlab.com/emily-crawford…. I tried to run their code and it seems there are only very small differences to what was reported in the paper.
English
8
64
187
0
Chao Wang
Chao Wang@excel_wang·
@EpochAIResearch The top four models likely don’t have any statistically significant difference, given the substantial overlap in their confidence intervals.
English
0
0
0
6
Epoch AI
Epoch AI@EpochAIResearch·
GPT-5.5 Pro achieves a new high score of 159 on the Epoch Capabilities Index! ECI is our statistical tool that combines multiple benchmarks into a unified scale.
Epoch AI tweet media
English
22
90
784
145K
Chao Wang
Chao Wang@excel_wang·
On the other hand, @EpochAIResearch's "capability" seems more promising. Here is the technical paper arxiv.org/abs/2512.00193. I haven't fully read the paper yet but it says it uses a method that is similar to IRT model.
English
0
0
0
19
Chao Wang
Chao Wang@excel_wang·
@thdxr Unlike the IRT model, the weights used in calculating the AA Intelligence Index are quite arbitrary (4 big categories each 25%; sub categories given predetermined ratios). I know which one to trust more.
English
0
0
0
283
Chao Wang
Chao Wang@excel_wang·
@uwunetes Did you even see the figure??? It is more intelligent but cheaper than DeepSeek Pro. Why would I use DeepSeek over Grok? You have a supercomputer and want to run this big model locally?
English
0
0
0
47
addison
addison@uwunetes·
xai is the most unserious US lab lmao why would u ever release this? its a closed source model worse than open source models like why would i use this over deepseek or kimi
Artificial Analysis@ArtificialAnlys

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

English
58
4
238
30.8K
Chao Wang
Chao Wang@excel_wang·
GPT 5.5 now available on Microsoft 365 Copilot.
Chao Wang tweet media
Français
0
0
0
34
Chao Wang
Chao Wang@excel_wang·
@MatthewBerman What’s the practical benefit of a “open” model (just open weight as a neutral network model is a black box) for most people? Run the model locally? Run the distillation process yourself?
English
0
0
1
44
Matthew Berman
Matthew Berman@MatthewBerman·
Demis says he wants to see a Western open source AI stack and that we’re losing to China. He also says Google doesn’t have enough compute to build two frontier (open and closed) models, which is why Gemma is a smaller family of models. Watch this incredible clip. Shout out @ycombinator and @garrytan for the fantastic interview.
Matthew Berman@MatthewBerman

American open source AI is in trouble. China is eating our lunch. This is a bigger problem than people realize.

English
90
146
1.4K
293.1K
spicylemonade
spicylemonade@spicey_lemonade·
Gemini 3.1 is in the top 3 of almost every main benchmark, yet no one uses it. I think vibecodebench, swe atlas, and AA agent index are well calibrated.
spicylemonade tweet mediaspicylemonade tweet mediaspicylemonade tweet media
English
66
10
499
66.2K
Chao Wang
Chao Wang@excel_wang·
@EndWokeness Rumour has it they became more open to the idea of a king after hearing about what happened to Charles I following his clash with Parliament.
English
0
1
0
20
End Wokeness
End Wokeness@EndWokeness·
"NO KINGS" crowd greets King Charles with a standing ovation
English
6.1K
27.9K
166K
4.8M
Chao Wang
Chao Wang@excel_wang·
@SenAshleyMoody They clapped after hearing Charles promised he would not rule America.
English
0
0
0
27
Senator Ashley Moody
Senator Ashley Moody@SenAshleyMoody·
Why did I just watch every Democrat in Congress stand and clap for an actual King? 🤔
Senator Ashley Moody tweet mediaSenator Ashley Moody tweet media
English
2.5K
2.4K
11K
392.9K
Chao Wang retweeté
Acyn
Acyn@Acyn·
Standing ovation for this line from King Charles: The U.S. Supreme court historical society has calculated that Magna Carta is cited in at least 160 supreme court cases since 1789, not least as the foundation of the principle that executive power is subject to checks and balances.
English
395
4.7K
22.2K
1.5M
Jen Zhu
Jen Zhu@jenzhuscott·
5. Massive price disadvantages compared to 🇨🇳 competitors 6. Elon (xAI and lawsuit) 7. Microsoft stops rev sharing + indigenous efforts and platform hedging (note MSFT’s recent $5bn investment in Anthropic) 8. Disruptive startups pursuing orthogonal approaches like Ilya’s SSI 9. Compute shortfalls (if data center buildout gets delayed by input bottlenecks, regulatory hurdles, public backlash, etc) 10. Massive burn rate + increasing competitions What did I miss? The question is not if, it’s when.
Jen Zhu tweet media
English
8
5
21
2K
Jen Zhu
Jen Zhu@jenzhuscott·
A few tough facts OpenAI is facing: 1. Anthropic leads in critical coding capabilities. 2. Anthropic’s overall strengths in enterprise 3. Gemini’s consumer growth at expense of ChatGPT 4. Threat from high quality Open Source models from China 🧵
Jen Zhu tweet mediaJen Zhu tweet mediaJen Zhu tweet mediaJen Zhu tweet media
English
21
27
130
15.9K
Chao Wang retweeté
Chao Wang retweeté
Terence Shen
Terence Shen@Terenceshen·
Is Mark Zuckerberg the most desperate tech tycoon in the world? Learned Mandarin, jogged through Tiananmen Square, read Xi Jinping's book, even asked Xi to name his baby, got rejected. Hosted Chinese officials at Facebook, tried to re-enter China, got rejected. Built China-friendly censorship tools, tested a China-only app, got rejected. Now trying to buy a Singapore AI firm founded by some Chinese… still getting rejected.
Terence Shen tweet mediaTerence Shen tweet media
English
54
70
368
47.6K
Chao Wang retweeté
Denise Wu
Denise Wu@denisewu·
I feel sorry for Chinese engineers who realize their intellectual property belongs to the state, not to them, after the Manus order. 🥲
English
171
36
393
23.7K