Yi Z.

402 posts

Yi Z. banner
Yi Z.

Yi Z.

@yz

Curious ML/DL hacker. Love cats, daydreaming, chatting with AI, and pondering over all things logical. Meta; ex: Stakefish, Twitter, Google, CMU, Bowdoin.

San Francisco, CA Katılım Eylül 2009
740 Takip Edilen2.6K Takipçiler
Yi Z. retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights Muse Spark is a new model from @Meta evaluated on Artificial Analysis. We were given early access by Meta to independently benchmark the model. It is the first frontier-class model from Meta since Llama 4 Maverick was released in April 2025, and notably the first @AIatMeta model that is not being released as open weights. The release follows Meta's reorganization of its AI efforts under Meta Superintelligence Labs, and signals that Meta is re-entering the frontier race after roughly a year of relative quiet. For context, Llama 4 Maverick and Scout scored 18 and 13 respectively on the Artificial Analysis Intelligence Index as non-reasoning models at the time of their release, while Muse Spark scores 52. Muse Spark essentially closes the gap between to the frontier in a single release. The model is not open source and is not yet accessible via an API but Meta has shared they expect this to come soon. Meta is also integrating Muse Spark into their first party products including their Meta AI chat product, Facebook, Instagram and Threads. Key takeaways from our benchmarks: ➤ Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it within the top 5 models we have benchmarked. It sits ahead of Claude Sonnet 4.6, GLM-5.1, MiniMax-M2.7, Grok 4.20 and behind Gemini 3.1 Pro Preview, GPT-5.4 and Claude Opus 4.6 ➤ Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5 (110M) ➤ Muse Spark is the second-most capable vision model we have benchmarked. It scores 80.5% on MMMU-Pro, behind only Gemini 3.1 Pro Preview (82.4%) ➤ Muse Spark performs strongly on reasoning and instruction-following evaluations. It scores 39.9% on HLE, trailing only Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (xhigh, 41.6%). The model also achieved 5th highest in CritPT with a score of 11%, an eval that is focused on difficult physics research questions. This is substantially above above Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%) ➤ Agentic performance does not stand out. On GDPval-AA, our evalaution focused on real world work tasks, Muse Spark scores 1427, behind both Claude Sonnet 4.6 at 1648 and GPT-5.4 at 1676, but ahead of Gemini 3.1 Pro Preview at 1320. On On TerminalBench Hard, Muse Spark trails Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Muse Spark joins others in achieving a high τ²-Bench Telecom score of 92% Key model details: ➤ Modalities: Multimodal including text and vision input, text output ➤ License: Proprietary, Meta's first frontier model not released as open weights ➤ Availability: No public API at the time of publishing. Meta expects to provide API access soon. Meta has started integration into their first party AI offering Meta AI and inside Facebook, Instagram, and Threads
Artificial Analysis tweet media
English
76
323
2.5K
497.2K
Yi Z.
Yi Z.@yz·
My 14-years old cat passed away on my birthday, after battling lymphoma for 8 month.
Yi Z. tweet mediaYi Z. tweet mediaYi Z. tweet mediaYi Z. tweet media
English
3
0
3
3.2K
Yi Z. retweetledi
Tiezhen WANG
Tiezhen WANG@Xianbao_QIAN·
Let’s be clear: DeepSeek r1 isn’t about who “races faster”—it’s about the inherent flaw in closed models. The future of AI isn’t owned by those who hide code or stockpile chips. It’s built on trust, and trust requires transparency. When models are black boxes, you surrender control over data privacy, culturally aligned ethics, and post-training customization for real world scenarios. That’s not leadership—it’s liability. China’s pretraining consolidation proves a simple truth: AI is becoming a commodity. The real value lies not in the model but in what you do with it. Why waste billions reinventing closed-source base models when the market craves applications that solve poverty, climate crises, or healthcare gaps? Labs clinging to secrecy risk irrelevance—like doubling down on fax machines as email took over. Consider the irony: Today, free models come from quant firms while “nonprofits” charge premiums for access. OpenAI’s pivot from “open” to walled gardens betrays the very ethos that birthed modern AI. Meanwhile, market economy principles prevail: restrict access, and replacements emerge. The lesson? Trust beats control. Open models engage developers. Closed models breed suspicion—and suspicion fuels replacement. How to “maintain leads”? Stop gatekeeping. Build open infrastructure the world trusts, like the internet. History doesn’t reward those clinging to scarcity. It rewards those who empower the many. The choice is yours.
Alexandr Wang@alexandr_wang

DeepSeek is a wake up call for America, but it doesn’t change the strategy: - USA must out-innovate &race faster, as we have done in the entire history of AI - Tighten export controls on chips so that we can maintain future leads Every major breakthrough in AI has been American

English
73
444
2K
208.9K
Yi Z.
Yi Z.@yz·
@wangtian Our team wanted to write some cool code, so we decided to get together in the arctic to do it.
English
2
0
3
0
Yi Z.
Yi Z.@yz·
Ghost town Pyramiden: once a busy USSR mining town in the arctic with a library, theatre, canteen, and swimming pool; now abandoned, with a Lenin statue watching over the glaciers in solitude.
Yi Z. tweet mediaYi Z. tweet mediaYi Z. tweet mediaYi Z. tweet media
English
2
0
10
0
Pavan Yalamanchili
Pavan Yalamanchili@pavan_ky·
@yz @wangtian Technically the first one is *min* number of bytes. You can easily store that in a 64 but integer too.
English
1
0
0
0
Yi Z.
Yi Z.@yz·
@wangtian They are on Twitter as well! > 50% of engineers I interview fail to tell me correctly how many bytes is 0xDEAD. > 80% of engineers I interview cannot calculate the decimal value of 0x01FF without Googling for a hex converter. And these are engineer candidates.
English
2
0
1
0
Tian Wang 王天
Tian Wang 王天@wangtian·
@yz Where did you find the people, teenagers on discord?
English
1
0
0
0
Yi Z.
Yi Z.@yz·
@qiqicoin @stakefish @Ledger You need to come to our Consensus booth to pick it up 😃. And I forgot to mention, f2pool or stakefish employees don't qualify for prizes!)
English
0
0
2
0
Yi Z.
Yi Z.@yz·
Would you hire an engineer who could build beautiful frontend React apps but tries to "sudo cd Desktop"? 😂
English
0
0
2
0
Yi Z.
Yi Z.@yz·
Bought a guitar with a touch screen and apps. Given that it has Wi-Fi, speakers, and a microphone, let's see if future software updates will bring a phone app. Can't wait to call my mom from a guitar. 🤔
Yi Z. tweet mediaYi Z. tweet mediaYi Z. tweet media
English
2
0
19
0
Yi Z. retweetledi
Naval
Naval@naval·
If you can buy happiness, buy it.
English
543
3.8K
22.2K
0
Juliano Siloto Assine
Juliano Siloto Assine@julianosiloto·
@yz My suggestion was using 2 encoders, image(X) and brain data(Y) and one decoder for image (X') and your loss would be something like L = MMSE(X, X')+cosine_similarity(enc1(X), enc2(Y))
English
1
0
0
0
Yi Z.
Yi Z.@yz·
The pandemic will end. 🌞
Yi Z. tweet mediaYi Z. tweet mediaYi Z. tweet mediaYi Z. tweet media
English
2
0
48
0
Yi Z. retweetledi
Wenzhe Shi 🐕🐎
Wenzhe Shi 🐕🐎@trustswz·
After a long wait, we are finally announcing the #Recsys2021 challenge! This year we are releasing around 1 billion samples, largest social network dataset by far, with an added focus on fair recommendations. Please checkout our website recsys-twitter.com for more details.
ACM RecSys@ACMRecSys

We are proud to announce the #RecSys2021 challenge! The data for this year's challenge is provided by @TwitterResearch. This year's goal is two-fold: predict different engagement types, while providing fair recommendations. More details on our website: recsys.acm.org/recsys21/chall…

English
0
30
67
0
Yi Z.
Yi Z.@yz·
@nojeshua Definitely interested. Ever since I quit my job, I found myself pondering over all these “unproductive questions”, such as origins of life/DNA, why do we get old, what’s gravity, etc. I just started reading a couple of books other people suggested on these topics.
English
0
0
0
0
Yi Z.
Yi Z.@yz·
Quarantine daydreaming: so DNA encodes information about how to interpret & replicate itself, like how to assemble proteins (e.g. helicase, primase) required for replication. This is like a writing a C compiler in C. How did the "first DNA sequence" bootstrap?
English
5
0
8
0