Yi Z.

402 posts

Yi Z.

@yz

Curious ML/DL hacker. Love cats, daydreaming, chatting with AI, and pondering over all things logical. Meta; ex: Stakefish, Twitter, Google, CMU, Bowdoin.

San Francisco, CA Katılım Eylül 2009

740 Takip Edilen2.6K Takipçiler

Yi Z. retweetledi

Artificial Analysis@ArtificialAnlys·8 Nis

Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights Muse Spark is a new model from @Meta evaluated on Artificial Analysis. We were given early access by Meta to independently benchmark the model. It is the first frontier-class model from Meta since Llama 4 Maverick was released in April 2025, and notably the first @AIatMeta model that is not being released as open weights. The release follows Meta's reorganization of its AI efforts under Meta Superintelligence Labs, and signals that Meta is re-entering the frontier race after roughly a year of relative quiet. For context, Llama 4 Maverick and Scout scored 18 and 13 respectively on the Artificial Analysis Intelligence Index as non-reasoning models at the time of their release, while Muse Spark scores 52. Muse Spark essentially closes the gap between to the frontier in a single release. The model is not open source and is not yet accessible via an API but Meta has shared they expect this to come soon. Meta is also integrating Muse Spark into their first party products including their Meta AI chat product, Facebook, Instagram and Threads. Key takeaways from our benchmarks: ➤ Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it within the top 5 models we have benchmarked. It sits ahead of Claude Sonnet 4.6, GLM-5.1, MiniMax-M2.7, Grok 4.20 and behind Gemini 3.1 Pro Preview, GPT-5.4 and Claude Opus 4.6 ➤ Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5 (110M) ➤ Muse Spark is the second-most capable vision model we have benchmarked. It scores 80.5% on MMMU-Pro, behind only Gemini 3.1 Pro Preview (82.4%) ➤ Muse Spark performs strongly on reasoning and instruction-following evaluations. It scores 39.9% on HLE, trailing only Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (xhigh, 41.6%). The model also achieved 5th highest in CritPT with a score of 11%, an eval that is focused on difficult physics research questions. This is substantially above above Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%) ➤ Agentic performance does not stand out. On GDPval-AA, our evalaution focused on real world work tasks, Muse Spark scores 1427, behind both Claude Sonnet 4.6 at 1648 and GPT-5.4 at 1676, but ahead of Gemini 3.1 Pro Preview at 1320. On On TerminalBench Hard, Muse Spark trails Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Muse Spark joins others in achieving a high τ²-Bench Telecom score of 92% Key model details: ➤ Modalities: Multimodal including text and vision input, text output ➤ License: Proprietary, Meta's first frontier model not released as open weights ➤ Availability: No public API at the time of publishing. Meta expects to provide API access soon. Meta has started integration into their first party AI offering Meta AI and inside Facebook, Instagram, and Threads

English

323

2.5K

497.2K

Yi Z.@yz·16 Haz

My 14-years old cat passed away on my birthday, after battling lymphoma for 8 month.

English

3.2K

Yi Z. retweetledi

Tiezhen WANG@Xianbao_QIAN·28 Oca

Let’s be clear: DeepSeek r1 isn’t about who “races faster”—it’s about the inherent flaw in closed models. The future of AI isn’t owned by those who hide code or stockpile chips. It’s built on trust, and trust requires transparency. When models are black boxes, you surrender control over data privacy, culturally aligned ethics, and post-training customization for real world scenarios. That’s not leadership—it’s liability. China’s pretraining consolidation proves a simple truth: AI is becoming a commodity. The real value lies not in the model but in what you do with it. Why waste billions reinventing closed-source base models when the market craves applications that solve poverty, climate crises, or healthcare gaps? Labs clinging to secrecy risk irrelevance—like doubling down on fax machines as email took over. Consider the irony: Today, free models come from quant firms while “nonprofits” charge premiums for access. OpenAI’s pivot from “open” to walled gardens betrays the very ethos that birthed modern AI. Meanwhile, market economy principles prevail: restrict access, and replacements emerge. The lesson? Trust beats control. Open models engage developers. Closed models breed suspicion—and suspicion fuels replacement. How to “maintain leads”? Stop gatekeeping. Build open infrastructure the world trusts, like the internet. History doesn’t reward those clinging to scarcity. It rewards those who empower the many. The choice is yours.

Alexandr Wang@alexandr_wang

DeepSeek is a wake up call for America, but it doesn’t change the strategy: - USA must out-innovate &race faster, as we have done in the entire history of AI - Tighten export controls on chips so that we can maintain future leads Every major breakthrough in AI has been American

English

444

208.9K

Yi Z. retweetledi

Philipp Schmid@_philschmid·4 May

Introducing StarCoder ⭐️ a 15B open-source Code-LLM created by @huggingface and @ServiceNow through @BigCodeProject 🔡 8192 token context window 📊 trained on 1 trillion token 💭 80+ Programming languages 🔐 only permissive licensed data ✅ commercial use huggingface.co/bigcode/starco…

English

311

39K

Yi Z.@yz·17 Ağu

@wangtian Our team wanted to write some cool code, so we decided to get together in the arctic to do it.

English

Tian Wang 王天@wangtian·17 Ağu

@yz Why are you here..

English

Yi Z.@yz·16 Ağu

Ghost town Pyramiden: once a busy USSR mining town in the arctic with a library, theatre, canteen, and swimming pool; now abandoned, with a Lenin statue watching over the glaciers in solitude.

English

Yi Z.@yz·2 Ağu

@pavan_ky @wangtian Agree.

English

Pavan Yalamanchili@pavan_ky·1 Ağu

@yz @wangtian Technically the first one is *min* number of bytes. You can easily store that in a 64 but integer too.

English

Yi Z.@yz·1 Ağu

@wangtian They are on Twitter as well! > 50% of engineers I interview fail to tell me correctly how many bytes is 0xDEAD. > 80% of engineers I interview cannot calculate the decimal value of 0x01FF without Googling for a hex converter. And these are engineer candidates.

English

Tian Wang 王天@wangtian·1 Ağu

@yz Where did you find the people, teenagers on discord?

English

Yi Z. retweetledi

Chun@satofishi·9 Haz

.@f2pool and @stakefish team heading to #Consensus2022.

Austin, TX 🇺🇸 English

Yi Z.@yz·6 Haz

@qiqicoin @stakefish @Ledger You need to come to our Consensus booth to pick it up 😃. And I forgot to mention, f2pool or stakefish employees don't qualify for prizes!)

English

Yi Z.@yz·12 Nis

Staking & relaxing with @stakefish

English

Yi Z.@yz·9 Mar

Would you hire an engineer who could build beautiful frontend React apps but tries to "sudo cd Desktop"? 😂

English

Yi Z.@yz·1 Şub

Bought a guitar with a touch screen and apps. Given that it has Wi-Fi, speakers, and a microphone, let's see if future software updates will bring a phone app. Can't wait to call my mom from a guitar. 🤔

English

Yi Z.@yz·8 Ara

@squarecog @__lucab @satanjeev @cayley @kevinweil @sritchie @posco @niels @thesteggie 😂 Wow! You still remember that incident.

English

Dmitriy Ryaboy 🇺🇸🇮🇱@squarecog·8 Ara

@__lucab @satanjeev @cayley @kevinweil @sritchie @posco @niels @thesteggie @yz The one where all batch jobs started recomputing everything since the dawn of time? If so, that one was later and also amazing.

English

Yi Z. retweetledi

Square@Square·2 Ara

We’re changing our company name so we can give the full @Square brand to our Seller business. So now we need a name to tie @Square, @CashApp, @TIDAL, and @TBD54566975 together into one. That name is “Block.” Why?

Block@blocks

Block is @Square, @CashApp, @spiralbtc, @TIDAL, @TBD54566975, and our foundational teams who support them. We’re here to build simple tools to increase access to the economy. block.xyz

English

363

1.2K

5.3K

Yi Z. retweetledi

Naval@naval·17 Eyl

If you can buy happiness, buy it.

English

543

3.8K

22.2K

Yi Z.@yz·3 Haz

@julianosiloto Thanks for the ideas!

English

Juliano Siloto Assine@julianosiloto·3 Haz

@yz My suggestion was using 2 encoders, image(X) and brain data(Y) and one decoder for image (X') and your loss would be something like L = MMSE(X, X')+cosine_similarity(enc1(X), enc2(Y))

English

Yi Z.@yz·27 Mar

The pandemic will end. 🌞

English

Yi Z. retweetledi

Wenzhe Shi 🐕🐎@trustswz·23 Mar

After a long wait, we are finally announcing the #Recsys2021 challenge! This year we are releasing around 1 billion samples, largest social network dataset by far, with an added focus on fair recommendations. Please checkout our website recsys-twitter.com for more details.

ACM RecSys@ACMRecSys

We are proud to announce the #RecSys2021 challenge! The data for this year's challenge is provided by @TwitterResearch. This year's goal is two-fold: predict different engagement types, while providing fair recommendations. More details on our website: recsys.acm.org/recsys21/chall…

English

Yi Z.@yz·7 Mar

@nojeshua Definitely interested. Ever since I quit my job, I found myself pondering over all these “unproductive questions”, such as origins of life/DNA, why do we get old, what’s gravity, etc. I just started reading a couple of books other people suggested on these topics.

English

Yi Z.@yz·7 Mar

Quarantine daydreaming: so DNA encodes information about how to interpret & replicate itself, like how to assemble proteins (e.g. helicase, primase) required for replication. This is like a writing a C compiler in C. How did the "first DNA sequence" bootstrap?

English

Keşfet

@Meta @AIatMeta @huggingface @ServiceNow @BigCodeProject @wangtian @pavan_ky @f2pool