James G. Beldock

3.4K posts

James G. Beldock

@jbeldock

Day Job: Facebook, formerly @ShotSpotter. Night Job: SF Bay Area Immigrant. Life Job: New Yorker and lesser half of @TatyanaDB. https://t.co/jF6JNGdp0c

San Francisco, CA Beigetreten Mart 2009

1.5K Folgt730 Follower

James G. Beldock@jbeldock·2h

The work @summeryue0 and our @Meta Superintelligence Lab's Safety research teams just shared set a new high water mark for model safety, preparedness, and risk assessment. Hat's off to the whole team. #meta #AIsafety

Summer Yue@summeryue0

🚀 Muse Spark Safety & Preparedness Report for Meta AI is out. We start with our pre-deployment assessment under Meta's Advanced AI Scaling Framework, covering chemical and biological, cybersecurity, and loss of control risks. Our assessment flagged potentially elevated chem/bio risk, so we implemented safeguards and validated mitigations before deployment - bringing residual risk to within acceptable levels. Beyond the Framework, we also share findings and early explorations of model behavior (honesty, intent understanding, etc.), jailbreak robustness, eval awareness, and more. We're sharing this report to give a closer look at how we evaluate advanced AI safety. Always more work to do, and we welcome feedback from the community. ai.meta.com/static-resourc…

English

James G. Beldock retweetet

Alexandr Wang@alexandr_wang·4d

the muse spark API will be coming soon! we have been thrilled with the amount of excitement amongst developers who want to try muse spark inside their agentic harnesses stay tuned!

English

123

1.7K

146.8K

James G. Beldock retweetet

Dimitry Nakhla | Babylon Capital®@DimitryNakhla·2d

Mark Zuckerberg was asked to describe the business AI agent opportunity at $META. The answer is worth paying attention to. A few things that stood out: 💸 The end goal for advertising: any business comes with an objective and a budget — $META delivers the results. Zuck calls it “the ultimate business results machine.” 📈 Advertising has historically been ~1% of global GDP — but that includes enormous inefficiency. As it transforms into an AI-driven results machine, that share grows. 🤖 Every business will have an AI agent living in messaging platforms handling customer support and sales. Thailand and Vietnam are already the proof of concept — and they’re META’s 10th and 11th revenue countries despite ranking in the 30s by global GDP. 💬 WhatsApp, from a revenue perspective, is just getting started. ___ 🎙️ YouTube: Stripe | A conversation with Mark Zuckerberg (05/08/2025)

English

283

39.2K

James G. Beldock@jbeldock·2d

Agree with @vu0tran; it's felt like @Meta really is a startup nearly every day of my 10 years here. After being a startup CEO for 15 years (@soundthinking_ , etc.), the transition felt totally natural.

Vu Tran@vu0tran

Crazy to see the positive response to Muse Spark. I joined Meta in Dec and was surprised how startup-y MSL feels. Dec through now, through the holidays was literally nonstop building for me. Everyone on the team cares. People want to do great work

English

James G. Beldock retweetet

Rohan Paul@rohanpaul_ai·2d

Mark Zuckerberg: Most businesses will not own frontier AI in the way Meta or OpenAI does. But many will end up with something that feels like their own AI: a customized operational layer that reflects how that company actually works. He says, "OpenAI, Google, they're building an AI. But I think we're gonna have a lot of different AI systems, just like we're gonna have, we have a lot of different apps. I think in the future, every business, just like I have a website and a phone number and an email address, a social media account, is also going to have an AI that can interact with their customers to help them sell things, help them give support." --- What he is really describing that a company’s “own AI” will usually not be a frontier model trained from scratch, but a layer built on top of shared models, shaped by its products, policies, customer history, and way of working. Support, sales, and basic operations can be handled through a system that knows the business well enough to answer, route, recommend, and escalate without feeling generic. --- From 'Cleo Abram' YT channel (link in comment)

Rohan Paul@rohanpaul_ai

Meta is back. 🔥 Finally dropped its first model since Zuckerberg started writing checks like crazy. Launched Muse Spark (originally codenamed Avocado). Its a natively multimodal reasoning model that can look, reason, use tools, and split hard work across multiple cooperating agents. Claims it can reach similar capability with 10x+ less training compute than Llama 4 Maverick, They are not positioning Muse Spark as a top-of-the-line model, but is instead highlighting its efficiency and “competitive performance” on various tasks. The old bottleneck in AI is that one model often has to read, plan, call tools, and solve everything in one stream, which wastes compute and slows hard tasks. The key idea here is multi-agent orchestration, where several copies of the model work on the same problem in parallel and then compare or merge results, which is closer to a small team than a single assistant. That changes the scaling story because better performance no longer comes only from making 1 model bigger, but also from spending compute more intelligently at run time. So Muse is a stack built around 3 scaling axes: stronger pretraining for basic world and code understanding, steadier RL for improving answers after pretraining, and test-time reasoning so the model spends extra compute only when a problem is hard. The most interesting part is multi-agent orchestration, where several copies of the model reason in parallel and compare work, which raised Humanity’s Last Exam to 58% and FrontierScience Research to 38% in its heavier Contemplating mode. Meta also says the new pretraining recipe reaches similar capability with over 10x less compute than Llama 4 Maverick, which matters because cheaper training usually means faster iteration and more room to scale.

English

829

425.1K

James G. Beldock retweetet

François Fleuret@francoisfleuret·2d

Wow, muse spark passes *my* test with flying colors!

Alexandr Wang@alexandr_wang

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

English

121

29.6K

James G. Beldock@jbeldock·3d

@LoveHelloMolly Need your help! My daughter has struggled 7+ weeks to get return credit for goods returned to you (pristine, < 20 days). Emailed your info@ many times; gets the run-around. This seems inconsistent with the mission of the "Customer Love Support Team" you are proud of. Help!

English

James G. Beldock@jbeldock·3d

@JackTripleU You have a LOT to be proud of, @JackTripleU . You and your team are amazing.

English

700

James G. Beldock retweetet

Jack Wu@JackTripleU·3d

Contemplating mode is rolling out slowly, everyone will get a chance to try it soon!

Mostly Borrowed Ideas@borrowed_ideas

Not usually a Meta AI user, but wanted to give them a shot after the latest model release (it's free anyway). So I installed the app on my desktop, and noticed "contemplating" mode (didn't see that on the mobile app btw). When I asked a question, 16 agents simultaneously started working on the question which looks pretty cool!

English

160

57.4K

James G. Beldock retweetet

Arena.ai@arena·4d

Meta is back in the Arena! Muse Spark debuts as a top frontier model across both Text and Vision: - Text Arena: #3 tied with Gemini-3.1-Pro and Claude-Opus-4.6 - Vision Arena: #2 tied with Claude-Opus-4.6 This marks Meta’s first major release since early 2025. Highlights: - #4 Hard Prompts, #6 Coding, #9 Creative Writing, #10 Instruction Following, #27 Expert - #3 tied for Business, Management, & Financial Ops, #7 Legal & Government, #12 Writing & Literature Meta is back at the frontier. Huge congrats to @AIatMeta on this incredible milestone!

AI at Meta@AIatMeta

Introducing Muse Spark, the first in the Muse family of models developed by Meta Superintelligence Labs. Muse Spark is a natively multimodal reasoning model with support for tool-use, visual chain of thought, and multi-agent orchestration. Muse Spark is available today at meta.ai and the Meta AI app. We’re also making it available in private preview via API to select partners, and we hope to open-source future versions of the model. Learn more: go.meta.me/43ea00

English

801

145.4K

James G. Beldock retweetet

Alexandr Wang@alexandr_wang·3d

okay this is too exciting :) meta AI is now #2 in the app store, top AI app! we are so back!

English

262

117

2.2K

325.2K

James G. Beldock retweetet

Tornado guy@fanofaliens·5d

This is the most insane magical model I've ever used I just provided some AI generated images and fed it into the chat and told me to generate a game It used the image as a billboard and gave it some actions And it did this Really great

Alexandr Wang@alexandr_wang

English

35.8K

James G. Beldock retweetet

Pietro Schirano@skirano·6d

Ok this is actually pretty impressive and I truly didn't see any model doing this before or being able to do it to this extent. When I asked Muse Spark from Meta to convert this image into code, it cut out the assets from the screens so it could use them correctly!

English

870

152.7K

James G. Beldock retweetet

Alexandr Wang@alexandr_wang·5d

Meta AI is up to #6 in the App Store overnight, and still growing :) Also who knew the 7-Eleven app was so popular

English

206

2.2K

264.8K

James G. Beldock retweetet

Alexandr Wang@alexandr_wang·5d

cool to see people finding new emergent capabilities within Muse Spark!

Nainish Rai@Nain1sh

That clip is the part most people will underestimate. Image-to-code was already impressive What Meta’s Muse Spark seems to be doing is one level higher: it’s not just recreating pixels, it’s inferring product logic. I gave it a calendar screenshot and I am blown

English

348

37.7K

James G. Beldock retweetet

Alexandr Wang@alexandr_wang·5d

muse spark is impressively multimodal!

Pietro Schirano@skirano

Visual AGI is here btw

English

409

35.8K

James G. Beldock retweetet

Ritesh@treadon·6d

VERDICT: Meta Muse Spark is the REAL DEAL I ran several tests, including reading a menu. Newly released Meta Muse Spark was on the ONLY frontier AI to get all the items correct. Sorry ChatGPT, there is no "Slapped Wagyu Dog" on the menu💀. riteshkhanna.com/blog/muse-spar…

English

125

23.3K

James G. Beldock retweetet

Leon Lin@LexnLin·6d

I tried to recreate the Meta AI animation using different AI models 1/ (top left) original 2/ (top right) Gemini 3.1 Pro High 3/ (bottom left) Muse Spark 4/ (bottom right) ChatGPT 5.4 Thinking Extended for this round, muse spark won

English

107

17.9K

James G. Beldock@jbeldock·6d

Benchmarks for @Meta Superintelligence Labs Muse Spark, the first model from our new lab. Exciting and only 9 months into the journey.

Shengjia Zhao@shengjia_zhao

Excited to share what we’ve been building at Meta Superintelligence Labs! We just released Muse Spark, our first AI model. It's a natively multimodal reasoning model and the first step on our path to personal superintelligence. We've overhauled our entire stack to support scaling, and this is just the beginning. ai.meta.com/blog/introduci…

English

James G. Beldock@jbeldock·6d

Exciting day for our team at @Meta Superintelligence Labs.

Artificial Analysis@ArtificialAnlys

Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights Muse Spark is a new model from @Meta evaluated on Artificial Analysis. We were given early access by Meta to independently benchmark the model. It is the first frontier-class model from Meta since Llama 4 Maverick was released in April 2025, and notably the first @AIatMeta model that is not being released as open weights. The release follows Meta's reorganization of its AI efforts under Meta Superintelligence Labs, and signals that Meta is re-entering the frontier race after roughly a year of relative quiet. For context, Llama 4 Maverick and Scout scored 18 and 13 respectively on the Artificial Analysis Intelligence Index as non-reasoning models at the time of their release, while Muse Spark scores 52. Muse Spark essentially closes the gap between to the frontier in a single release. The model is not open source and is not yet accessible via an API but Meta has shared they expect this to come soon. Meta is also integrating Muse Spark into their first party products including their Meta AI chat product, Facebook, Instagram and Threads. Key takeaways from our benchmarks: ➤ Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it within the top 5 models we have benchmarked. It sits ahead of Claude Sonnet 4.6, GLM-5.1, MiniMax-M2.7, Grok 4.20 and behind Gemini 3.1 Pro Preview, GPT-5.4 and Claude Opus 4.6 ➤ Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5 (110M) ➤ Muse Spark is the second-most capable vision model we have benchmarked. It scores 80.5% on MMMU-Pro, behind only Gemini 3.1 Pro Preview (82.4%) ➤ Muse Spark performs strongly on reasoning and instruction-following evaluations. It scores 39.9% on HLE, trailing only Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (xhigh, 41.6%). The model also achieved 5th highest in CritPT with a score of 11%, an eval that is focused on difficult physics research questions. This is substantially above above Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%) ➤ Agentic performance does not stand out. On GDPval-AA, our evalaution focused on real world work tasks, Muse Spark scores 1427, behind both Claude Sonnet 4.6 at 1648 and GPT-5.4 at 1676, but ahead of Gemini 3.1 Pro Preview at 1320. On On TerminalBench Hard, Muse Spark trails Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Muse Spark joins others in achieving a high τ²-Bench Telecom score of 92% Key model details: ➤ Modalities: Multimodal including text and vision input, text output ➤ License: Proprietary, Meta's first frontier model not released as open weights ➤ Availability: No public API at the time of publishing. Meta expects to provide API access soon. Meta has started integration into their first party AI offering Meta AI and inside Facebook, Instagram, and Threads

English

Entdecken

@summeryue0 @Meta @vu0tran @soundthinking_ @LoveHelloMolly @JackTripleU @AIatMeta @elonmusk