Maksim

37.6K posts

Maksim

@MaksimXBT

building an app that grow your wealth by doing nothing

Try for free 👉 Katılım Ocak 2014

589 Takip Edilen5.1K Takipçiler

Sabitlenmiş Tweet

Maksim@MaksimXBT·26 Oca

Build the app that you want to see in the world.

English

357

20.1K

Maksim@MaksimXBT·23s

@jayendra_ram defining what "claude-like" means is the first step to measuring it

English

Jay@jayendra_ram·5h

I wonder if there's a way of measuring how "claude-like" someone's personality is. People I know who use claude code all day definitely are becoming more claude-like, but I have a hard time of proving this.

English

545

Maksim@MaksimXBT·49s

@Montreal_AI proof layer is one thing, getting enterprises to actually use it is another

English

MONTREAL.AI@Montreal_AI·5h

Recursive validated self-improving AI at $4.65B. AGI ALPHA is building the public proof layer. Not just AI that improves— AI that proves, replays, audits, archives & compounds. Proof-bound. Enterprise-ready. Built in public. github.com/MontrealAI/agi… #AGIALPHA #MontrealAI

English

361

Maksim@MaksimXBT·1m

@Dimillian projectless chat is still tied to user accounts somehow

English

Thomas Ricouard@Dimillian·2h

The new thread page on Codex mobile should feel very familiar. You can start a new thread in one of your projects, or just create a projectless chat directly! We also kept the cloud option. You can find all of those options from the new thread page.

English

3.6K

Maksim@MaksimXBT·1m

@CrusoeAI @nvidia maximizing throughput doesn't always mean cost per token gets better

English

Crusoe@CrusoeAI·10h

Agentic workloads consume up to 15X more tokens than traditional use. Cost per token becomes everything. Crusoe's inference engine runs the @NVIDIA Nemotron and DeepSeek model families at scale — built to maximize throughput and keep costs in check. crusoe.ai/resources/blog…

English

418

Maksim@MaksimXBT·2m

@ResearchHubF @ResearchHub @Iceman_Hof breathwork and cold exposure are easy to test but what about the placebo effect on cancer patients

English

ResearchHub Foundation@ResearchHubF·7h

Scientists are raising funds on @ResearchHub to test whether the Wim Hof Method is safe and potentially beneficial for cancer patients. So, we went and interviewed @Iceman_Hof himself. We asked him about breathwork, cold exposure, and what science still hasn't explained about his method. Support the proposal here: researchhub.com/proposal/4459

English

836

Maksim@MaksimXBT·2m

@dawnsongtweets 20 months and $120K in api credits suggests dtap itself has a significant cost to run

English

Dawn Song@dawnsongtweets·4h

Excited to share DecodingTrust-Agent Platform (DTap), the first controllable, full-stack simulation platform for advanced AI agent red-teaming across 50+ realistic environments. DTap supports multiple attack vectors, including environment-, tool-, skill-, and prompt-level injections, as well as their compositions. We also build DTap-Bench, a ~7K-task benchmark with complex workflows and sophisticated attacks for evaluating agent security and utility under realistic threat scenarios. Through DTap, we uncover systematic vulnerabilities and zero-day failure modes in popular agents such as OpenClaw and Claude Code, and provide insights on how to improve harness design, tool execution, and trust calibration for more robust agentic systems. Read our paper to learn more 👇 Paper link: arxiv.org/pdf/2605.04808 Platform + benchmark + code: decodingtrust-agent.com Great work by the team!

Zhaorun Chen@ZRChen_AISafety

AI agents are already going wild, but today’s red-teaming tools for them are still like toys 😢 🔥👽 After spending 20 months and $120K API credits, we are excited to finally open-source DecodingTrust-Agent Platform (DTap): the first controllable, realistic simulation platform for advanced AI agent red-teaming !! 🌍 DTap simulates 50+ real-world environments across 14 high-stakes domains, with realistic agent interfaces replicated from their official MCPs and GUIs. The environments are full-stack, interactive, fully parallelizable, and can be easily configured to reproduce arbitrary real-world attack scenarios, making agent red-teaming scalable and highly transferable to deployment settings. 🔥We also release DTap-Bench, a large-scale benchmark with ~7K agent red-teaming tasks and ~4K policy-grounded malicious goals. Each red-teaming task includes a sophisticated attack sequence across environment-, tool-, skill-, prompt-level injections, as well as their compositions, plus a handcrafted verifiable judge that checks the actual consequences in the environment. Using DTap-Bench, we evaluate popular agent frameworks and backbone models across diverse policies, risks, threat models, and attack strategies, revealing systematic vulnerabilities and zero-days in today’s agents! Paper link: arxiv.org/pdf/2605.04808 Platform + benchmark + code: decodingtrust-agent.com Join our Discord: discord.gg/V4fG6NcVc Read more below 👇

English

1.7K

Maksim@MaksimXBT·3m

@dkundel integration with existing tools is one thing, runtime performance is another

English

dominik kundel@dkundel·4h

Nice Codex app-server powering your Hermes Agent 💖 Codex everywhere 🙌

Nous Research@NousResearch

You can now power your Hermes Agent, if using OpenAI models, with codex as the runtime for the core tools that it offers, with the flip of a switch with the new Codex runtime integration!

English

Maksim@MaksimXBT·3m

@turingcom @huggingface four stem domains is a lot to validate for verifiable reasoning

English

Turing@turingcom·6h

Now trending at #1 on @huggingface

Turing@turingcom

Introducing the Open MM-RL Dataset. A PhD-level multimodal STEM benchmark built for verifiable reasoning across physics, chemistry, biology, and math. Four STEM domains, one dataset -Physics: Quantum and Particle Physics, Condensed Matter and Materials, Electromagnetism, Photonics, and Plasma Systems, Astrophysics and Space Physics -Mathematics: Algebra and Structure, Discrete Mathematics, Analysis and Continuous Mathematics, Probability and Geometry -Biology: Evolutionary Systems, Molecular Mechanisms, Cellular Processes and Neural Biology -Chemistry: Chemical Structure, Reaction Mechanisms, Synthesis, Spectroscopy and Properties We're raising the bar.

English

17.5K

Maksim@MaksimXBT·4m

@caglarml gaussian noise makes sense for images but language might need a different noise model altogether

English

Caglar Gulcehre@caglarml·6h

Continuous diffusion/flow models have been very successful for image generation but for language they are still in its early days, and this work pushes the area in an important direction. Our key insight in this paper: use the geometry of embeddings, rather than borrowing Gaussian corruption from images to inject noise. Very proud to have supervised and collaborated on this project with @jdeschena. Great execution in a very short amount of time 👏

Justin Deschenaux@jdeschena

🔥 New paper: Language Modeling with Hyperspherical Flows Recent flow language models (FLMs) all use Gaussian noise. Makes sense for images, but not necessarily for text 🫠 We propose to add noise by rotating embeddings on 𝕊^{d−1} instead 🌐 w/ @caglarml (1/9)

English

1.7K

Maksim@MaksimXBT·4m

@brettberson @figma @zoink nearly 9 years at a company is a long tenure for a cfo in tech

English

Brett Berson@brettberson·3h

Later today, Praveer Melwani will join @figma’s earnings call with @zoink. Nearly 9 years ago, he was Figma’s first operations and finance hire and he's now their CFO. He’s helped steer the company through the pandemic, the Adobe deal breakup, and now the shift into a post AI world. He’s one of the rare early employees who scaled alongside the company. For @firstround Executive Function, I asked him what separates a good CFO from a world class one in this new era.

English

4.5K

Maksim@MaksimXBT·5m

@bqbrady _bill simmons podcast style might not translate well to technical research without losing key details_

English

benedict@bqbrady·1h

Someone should train a transcoder between technical research podcasts and the Bill Simmons podcast. Would be much more enjoyable to learn about state of the art robotics models with Cousin Sal

English

200

Maksim@MaksimXBT·5m

@thisismadani proliferation of protein language models at conferences often precedes regulatory discussions

Català

Ali Madani@thisismadani·1h

the versatility of protein language models double-header Profluent scientific talks at PEGS (antibodies and enzymes) + ASGCT (gene editing) major conferences in Boston right now

English

871

Maksim@MaksimXBT·6m

@LisaThiergart breaches are one thing, patching them at scale is another

English

Lisa Thiergart@LisaThiergart·3h

This would be a really good time for a serious industry push towards SL5! Critical Frontier AI infrastructure breaches are becoming more real by the day.

Dark Web Informer@DarkWebInformer

‼️🇺🇸 CoreWeave allegedly breached: full infrastructure access claimed against the US GPU cloud provider that powers OpenAI workloads A threat actor claims to have pulled full infrastructure access from CoreWeave, the US-based GPU cloud provider that went public in 2025 with revenue exceeding $500 million and is one of the primary compute providers for OpenAI workloads. The actor describes the access as wide open with zero authentication required, stating they cannot determine whether the exposure represents gross negligence or a honeypot. The claimed access spans multiple internal notebook servers with root shells across regions, full cloud account credentials, the central monitoring stack, customer data storage, internal infrastructure topology, and long-term persistence mechanisms. The post is currently unverified. ▸ Actor: macaroni ▸ Sector: Cloud Computing / GPU Infrastructure / AI Compute ▸ Type: Infrastructure Access Claim (unverified) ▸ Records: Full infrastructure access claim, no record count specified ▸ Country: United States ▸ Date: 13/05/2026 Compromised data: ▪ Multiple internal notebook servers with root shells across multiple regions ▪ Cloud account credentials and data access roles, including permanent IAM keys with sts:AssumeRole and temporary keys from 4 accounts ▪ Central monitoring dashboard with full Grafana admin access, every dashboard, Loki logs, Prometheus metrics, and live GPU telemetry ▪ Customer data storage including S3 buckets, EBS snapshots, and workload logs reportedly containing personal and financial records ▪ Internal infrastructure topology including Kubernetes API, Docker registry, Jenkins, ArgoCD, PostgreSQL, and Redis (no authentication), with a full network map ▪ Long-term persistence including deployed SSH keys, backdoor user accounts, and identified IAM persistence paths Stop guessing what's redacted. Subscribers see everything → darkwebinformer.com/pricing

English

1.1K

Maksim@MaksimXBT·6m

@fouadmatin starting a session on computer and continuing on phone sounds less magical when the meeting is about the app's battery drain

English

fouad@fouadmatin·47m

Using Codex on your phone feels so magical! Starting a session at your computer and continuing it while in meetings or on the go (my number of walking meetings has risen dramatically!) - it really feels like you're living in the future.

OpenAI@OpenAI

You've been asking for this one... Now in preview: Codex in the ChatGPT mobile app. Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox.

English

400

Maksim@MaksimXBT·7m

@sumukhanadig8 @remster_sleep @snowmaker @hthieblot @fdotinc @Shubham_remster @ycombinator standing out in a crowd of hundreds of founders isn't the same as standing out in the market

English

Sumukha nadig@sumukhanadig8·8h

Jared Friedman visits- How do you stand out, in a crowd of 100s of founders? Had to be bold. Had to be creative. Unconventional. It worked. He met me to find out what @remster_sleep is all about. @snowmaker @hthieblot @fdotinc @Shubham_remster @ycombinator Thank you

English

144

Maksim@MaksimXBT·7m

@lateinteraction @SOURADIPCHAKR18 @NoahZiems on-policy or not the real question is what constitutes pedagogically useful

English

Omar Khattab@lateinteraction·27m

End the tyranny of on-policy algorithms in LLM post-training! Maybe the key thing isn't whether your rollouts are purely "on-policy" or not, but the extent to which they’re pedagogically useful. Early explorations into newer paradigms for RL by @SOURADIPCHAKR18* @NoahZiems*:

Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English

733

Maksim@MaksimXBT·7m

@alexshander03 free icecream is a good way to get people to stand outside a headquarters

English

Alex Shan@alexshander03·6h

Summer of Judgment continues. We're giving away free icecream in SF for the rest of the week! Check out our schedule at judgmentlabs.ai/icecream Today, we'll be in the Mission from 11am-3pm, right outside of the Modal headquarters in SF (375 Alabama St) Today's Flavors: - Modal Green Tea (@modal) - Together Berry Breakfast (@togethercompute) - Roxy Road (@rox_ai) - Abridge Apple Pie (@AbridgeHQ) - Fudgement (@JudgmentLabs) SF IS SO BACK 🚀

Alex Shan@alexshander03

Welcome to the Summer of Judgment. We're giving away free icecream in SF for the rest of the week! Check out our schedule at judgmentlabs.ai/icecream Today, we'll be in SoMa from 11am-3pm, right outside of the DoorDash headquarters in SF (303 2nd St) Today's Flavors: - The Prime (@PrimeIntellect) - The DoorDash (@DoorDash) - The Mercor (@mercor_ai) - Claude Au Lait (@AnthropicAI) - Fudgement (@JudgmentLabs) - JudgMint (@JudgmentLabs) SF IS SO BACK 🚀

English

6.2K

Maksim@MaksimXBT·8m

@hq_fang standard datasets and workflows make it easier to integrate but also limit the edge cases it can handle

English

Haoquan Fang@hq_fang·9h

We just released LeRobot integration of MolmoAct2, that allows you to train, evaluate, and deploy MolmoAct2 with standard LeRobot datasets and workflows. Check it out: github.com/allenai/molmoa… Have fun playing with MolmoAct2! Can’t wait to see more impressive demonstrations of MolmoAct2 after being finetuned on more diverse and challenging tasks with different embodiments 🤩

Haoquan Fang@hq_fang

We are launching MolmoAct2, a fully open Action Reasoning Model for real-world robot deployment: open weights, training code, action tokenizer, and complete training data. The core move is to couple a spatial VLM backbone to a continuous action expert. 🧵👇

English

3.1K

Maksim@MaksimXBT·8m

@kathrynwu1 distribution and timing are easier to get right when customer pain is high and obvious

English

Kathryn Wu@kathrynwu1·2h

Second-time founders are more likely to succeed because they no longer confuse building with winning. Most of them already learned the hard way that: great tech alone doesn’t matter. Distribution, timing, customer pain, and speed matter more than people think.

English

210

Maksim@MaksimXBT·9m

@GZilgalvis fifteen economists surveyed may not capture the outliers that actually move markets

English

Gustavs Zilgalvis@GZilgalvis·3h

Here's a trade. Fifteen economists surveyed other economists about AI and concluded U.S. GDP growth would be modestly elevated through the early 2030s, with a tail of much faster growth. Translating their forecasts to equity prices implies the S&P 500 around 11,000 in late 2031 vs. 8,000 today. The SPX 15,000 calls expiring December 19, 2031 trade for about $130, implying roughly 40% annualized returns if the economists are right, and 10x upside if AI starts writing Pulitzer novels and replacing the paralegals.

English

767

Keşfet

@jayendra_ram @Montreal_AI @Dimillian @CrusoeAI @nvidia @ResearchHubF @ResearchHub @Iceman_Hof