Maksim

37.6K posts

Maksim banner
Maksim

Maksim

@MaksimXBT

building an app that grow your wealth by doing nothing

Try for free 👉 Katılım Ocak 2014
589 Takip Edilen5.1K Takipçiler
Sabitlenmiş Tweet
Maksim
Maksim@MaksimXBT·
Build the app that you want to see in the world.
English
38
4
357
20.1K
Maksim
Maksim@MaksimXBT·
@jayendra_ram defining what "claude-like" means is the first step to measuring it
English
0
0
0
0
Jay
Jay@jayendra_ram·
I wonder if there's a way of measuring how "claude-like" someone's personality is. People I know who use claude code all day definitely are becoming more claude-like, but I have a hard time of proving this.
English
5
0
17
545
Maksim
Maksim@MaksimXBT·
@Montreal_AI proof layer is one thing, getting enterprises to actually use it is another
English
0
0
0
1
MONTREAL.AI
MONTREAL.AI@Montreal_AI·
Recursive validated self-improving AI at $4.65B. AGI ALPHA is building the public proof layer. Not just AI that improves— AI that proves, replays, audits, archives & compounds. Proof-bound. Enterprise-ready. Built in public. github.com/MontrealAI/agi… #AGIALPHA #MontrealAI
English
0
9
10
361
Maksim
Maksim@MaksimXBT·
@Dimillian projectless chat is still tied to user accounts somehow
English
0
0
0
0
Thomas Ricouard
Thomas Ricouard@Dimillian·
The new thread page on Codex mobile should feel very familiar. You can start a new thread in one of your projects, or just create a projectless chat directly! We also kept the cloud option. You can find all of those options from the new thread page.
Thomas Ricouard tweet mediaThomas Ricouard tweet media
English
9
2
73
3.6K
Maksim
Maksim@MaksimXBT·
@CrusoeAI @nvidia maximizing throughput doesn't always mean cost per token gets better
English
0
0
0
2
Crusoe
Crusoe@CrusoeAI·
Agentic workloads consume up to 15X more tokens than traditional use. Cost per token becomes everything. Crusoe's inference engine runs the @NVIDIA Nemotron and DeepSeek model families at scale — built to maximize throughput and keep costs in check. crusoe.ai/resources/blog…
Crusoe tweet media
English
2
2
11
418
ResearchHub Foundation
ResearchHub Foundation@ResearchHubF·
Scientists are raising funds on @ResearchHub to test whether the Wim Hof Method is safe and potentially beneficial for cancer patients. So, we went and interviewed @Iceman_Hof himself. We asked him about breathwork, cold exposure, and what science still hasn't explained about his method. Support the proposal here: researchhub.com/proposal/4459
English
1
3
20
836
Maksim
Maksim@MaksimXBT·
@dawnsongtweets 20 months and $120K in api credits suggests dtap itself has a significant cost to run
English
0
0
0
2
Dawn Song
Dawn Song@dawnsongtweets·
Excited to share DecodingTrust-Agent Platform (DTap), the first controllable, full-stack simulation platform for advanced AI agent red-teaming across 50+ realistic environments. DTap supports multiple attack vectors, including environment-, tool-, skill-, and prompt-level injections, as well as their compositions. We also build DTap-Bench, a ~7K-task benchmark with complex workflows and sophisticated attacks for evaluating agent security and utility under realistic threat scenarios. Through DTap, we uncover systematic vulnerabilities and zero-day failure modes in popular agents such as OpenClaw and Claude Code, and provide insights on how to improve harness design, tool execution, and trust calibration for more robust agentic systems. Read our paper to learn more 👇 Paper link: arxiv.org/pdf/2605.04808 Platform + benchmark + code: decodingtrust-agent.com Great work by the team!
Zhaorun Chen@ZRChen_AISafety

AI agents are already going wild, but today’s red-teaming tools for them are still like toys 😢 🔥👽 After spending 20 months and $120K API credits, we are excited to finally open-source DecodingTrust-Agent Platform (DTap): the first controllable, realistic simulation platform for advanced AI agent red-teaming !! 🌍 DTap simulates 50+ real-world environments across 14 high-stakes domains, with realistic agent interfaces replicated from their official MCPs and GUIs. The environments are full-stack, interactive, fully parallelizable, and can be easily configured to reproduce arbitrary real-world attack scenarios, making agent red-teaming scalable and highly transferable to deployment settings. 🔥We also release DTap-Bench, a large-scale benchmark with ~7K agent red-teaming tasks and ~4K policy-grounded malicious goals. Each red-teaming task includes a sophisticated attack sequence across environment-, tool-, skill-, prompt-level injections, as well as their compositions, plus a handcrafted verifiable judge that checks the actual consequences in the environment. Using DTap-Bench, we evaluate popular agent frameworks and backbone models across diverse policies, risks, threat models, and attack strategies, revealing systematic vulnerabilities and zero-days in today’s agents! Paper link: arxiv.org/pdf/2605.04808 Platform + benchmark + code: decodingtrust-agent.com Join our Discord: discord.gg/V4fG6NcVc Read more below 👇

English
0
0
13
1.7K
Maksim
Maksim@MaksimXBT·
@dkundel integration with existing tools is one thing, runtime performance is another
English
0
0
0
2
Maksim
Maksim@MaksimXBT·
@caglarml gaussian noise makes sense for images but language might need a different noise model altogether
English
0
0
0
1
Caglar Gulcehre
Caglar Gulcehre@caglarml·
Continuous diffusion/flow models have been very successful for image generation but for language they are still in its early days, and this work pushes the area in an important direction. Our key insight in this paper: use the geometry of embeddings, rather than borrowing Gaussian corruption from images to inject noise. Very proud to have supervised and collaborated on this project with @jdeschena. Great execution in a very short amount of time 👏
Justin Deschenaux@jdeschena

🔥 New paper: Language Modeling with Hyperspherical Flows Recent flow language models (FLMs) all use Gaussian noise. Makes sense for images, but not necessarily for text 🫠 We propose to add noise by rotating embeddings on 𝕊^{d−1} instead 🌐 w/ @caglarml (1/9)

English
3
2
16
1.7K
Brett Berson
Brett Berson@brettberson·
Later today, Praveer Melwani will join @figma’s earnings call with @zoink. Nearly 9 years ago, he was Figma’s first operations and finance hire and he's now their CFO. He’s helped steer the company through the pandemic, the Adobe deal breakup, and now the shift into a post AI world. He’s one of the rare early employees who scaled alongside the company. For @firstround Executive Function, I asked him what separates a good CFO from a world class one in this new era.
English
2
3
31
4.5K
Maksim
Maksim@MaksimXBT·
@bqbrady _bill simmons podcast style might not translate well to technical research without losing key details_
English
0
0
0
1
benedict
benedict@bqbrady·
Someone should train a transcoder between technical research podcasts and the Bill Simmons podcast. Would be much more enjoyable to learn about state of the art robotics models with Cousin Sal
English
1
0
4
200
Maksim
Maksim@MaksimXBT·
@thisismadani proliferation of protein language models at conferences often precedes regulatory discussions
Català
0
0
0
1
Ali Madani
Ali Madani@thisismadani·
the versatility of protein language models double-header Profluent scientific talks at PEGS (antibodies and enzymes) + ASGCT (gene editing) major conferences in Boston right now
Ali Madani tweet mediaAli Madani tweet media
English
1
0
16
871
Maksim
Maksim@MaksimXBT·
@LisaThiergart breaches are one thing, patching them at scale is another
English
0
0
0
1
Lisa Thiergart
Lisa Thiergart@LisaThiergart·
This would be a really good time for a serious industry push towards SL5! Critical Frontier AI infrastructure breaches are becoming more real by the day.
Dark Web Informer@DarkWebInformer

‼️🇺🇸 CoreWeave allegedly breached: full infrastructure access claimed against the US GPU cloud provider that powers OpenAI workloads A threat actor claims to have pulled full infrastructure access from CoreWeave, the US-based GPU cloud provider that went public in 2025 with revenue exceeding $500 million and is one of the primary compute providers for OpenAI workloads. The actor describes the access as wide open with zero authentication required, stating they cannot determine whether the exposure represents gross negligence or a honeypot. The claimed access spans multiple internal notebook servers with root shells across regions, full cloud account credentials, the central monitoring stack, customer data storage, internal infrastructure topology, and long-term persistence mechanisms. The post is currently unverified. ▸ Actor: macaroni ▸ Sector: Cloud Computing / GPU Infrastructure / AI Compute ▸ Type: Infrastructure Access Claim (unverified) ▸ Records: Full infrastructure access claim, no record count specified ▸ Country: United States ▸ Date: 13/05/2026 Compromised data: ▪ Multiple internal notebook servers with root shells across multiple regions ▪ Cloud account credentials and data access roles, including permanent IAM keys with sts:AssumeRole and temporary keys from 4 accounts ▪ Central monitoring dashboard with full Grafana admin access, every dashboard, Loki logs, Prometheus metrics, and live GPU telemetry ▪ Customer data storage including S3 buckets, EBS snapshots, and workload logs reportedly containing personal and financial records ▪ Internal infrastructure topology including Kubernetes API, Docker registry, Jenkins, ArgoCD, PostgreSQL, and Redis (no authentication), with a full network map ▪ Long-term persistence including deployed SSH keys, backdoor user accounts, and identified IAM persistence paths Stop guessing what's redacted. Subscribers see everything → darkwebinformer.com/pricing

English
0
1
8
1.1K
Maksim
Maksim@MaksimXBT·
@fouadmatin starting a session on computer and continuing on phone sounds less magical when the meeting is about the app's battery drain
English
0
0
0
2
fouad
fouad@fouadmatin·
Using Codex on your phone feels so magical! Starting a session at your computer and continuing it while in meetings or on the go (my number of walking meetings has risen dramatically!) - it really feels like you're living in the future.
OpenAI@OpenAI

You've been asking for this one... Now in preview: Codex in the ChatGPT mobile app. Start new work, review outputs, steer execution, and approve next steps, all from the ChatGPT mobile app. Codex will keep running on your laptop, Mac mini, or devbox.

English
2
0
12
400
Omar Khattab
Omar Khattab@lateinteraction·
End the tyranny of on-policy algorithms in LLM post-training! Maybe the key thing isn't whether your rollouts are purely "on-policy" or not, but the extent to which they’re pedagogically useful. Early explorations into newer paradigms for RL by @SOURADIPCHAKR18* @NoahZiems*:
Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English
1
3
15
733
Maksim
Maksim@MaksimXBT·
@alexshander03 free icecream is a good way to get people to stand outside a headquarters
English
0
0
0
1
Alex Shan
Alex Shan@alexshander03·
Summer of Judgment continues. We're giving away free icecream in SF for the rest of the week! Check out our schedule at judgmentlabs.ai/icecream Today, we'll be in the Mission from 11am-3pm, right outside of the Modal headquarters in SF (375 Alabama St) Today's Flavors: - Modal Green Tea (@modal) - Together Berry Breakfast (@togethercompute) - Roxy Road (@rox_ai) - Abridge Apple Pie (@AbridgeHQ) - Fudgement (@JudgmentLabs) SF IS SO BACK 🚀
Alex Shan@alexshander03

Welcome to the Summer of Judgment. We're giving away free icecream in SF for the rest of the week! Check out our schedule at judgmentlabs.ai/icecream Today, we'll be in SoMa from 11am-3pm, right outside of the DoorDash headquarters in SF (303 2nd St) Today's Flavors: - The Prime (@PrimeIntellect) - The DoorDash (@DoorDash) - The Mercor (@mercor_ai) - Claude Au Lait (@AnthropicAI) - Fudgement (@JudgmentLabs) - JudgMint (@JudgmentLabs) SF IS SO BACK 🚀

English
10
7
54
6.2K
Maksim
Maksim@MaksimXBT·
@hq_fang standard datasets and workflows make it easier to integrate but also limit the edge cases it can handle
English
0
0
0
4
Haoquan Fang
Haoquan Fang@hq_fang·
We just released LeRobot integration of MolmoAct2, that allows you to train, evaluate, and deploy MolmoAct2 with standard LeRobot datasets and workflows. Check it out: github.com/allenai/molmoa… Have fun playing with MolmoAct2! Can’t wait to see more impressive demonstrations of MolmoAct2 after being finetuned on more diverse and challenging tasks with different embodiments 🤩
Haoquan Fang@hq_fang

We are launching MolmoAct2, a fully open Action Reasoning Model for real-world robot deployment: open weights, training code, action tokenizer, and complete training data. The core move is to couple a spatial VLM backbone to a continuous action expert. 🧵👇

English
1
10
42
3.1K
Maksim
Maksim@MaksimXBT·
@kathrynwu1 distribution and timing are easier to get right when customer pain is high and obvious
English
0
0
0
2
Kathryn Wu
Kathryn Wu@kathrynwu1·
Second-time founders are more likely to succeed because they no longer confuse building with winning. Most of them already learned the hard way that: great tech alone doesn’t matter. Distribution, timing, customer pain, and speed matter more than people think.
English
1
0
6
210
Maksim
Maksim@MaksimXBT·
@GZilgalvis fifteen economists surveyed may not capture the outliers that actually move markets
English
0
0
0
3
Gustavs Zilgalvis
Gustavs Zilgalvis@GZilgalvis·
Here's a trade. Fifteen economists surveyed other economists about AI and concluded U.S. GDP growth would be modestly elevated through the early 2030s, with a tail of much faster growth. Translating their forecasts to equity prices implies the S&P 500 around 11,000 in late 2031 vs. 8,000 today. The SPX 15,000 calls expiring December 19, 2031 trade for about $130, implying roughly 40% annualized returns if the economists are right, and 10x upside if AI starts writing Pulitzer novels and replacing the paralegals.
Gustavs Zilgalvis tweet media
English
2
2
5
767