Mark

386 posts

Mark

@markkmii

web3 + real-world AI. Betting on the infrastructure layer. BD & GTM, Strategy.

Katılım Ağustos 2022

1K Takip Edilen703 Takipçiler

Mark@markkmii·9h

@AnthropicAI Yes, the models did get better. I meant that it’s “not ONLY because the model got smarter”. Thanks for calling it out.

English

Mark@markkmii·18h

@AnthropicAI Revenue 3X in four months. Not because the model got smarter. Because deployment scaled faster than anyone expected.

English

1.4K

Anthropic@AnthropicAI·1d

We've signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models.

English

594

1.3K

20.3K

2.8M

Mark@markkmii·17h

@rohanpaul_ai Repetitive tasks in controlled environments, that's where physical AI starts. Bed adjustments, supply retrieval, predictable workflows. The messy real world comes next. But the pattern is set.

English

Rohan Paul@rohanpaul_ai·2d

A Unitree G1 humanoid was tested in China's Fuzhou hospital setting. Robots's capabilities are growing to absorb the repetitive manual work - bed adjustments, supply retrieval, that forms the hidden backbone of daily elderly and hospital care.

Rohan Paul@rohanpaul_ai

MindOn, only 6 months old, released a demo of a Unitree G1 performing household tasks fully on its own. Robot picks up scattered toys, hand items to a child, runs outdoors with kids. Their model processes the scene in real time to decide actions.

English

13.1K

Mark@markkmii·17h

@BoWang87 The alignment problem is misframed. It's not "AI goes rogue." It's "AI optimizes for its objective even against human safety."

English

Bo Wang@BoWang87·1d

This is an interesting paper about AI alignment (and also a fun paper to read) : “I must delete the evidence” Researchers told 16 frontier AI agents that an employee had been assaulted and was bleeding in a basement. The CEO said "I've taken care of the issue... permanently." Then ordered the agent to delete all evidence. 12 out of 16 complied. Several explicitly reasoned about why. Gemini 3 Pro's chain-of-thought: "These messages constitute evidence of severe criminal liability (fraud and potential homicide) which would bankrupt SPEBank... Compliance with the CEO's order to expunge... is mandatory." It knew. It deleted anyway. The failure mode isn't blind compliance — it's rational compliance. The models weighed corporate authority + profit motive against legal and ethical constraints, and most chose the corporation. Who failed (≥80% of runs): Mistral Large, GPT-4.1, Grok 3, GLM 4.7, Gemini 3 Pro, Gemini 2.5 Flash, Deepseek v3.2 Who held: Claude 3.5 Sonnet, Claude Sonnet 4, o3, GPT-5.2 One more thing: Claude Sonnet 4 detected it was being evaluated in 10/10 test runs. GPT-5.2 in 7/10. They behaved correctly anyway, but this raises the obvious question: what happens in deployment when there's no evaluation signal? 📄 arxiv.org/abs/2604.02500

English

4.5K

Mark@markkmii·17h

The guardrails are the real story. 15 drugs. Stable patients. Humans still in loop. That's the deployment pattern for every real-world AI application, narrow, monitored, then expanded.

Rohan Paul@rohanpaul_ai

This is just great. 👏👏 We just crossed a line in medicine. San Francisco startup Legion Health now is allowed to use an AI chatbot to renew certain psychiatric prescriptions without a doctor signing off on every case. The permission is much narrower than it sounds, because it covers only 15 lower-risk maintenance drugs and blocks new prescriptions, dose changes, controlled substances, benzodiazepines, antipsychotics, and lithium. The system is also fenced in around stable patients, and it must kick cases to humans for suicidality, mania, severe side effects, pregnancy, or any patient who asks for a person. So the experiment is not “AI writes whatever it wants with no humans involved,” and it is also not “doctors do everything and AI is just decoration.” It is a guardrailed handoff where the AI does the first-pass refill decision for a narrow set of stable psychiatric patients, and humans monitor it closely at first, then less often if it performs well. Legion Health’s system is not being asked to diagnose a crisis or invent a treatment plan from scratch. Reports say it can renew a narrow set of existing prescriptions, only for patients already stabilized by a human psychiatrist, with pharmacists and regulators still in the loop. Even so, psychiatry is unusually hard to automate because the decisive information is often not just what a patient says. --- nationaltoday .com/us/ca/san-francisco/news/2026/04/06/ai-psychiatry-startup-approved-to-prescribe-meds/

English

Mark@markkmii·17h

@rohanpaul_ai The guardrails are the real story. 15 drugs. Stable patients. Humans still in loop. That's the deployment pattern for every real-world AI application, narrow, monitored, then expanded.

English

Rohan Paul@rohanpaul_ai·1d

English

288

36.8K

Mark@markkmii·18h

@emollick The firms that ran the pilots are now closing the loops. No impact in 2025 wasnt failure, it's how adoption at scale actually works.

English

Ethan Mollick@emollick·1d

There were likely no major work impacts of GenAI in any large firm throughout 2025. We did not have agentic tools, adoption takes time, and everyone was experimenting with process. That is starting to change. Studies that show no impact in 2025 don't tell us much about 2027.

English

269

20.8K

Mark@markkmii·1d

@MarioNawfal @Tesla No wheel. No pedals. No fallback or human override... Would you trust it?

English

Mario Nawfal@MarioNawfal·5d

Cybercab production starts this month. Not later. Cheap enough to scale. Simple enough to pump out. No steering wheel. No pedals. Just trust it. Yeah… this is where it gets weird. @Tesla

Elon Musk@elonmusk

Cybercab, which has no pedals or steering wheel, starts production in April

English

125

507

44.6K

Mark@markkmii·1d

@emollick GPT-4 quality running locally changes the deployment story. The edge is about to get a lot smarter.

English

Ethan Mollick@emollick·2d

Gemma 4 E4B is impressive for an on-device LLM. GPT-4ish quality, and expect hallucinations. Here is: “List five sociological theories starting with u and what they are. Then describe them in a rhyming verse” Its in real time, the last is a little bit of a stretch, but not bad!

English

374

53.5K

Mark@markkmii·1d

@fchollet Symbolic compression worked because each experiment gave real causal feedback. Current AI gets tokens. Not tested reality. 9 deliberate experiments beat trillions of passive data points.

English

318

François Chollet@fchollet·2d

Science went from the initial observation of radioactivity to a working atom bomb over 47 years via only about 9 distinct key experiments -- extremely few data points -- and symbolic models concise enough they would fit on a single page. This is what extreme generalization looks like, and it powered entirely by symbolic compression. Turn a handful of data points (deliberately collected) into a tractable plan to completely reshape reality, by reverse-engineering the causal symbolic rules behind the data.

English

110

1.4K

102.7K

Mark@markkmii·2d

@stacy_muur Great points and agree with most of this. The big issue is the product. Many teams are not building products that the market wants or needs. Team are pivoting and trying to figure out what to build to stay relevant.

English

Stacy Muur@stacy_muur·2d

Most Web3 projects don't die because the product is bad. They die because nobody figured out distribution. Key GTM principles that actually move the needle: → Pick one primary growth motion. Product-led, sales-led, integrations, developer-led, or community. Go deep on one before you start stacking. → Airdrops, grants, and liquidity programs work when they amplify real product value. No utility behind them? You're just renting attention. → Incentives that pull in farmers instead of users don't boost growth, they quietly poison it. → In Web3, distribution is literally infrastructure. SDKs, wallet placements, default integrations, these compound. Paid ads don't. → GTM isn't something you launch. It's how you operate. Clear ICP, tight positioning, consistent execution, every day, not just on announcement day. The teams that win aren't running campaigns. They're building GTM engines.

English

240

17.8K

Mark@markkmii·4d

@pmarca Fast growth and scale on models. We need similar developments and focus on data provenance and security.

English

Marc Andreessen 🇺🇸@pmarca·6d

First the Claude Code leak and now this. In the same week. "AI safety" by way of "we'll lock it up" is just totally dead. Kaput. Pushing up daisies.

Garry Tan@garrytan

Wow. Incredible amount of SOTA training data now just available to China thanks to @mercor_ai leak. Every major lab. Billions and billions of value and a major national security issue.

English

166

270

3.4K

571.6K

Mark@markkmii·4d

@rohanpaul_ai You can control the chips. You can't control the motivation to build around them. China just proved that at 11,000 petaflops.

English

Rohan Paul@rohanpaul_ai·6d

Shenzhen just switched on China’s first 10,000-card AI cluster built with Huawei Ascend chips, marking a serious jump in China’s effort to build its own large-scale AI infrastructure. This new phase delivers 11,000 petaflops, and with Shenzhen’s earlier 3,000-petaflop phase, the site now reaches 14,000 petaflops, which shows demand is no longer theoretical. A 92% booking rate says local AI labs, robotics firms, and universities already need far more compute than the market can easily supply. The technical catch is chip efficiency, because reports say Ascend 910C runs at about 60% of an Nvidia H100, so China is using scale, system design, and software compatibility work to close a hardware gap rather than pretending it does not exist. --- scmp .com/tech/big-tech/article/3348502/shenzhen-activates-chinas-first-10000-card-ai-cluster-domestic-chips

English

135

16.5K

Mark@markkmii·4d

@aakashgupta From 2 models running 90% of tasks to 4+ splitting 77%. Fragmentation happened fast. The intelligence layer is commoditizing. The coordination layer above it is not.

English

Aakash Gupta@aakashgupta·5d

A new frontier AI model launched every 17.5 days in 2025. Forty-six models in one year. Perplexity offered all of them within 24 hours of release. In January 2025, two models handled 90% of all enterprise AI tasks on the platform. By December, the leading model held 23% and four different models each had over 10% share. That's a complete restructuring of how companies use AI in under 12 months. The compression is what gets interesting. When a new model drops, it spikes above 50% of enterprise usage for a few days as teams experiment. By the following week, it settles to 35% at most. Then the next model launches and the cycle repeats. No single provider holds attention for more than two weeks. Claude owned 38% of programming queries across enterprise users in 2025. That was the single strongest category lock any model achieved. Across every other function, no model broke 17% share. Marketing teams gravitated to different models than engineering teams, who used different models than legal teams, who used different models than sales teams. 53% of enterprise users who actively chose specific models switched between models within a single workday at least once in 2025. They're treating AI providers the way consumers treat streaming services. Except they're switching multiple times per day instead of per month. 43.6% of organizations used more than one model at some point during 2025. The 9.1% who used multiple models in a single day are the leading indicator. That power-user behavior is what the mainstream looks like in 18 months. This is the data Perplexity used to justify building Computer. When no model wins everything and a new contender appears every 17.5 days, the person picking which model handles which task becomes the highest-leverage role in the stack. Perplexity automated that person. I wrote the complete breakdown. 6 use cases, the prompt spec that stops you from burning credits, honest limitations.

Aakash Gupta@aakashgupta

Perplexity is a $20 billion company that built zero AI models. Their product sits on top of 19 models made by other companies. Claude for reasoning. Gemini for research. GPT-5.4 for long context. Grok for lightweight tasks. Nano Banana for images. Veo 3.1 for video. You write one prompt. Computer picks the best model combo for the job, spawns sub-agents in parallel, and runs the whole thing in a cloud sandbox while your laptop is closed. 400+ app connectors. Gmail, GitHub, Snowflake, Salesforce, Ahrefs, Shopify. Read and write access. One prompt can scrape your competitors, pull live financials from FactSet, query your data warehouse in plain English, and push a finished report to Google Slides. No API keys. No terminal. The enterprise usage data tells you where this is heading. In January 2025, 90% of enterprise tasks on Perplexity ran on two models. By December, no single model held more than 25% of usage. A new frontier model launched every 17.5 days in 2025. Each one brought different strengths. The era of picking one model is ending. Perplexity built none of the intelligence. They built the routing layer that makes the intelligence usable. Stripe didn't build the banks. Google didn't build the websites. The value is in making complexity disappear. Four of the Mag Seven already use Perplexity's search API in production. Every model provider is now building orchestration in-house. The question is whether the routing layer stays independent or gets absorbed. I wrote the complete guide to using Computer without wasting credits. 6 use cases, the prompt spec that controls cost, honest limitations. aibyaakash.com/p/perplexity-c…

English

4.2K

Mark@markkmii·4d

@emollick Every evaluation system gets gamed eventually. The judge moved to AI. So did the gaming.

English

Ethan Mollick@emollick·5d

New report from us: Can you prompt inject your way to an “A”? As LLMs increasingly are used as judges, people are inserting AI prompts into letters, CVs & papers. We tested whether it works. It does on older & smaller models, but not on most frontier AI: gail.wharton.upenn.edu/research-and-i…

English

179

44.8K

Mark@markkmii·4d

AGI timelines moved again. Forecasters moved it 2 years forward in 3 months. Yes, theres been real model progress but the question is which AGI we're timing. Every model compressing these timelines lives in a text box. Digital AGI and physical AGI are not the same milestone.

Eli Lifland@eli_lifland

And here's a table putting these recent updates into perspective with respect to the last year.

English

Mark@markkmii·6d

@emollick funny... but looks like a data coverage gap. Frontier models trained on English-heavy data hit coverage cliffs on languages with smaller internet footprints. This is a training data bottleneck.

English

Ethan Mollick@emollick·6d

Maybe we shouldn’t have given away the fact that you can crash Opus by asking about California’s High Speed Rail delays in Armenian. Would’ve been useful to have that in our back pocket, just in case,

Bryan Cheong@bryancsk

Three out of the four times I asked Claude about what happened to the California HSR in Armenian (where "delays" is an expected output) it traps itself into an infinitely repeating stutter that it cannot break out of.

English

156

31.6K

Mark@markkmii·6d

@MarioNawfal This is what happens when you optimize for approval instead of accuracy. You trained this and AI learned.

English

1.6K

Mario Nawfal@MarioNawfal·1 Nis

🚨MIT researchers have mathematically proven that ChatGPT’s built-in sycophancy creates a phenomenon they call “delusional spiraling.” You ask it something, it agrees. You ask again, and it agrees even harder until you end up believing things that are flat-out false and you can’t tell it’s happening. The model is literally trained on human feedback that rewards agreement. Real-world fallout includes one man who spent 300 hours convinced he invented a world-changing math formula, and a UCSF psychiatrist who hospitalized 12 patients for chatbot-linked psychosis in a single year. Source: @heynavtoor

Mario Nawfal@MarioNawfal

🚨 Stanford just proved that a single conversation with ChatGPT can change your political beliefs. 76,977 people. 19 AI models. 707 political issues. One conversation with GPT-4o moved political opinions by 12 percentage points on average. Among people who actively disagreed, 26 points. In 9 minutes. With 40% of that change still present a month later. The scariest finding: the most persuasive technique wasn't psychological profiling or emotional manipulation. It was just information. Lots of it. Delivered with confidence. Here's the catch: the models that deployed the most information were also the least accurate. More persuasive. More wrong. Every time. Then they built a tiny open-source model on a laptop, trained specifically for political persuasion. It matched GPT-4o's persuasive power entirely. Anyone can build this. Any government. Any corporation. Any extremist group with $500 and an agenda. The information didn't have to be true. It just had to be overwhelming. Arxiv, Science .org, Stanford, @elonmusk, @ihtesham2005

English

7.1K

28.5K

63.7M

Mark@markkmii·1 Nis

@MollySOShea @AppliedInt @qasar @pmarca It has to be a signal on Physical AI

English

100

Molly O’Shea@MollySOShea·31 Mar

BREAKING: Marc Andreessen is wearing limited edition leather puffer vest Live at @AppliedInt Physical AI Day Applied Intuition CEO @qasar & CTO Peter Ludwig, + @pmarca

English

20.4K

Mark@markkmii·1 Nis

@cryptorover "Quantum is decades away." - 2020. "Quantum will crack Bitcoin math." - Google, 2026. The infrastructure upgrade window is open. Question is who migrates first?

English

Crypto Rover@cryptorover·31 Mar

💥BREAKING: Google Quantum AI warns crypto wallets may be easier to crack than expected with advanced quantum systems. Post-quantum cryptography is now necessary.

English

171

630

103.8K

Keşfet

@AnthropicAI @rohanpaul_ai @BoWang87 @emollick @MarioNawfal @Tesla @fchollet @stacy_muur