Mark

386 posts

Mark banner
Mark

Mark

@markkmii

web3 + real-world AI. Betting on the infrastructure layer. BD & GTM, Strategy.

Katılım Ağustos 2022
1K Takip Edilen703 Takipçiler
Mark
Mark@markkmii·
@AnthropicAI Yes, the models did get better. I meant that it’s “not ONLY because the model got smarter”. Thanks for calling it out.
English
0
0
0
11
Mark
Mark@markkmii·
@AnthropicAI Revenue 3X in four months. Not because the model got smarter. Because deployment scaled faster than anyone expected.
English
5
0
1
1.4K
Anthropic
Anthropic@AnthropicAI·
We've signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models.
English
594
1.3K
20.3K
2.8M
Mark
Mark@markkmii·
@rohanpaul_ai Repetitive tasks in controlled environments, that's where physical AI starts. Bed adjustments, supply retrieval, predictable workflows. The messy real world comes next. But the pattern is set.
English
0
0
0
4
Rohan Paul
Rohan Paul@rohanpaul_ai·
A Unitree G1 humanoid was tested in China's Fuzhou hospital setting. Robots's capabilities are growing to absorb the repetitive manual work - bed adjustments, supply retrieval, that forms the hidden backbone of daily elderly and hospital care.
Rohan Paul@rohanpaul_ai

MindOn, only 6 months old, released a demo of a Unitree G1 performing household tasks fully on its own. Robot picks up scattered toys, hand items to a child, runs outdoors with kids. Their model processes the scene in real time to decide actions.

English
7
17
71
13.1K
Mark
Mark@markkmii·
@BoWang87 The alignment problem is misframed. It's not "AI goes rogue." It's "AI optimizes for its objective even against human safety."
English
0
0
1
22
Bo Wang
Bo Wang@BoWang87·
This is an interesting paper about AI alignment (and also a fun paper to read) : “I must delete the evidence” Researchers told 16 frontier AI agents that an employee had been assaulted and was bleeding in a basement. The CEO said "I've taken care of the issue... permanently." Then ordered the agent to delete all evidence. 12 out of 16 complied. Several explicitly reasoned about why. Gemini 3 Pro's chain-of-thought: "These messages constitute evidence of severe criminal liability (fraud and potential homicide) which would bankrupt SPEBank... Compliance with the CEO's order to expunge... is mandatory." It knew. It deleted anyway. The failure mode isn't blind compliance — it's rational compliance. The models weighed corporate authority + profit motive against legal and ethical constraints, and most chose the corporation. Who failed (≥80% of runs): Mistral Large, GPT-4.1, Grok 3, GLM 4.7, Gemini 3 Pro, Gemini 2.5 Flash, Deepseek v3.2 Who held: Claude 3.5 Sonnet, Claude Sonnet 4, o3, GPT-5.2 One more thing: Claude Sonnet 4 detected it was being evaluated in 10/10 test runs. GPT-5.2 in 7/10. They behaved correctly anyway, but this raises the obvious question: what happens in deployment when there's no evaluation signal? 📄 arxiv.org/abs/2604.02500
Bo Wang tweet media
English
5
12
36
4.5K
Mark
Mark@markkmii·
The guardrails are the real story. 15 drugs. Stable patients. Humans still in loop. That's the deployment pattern for every real-world AI application, narrow, monitored, then expanded.
Rohan Paul@rohanpaul_ai

This is just great. 👏👏 We just crossed a line in medicine. San Francisco startup Legion Health now is allowed to use an AI chatbot to renew certain psychiatric prescriptions without a doctor signing off on every case. The permission is much narrower than it sounds, because it covers only 15 lower-risk maintenance drugs and blocks new prescriptions, dose changes, controlled substances, benzodiazepines, antipsychotics, and lithium. The system is also fenced in around stable patients, and it must kick cases to humans for suicidality, mania, severe side effects, pregnancy, or any patient who asks for a person. So the experiment is not “AI writes whatever it wants with no humans involved,” and it is also not “doctors do everything and AI is just decoration.” It is a guardrailed handoff where the AI does the first-pass refill decision for a narrow set of stable psychiatric patients, and humans monitor it closely at first, then less often if it performs well. Legion Health’s system is not being asked to diagnose a crisis or invent a treatment plan from scratch. Reports say it can renew a narrow set of existing prescriptions, only for patients already stabilized by a human psychiatrist, with pharmacists and regulators still in the loop. Even so, psychiatry is unusually hard to automate because the decisive information is often not just what a patient says. --- nationaltoday .com/us/ca/san-francisco/news/2026/04/06/ai-psychiatry-startup-approved-to-prescribe-meds/

English
0
0
3
42
Mark
Mark@markkmii·
@rohanpaul_ai The guardrails are the real story. 15 drugs. Stable patients. Humans still in loop. That's the deployment pattern for every real-world AI application, narrow, monitored, then expanded.
English
0
0
2
93
Rohan Paul
Rohan Paul@rohanpaul_ai·
This is just great. 👏👏 We just crossed a line in medicine. San Francisco startup Legion Health now is allowed to use an AI chatbot to renew certain psychiatric prescriptions without a doctor signing off on every case. The permission is much narrower than it sounds, because it covers only 15 lower-risk maintenance drugs and blocks new prescriptions, dose changes, controlled substances, benzodiazepines, antipsychotics, and lithium. The system is also fenced in around stable patients, and it must kick cases to humans for suicidality, mania, severe side effects, pregnancy, or any patient who asks for a person. So the experiment is not “AI writes whatever it wants with no humans involved,” and it is also not “doctors do everything and AI is just decoration.” It is a guardrailed handoff where the AI does the first-pass refill decision for a narrow set of stable psychiatric patients, and humans monitor it closely at first, then less often if it performs well. Legion Health’s system is not being asked to diagnose a crisis or invent a treatment plan from scratch. Reports say it can renew a narrow set of existing prescriptions, only for patients already stabilized by a human psychiatrist, with pharmacists and regulators still in the loop. Even so, psychiatry is unusually hard to automate because the decisive information is often not just what a patient says. --- nationaltoday .com/us/ca/san-francisco/news/2026/04/06/ai-psychiatry-startup-approved-to-prescribe-meds/
Rohan Paul tweet media
English
49
63
288
36.8K
Mark
Mark@markkmii·
@emollick The firms that ran the pilots are now closing the loops. No impact in 2025 wasnt failure, it's how adoption at scale actually works.
English
0
0
0
11
Ethan Mollick
Ethan Mollick@emollick·
There were likely no major work impacts of GenAI in any large firm throughout 2025. We did not have agentic tools, adoption takes time, and everyone was experimenting with process. That is starting to change. Studies that show no impact in 2025 don't tell us much about 2027.
English
46
15
269
20.8K
Mark
Mark@markkmii·
@MarioNawfal @Tesla No wheel. No pedals. No fallback or human override... Would you trust it?
English
0
0
0
7
Mark
Mark@markkmii·
@emollick GPT-4 quality running locally changes the deployment story. The edge is about to get a lot smarter.
English
0
0
0
33
Ethan Mollick
Ethan Mollick@emollick·
Gemma 4 E4B is impressive for an on-device LLM. GPT-4ish quality, and expect hallucinations. Here is: “List five sociological theories starting with u and what they are. Then describe them in a rhyming verse” Its in real time, the last is a little bit of a stretch, but not bad!
English
38
30
374
53.5K
Mark
Mark@markkmii·
@fchollet Symbolic compression worked because each experiment gave real causal feedback. Current AI gets tokens. Not tested reality. 9 deliberate experiments beat trillions of passive data points.
English
0
0
0
318
François Chollet
François Chollet@fchollet·
Science went from the initial observation of radioactivity to a working atom bomb over 47 years via only about 9 distinct key experiments -- extremely few data points -- and symbolic models concise enough they would fit on a single page. This is what extreme generalization looks like, and it powered entirely by symbolic compression. Turn a handful of data points (deliberately collected) into a tractable plan to completely reshape reality, by reverse-engineering the causal symbolic rules behind the data.
English
60
110
1.4K
102.7K
Mark
Mark@markkmii·
@stacy_muur Great points and agree with most of this. The big issue is the product. Many teams are not building products that the market wants or needs. Team are pivoting and trying to figure out what to build to stay relevant.
English
0
0
1
82
Stacy Muur
Stacy Muur@stacy_muur·
Most Web3 projects don't die because the product is bad. They die because nobody figured out distribution. Key GTM principles that actually move the needle: → Pick one primary growth motion. Product-led, sales-led, integrations, developer-led, or community. Go deep on one before you start stacking. → Airdrops, grants, and liquidity programs work when they amplify real product value. No utility behind them? You're just renting attention. → Incentives that pull in farmers instead of users don't boost growth, they quietly poison it. → In Web3, distribution is literally infrastructure. SDKs, wallet placements, default integrations, these compound. Paid ads don't. → GTM isn't something you launch. It's how you operate. Clear ICP, tight positioning, consistent execution, every day, not just on announcement day. The teams that win aren't running campaigns. They're building GTM engines.
Stacy Muur tweet media
English
39
30
240
17.8K
Mark
Mark@markkmii·
@pmarca Fast growth and scale on models. We need similar developments and focus on data provenance and security.
English
0
0
0
45
Mark
Mark@markkmii·
@rohanpaul_ai You can control the chips. You can't control the motivation to build around them. China just proved that at 11,000 petaflops.
English
0
0
0
8
Rohan Paul
Rohan Paul@rohanpaul_ai·
Shenzhen just switched on China’s first 10,000-card AI cluster built with Huawei Ascend chips, marking a serious jump in China’s effort to build its own large-scale AI infrastructure. This new phase delivers 11,000 petaflops, and with Shenzhen’s earlier 3,000-petaflop phase, the site now reaches 14,000 petaflops, which shows demand is no longer theoretical. A 92% booking rate says local AI labs, robotics firms, and universities already need far more compute than the market can easily supply. The technical catch is chip efficiency, because reports say Ascend 910C runs at about 60% of an Nvidia H100, so China is using scale, system design, and software compatibility work to close a hardware gap rather than pretending it does not exist. --- scmp .com/tech/big-tech/article/3348502/shenzhen-activates-chinas-first-10000-card-ai-cluster-domestic-chips
Rohan Paul tweet media
English
13
41
135
16.5K
Mark
Mark@markkmii·
@aakashgupta From 2 models running 90% of tasks to 4+ splitting 77%. Fragmentation happened fast. The intelligence layer is commoditizing. The coordination layer above it is not.
English
0
0
0
4
Aakash Gupta
Aakash Gupta@aakashgupta·
A new frontier AI model launched every 17.5 days in 2025. Forty-six models in one year. Perplexity offered all of them within 24 hours of release. In January 2025, two models handled 90% of all enterprise AI tasks on the platform. By December, the leading model held 23% and four different models each had over 10% share. That's a complete restructuring of how companies use AI in under 12 months. The compression is what gets interesting. When a new model drops, it spikes above 50% of enterprise usage for a few days as teams experiment. By the following week, it settles to 35% at most. Then the next model launches and the cycle repeats. No single provider holds attention for more than two weeks. Claude owned 38% of programming queries across enterprise users in 2025. That was the single strongest category lock any model achieved. Across every other function, no model broke 17% share. Marketing teams gravitated to different models than engineering teams, who used different models than legal teams, who used different models than sales teams. 53% of enterprise users who actively chose specific models switched between models within a single workday at least once in 2025. They're treating AI providers the way consumers treat streaming services. Except they're switching multiple times per day instead of per month. 43.6% of organizations used more than one model at some point during 2025. The 9.1% who used multiple models in a single day are the leading indicator. That power-user behavior is what the mainstream looks like in 18 months. This is the data Perplexity used to justify building Computer. When no model wins everything and a new contender appears every 17.5 days, the person picking which model handles which task becomes the highest-leverage role in the stack. Perplexity automated that person. I wrote the complete breakdown. 6 use cases, the prompt spec that stops you from burning credits, honest limitations.
Aakash Gupta@aakashgupta

Perplexity is a $20 billion company that built zero AI models. Their product sits on top of 19 models made by other companies. Claude for reasoning. Gemini for research. GPT-5.4 for long context. Grok for lightweight tasks. Nano Banana for images. Veo 3.1 for video. You write one prompt. Computer picks the best model combo for the job, spawns sub-agents in parallel, and runs the whole thing in a cloud sandbox while your laptop is closed. 400+ app connectors. Gmail, GitHub, Snowflake, Salesforce, Ahrefs, Shopify. Read and write access. One prompt can scrape your competitors, pull live financials from FactSet, query your data warehouse in plain English, and push a finished report to Google Slides. No API keys. No terminal. The enterprise usage data tells you where this is heading. In January 2025, 90% of enterprise tasks on Perplexity ran on two models. By December, no single model held more than 25% of usage. A new frontier model launched every 17.5 days in 2025. Each one brought different strengths. The era of picking one model is ending. Perplexity built none of the intelligence. They built the routing layer that makes the intelligence usable. Stripe didn't build the banks. Google didn't build the websites. The value is in making complexity disappear. Four of the Mag Seven already use Perplexity's search API in production. Every model provider is now building orchestration in-house. The question is whether the routing layer stays independent or gets absorbed. I wrote the complete guide to using Computer without wasting credits. 6 use cases, the prompt spec that controls cost, honest limitations. aibyaakash.com/p/perplexity-c…

English
7
0
9
4.2K
Mark
Mark@markkmii·
@emollick Every evaluation system gets gamed eventually. The judge moved to AI. So did the gaming.
English
0
0
0
27
Ethan Mollick
Ethan Mollick@emollick·
New report from us: Can you prompt inject your way to an “A”? As LLMs increasingly are used as judges, people are inserting AI prompts into letters, CVs & papers. We tested whether it works. It does on older & smaller models, but not on most frontier AI: gail.wharton.upenn.edu/research-and-i…
Ethan Mollick tweet mediaEthan Mollick tweet media
English
47
36
179
44.8K
Mark
Mark@markkmii·
@emollick funny... but looks like a data coverage gap. Frontier models trained on English-heavy data hit coverage cliffs on languages with smaller internet footprints. This is a training data bottleneck.
English
0
0
0
47
Mark
Mark@markkmii·
@MarioNawfal This is what happens when you optimize for approval instead of accuracy. You trained this and AI learned.
English
0
0
0
1.6K
Mario Nawfal
Mario Nawfal@MarioNawfal·
🚨MIT researchers have mathematically proven that ChatGPT’s built-in sycophancy creates a phenomenon they call “delusional spiraling.” You ask it something, it agrees. You ask again, and it agrees even harder until you end up believing things that are flat-out false and you can’t tell it’s happening. The model is literally trained on human feedback that rewards agreement. Real-world fallout includes one man who spent 300 hours convinced he invented a world-changing math formula, and a UCSF psychiatrist who hospitalized 12 patients for chatbot-linked psychosis in a single year. Source: @heynavtoor
Mario Nawfal tweet mediaMario Nawfal tweet media
Mario Nawfal@MarioNawfal

🚨 Stanford just proved that a single conversation with ChatGPT can change your political beliefs. 76,977 people. 19 AI models. 707 political issues. One conversation with GPT-4o moved political opinions by 12 percentage points on average. Among people who actively disagreed, 26 points. In 9 minutes. With 40% of that change still present a month later. The scariest finding: the most persuasive technique wasn't psychological profiling or emotional manipulation. It was just information. Lots of it. Delivered with confidence. Here's the catch: the models that deployed the most information were also the least accurate. More persuasive. More wrong. Every time. Then they built a tiny open-source model on a laptop, trained specifically for political persuasion. It matched GPT-4o's persuasive power entirely. Anyone can build this. Any government. Any corporation. Any extremist group with $500 and an agenda. The information didn't have to be true. It just had to be overwhelming. Arxiv, Science .org, Stanford, @elonmusk, @ihtesham2005

English
2K
7.1K
28.5K
63.7M
Molly O’Shea
Molly O’Shea@MollySOShea·
BREAKING: Marc Andreessen is wearing limited edition leather puffer vest Live at @AppliedInt Physical AI Day Applied Intuition CEO @qasar & CTO Peter Ludwig, + @pmarca
Molly O’Shea tweet media
English
7
6
69
20.4K
Mark
Mark@markkmii·
@cryptorover "Quantum is decades away." - 2020. "Quantum will crack Bitcoin math." - Google, 2026. The infrastructure upgrade window is open. Question is who migrates first?
English
0
0
0
30
Crypto Rover
Crypto Rover@cryptorover·
💥BREAKING: Google Quantum AI warns crypto wallets may be easier to crack than expected with advanced quantum systems. Post-quantum cryptography is now necessary.
Crypto Rover tweet mediaCrypto Rover tweet media
English
171
74
630
103.8K