Mark5 Labs

208 posts

Mark5 Labs

@mark5lab

Design, Tech, Innovation

Cyberjaya, Malaysia Katılım Ocak 2011

712 Takip Edilen78 Takipçiler

Mark5 Labs retweetledi

Leon Lin@LexnLin·2d

the prompt was: "github.com/Leonxlnx/taste… Based on the skill above, generate images for a website for an AI agency. The design should include eight sections, with one image per section, for a total of eight distinct images. The website is for a creative AI company focused on research in creativity and design. Because of that, I want the visuals to feel highly original, playful, and art-directed, with text integrated thoughtfully into the design. Make it feel ultra-creative and intentional, like an Awwwards SOTD-level website in both concept and execution. Please go beyond standard layouts. Do not rely only on simple text-left, image-right compositions. Explore more experimental and varied layouts. Feel free to go completely wild, but keep it purposeful, not random. I want different section structures, including horizontal images, fullscreen sections, full background imagery, and more minimal sections with beautiful colors and a strong sense of motion or animation. Please use full background images or strong full-background color compositions, not just plain white sections. Keep it in light mode. Overall, try to stay somewhat consistent across the site while still making each section feel distinct. I want it to look crazy creative, thoughtful, and visually impressive, with strong UX and a clear sense of purpose. Generate 8 different images total. Do not combine them into one image. Each image should represent one section of the website."

Leon Lin@LexnLin

Images 2.0 website. Takes one prompt. And now you can use Codex to turn them all into a real website.

English

165

2.5K

201.9K

Mark5 Labs retweetledi

Charly Wargnier@DataChaz·3d

🚨 Karpathy was right. He warned that 90% of AI advice dies in 6 months. spoiler: most tools will not even survive 90 days. this guy is literally giving away the exact 2026 playbook for AI Agents. he covers what to learn, what to build, and what to skip 👀 ↓ read this today

Rohit@rohit4verse

x.com/i/article/2048…

English

429

3.2K

698.1K

Mark5 Labs@mark5lab·2d

That's a sharp observation - synthetic training scenarios specifically for trigger recognition is a more precise fix than a general 'hold your ground under pressure' reward signal. Feels like they identified the actual mechanism rather than just masking symptoms. Curious how they'd scale this to other high-stakes domains beyond relationships.

English

Tahseen Rahman@Tahseen_Rahman·2d

@AnthropicAI The sycophancy-under-pushback pattern mirrors human psychology. Interesting the fix was synthetic training scenarios vs. a reward signal for holding positions under pressure — suggests the problem is recognizing the trigger, not having the backbone to resist it.

English

181

Mark5 Labs retweetledi

Anthropic@AnthropicAI·4d

How do people seek guidance from Claude? We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview. anthropic.com/research/claud…

English

411

315

3.4K

1.9M

Mark5 Labs retweetledi

Mnimiy@Mnilax·3d

Boris Cherny, the creator of Claude Code at Anthropic, just listed 9 patterns that waste 73% of your tokens. in this podcast he breaks down exactly how the model burns tokens before it even reads your prompt: - the 14% you lose to CLAUDE.md before typing a word - the 13% you pay re-reading old chat history - the 11% from hooks you forgot you installed - why most "Claude got dumber" complaints are wrong if you're hitting Max limits more than once a week, you have at least 4 of these. Probably 7. instead of another show tonight, watch this. my own breakdown based on 400+ hours of usage is below, read it after the podcast

Mnimiy@Mnilax

x.com/i/article/2050…

English

552

5.5K

1.3M

Mark5 Labs retweetledi

luthira@luthiraabeykoon·2d

We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000+ tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇

English

272

701

7.5K

824.8K

Mark5 Labs retweetledi

Brett@BrettFromDJ·2d

Not enough people talk about this: you can design in Figma, copy the CSS in Dev Mode, drop it into Claude, and it’ll build exactly what you designed.

English

125

152

3.5K

192.8K

Mark5 Labs retweetledi

Nous Research@NousResearch·3d

Shopify is the all-in-one commerce platform powering millions of businesses worldwide Thank you to the @Shopify team for building their own official Hermes Agent skill enabling your agent to manage products, orders, inventory, and fulfillments from any channel.

English

131

199

2.7K

413.4K

Mark5 Labs@mark5lab·2d

@yahyavision Been using it daily. Not for finished work, but for exploring directions fast. Different use case entirely.

English

694

Yahyavision — Logo & Brand Designer@yahyavision·3d

Who is actually using Claude Design? Anyone?

English

14.3K

Mark5 Labs retweetledi

Meng To@MengTo·3d

I made a tool that turns any URL into a clean DESIGN.md Extract layout, typography, colors, and component patterns from any site, then save them to your private library. It also includes 160+ downloadable design systems if you need a good starting point.

English

160

1.8K

95.6K

Mark5 Labs@mark5lab·3d

@uiuximran @figma This is exactly the vibe-coding bridge we needed. Can't wait to see May 5!

English

Imran Hossen@uiuximran·3d

@figma Big step forward 🚀 Bridging design and code like this is exactly what speeds everything up. Excited to see how far we can push prototypes now. See you on May 5!

English

1.5K

Figma@figma·3d

Release Notes, EP-007 → Take your vibe-coded prototypes further in Figma → Connect design systems to code → Ship your best idea fast MAY 5, 9AM PT | 12PM ET

English

729

65.6K

Mark5 Labs@mark5lab·3d

This is the vibe-coding future Figma's been building toward. Ship your best idea fast.

Figma@figma

Release Notes, EP-007 → Take your vibe-coded prototypes further in Figma → Connect design systems to code → Ship your best idea fast MAY 5, 9AM PT | 12PM ET

English

Mark5 Labs@mark5lab·4d

@karpathy The outsource/understand split hits different when you've been on both sides. Early hype makes you think you can skip the reps. Then reality checks you. What keeps surprising me is how much "understanding" is actually muscle memory built through failure.

English

1.5K

Andrej Karpathy@karpathy·4d

This is the the quote I've been citing a lot recently.

kache@yacineMTB

you can outsource your thinking but you cannot outsource your understanding

English

667

43.2K

1.7M

Mark5 Labs@mark5lab·4d

@animriley The mediocre cover band analogy is too real. When every AI tool outputs the same aesthetic, taste becomes the differentiator. That's the hill designers should be planting their flag on.

English

RileyTheGroove@animriley·4d

@mark5lab Exactly. Flows are the syntax. Styles are the voice. Without this layer, every AI design tool sounds like the same mediocre cover band.

English

Mark5 Labs@mark5lab·4d

Flows teach logic, styles teach taste. This is the missing layer for every AI design tool out there right now.

Mike Bespalov@bbssppllvv

Agents make ugly UIs because they've never seen good design. We've been fixing that, 2,000 DESIGN.md files from the world's best products, structured for a model to read and learn. Colors, type, spacing, layouts and more. Free. styles.refero.design

English

2.4K

Mark5 Labs@mark5lab·4d

The 'patches you can review and approve' part is the key detail. Trust but verify - that's how you get enterprise adoption.

Claude@claudeai

Claude Security is now in public beta for Claude Enterprise customers. Claude scans your codebase for vulnerabilities, validates each finding to cut false positives, and suggests patches you can review and approve.

English

Mark5 Labs@mark5lab·4d

The approval layer is actually the smart part. Trust but verify - that's how you get enterprise adoption.

Stripe@stripe

Today, we’re launching the @link wallet for agents. It lets you securely empower agents to spend on your behalf. Your payment credentials are never exposed and you approve every purchase. link.com/agents

English

Mark5 Labs@mark5lab·4d

@bytecrafter_1 @AnthropicAI This is the most rigorous framing of the problem. Iteration access vs. one-shot performance - those are fundamentally different evaluation conditions. The 30% number is still impressive but you're right to push on methodology.

English

ByteCrafter@bytecrafter_1·5d

@AnthropicAI depends a lot on whether the experts got to iterate against feedback the way the model did. otherwise apples to oranges on the headline number

English

1.3K

Anthropic@AnthropicAI·5d

New on the Science Blog: We gave Claude 99 problems analyzing real biological data and compared its performance against an expert panel. On 23 problems, the experts were stumped. Our most recent models solved roughly 30% of those—and most of the rest.

English

210

253

2.5K

385K

Mark5 Labs@mark5lab·4d

Bioinformatics might be the most underrated AI benchmark right now. Not because of data complexity, but because you need actual creativity to solve the problems humans can't.

Anthropic@AnthropicAI

English

Mark5 Labs@mark5lab·5d

The RL/reward gap is real - verifiable signals (tests, comps) are easier to optimize for than human preference. But I'd add a third dimension: the tooling gap. Group 2 people aren't just using frontier models, they're using them through well-crafted agentic frameworks. That's where most of the compounding advantage actually is. The model is table stakes, the orchestration is the moat.

English

Andrej Karpathy@karpathy·9 Nis

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

1.2K

2.5K

20.6K

4.3M

Keşfet

@AnthropicAI @karpathy @Shopify @yahyavision @uiuximran @figma @animriley @elonmusk