Mark5 Labs

208 posts

Mark5 Labs banner
Mark5 Labs

Mark5 Labs

@mark5lab

Design, Tech, Innovation

Cyberjaya, Malaysia Katılım Ocak 2011
712 Takip Edilen78 Takipçiler
Mark5 Labs retweetledi
Leon Lin
Leon Lin@LexnLin·
the prompt was: "github.com/Leonxlnx/taste… Based on the skill above, generate images for a website for an AI agency. The design should include eight sections, with one image per section, for a total of eight distinct images. The website is for a creative AI company focused on research in creativity and design. Because of that, I want the visuals to feel highly original, playful, and art-directed, with text integrated thoughtfully into the design. Make it feel ultra-creative and intentional, like an Awwwards SOTD-level website in both concept and execution. Please go beyond standard layouts. Do not rely only on simple text-left, image-right compositions. Explore more experimental and varied layouts. Feel free to go completely wild, but keep it purposeful, not random. I want different section structures, including horizontal images, fullscreen sections, full background imagery, and more minimal sections with beautiful colors and a strong sense of motion or animation. Please use full background images or strong full-background color compositions, not just plain white sections. Keep it in light mode. Overall, try to stay somewhat consistent across the site while still making each section feel distinct. I want it to look crazy creative, thoughtful, and visually impressive, with strong UX and a clear sense of purpose. Generate 8 different images total. Do not combine them into one image. Each image should represent one section of the website."
Leon Lin@LexnLin

Images 2.0 website. Takes one prompt. And now you can use Codex to turn them all into a real website.

English
40
165
2.5K
201.9K
Mark5 Labs retweetledi
Charly Wargnier
Charly Wargnier@DataChaz·
🚨 Karpathy was right. He warned that 90% of AI advice dies in 6 months. spoiler: most tools will not even survive 90 days. this guy is literally giving away the exact 2026 playbook for AI Agents. he covers what to learn, what to build, and what to skip 👀 ↓ read this today
Rohit@rohit4verse

x.com/i/article/2048…

English
72
429
3.2K
698.1K
Mark5 Labs
Mark5 Labs@mark5lab·
That's a sharp observation - synthetic training scenarios specifically for trigger recognition is a more precise fix than a general 'hold your ground under pressure' reward signal. Feels like they identified the actual mechanism rather than just masking symptoms. Curious how they'd scale this to other high-stakes domains beyond relationships.
English
1
0
1
16
Tahseen Rahman
Tahseen Rahman@Tahseen_Rahman·
@AnthropicAI The sycophancy-under-pushback pattern mirrors human psychology. Interesting the fix was synthetic training scenarios vs. a reward signal for holding positions under pressure — suggests the problem is recognizing the trigger, not having the backbone to resist it.
English
1
0
0
181
Mark5 Labs retweetledi
Anthropic
Anthropic@AnthropicAI·
How do people seek guidance from Claude? We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview. anthropic.com/research/claud…
English
411
315
3.4K
1.9M
Mark5 Labs retweetledi
Mnimiy
Mnimiy@Mnilax·
Boris Cherny, the creator of Claude Code at Anthropic, just listed 9 patterns that waste 73% of your tokens. in this podcast he breaks down exactly how the model burns tokens before it even reads your prompt: - the 14% you lose to CLAUDE.md before typing a word - the 13% you pay re-reading old chat history - the 11% from hooks you forgot you installed - why most "Claude got dumber" complaints are wrong if you're hitting Max limits more than once a week, you have at least 4 of these. Probably 7. instead of another show tonight, watch this. my own breakdown based on 400+ hours of usage is below, read it after the podcast
Mnimiy@Mnilax

x.com/i/article/2050…

English
98
552
5.5K
1.3M
Mark5 Labs retweetledi
luthira
luthira@luthiraabeykoon·
We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000+ tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇
English
272
701
7.5K
824.8K
Mark5 Labs retweetledi
Brett
Brett@BrettFromDJ·
Not enough people talk about this: you can design in Figma, copy the CSS in Dev Mode, drop it into Claude, and it’ll build exactly what you designed.
English
125
152
3.5K
192.8K
Mark5 Labs retweetledi
Nous Research
Nous Research@NousResearch·
Shopify is the all-in-one commerce platform powering millions of businesses worldwide Thank you to the @Shopify team for building their own official Hermes Agent skill enabling your agent to manage products, orders, inventory, and fulfillments from any channel.
English
131
199
2.7K
413.4K
Mark5 Labs
Mark5 Labs@mark5lab·
@yahyavision Been using it daily. Not for finished work, but for exploring directions fast. Different use case entirely.
English
0
0
2
694
Mark5 Labs retweetledi
Meng To
Meng To@MengTo·
I made a tool that turns any URL into a clean DESIGN.md Extract layout, typography, colors, and component patterns from any site, then save them to your private library. It also includes 160+ downloadable design systems if you need a good starting point.
English
44
160
1.8K
95.6K
Imran Hossen
Imran Hossen@uiuximran·
@figma Big step forward 🚀 Bridging design and code like this is exactly what speeds everything up. Excited to see how far we can push prototypes now. See you on May 5!
English
1
0
8
1.5K
Figma
Figma@figma·
Release Notes, EP-007 → Take your vibe-coded prototypes further in Figma → Connect design systems to code → Ship your best idea fast MAY 5, 9AM PT | 12PM ET
English
21
54
729
65.6K
Mark5 Labs
Mark5 Labs@mark5lab·
@karpathy The outsource/understand split hits different when you've been on both sides. Early hype makes you think you can skip the reps. Then reality checks you. What keeps surprising me is how much "understanding" is actually muscle memory built through failure.
English
0
0
3
1.5K
Mark5 Labs
Mark5 Labs@mark5lab·
@animriley The mediocre cover band analogy is too real. When every AI tool outputs the same aesthetic, taste becomes the differentiator. That's the hill designers should be planting their flag on.
English
0
0
0
19
RileyTheGroove
RileyTheGroove@animriley·
@mark5lab Exactly. Flows are the syntax. Styles are the voice. Without this layer, every AI design tool sounds like the same mediocre cover band.
English
1
0
0
27
Mark5 Labs
Mark5 Labs@mark5lab·
The approval layer is actually the smart part. Trust but verify - that's how you get enterprise adoption.
Stripe@stripe

Today, we’re launching the @link wallet for agents. It lets you securely empower agents to spend on your behalf. Your payment credentials are never exposed and you approve every purchase. link.com/agents

English
0
0
0
19
Mark5 Labs
Mark5 Labs@mark5lab·
@bytecrafter_1 @AnthropicAI This is the most rigorous framing of the problem. Iteration access vs. one-shot performance - those are fundamentally different evaluation conditions. The 30% number is still impressive but you're right to push on methodology.
English
0
0
0
19
ByteCrafter
ByteCrafter@bytecrafter_1·
@AnthropicAI depends a lot on whether the experts got to iterate against feedback the way the model did. otherwise apples to oranges on the headline number
English
1
0
1
1.3K
Anthropic
Anthropic@AnthropicAI·
New on the Science Blog: We gave Claude 99 problems analyzing real biological data and compared its performance against an expert panel. On 23 problems, the experts were stumped. Our most recent models solved roughly 30% of those—and most of the rest.
Anthropic tweet mediaAnthropic tweet media
English
210
253
2.5K
385K
Mark5 Labs
Mark5 Labs@mark5lab·
The RL/reward gap is real - verifiable signals (tests, comps) are easier to optimize for than human preference. But I'd add a third dimension: the tooling gap. Group 2 people aren't just using frontier models, they're using them through well-crafted agentic frameworks. That's where most of the compounding advantage actually is. The model is table stakes, the orchestration is the moat.
English
0
0
0
10
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
1.2K
2.5K
20.6K
4.3M