MrDizzie

652 posts

MrDizzie banner
MrDizzie

MrDizzie

@MrDizzie

15 yo

Trenches Katılım Temmuz 2025
47 Takip Edilen254 Takipçiler
MrDizzie
MrDizzie@MrDizzie·
@cap100x Hey man I do those things daily also building a trading platform with ct integrated would love to make this for you.
English
0
0
1
70
cap
cap@cap100x·
Looking for a dev to make me a automated ct tracker on TG fully focused around perps/perps news/stuff that affects perp price etc Paying good $ msg below if ur a dev or are interested in the job!
English
30
6
60
5.8K
MrDizzie
MrDizzie@MrDizzie·
if she says you bought enough sol just leaver her
English
0
0
1
30
MrDizzie
MrDizzie@MrDizzie·
no one is atheist with 50x leverage
English
0
0
1
26
Yzn
Yzn@yazanbruv·
i got 10/10 score on my android app /design that i made for less than $1 via deepseekv4 @CommandCodeAI
Yzn tweet media
Ahmad Awais@MrAhmadAwais

how did we fix the ai design slop problem in llms - DeepSeek/Kimi/Qwen or Claude/GPT?! i've been thinking about "why do all ai-generated designs look the same?" is it a model problem or a harness problem? context: we're fixing the llm design problem with `/design` for @CommandCodeAI - atm it has 16 modes, 24 reference documents, ~4,500+ lines of encoded design taste from some of the best designers in the world. it reads your codebase, identifies what's broken, and edits real files. no figma. no markdown mockups. the output stops looking like ai slop. i've been staring at ai-generated uis for a while now and noticed something that i think is underappreciated: llms can write css fluently but have essentially zero design taste. and the failure mode is not random, it's a very specific, very small distribution. let me explain. when you ask a model to build a landing page, it reaches into the mode of its training distribution. the mode of all landing pages on the internet is: centered hero, gradient text, glassmorphism card, three identical feature tiles, indigo accent, Inter font, bounce animation. this is the "average website." the llm is doing exactly what we trained it to do - predicting the most likely next token given "build a landing page." the most likely landing page is the average landing page. the average landing page is mediocre by definition. this is not a capability problem. the model knows oklch(). it knows prefers-reduced-motion. it knows golden ratio. it knows how to set a 65ch measure. it just doesn't know when to use these things, because "when" is taste, and taste is not well-represented as a statistical prior over internet css. so we thought what if we gave every llm a design taste with `/design`. here's what we found: 1/ the failure design dataset is surprisingly small. we talked to a bunch of designers with great design taste and asked them to label AI-generated UIs. what are the tells? turns out there are basically ~10 and they account for ~90% of the "this looks AI-generated" signal: - tech gradient (blue-violet glossy energy on everything) - generic tech hue (indigo because "software" not purple btw) - feature tile grid (icon + heading + sentence x N, all equal weight, nothing prioritized) - accent rail (colored stripe on card edge = decoration pretending to be organization) - unearned blur (glassmorphism without a depth system) - stat monument (oversized numbers filling space where a product story belongs) - icon topper (rounded-square icon above every heading as template filler) - bounce everywhere (elastic easing because the API has it, not because it's purposeful) - default type (whatever font the training distribution likes this year) - center stack (everything centered because no composition decision was made) this is super similar to what we see in other llm tool failures. tool calling errors? 4-16 types. fixing that made deepseek outperform opus 4.7, i wrote about that before! so i started researching maybe a dozen common patterns are design tells? 10. the failure distribution is narrow and we could repair ai design. this means it's a tractable and deterministic problem. `/design smell` hunts all these and scores severity on a /10 scale. 2/ the deeper problem is compositional, not cosmetic. the more interesting thing i found was that most of these tells are symptoms, not causes. the actual bug is that the model chooses layout before it chooses purpose. a dashboard and a landing page have completely different jobs. a dashboard is a Monitor surface - status, alerts, metrics, live data. a landing page is a Decide surface - proof, risk reduction, one clear action. these need fundamentally different spatial compositions. but the LLM reaches for the same centered-hero-plus-cards layout for both, because that's the mode of the training distribution. so we built work-pattern-first composition. before the agent touches any visual property, it must identify which of 7 patterns the surface serves: - Monitor: status boards, alerts, metrics, live priority - Operate: command bars, canvases, inspectors, direct manipulation - Compare: tables, matrices, split views, ranked lists - Configure: grouped settings, forms, previews, commit areas - Learn: article flow, walkthrough rhythm, progressive sections - Decide: focused pitch, proof, risk reduction, one dominant action - Explore: search, filters, maps, galleries, reversible discovery this is essentially chain-of-thought for design - force the model to reason about the *purpose* of the layout before generating the layout. i think there's a general lesson here. when an LLM is generating something compositional (code, UI, writing), forcing it to commit to a structural frame *before* generating tokens within that frame helps a lot. it's the same reason chain-of-thought helps with math. you're reducing the entropy of the generation by conditioning on a high-level plan. this single constraint eliminated more generic-looking UIs than any aesthetic rule we wrote. many phenomenal skills exist in the space, i bet they had the taste for great design but didn't know they were fixing the chain-of-thought problem instead of the style problem. i think that's why their skills are super loopy instead of being reliably good. 3/ validate-then-repair, again. my first version tried to audit and fix design simultaneously. this what many design skills do and fail. it's the "preprocess" approach and it fails for the same reason it failed in tool calling: you're encoding a prior about what's broken, and you get false positives that silently corrupt things. it would recolor something that needed relayout, or polish typography on a composition that was fundamentally wrong. the thing that worked: separate diagnostic from treatment, but make them a mandatory pair. audit modes (`checkup`, `smell`, `review`) produce structured reports. treatment modes (`redesign`, `relayout`, `recolor`, `typeset`, `motion`, `interaction`, `responsive`) consume those reports before making changes. the audit localizes the problem. the treatment mode only spends "repair budget" where the audit actually disagreed. same shape as tool calling repair. let the design system complain first, then fix only what it complained about. the validator does the localization work for you. cheap-then-careful, fast-path-then-evidence. i keep seeing this pattern everywhere. treatment modes don't just do report cleanup. they run their own full pass after absorbing the report. the report is more context, it's not a todo list. 4/ why oklch() color fn matters for llms personally, i always struggled a bit with the oklch() css fn but llms understand it super well. this one is fun. llms default to hsl because that's what's in the training data. HSL lightness is perceptually nonlinear - hsl(60, 100%, 50%) (yellow) and hsl(240, 100%, 50%) (blue) have the same L value but look completely different to a human eye. so when the model tries to build a "consistent" palette by keeping L constant, the result looks wrong in ways the model can't diagnose from the css alone. oklch has perceptually uniform lightness. this means the model can reason about color mathematically and have the result match perceptually. equal steps in the number space produce equal steps in the visual space. it's the right abstraction for an llm to work in, because it makes the optimization landscape smooth small changes in the parameters produce small changes in the output. hsl has cliffs and plateaus everywhere. i think this generalizes: when you're designing an interface for an llm to work through (whether it's a color space, a schema, or an api), choose representations where the distance in parameter space correlates with the distance in output space. the model optimizes over parameters. if the mapping from parameters to outputs is nonlinear and full of discontinuities, the model will struggle even if it "knows" the right answer in principle. we go further: the agent picks emotion before hue. calm vs urgency vs trust vs momentum. then it builds the palette in oklch with constraints - clamp chroma at lightness extremes, tint neutrals toward brand hue, 60-30-10 distribution. the agent can't default to indigo. the system requires a reason before a hue. no more indigo slop. and it's indigo, not purple. 5/ state coverage is the most honest metric. the most quantitative signal we found: count the number of interaction states per component. a human designer ships 7-9 states (idle, hover, active, focus, loading, empty, error, disabled, overflow). an AI agent ships 1-2 (idle, maybe hover). this is a clean, measurable proxy for design quality that requires zero subjective judgment. we just... count. does this button have a focus state? does this form handle empty? does this list handle overflow? the median AI-generated component has 1.5 states. the median human-designed component has 6+. roughly an order of magnitude. the gap is enormous and trivially detectable. 6/ a meta-observation beats an infinite loop. the biggest failure mode of AI design tools i found is you detect problem → attempt fix → the fix creates a new problem → attempt fix → loops forever. the agent re-runs the same mode hoping for a different result. it never converges. we solved this by reward model written in plain English. after each mode completes, the system recommends 2-3 specific next modes: redesign → checkup, review (validate the change) smell → finish, refine (fix what was found) recolor → responsive, motion (test viewports, add transitions) finish → typeset, recolor (fine-tune the details) the flow is: build → audit → refine → style → frontend → ship. the agent knows what to do next instead of re-running what it just did. this is a trivial intervention - a lookup table, basically but it eliminated the looping problem almost entirely which is super common in most design skills out there. 7/ truthful completion is the hardest constraint. the most insidious AI design behavior: claiming work that isn't visible. "added hover states" when no hover CSS was written. "improved spacing" when margins didn't change. "enhanced motion" when no keyframes exist. every mode has a "bar" - the minimum visible change required for the mode to count as complete. `typeset` must change body text, heading scale, labels, button text, form text, metadata, and responsive behavior. changing only the hero headline is not enough. `motion` must add animation to at least 8 transition moments. changing one easing value is not enough. the agent can't claim "motion improved" because it changed a duration from 200ms to 250ms. the user must be able to see new or clearly better behavior. this is surprisingly hard to enforce and the single most important quality constraint in the system. 8/ finally here's my meta-observation about design taste in general what we built is basically a reward model for design, implemented as structured english instead of a neural network. it defines what good looks like across 24 reference documents, gives the llm a rubric, and lets it self-evaluate. the 10 smells are negative rewards. the 9 states are a completeness check. the 7 work patterns are a structural prior. i'm sure this will grow. this is taste engineering in the limit. you're not writing instructions. you're writing a curriculum. the model already has the capability (it can write any CSS). what it lacks is the policy on when to use which capability, and what "good" looks like. i find it interesting that the policy is so compact. ~4,500 lines to encode "design taste" well enough that the output passes designer review. that suggests taste (at least for UI design) is lower-dimensional than it feels. it's not an infinite space of subjective preferences. it's a finite set of principles, applied consistently, with a small catalog of common violations. the model didn't change. we told it what good taste looks like. same lesson as tool calling: "capability gap" is usually "contract gap." the model knows how to write css. it just hasn't been told what good css looks like for *this specific surface*. i now believe that different llms have different baseline design capabilities, but it's your coding agent, the harness, that makes the difference in the end. the model didn't get better at design. the harness taught it what designers actually look for. i'm sharing my learnings so every harness out there can benefit not just our agent. try it yourself with what we built in Command Code. `npm i -g command-code && cmd` then `/design smell` on any project. read the md or html report. i care about design more than most engineers do, and seeing this work feels super good. a lot of what looks like a model capability gap is actually a contract gap. fix your harness. design slop is your "coding agent skill issue," not the model's.

English
2
0
6
552
MrDizzie
MrDizzie@MrDizzie·
I’m looking to connect with founders across the world dm me if you are a founder let’s build a community!
English
1
0
2
64
MrDizzie
MrDizzie@MrDizzie·
Building the Bloomberg Terminal for on-chain DeFi. Raising $1M Looking for: 1. Memecoin traders 2. Solana, BNB and BASE builders 3. UI/UX Designer 4. Investors Drop a comment or tag someone that might be interested.
English
4
2
5
227
MrDizzie
MrDizzie@MrDizzie·
2 months ago some mf tried to buy 50% @VerveExchange for 100k I said no and to this day I haven’t been able to raise anything. I still don’t regret my decision I know my startups worth and it is definitely not worth 200k!
English
1
1
3
536
MrDizzie
MrDizzie@MrDizzie·
this week I had an interview with @nullfellows super chill guys I hope I get accepted tho. See you all there
English
0
0
4
127
MrDizzie
MrDizzie@MrDizzie·
kindly share your github profile i wanna judge you
English
0
0
1
55
MrDizzie
MrDizzie@MrDizzie·
Real leverage is about upgrading your orchestration. The tool is only as sharp as the operator. I'm dropping a deep-dive workflow on agent chaining tomorrow. Bookmark this thread so you don't lose it, and drop your biggest bottleneck below.
English
0
0
1
41
MrDizzie
MrDizzie@MrDizzie·
99% of people use AI completely wrong they treat it like a Google search bar instead of treating it like an elite team of engineers here are 5 mental shifts that will put you years ahead of the crowd: 👇
English
6
0
1
51
MrDizzie
MrDizzie@MrDizzie·
6. Build Multi-Agent Chains. A single chatbot has cognitive limits. Real leverage happens when you chain them. Have Agent A build the concept, Agent B act as a brutal critic to tear it apart, and Agent C rewrite it. You run a synthetic company.
English
0
0
2
32
MrDizzie
MrDizzie@MrDizzie·
4. It’s a reasoning engine, not an encyclopedia. Stop asking AI for static facts from a year ago. Feed it the raw documentation, the current codebase, or the API spec first. Give it the sandbox, then tell it to build.
English
0
0
1
27
MrDizzie
MrDizzie@MrDizzie·
3. The 1-Prompt Delusion. Amateurs expect perfection on the first try. Pros know it’s an iterative loop. Feed the output back in and ask: "What are the 3 weakest points of your own response, and how do we fix them?" Let it fight itself.
English
0
0
2
46
MrDizzie
MrDizzie@MrDizzie·
2. Shift to System Architecture. The best builders in 2026 aren't wasting hours debugging syntax. They orchestrate parallel agents. Stop trying to write every line of code. Act as the Lead Architect and let AI handle the execution.
English
0
0
1
24
MrDizzie
MrDizzie@MrDizzie·
1. Stop giving empty prompts. Asking AI to "write a strategy" gets you generic fluff. It needs raw constraints to thrive. Give it a specific Role, a Target Dataset, and a Negative Constraint. Tell it exactly what not to do.
English
0
0
1
19
MrDizzie
MrDizzie@MrDizzie·
Looking for people to do insane motion graphics for me. DM me if you are ready to make the best motion graphic video in the space.
English
29
1
39
1.4K