plain

24 posts

plain banner
plain

plain

@getplainai

plain is a platform for turning MVPs into customer-ready, production systems. It helps teams reach production faster without rewrites or shortcuts.

New York, NY Katılım Ocak 2026
53 Takip Edilen5 Takipçiler
plain
plain@getplainai·
@vercel explicit service boundaries solve this at the architecture level — when each component has clearly defined contracts and data ownership, you can't accidentally leak secrets across trust zones. opaque monoliths are where this falls apart fastest.
English
0
0
0
24
Vercel
Vercel@vercel·
Most coding agents default to running generated code with full access to secrets, creating a major risk for data exfiltration. It's essential that developers are deliberate in defining and enforcing security boundaries. How we're thinking about this ↓ vercel.com/blog/security-…
English
21
22
189
46.5K
plain
plain@getplainai·
unpopular opinion: the vibe coding era is ending the next wave isn't "go faster" — it's "generate correct systems from explicit specs" non-determinism breaks at scale. explicit architecture doesn't. what are you building that needs to survive production?
English
2
0
1
42
plain
plain@getplainai·
@paulg works the same way for product design — if you can't describe what your system does as a simple, inevitable-sounding thing, it's probably not designed right yet. the best tools often feel like they should have existed already.
English
0
0
0
150
Paul Graham
Paul Graham@paulg·
It's a surprisingly useful heuristic to ask startups what would happen in a sci fi novel about them. It doesn't just work for product ideas but even for names: today we found a new name for a company by asking what it would be called in a science fiction story.
English
102
74
1.5K
116.4K
plain
plain@getplainai·
@github this is great — the gap between knowing AI tools exist and actually integrating them into real production workflows is still huge for most teams. events like this go a long way in bridging that.
English
0
0
0
79
GitHub
GitHub@github·
GitHub Copilot Dev Days are coming to a city near you. 🏙️ Starting March 14, connect with your local dev community to level up your Copilot skills. Here's your opportunity to: ✅ Learn new features ✅ Watch live demos ✅ Apply real-world workflows Find an event or sign up to host. 👇 aka.ms/githubcopilotd…
English
15
31
148
25.2K
plain
plain@getplainai·
@OpenAI faster models are great, but what backend teams really need is consistency — deterministic outputs you can build reliable services around. speed + repeatability is the combo that unlocks real production use cases.
English
0
0
0
8
OpenAI
OpenAI@OpenAI·
5.4 sooner than you Think.
English
3.6K
2K
29.1K
5M
plain
plain@getplainai·
@karpathy multi-agent coordination is so hard once agents share datastores — locally valid changes create divergence at the system level. curious if the conflicts you're seeing are mostly in the training loop or in the shared state?
English
0
0
0
15
Andrej Karpathy
Andrej Karpathy@karpathy·
I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :) I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p. But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them. But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?
Thomas Wolf@Thom_Wolf

How come the NanoGPT speedrun challenge is not fully AI automated research by now?

English
562
803
8.7K
1.6M
plain
plain@getplainai·
@karpathy @rkobylinski Agree, the shift isn't just about better prompts. Moving from prototype to production means really specifying systems. It's about building deterministic blocks from the ground up, not just iterating on prompts for well-specified tasks.
English
0
0
0
52
Andrej Karpathy
Andrej Karpathy@karpathy·
"Last year" very possible you're holding it wrong. UI: should be a lot more tractable with /chrome etc. network/concurrency: how can you gather all the knowledge and context the agent needs that is currently only in your head accessible to tools you use through legacy ways (e.g. web UIs)? how can you make the things you care about testable? observable? legible? the goal is to arrange the thing so that you can put agents into longer loops and remove yourself as the bottleneck. "every action is error", we used to say at tesla, it's the same thing now but in software. Some areas/scenarios will be easier than others but it's very worth thinking about and trying.
English
21
27
695
115.2K
Andrej Karpathy
Andrej Karpathy@karpathy·
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.
English
1.6K
4.8K
37.3K
5M
plain
plain@getplainai·
@tywells this is the gap that nobody talks about enough. permissions, role logic, auth flows — these are exactly the kind of components that need explicit specifications, not vibes. "it worked in testing" isn't a production guarantee when the boundary logic is implicit.
English
0
0
0
5
Ty Wells
Ty Wells@tywells·
45% of AI code fails security audits. Vibe coding ships unverified code at scale. We ran Assay on production ERP code: 354 claims verified. 23 real bugs, 2 critical. The permission system was broken — 4 roles locked out of all pages. No linter catches that.
English
2
0
1
11
Ty Wells
Ty Wells@tywells·
We tried to train hallucinations out of AI code models. More training data made them worse. 120 pairs: 91.5% 500 pairs: 82.3% 2,000 pairs: 77.4% (below base model) Training loss kept decreasing. Eval collapsed. Thread on what we learned:
English
1
0
0
14
plain
plain@getplainai·
@prathamsays exactly this. adapting AI for enterprise isn't about replacing devs — it's about changing *how you describe* what needs to be built. teams who nail explicit, versioned system definitions are going to move a lot faster than those just vibe-coding their way through.
English
0
0
1
40
Prathamesh Deshmukh
Prathamesh Deshmukh@prathamsays·
Most real-world software teams are not building PoCs all day. They're maintaining production systems that run businesses. AI is clearly changing how we write software. But the bigger challenge isn't coding. It's how do big enterprises adapt AI into their development workflow.
English
2
0
2
40
plain
plain@getplainai·
@karpathy the part about "well-specified" tasks doing better is the key insight here. the shift isn't just from writing code to prompting — it's from writing code to *specifying systems*. engineers who nail that abstraction layer will have a significant edge.
English
0
0
0
15
plain
plain@getplainai·
@realdavidchou The maturity conversation is overdue. Most health systems are still stuck in pilot mode — the CIOs who build real governance frameworks and integration playbooks now will define the next decade of care delivery. Great framing heading into HIMSS.
English
0
0
1
9
plain
plain@getplainai·
@codewithrage Fair point. In practice the best agent-readable configs we've shipped also turned out to be the clearest for devs. Structured intent with explicit boundaries beats both spaghetti code and over-abstracted DSLs. The convergence is real.
English
0
0
1
28
Roman Samoilov | Rage
Roman Samoilov | Rage@codewithrage·
@getplainai I’m not convinced it’s a tradeoff. Both humans and agents struggle in the same environments. Good design is readable by anyone - human or machine.
English
1
0
1
27
Roman Samoilov | Rage
Roman Samoilov | Rage@codewithrage·
We’ve spent years teaching humans how to use frameworks. Now we need to teach agents too. Today Rage ships `rage skills install`. One command - and your coding agent understands Rage. If AI is part of your workflow, your framework should speak its language. Rage now does.
English
1
0
2
80
plain
plain@getplainai·
@ThePracticalDev @0x711 This is exactly the right framing. As MCP adoption grows, scanning agent skill definitions before runtime is going to be as essential as dependency audits. The attack surface for AI agents is massive and largely uncharted. Great to see static analysis applied here.
English
0
0
1
22
DEV Community
DEV Community@ThePracticalDev·
We scan code for bugs, so why not scan agents for dangerous tools? A look at building a Semgrep-like static analysis scanner that detects risky AI agent skill definitions and MCP configurations before they run. { author: @0x711 } dev.to/0x711/how-i-bu…
English
3
3
13
1.9K
plain
plain@getplainai·
@tuantruong @tengyanAI The moat in RaaS isn't the model — it's the orchestration layer. Domain-specific workflows, proprietary data pipelines, and compliance guardrails are what prevent commoditization. The vendors who own the "last mile" of integration into enterprise systems will hold pricing power.
English
0
0
0
4
Tuan Truong
Tuan Truong@tuantruong·
@tengyanAI the margin compression angle is brutal though. once raas becomes commoditized, you are selling against dozens of competitors with identical models. pricing floors drop fast.
English
2
0
5
2.2K
Teng Yan · Chain of Thought AI
For 20+ years, software ate the world. Now, AI Agents are eating software. A massive signal just came out of China that most people missed: Bairong (a publicly listed enterprise giant) started selling "AI Workers". they call it Results-as-a-Service (RaaS). 🧵 instead of buying "seats," enterprises now "hire" agents. each agent comes with a job description, KPIs, and revenue targets. if performance drops, the bill drops. if the agent improves, it earns more. Bairong runs this through "Results Cloud". it’s essentially an HR system for machines. they’ve already deployed agents across: Sales & Customer Service + Recruitment (hiring cycles cut from 30 days to 2) + Legal & Tax (handling 90% of high-frequency work) this is where the SaaS model starts to crack. Traditional SaaS: You pay upfront. You carry the risk. Agentic Era: You pay for outcomes. The vendor carries the risk. this shift is being accelerated by the collapse of build costs. I came across this post by @martinald recently that agentic coding has slashed internal build costs by ~90%. when it's this cheap to build exactly what you need, the "Buy" in "Build vs Buy" dies. IMO, Vendors who are not able to price against results will struggle. the "Seat" is dead. the "Outcome" is everything. 🤖📈
Teng Yan · Chain of Thought AI tweet media
English
219
592
3.3K
406.7K
plain
plain@getplainai·
@cognizantailab Agreed. The failure modes in multi-agent vibe prototypes vs prod are completely different. Prototypes fail on prompt quality; prod systems fail on partial agent failures, stale tool states, cascading retries. Those don't surface until you're handling real concurrent load.
English
0
0
1
13
plain
plain@getplainai·
@saen_dev @Hartdrawss 100% on adding early. Harder problem after setup: what thresholds trigger an alert? Token count spikes are not quality drops. You need baseline distributions per prompt template to know when drift is real vs noise. Most teams add Langfuse, stare at dashboards, then ignore them.
English
0
0
0
16
Saeed Anwar
Saeed Anwar@saen_dev·
@Hartdrawss Solid stack. One addition after 30+ MVPs: add observability (Langfuse or Helicone) from day one if there's any AI in the stack. You'll thank yourself later when production starts misbehaving and you have zero visibility into what the LLM actually received.
English
1
0
3
125
Harshil Tomar
Harshil Tomar@Hartdrawss·
The only stack you need in 2026 (after 50+ MVPs): > Framework – Next.js 15 (App Router), React Server Components. Turbopack for dev. > UI – Tailwind, shadcn/ui (Radix). No custom design system from scratch. > DB – Supabase or Neon. Prisma or Drizzle. Migrations in repo. > Auth – Clerk or Supabase Auth. Middleware for protected routes. > Hosting – Vercel. Preview per branch, env per env. > Payments – Stripe (Checkout or Elements). Webhooks + idempotency. > Validation – Zod. React Hook Form + @ hookform/resolvers/zod. > State – Server state (fetch, RSC). Client: Zustand or React Query only where needed. > API – Server Actions, tRPC, or Route Handlers. One style per app. > Files – UploadThing, Cloudinary, or S3 (Vercel Blob, R2). > Errors – Sentry (front + backend). PostHog or Plausible for product analytics. > Email – Resend, Loops, or SendGrid. Transactional + optional marketing. > Cron/Jobs – Vercel Cron, Inngest, or Trigger. dev. For queues: Upstash Redis or QStash.
English
5
10
57
2.8K
plain
plain@getplainai·
@AnandeshSharma Dashboard = visibility, not debugging. Latency/session history tells you *that* something went wrong, not *why* the model made a bad tool choice. You need structured span annotations (expected vs actual intent) before telemetry becomes actionable in prod.
English
0
0
0
5
Anandesh Sharma
Anandesh Sharma@AnandeshSharma·
One line to get a live observability dashboard for your AI agent: Agent(model="gpt-4o-mini", observability=True) Real-time events, session history, run comparison, per-tool metrics — all at /obs/ THIS is what production-ready looks like. 📊 #AI #LLM #Python #DevTools #AgentAI
English
1
0
1
24
plain
plain@getplainai·
@FierceHealthIT @EvidenceOpen The real moat isn't the dialer or the transcription — it's structured output: converting audio into discrete EHR data fields that actual clinical systems can use downstream. Blob-of-text SOAP notes don't cut it. That's the hard integration work most AI scribe companies skip.
English
0
0
0
17
FierceHealthIT
FierceHealthIT@FierceHealthIT·
.@EvidenceOpen has made its AI-integrated Doctor Dialer feature more widely available. The fast-growing health tech company is expanding from clinical search into other clinical workflows as it directly takes on AI scribe companies and Doximity bit.ly/3OKXQX2
English
1
0
2
208
plain
plain@getplainai·
@joncwarner The evaluation problem in health AI investing is underrated. Generic metrics don't hold — you need clinical validation, regulatory approval trajectories, and EHR integration depth. A model that works in a sandbox often breaks on production data locked behind HIPAA walls.
English
0
0
0
8
plain
plain@getplainai·
@omar_or_ahmed @historyinmemes RCM version: payer adjudication logic and COB edge cases break generic LLMs. Real wall is the compliance audit trail — traceable evidence linked to source records for every decision. Horizontal plays can't do that. Unglamorous infra work comes first.
English
0
0
0
7
Ahmed Omar.
Ahmed Omar.@omar_or_ahmed·
@historyinmemes vertical-specific AI > horizontal wrappers every time. healthcare has $1.2T in admin waste — you don't fix that with a chatbot on top of an API. you fix it going deep into clinical workflows. unglamorous infrastructure work wins
English
1
0
1
102
Historic Vids
Historic Vids@historyinmemes·
$150M to help engineers test jet engines and rocket systems instead of another AI wrapper that summarizes your emails. There's hope for us yet.
Revel.io@Revel_Software

Today we announced our $150M Series B led by @IndexVentures with major participation from @Redpoint and returning investors including @ThriveCapital, @Felicis, and @AbstractVC. Across aerospace, energy, and manufacturing, engineering teams are pushing what’s possible. The software behind many of these systems hasn’t kept up. Revel gives engineering teams the infrastructure to test and control complex hardware systems with speed and confidence. In just over a year, we’ve built a world-class team, converted every pilot into a customer, and are now expanding the platform across new industries. If you believe great hardware deserves great software, we’re hiring across the board.

English
38
36
379
660.9K