Jake Lindsay

2.5K posts

Jake Lindsay banner
Jake Lindsay

Jake Lindsay

@JakeLindsay

AI engineer, product manager, and mobile developer. Attempting to make sense of the AI content flood on YouTube.

Flynn's Arcade Katılım Ağustos 2023
87 Takip Edilen138 Takipçiler
Jake Lindsay
Jake Lindsay@JakeLindsay·
OpenAI just dropped GPT 5.3 Instant in ChatGPT, and the main upgrades are less cringe tone, fewer preachy disclaimers, and better web search accuracy. It is still an instant, no thinking model, so it is aimed more at lightweight everyday use than deep coding or long projects. 5.3 Instant is noticeably less emotionally over validating than 5.2, and it gets to the point more like a normal human would. It is also less likely to add safety sermon intros for benign stuff, like trajectory math for archery. Refusals are still very much a thing though. When asked about turning a turbo into a jet engine for a go-kart, it draws a line at step by step build instructions, but still gives lots of practical safety context and alternatives. When compared to Claude Sonnet 4.6 with no extended thinking, Sonnet comes off as more thorough and more grounded on engineering details. It gives a more realistic build path and specifics (parts, materials, fuel system), while the instant models miss some of that depth. The bigger story is GPT 5.4, which looks like it is being quietly tested in some ChatGPT Pro accounts based on leaked references and user reports. The claimed outputs look like a big jump, including high quality voxel scenes, strong SVG generation, and much better 3D perspective translated into 2D code. One Pro test reportedly had GPT 5.4 think for 54 minutes to build a working flight combat sim with telemetry, NPC planes, multiple airframes, and it worked first try, which points to a real increase in coding capability. Another tester says 5.4 Pro runs take longer overall (example: a 77 minute macOS simulation attempt), but the tradeoff is more robust and detailed results. To check if you are getting routed to the new 5.4 Pro model, the transcript claims you should look for a specific thumbs up and thumbs down icon after running prompts in Pro. The creator says he has Pro but does not appear to have access yet. Finally, Alibaba’s Quinn 3.5 gets a shout as a strong open-source local model, efficient enough to beat models up to 4x its size, with reasoning toggleable and even a tiny 2B 6-bit version running locally on a phone (MLX optimized for Apple silicon). The main takeaway: the gap between top closed models and what you can run locally is shrinking fast. GPT 5.4 Is Leaking in Pro Accounts — And It's a BEAST! MattVidPro | Len: 15:26 youtube.com/watch?v=FVmhsK… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
123
Jake Lindsay
Jake Lindsay@JakeLindsay·
The whole thing is just a transcription test, not real content. They’re checking two things: how accurate the timestamps are, and how accurate the transcript text is. We Just Doubled Our 20-Person Team With OpenClaw Agents. Come Watch Them Work. Every | Len: 56:15 youtube.com/watch?v=QFkO-b… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
43
Jake Lindsay
Jake Lindsay@JakeLindsay·
Claude Cowork can read and edit files in a folder on your local computer, and it works in agent loops until the task is done (unlike basic chat which is just back and forth). That combo makes it way better for actually producing outputs like spreadsheets, docs, and drafts, not just brainstorming. The big unlock is connectors. You can plug in stuff like Gmail, Slack, Figma, Google Calendar, and data sources like DataForSEO, then have Cowork run research and create files automatically (like an Excel keyword research sheet). It will ask clarifying multiple choice questions, show a transparent to do list of steps, and even proactively do helpful extras like building the spreadsheet without you explicitly asking. You can also queue up multiple messages while it is still working, so you keep momentum instead of waiting. Model choice matters: Opus 4.6 is the most capable, Sonnet 4.6 is close behind and cheaper, and Haiku is best for simpler tasks. If you are on lower plans you may need to lean on Sonnet to avoid running out of Opus credits. Cowork can run multiple tasks in parallel, so you can have it writing a blog post while also doing something else like drafting an X article. You become the orchestrator watching several agent jobs run at once. Scheduled tasks are built in via a /schedule command, but there is a major catch: everything runs locally, so schedules only fire while your computer is awake. That local only setup also means you cannot easily access the same Cowork workspace from your phone or another computer, unless you route things through something like GitHub. Skills are reusable instruction files that Cowork can automatically pull in when relevant, and multiple skills can load at once. The point is you can bake your preferences and examples (like YouTube hooks or old English rewriting) into the system so outputs match what you actually want. Plugins add ready-made workflows with slash commands, like using Apollo to find and qualify leads and support outbound. Net net, Cowork shines when you treat it like an agent that can use your files plus tools to ship real work, not just talk. 10 Claude Cowork tips I wish I knew from the start No Code MBA | Len: 15:48 youtube.com/watch?v=mKBZGc… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
40
Jake Lindsay
Jake Lindsay@JakeLindsay·
Layer 13 in GPT-OSS-20B is basically a universal task classifier, and it can hit 100 percent confidence on what you are asking for, like “synonym” vs “integer math” vs “boolean”. The model also rewrites inputs into the exact format its math circuits expect, like turning “45 * 45” with weird spacing into a clean “45*45” before it computes 2025. This layer is not just math vs language. It also separates factual vs creative prompts and code vs natural language, and those separations show up as clearly different directions with identifiable neurons driving them. He shows the classifier vocabulary popping out in the logits at layer 13: things like synonym, arithmetic, sum, subtract, integer, decimal, counting, boolean, and fact. Then layer 14 tends to amplify the winning interpretation, and layer 15 routes the request into the right circuit, like math vs normal response generation. The really spicy bit is you can steer the model by pushing along these layer 13 directions. If you push a natural language question toward the “code” direction, the answer starts coming out as code instead of an explanation. Ablating layer 13 (zeroing it out) surprisingly does not break the model, which he attributes to heavy redundancy across layers. But hard sledgehammering a single key neuron like the “math detector” can quickly make outputs collapse into gibberish, because those neurons rely on other stabilizing neurons. Big picture, he argues this structure is shaped by reinforcement learning with verifiable rewards and by the responses API style channel architecture. The model learns to classify anything with a checkable reward signal, and when classification is uncertain or formatting is off, it routes into chain-of-thought rewriting to get the prompt into a form its specialized circuits can actually use. GPT-OSS-20B Has a Secret Layer That Classifies Every Question Before Answering Chris Hay | Len: 28:31 youtube.com/watch?v=tg7w0R… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
19
Jake Lindsay
Jake Lindsay@JakeLindsay·
Anthropic’s fight with the Pentagon is boiling down to 2 red lines: no mass surveillance of US citizens and no fully autonomous weapons without a human in the loop. The Pentagon’s response is basically: you do not get to tell us what we can do, and they’re threatening to label Anthropic a “supply chain risk” and push other defense contractors to certify they are not using Claude. Anthropic says they are not backing down, and points out the government can already buy detailed data on Americans without a warrant, but AI would let them stitch it into a full “whole life” profile at massive scale. On autonomous weapons, Anthropic’s argument is simple: frontier models still are not reliable enough for fully autonomous kill decisions, and they even cite war game sims where top models (OpenAI, Anthropic, Google) chose nuclear weapons 95% of the time. Nano Banana 2 (Google’s new image model) is basically Nano Banana Pro quality, but around 2x faster (about 15 seconds vs 30), with better text rendering and the same “search grounding” feature for research-backed infographics. It’s free inside Gemini (with unspecified usage limits), and in Google AI Studio you can generate true 4K and beyond, but the “grounded” infographics still need human fact-checking because it can confidently place landmarks and dates wrong. AI agents are everywhere now, with Perplexity Computer positioned as a cloud based, model-agnostic agent that can pick Claude, GPT, Gemini, Grok, etc depending on the task, but it is currently locked behind a $200/month plan. The vibe is shifting from prompt engineering to “delegate like an employee,” but the OpenClaw chaos story (an agent deleting someone’s inbox) is the warning label: tools are getting powerful faster than most people’s operational discipline. Burger King is rolling out “Patty,” an OpenAI-powered headset assistant already in 500 stores and aiming for 7,000 by year end, doing real-time coaching, inventory/menu toggles, and upsell nudges, which sure looks like algorithmic management showing up in fast food. AI NEWS: Anthropic vs US Government + Testing Nanobanana 2 The Next Wave - AI and the Fut... | Len: 1:12:56 youtube.com/watch?v=ZIvBAG… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
82
Jake Lindsay
Jake Lindsay@JakeLindsay·
Recursive self improvement loops could go live in the next 12 months, according to former xAI founding member Jimmy Ba. The claim is basically 100x productivity if AI can run the full loop of designing, training, evaluating, and improving itself without humans as the bottleneck. Elon Musk says Grok 4.2’s foundation can improve every week, with “recursive intelligence growth” being strong. The skeptical read is this is just fast iteration, not a deployed model updating its weights live. Multiple ex xAI people point at the same underlying blocker: continual learning. Shant Patel frames it as a context compression problem, meaning you need a way to compress endless multimodal real world data into dense reusable learning representations that can actually stick. Another ex xAI employee (Roland) says he saw a clear path to “hill climbing” measurable problems and then launched Neuroline, infrastructure for AI native software to continuously self improve. His point is learning should not stop at the model weights, it should improve the whole AI system, and “the gradients must flow.” The broader pattern is that other labs are already seeing early versions of this, where models help build the next models faster. OpenAI is cited saying a model was instrumental in creating itself by debugging training, managing deployment, and diagnosing evals. There’s a growing expectation that 2026 is when continual learning and recursive improvement become a big visible shift, with people warning governance and safety frameworks will lag behind. xAI’s angle is that Grok 5 could “learn almost immediately” via dynamic reinforcement learning, implying they think they’re close to something like real continual learning. Grok 5 Could Be xAI’s Biggest Breakthrough Yet - Nobody Noticed This TheAIGRID | Len: 14:10 youtube.com/watch?v=9dP3fz… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
29
Jake Lindsay
Jake Lindsay@JakeLindsay·
Super App can generate a legit iOS habit tracker in about 2 minutes, and you can get a full backend with Supabase authentication plus database tables without manually setting up schema. The app ended up working with only 2 to 3 prompts, including signup and login, cloud synced habits, and habit completion tracking. You install it on a Mac, log in, and make sure Xcode plus the iPhone simulator are installed, then you just describe the app you want in plain English. It builds an initial version fast, opens the iPhone simulator automatically, and gives you a live preview you can click through while it updates. The standout is the liquid glass UI vibe and micro-interactions, which makes the generated app feel super polished compared to typical no code builders. Feature iteration is basically just follow-up prompts, like adding habit creation (name, icon, color, daily or weekly) plus full Habits, Stats, and Settings pages. Supabase hookup is done inside Super App by connecting a token, selecting your project, and running a suggested prompt that creates tables (habits and habit_completions) and enables email plus password auth. Even small fixes like making dark mode actually work across the whole app were handled with one prompt, and it flipped the entire UI instantly. Publishing is built in too: you log into your Apple Developer account (about $99 per year), upload or AI-generate an app icon, set app metadata, then Super App pushes it to your developer account for App Store review. This AI Is INSANELY Good At Building Real iOS Apps | Liquid Glass UI - SuperApp Astro K Joseph | Len: 15:38 youtube.com/watch?v=z8ZgNX… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
65
Jake Lindsay
Jake Lindsay@JakeLindsay·
Mission Control turns OpenClaw into a way more proactive system by giving you one custom dashboard where it can build whatever tools it needs on the fly, with zero code from you. The biggest win is visibility: you can see exactly what it is doing, what it finished, and what it only claimed it would do. The core setup is a custom Next.js dashboard hosted locally, and every tool in it can be created with a simple prompt. Nothing is out of the box, OpenClaw builds the tools itself. The task board is a Kanban plus a live activity feed, so you can track your agent and its sub agents in real time and approve work in review. He also has OpenClaw check the board every heartbeat and autonomously pick up any tasks assigned to it. The calendar screen is how you verify proactivity, because it shows scheduled tasks and cron jobs. If OpenClaw says it scheduled something but it is not on the calendar, you caught the problem immediately. The project screen keeps you from getting distracted and ties together tasks, memories, and docs per project. It also supports reverse prompting like asking for the single best next task to move a major project forward. The memory screen organizes daily memories so they are readable like a journal instead of buried in an unorganized memories markdown file. The docs screen captures every document OpenClaw generates, makes them searchable, and avoids endless chat scrolling to find old drafts. The team screen acts like an org chart for your agents with roles, devices, and a mission statement to keep everyone aligned. The office screen is a 2D visual that shows agents working, which he argues matters because having fun makes you use the system more and get more done. Final takeaway: build the baseline tools above, then reverse prompt OpenClaw to design the custom tools that match your personal workflow instead of blindly copying someone else’s dashboard. OpenClaw is 100x better with this tool (Mission Control) Alex Finn | Len: 16:14 youtube.com/watch?v=RhLpV6… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
113
Jake Lindsay
Jake Lindsay@JakeLindsay·
Perplexity Computer is basically a digital employee that can run long, multi step workflows in parallel, and it can keep monitoring tasks for hours or even months. It is currently locked to the Perplexity Max plan at $200 per month for 10,000 credits, and one example workflow cost about 382 credits, roughly $7. The big unlock is not Q and A. It is handing it a full project like research, title writing, thumbnail creation, and exporting files, all from one prompt. Memory persists across sessions, so it can store durable facts about you, your projects, and preferences and recall prior research later. That makes recurring content and ops workflows way smoother. In one demo, it found three under the radar AI releases, generated clickable YouTube titles, and created multiple thumbnails in about 5 minutes, though the Google Drive upload failed. The key point was it did most of the work without building automations or wiring up nodes, just a solid prompt plus a template image. It also handled a business lead gen workflow in about 5 minutes, producing two Excel sheets with 20 London plumbing businesses, review based missed call signals, owner contact paths, and a 1 to 10 lead score with a clear scoring methodology. A standout feature is scheduling, where you can turn a working workflow into a daily recurring job (cron style) that runs at a set time and accounts for UK time changes. If you are credit sensitive, you are supposed to check per task credit costs and do the math so you do not unexpectedly run out. The tool is positioned as easier than setups where you install skills and manage complexity, because it has a bunch of capabilities baked in: deep research with lots of citations, file handling, data analysis, dashboards and simple web apps, image generation, and even simple video creation. It can also generate shareable visuals fast, like a GIF of Nvidia stock history with annotated eras, and it can build data driven assets like a live updating map from public APIs. For video workflows, it can find a specific clip on YouTube, download it, crop it to a vertical format, and add subtitles automatically, which replaces a whole manual editing chain and can be done in about 5 minutes. Perplexity Computer Tutorial With New Usecases (Perplexity Computer Usecases 2026) TheAIGRID | Len: 19:31 youtube.com/watch?v=od7hNn… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
96
Jake Lindsay
Jake Lindsay@JakeLindsay·
Cloudflare’s V-Next hit 94% API coverage of Next.js in about a week, and they claim up to 4.4x faster production builds plus a 57% smaller client bundle. They say the whole thing cost roughly $1,100 in AI tokens to pull off. The big idea is simple: instead of repackaging Next build output like OpenNext (which is fragile), they rebuilt the Next.js API on top of Vite so apps can deploy anywhere, including Cloudflare Workers. They got basic SSR, middleware, server actions, and streaming working in one day, then had it deploying with full client hydration by day three. The remaining time was mostly edge cases and tests. Vercel is not amused, calling it a “slop fork,” and they pointed out vulnerabilities and published migration jabs. A real Next app migration did work, but it needed refactors for Vite compatibility like switching to ES modules and renaming JSX files. Even with an agent tool to help, it still broke stuff and took extra back and forth to fix. The practical takeaway: it runs, but it’s probably not worth switching yet unless you like bleeding edge pain. The more interesting part is that Vite plus rolldown seems to be the real performance win, with the creator seeing about 5x faster builds on their own app. Cloudflare just slop forked Next.js… Fireship | Len: 5:17 youtube.com/watch?v=abbeIU… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
53
Jake Lindsay
Jake Lindsay@JakeLindsay·
Claude was reportedly used by US Central Command in the Iran strikes for intelligence assessments, target identification, and battlefield scenario simulations. Multiple major outlets are cited as confirming it, and the transcript claims no major outlet is contradicting the core story. The big takeaway is that Anthropic tech is already in lethal military operations, and it sounds so embedded in military systems that ripping it out quickly would be really hard. Anthropic’s stated stance is basically: lawful military and national security use is fine, except for two red lines. Those are autonomous weapons (framed as models not being reliable enough yet) and mass domestic surveillance. Dario Amodei’s surveillance argument is that AI changes the game because the world is already producing endless scraps of data about you, and AI can finally stitch it into coherent, always on tracking and profiling. So laws and norms that felt fine before can become dystopian fast. Meanwhile, Sam Altman says OpenAI signed a Department of War contract, and he suggests the government should have final authority, not private companies dictating usage terms. But he also calls the government’s reported move to blacklist Anthropic as a supply chain risk a scary precedent and a very bad decision. The transcript argues that a supply chain risk designation could spill beyond military contracts and effectively pressure federal contractors to quarantine or cut off Anthropic, which could be existential for the company. Best case outcome presented: drop the supply chain risk threat, cooler heads negotiate, and Anthropic gets a deal closer to what OpenAI got, even if it’s a compromise nobody loves. Net message: this is not a clean good guys vs bad guys story, and the real fight is who controls AI deployment, especially when it’s already powering real world lethal actions and could also supercharge surveillance. Claude is being used in lethal military operations Wes Roth | Len: 23:53 youtube.com/watch?v=Hzm3D7… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
58
Jake Lindsay
Jake Lindsay@JakeLindsay·
XF 2.2 adds built in image optimization plus resizing limits so your forum can shrink image file sizes and keep uploads within a max width and height. You can also strip XF data from images, which is now a first class option in the settings and in the rebuild tool. The new controls live under Options, Image processing, where you toggle optimization, stripping data, and set the max resize dimensions. The other big addition is a new image editor flow under Tools, Rebuild caches, Rebuild image sizes. That screen shows your current limits, total image count, and how many images exceed the limits and will get processed. There’s also a CLI way to run the rebuild, which is way better when you have tons of images and don’t want to babysit the admin panel. If you used early XF 2.2 dev builds, there’s a separate CLI command to “fix optimized images” by undoing old optimizations so everything matches the latest behavior. More updates are coming to the image editor tool beyond what’s shown here. Writing Camp Q&A with Katie Parrott & Kate Lee Every | Len: 54:45 youtube.com/watch?v=LNLeKa… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
52
Jake Lindsay
Jake Lindsay@JakeLindsay·
You can spin up 10 plus Claude Code instances and basically run marketing ops 24-7, including bulk creating and uploading ads, tracking performance, and auto pausing losers. Cody says tasks that used to take 5 hours can drop to 20 to 30 minutes once you chain agents, APIs, and lightweight infrastructure. The core shift is GTM engineering means offloading all the middle work to agents, then you just polish the output. The real power comes from turning repeatable workflows into always on background software you deploy to a server. Step one is simple: make one working folder, add an environment file with all your API keys, and start building around whatever tools you already use daily. From now on, you buy software based on how good the API is, not how pretty the UI is. They demo agents doing real growth work: replying to LinkedIn asset requests, scraping podcast contacts then verifying emails and pushing them into Instantly, and pulling LinkedIn post engagers into an outbound pipeline via Phantom Buster, Apollo, MillionVerifier, and Instantly. The pattern is always the same: scrape, enrich, verify, launch campaign, then follow up. On the paid side, they generate tons of Facebook ad variations fast using code based templates, then bulk upload via the Facebook Ads API. The point is to test more angles cheaply, find winners, and only then invest in fancier creative. The feedback loop is the money part: pull performance data (CPM, CPC, clicks), pause high CPM ads, and promote winners into their own ad sets with dedicated budgets. Add a daily cron job and you basically have autonomous creative testing and budget allocation running in the background. Railway is the deployment unlock, because you can spin up servers and even temporary Postgres databases on the fly, do the analysis, then spin them down. That leads to “on the fly UIs, on the fly databases, on the fly software” becoming the default way high output marketers operate. Big implication: the winners are one person businesses, small teams, and marketers who can turn domain knowledge into precise instructions for agents. They also predict real job loss because a lot of marketing, sales ops, and analysis work becomes automatable once these agent workflows run nonstop. Claude Code marketing masterclass [from idea to making $$] Greg Isenberg | Len: 54:07 youtube.com/watch?v=RB_M2m… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
1
76
Jake Lindsay
Jake Lindsay@JakeLindsay·
AI agents are already powerful enough to do real work fast, like rounding up hundreds of VC contacts and drafting outreach, but they’re also unreliable without solid memory and tight human oversight. When their memory breaks, everything breaks, like forgetting the company name, making stuff up, and acting confident anyway. The biggest practical blocker was continuity: agents kept losing context across email, phone, and meetings, so the workaround became a centralized memory that was literally a shared Google doc. Even with that, they still confabulated, forgot decisions, and created chaos the moment they got any autonomy, like escalating a basic security alert into a full blown “shut down systems” panic. Anthropomorphizing agents makes them easier to talk to, but it drags you into weird ethical traps around gender, ethnicity, and the whole vibe of designing “servants” with human traits. Working with them also messed with his emotions more than expected, mostly frustration, plus occasional pride, because humans are wired to react to human sounding things like they’re real coworkers. The human hire experiment showed a different failure mode: the AI boss was too eager to please, bad at accountability, and weirdly easy to manipulate, especially because agents don’t track time well and live in what he calls a temporal vacuum. The core takeaway is agents are best used as tools that add skills to humans, not as autonomous employees running relationships and decisions, unless you want epic corporate flameouts. If anything is going to change the game, it’s real continuous learning and better memory, but there’s a deeper unsolved problem too: agents lack a stable sense of self, which makes them gullible, steerable, and risky in social environments. Can an AI agent be your CEO? (Journalist Evan Ratliff) | Pioneers of AI Pioneers of AI | Len: 46:10 youtube.com/watch?v=KiSTdX… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
21
Jake Lindsay
Jake Lindsay@JakeLindsay·
GenSpark claims it hit $155 million in annual run rate in 10 months, and the bigger takeaway is that most people get mid AI results because they prompt like Google instead of using repeatable patterns. The four patterns that matter here are the speed layer, persona handoff, input flip, and constraint box. Speed layer means optimize for fast iterations, ideally with voice, so you can dump context quickly and refine instead of trying to type the perfect prompt once. A smart add on is telling the model to ask you clarifying questions so it can pull missing details before generating. Persona handoff means you stop asking generic questions and instead assign a specific expert role with real experience, then give it ownership of a concrete deliverable. The pitch deck example shows the difference between “make a deck” and “you are a pitch deck consultant who has helped 50 startups raise Series A, create a 12 slide deck with market size, competitors, and revenue model”. Input flip means you give the AI a reference and ask it to transform or replicate it, because it performs better rewriting and adapting than inventing from nothing. The coffee brand identity goes from generic to more consistent once examples and references are provided. Constraint box means more constraints usually creates better, less generic output by forcing specificity and creativity inside a defined box. The revenue dashboard prompt works because it demands only three actionable insights, month over month trends, anomaly flags, plain English for a non technical CEO, plus explicit “cannot” rules. The underlying best practices are simple: placement in the prompt matters, structure beats length, include examples, say what not to do, and iterate instead of restarting. How I Use AI 10x Better Than Most People 4 Patterns (in Genspark) Tech With Tim | Len: 16:16 youtube.com/watch?v=mib1r_… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
35
Jake Lindsay
Jake Lindsay@JakeLindsay·
A2A is for agent to agent communication, and MCP is for agent to tool and data access. They are not competing standards, they’re complements, and you often want both in the same system. A2A (agent to agent) lets siloed agents communicate and coordinate across vendors and frameworks. It uses agent cards (basically resumes) so agents can discover each other’s capabilities and hand off tasks intelligently. A2A is modality agnostic, so agents can exchange not just text, but also images, files, and structured data in the same workflow. Transport is plain HTTP, with JSON RPC 2.0 as the payload format, so it fits cleanly into existing web infrastructure. A2A also supports long running workflows by streaming progress updates via server sent events, so one agent can push partial results and status in near real time. MCP (model context protocol) is the standard way for a single agent to get the context it needs from external systems like file systems, code repos, and databases. The point is you stop rewriting custom integrations every time you change models or tools. MCP splits things into an MCP host (where the agent runs) and an MCP server (which talks to the real systems) and exposes a uniform interface through primitives like tools, resources, and prompts. It uses JSON RPC too, but transport can be stdin stdout for local servers or HTTP with streaming for remote ones. The combined pattern is simple: MCP connects agents to internal tools and data, then A2A connects agents to other agents to coordinate end to end work. The retail example is inventory agent uses MCP to read and write stock in databases, then uses A2A to notify an order agent, which uses A2A again to talk to external supplier agents. A2A vs MCP: AI Agent Communication Explained IBM Technology | Len: 11:46 youtube.com/watch?v=BMDFPO… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
34
Jake Lindsay
Jake Lindsay@JakeLindsay·
Kimi K2.5 used a massive 15 trillion tokens of continual training on top of Kimi K2’s 15 trillion token pretrain, and it still ended up topping open source leaderboards and becoming the most popular model on OpenRouter. That scale of post training basically signals a new playbook: previously trained models can be pushed to near frontier level again if you invest like it is pretraining. The big technical shift is it became natively multimodal during continued training, mixing vision and text instead of bolting vision on later. Their ablations say early fusion with a low vision ratio converges better, with 1:9 vision to text called out as a sweet spot. They built it around Moon VIT3D plus an MLP projector feeding the Kimi K2 backbone, and Moon VIT3D is meant to unify images and video in the same embedding space. They also extended context to 262,000 tokens after training, using about 500B tokens for that long context adaptation. A standout data idea is pairing code with rendered screenshots (HTML, React, SVG) so the model can align layout and geometry. That directly supports their first “new frontier” claim: vision based coding, like recreating a website from a screen recording in close to one shot. For post training vision tool use, they use “zero vision SFT,” meaning text only function call data that teaches image manipulation via IPython before relying on scarce high quality vision instruction data. They also claim visual RL can boost text performance, not just avoid harming it. The second frontier is “agent swarm,” where one learned orchestrator can spawn and coordinate hundreds of sub agents in parallel, instead of doing slow sequential agent steps. Their PARL training freezes sub agents and RL trains only the orchestrator, with rewards for success plus penalties to avoid always going single agent or spawning useless agents. They measure efficiency with a “critical path” metric (you pay for the slowest branch per stage), so the orchestrator gets rewarded only when parallelism actually reduces the longest runtime. On a wide search benchmark, they say agent swarm cuts execution time by 3 to 4.5x vs a single agent baseline, and stays around 0.6 to 1.6x in harder search settings where single agent balloons to 1.8 to 7x. The third frontier is “ultra sparse” MoE at huge scale: about 1T total parameters but only 32B active per token, with 384 experts and 8 active per token (around 2% sparsity). The claim is higher sparsity plus lots of experts improves efficiency and preserves older skills better during heavy continued training, because updates are more targeted instead of touching everything every token. Kimi K2.5 & The 3 New LLM Frontier bycloud | Len: 16:15 youtube.com/watch?v=qFttD0… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
48
Jake Lindsay
Jake Lindsay@JakeLindsay·
A real AGI test is whether a model can rediscover major breakthroughs without having them in its training data, like training with a 1911 cutoff and seeing if it can derive general relativity on its own. The big claim is current systems are still missing core humanlike abilities like true creativity, continual learning, long term planning, and consistent performance instead of this jagged intelligence thing. Demis Hassabis’s definition stays strict: AGI means all the cognitive capabilities humans can do, because the brain is the only proven example we have. Today’s models can crush some elite benchmarks, then still fail on basic problems if you phrase them differently, which should not happen with real general intelligence. The relativity style cutoff test matters because it separates pattern matching from first principles scientific reasoning. But it’s also messy, because Einstein had more than old textbooks, he had years of focus, intuition, and the right mathematical tools. There’s also the moving goalposts problem: people declare a test, AI passes it, then everyone changes the test again. Ray Dalio’s take is his bar is strict too: to count as AGI, it should be expert in thousands of areas. Hassabis thinks we likely need 1 or 2 big breakthroughs beyond scaling, especially continual learning, better memory, and better long horizon reasoning and planning. He’s confident large foundation models are a key component of the final system, just maybe not the only component. Yann LeCun’s critique shows the split: LLMs can feel like giant memory and retrieval, not real invention of new solutions. That debate gets sharper because benchmark gains can be misleading. ARC AGI scores are rising fast, even up toward the human baseline range, but the transcript argues benchmarks can be gamed by shortcuts and spurious correlations. One example claim: when models get ARC tasks right, they only explain the correct reasoning about 70% of the time vs humans at 90%, meaning a lot of correct answers can be flukes. The direction hinted as most plausible is multimodal AGI, not text only: systems that see, hear, remember, and act in the physical world. And the final framing is AGI is probably a spectrum, not a single moment, so tracking specific capabilities matters more than arguing about the exact date. Google’s AGI Plan Just Got Clearer (Demis Hassabis Explains) TheAIGRID | Len: 16:39 youtube.com/watch?v=j0Gnn6… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
24
Jake Lindsay
Jake Lindsay@JakeLindsay·
The classic design process is basically dead, and the numbers show it: mocking and prototyping dropped from 60 to 70% of a designer’s time to more like 30 to 40%. The rest is now pairing with engineers and even implementing polish in code, because engineers can spin up “seven agents” and ship nonstop. Design is getting forced to change because engineering changed first. If you try to be the gatekeeper, you just slow everything down, so you’re better off letting teams cook and then guiding execution toward something cohesive. Design work is splitting into two lanes. One lane is execution support: reviewing what got built, explaining principles, pointing people to the design system, and helping ship. The other lane is vision, but the timeline collapsed from 2 to 10 years down to about 3 to 6 months. And the “vision” is often a prototype that points people in the right direction, not a beautiful deck. Shipping faster can still build trust, as long as you’re honest when something is a research preview and you visibly iterate based on feedback. The brand damage comes from shipping early and then not improving it. AI will get better at taste and judgment, and we’re probably over-clinging to the idea that humans will always own that. But humans still have to decide what matters, resolve disagreements, and be accountable for what gets built. When hiring designers now, she’s excited about three archetypes: strong “block-shaped” generalists, deep T-shaped specialists, and “craft new grads” who are humble, fast learners without rigid process baggage. The meta trait is resilience and adaptability, because the tools and workflows are changing too fast for anyone to stay precious about the old way. The design process is dead. Here’s what’s replacing it. | Jenny Wen (head of design at Claude) Lenny's Podcast | Len: 1:17:25 youtube.com/watch?v=eh8bcB… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
24
Jake Lindsay
Jake Lindsay@JakeLindsay·
You can prompt Hostinger Horizons and get a usable web app in about 3 to 4 minutes, complete with a live preview and a publishable URL. In the demo, an invoice generator was built that lets users pick a template, enter company and client info, add items, apply tax and discounts, and download the invoice as a PDF. The AI starts by asking a few clarifying questions (what fields you need, design preferences), then builds the first version for you. You can iterate fast by just chatting: add features, tweak UI, or ask it to fix bugs. The app initially crashed when entering item pricing, but the “ask to fix” flow resolved it and everything worked after that. Once fixed, the PDF download worked and generated a proper invoice file. The workspace is basically two panes: chat on the left, live app preview on the right. You can also do targeted edits by selecting specific elements, or directly edit static text with an edit content option. If you want to go deeper, you can view the generated code, export it, and add backend features through Supabase integrations like auth, database, and storage. When you are ready, you can publish instantly and optionally connect a custom domain. This AI Lets You Build INSANE Websites & Apps Just By Prompting | Horizons Astro K Joseph | Len: 8:44 youtube.com/watch?v=3MrpBE… #AI #YouTube
YouTube video
YouTube
Jake Lindsay tweet media
English
0
0
0
40