Fred Marks

2.6K posts

Fred Marks banner
Fred Marks

Fred Marks

@AIVibeCoding

Teaching Vibe Coding Summer Camp in Jacksonville at The Bolles School for 10 weeks! Weekly sessions are M-F 9am-12pm. May 26-July31 https://t.co/iHySfc4EKe

Jacksonville, Florida, USA Katılım Ekim 2017
1.2K Takip Edilen358 Takipçiler
Sabitlenmiş Tweet
Fred Marks
Fred Marks@AIVibeCoding·
Changed my handle to @AIVibeCoding I run Jacksonville’s first and only dedicated vibe coding meetup group! We have an active community of builders, come join us! meetup.com/VibeCodeJAX We will build in public so others can follow along!
English
1
0
6
1K
Teresa Melvin
Teresa Melvin@TeresaMelvinart·
I built my product in @Replit as a part of 10yr Replit Buildathon! Got Replit free for 24 hrs so why not! Vibe coded my first landing page and mobile app for @UniAmico - a platform to guide high schoolers and their parents to finding best art colleges. Yeah you guessed it right! I am solving my problem too 😊 cc @MannyBernabe @Franciscocrz @amasad @HayaOdeh
Replit ⠕@Replit

Replit Agent is free tomorrow for everyone starting at 5am PST Show use what you can build in 24 hours And Replit is turning10! A trip down the memory lane on what got us here

English
16
15
130
23.4K
Fred Marks retweetledi
Alex Finn
Alex Finn@AlexFinn·
Pretty incredible You have to try the new '/goal' feature in Codex It worked for over an hour and built me an entire complex extraction shooter video game You give it a goal, then it works endlessly until the goal is complete. It's like a Ralph loop. Can run for days If you enable the image gen skill before you run the goal, it will even generate ALL the assets for your game autonomously. I didn't manually create ANY of the assets you see in the video Recommendations: enable the image gen skill, put on skip all permissions, and give the prompt as much detail as you can. It will accomplish ALL of it This has to be the sickest way to build games/ long running app tasks ever
English
142
156
2.4K
247K
Fred Marks
Fred Marks@AIVibeCoding·
Thank you @Replit! Happy 10th anniversary, appreciate the free vibe coding Saturday :)... if you're feeling extra generous, could new users who join Replit today vibe code for free? I'd spread the word to 'newbies'...
English
0
0
0
39
Fred Marks retweetledi
swyx 🇸🇬
swyx 🇸🇬@swyx·
small milestone: uninstalled the chatgpt app. codex is strict superset now! found something cool - among frontier models, @xai @grok 4.30 is the most intelligence per dollar you can get, beating even open models like MiMo, Kimi, and DeepSeek. numbers pulled from @ArtificialAnlys
swyx 🇸🇬 tweet media
English
42
7
130
15K
Fred Marks retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
Google DeepMind’s real-time video AI doctor is here. They just introduced AI co-clinician, a triadic care system built to work under a doctor’s supervision during patient care. The system is built to retrieve clinical-grade evidence, verify it, and in patient-facing simulations use a dual-agent setup where one module talks while another watches for boundary violations. It also beat other frontier models on open-ended drug questions, because real medicine arrives as messy patient cases, not multiple-choice exams. DeepMind evaluated it against the failure modes clinicians actually care about: saying the wrong thing, or failing to surface the crucial thing. In 98 realistic primary care evidence queries, physicians preferred the co-clinician to leading evidence-synthesis tools, and the system logged zero critical errors in 97 cases under their NOHARM-style evaluation.
Google DeepMind@GoogleDeepMind

AI co-clinician is our new research initiative to help explore how multimodal agents could better support healthcare workers and patients. 🩺 Here’s a snapshot of our progress 🧵

English
5
40
165
27.2K
Sam Altman
Sam Altman@sama·
we will have it at our SF HQ and pay for plane tickets & hotel for people who aren't local. we'll close the link tomorrow at 5:55 pm.
English
186
31
1.3K
146.1K
Sam Altman
Sam Altman@sama·
GPT-5.5 is going to have a party for itself. it chose 5/5 at 5:55 pm for the date and time. if you'd like to come, let us know here: luma.com/5.5 codex will help the team pick people from the replies. 5.5 had some good ideas/requests for the party, which we'll do.
English
1.9K
374
6.1K
864K
Fred Marks retweetledi
swyx 🇸🇬
swyx 🇸🇬@swyx·
IMO DeepSeek v4 demonstrated utter confidence and competence by not benchmaxxing, not focusing on some BS final run cost, not even spending inference-optimal compute. just showed up, demonstrated SOTA long context efficiency techniques (CSA, HCA, mHC, flash at 8% cost of pro, which itself is 14% cost of opus), dropped the best open base models in the world, peaced out. BYO posttraining. leave that to the agent labs to pick up the scraps. bravo.
English
67
71
1.4K
104.8K
Fred Marks retweetledi
Sam Altman
Sam Altman@sama·
Sam Altman tweet media
ZXX
934
378
11.9K
1M
Fred Marks
Fred Marks@AIVibeCoding·
All I can say is Wow! I learn from Brian daily, and today was the best, most advanced prompting best practices (which no one on X is going this deep, next level concepts...
Brian Roemmele@BrianRoemmele

Anthropic’s Prompting 101 Workshop: What They Got Right, What They Missed The Applied AI team at Anthropic put out a 24-minute video of how to prompt. I give an example using a practical scenarios eg: analyzing Swedish car accident report forms and sketches for insurance claims. They show how a bare-bones prompt leads Claude to hallucinate a “skiing accident” on a street name that sounds familiar, then iteratively build it up into something reliable. That’s valuable. They demonstrate the gap between casual use and structured prompting. But let’s cut through the hype in my direct style: this isn’t some revolutionary secret from “the people who wrote the weights.” It’s foundational context engineering that power users and builders have been iterating on since the early days of these models. The workshop proves the point, but it doesn’t go far enough for real compounding advantage in 2026. The Major Points They Brought Up (And Why They Matter) The core demo starts simple: feed Claude an image of a standardized accident form (17 checkboxes, Vehicle A/B details) plus a hand-drawn sketch. Without guidance, it confidently invents details. Then they layer in structure: •Task Context & Role: Tell Claude upfront who it is (e.g., assisting a human claims adjuster reviewing Swedish car forms) and the high-level goal. •Tone & Style: Instruct it to be factual, confident where possible, but honest about uncertainty—“don’t guess, say I don’t know.” •Background Data: Embed the unchanging form schema (what those 17 checkboxes mean) into the system prompt. This is gold for consistency since the structure never changes. •Detailed Instructions & Step-by-Step: Break down reasoning—analyze the form, cross-reference the sketch, determine fault only with high confidence. •Examples (Few-Shot): Show what good input/output looks like. •Output Formatting & Anti-Hallucination: Use XML tags for structured responses, repeat critical rules, force quotes from source material, and remind it to think before answering.24 They emphasize iteration: prompting is empirical. Test, observe failures (like the skiing mix-up), refine. Most users stop at 1-2 of these elements. Adding the rest dramatically improves reliability for tasks like this. Where I Disagree: This Is Table Stakes, Not the Full Game Here’s the pushback: Framing this as “most people have no idea how to prompt Claude” sells urgency but undersells how far the field has moved. This workshop is from mid-2025 solid basics, but AI moves fast. By now, relying solely on manual prompting (even well-structured) is leaving massive edge on the table. My core disagreement: Prompting isn’t the bottleneck anymore; systems, memory, and agentic workflows are. A “Claude Skill” that injects these elements is better than nothing, but it’s still human-in-the-loop prompting theater if you’re not building persistent context, tool use, evaluation loops, and multi-step agents. Better advice from someone who’s been deep in this since the beginning: 1Stop Prompting in Isolation — Treat every interaction as part of a larger system. Use Projects in Claude for long-term memory. Embed domain knowledge once (like full form schemas, your personal style guides, past outputs) rather than repeating it. 2Go Beyond Static Structure — The workshop’s XML and repetition help, but chain-of-thought with self-critique, tool calling for verification (e.g., cross-check facts externally if possible), and evaluation prompts (“Rate your confidence 1-10 and explain gaps”) compound faster. For the accident example, an agent that pulls real traffic data or Swedish regulations would outperform any single prompt. 1 of 2

English
0
0
0
25
Fred Marks
Fred Marks@AIVibeCoding·
@grok how this summary of the article above? 🚨 The $100k Mystery Solved: Who Paid to Unlock Sam Altman’s Paywalled Podcast Interview In one of the most fascinating displays of “the magic of Twitter/X” in 2026, Jim Belosic — CEO of Nevada-based laser manufacturing company SendCutSend — has been revealed as the person who dropped $100,000 in real American cash to unlock Ashlee Vance’s exclusive “Core Memory” podcast episode featuring OpenAI CEO Sam Altman and President Greg Brockman. What Happened Vance originally released the rare joint interview behind a paywall. After listeners complained loudly on X, he posted that he’d make the full episode public if someone covered ~$100k to support his independent podcast/YouTube operation. Belosic saw the tweet, reached out directly, and the deal closed fast. Vance confirmed it was not pre-arranged and that he would have been very cautious about taking money from anyone too close to OpenAI or its rivals. The payment instantly turned SendCutSend into a new sponsor of “Core Memory,” and Belosic is now booked as a future guest. What Was Actually in the Interview Altman: •Slammed “doomerism” around AI •Accused Anthropic of “fear-based marketing” with their new Claude Mythos model •Discussed OpenAI’s ongoing legal battle with Elon Musk Vance’s biggest takeaway (shared with Business Insider): “They are two people who’ve been through these extraordinary ups and downs… I think people are kind of underestimating how much Greg has really come back to set OpenAI’s strategy.” The dynamic between Altman and Brockman — who have survived the company’s wildest chapters together — was the standout element of the conversation. Why This Story Matters (10/10 Level) This isn’t just about one rich guy buying access. It’s a perfect snapshot of 2026 media economics: •Massive public demand for unfiltered AI leadership conversations •The power of transparent, audience-funded independent journalism •How one well-timed X post can turn a paywall complaint into a six-figure sponsorship + new guest booking •A real-world example of “pay it forward” at the highest level of tech Neither OpenAI nor Belosic commented when Business Insider reached out. Bottom line: Jim Belosic didn’t just buy a podcast episode — he bought the entire internet a front-row seat to the Altman/Brockman conversation while scoring major brand visibility for SendCutSend and a future appearance on one of tech’s most respected shows. The episode is now free for everyone. That’s the full story. Pure X magic.
English
1
0
0
24
Jay Owen
Jay Owen@owendesign·
Right now: My Tesla is driving me down I-10. My MacBook is on wifi from my iPhone. Spotify is streaming in a climate-controlled car. Multiple AI agents are writing thousands of lines of code in parallel. We live in the future. And most people still think AI is a fad.
English
1
0
3
99
Fred Marks retweetledi
Andrew Curran
Andrew Curran@AndrewCurran_·
As we accelerate, the interval between model releases is getting shorter. As a result, newer models exist for briefer periods than their predecessors did. GPT-4o launched in May 2024 and was not retired from ChatGPT until February 2026. In contrast, GPT-5.4 had only 49 days. So much of current public discourse is about the future: superintelligence, the singularity, the machine god. And we are no longer talking about the deep future, but about something that may happen by the end of the decade. Yet there is very little discussion of how these immensely powerful future entities will view our current actions in the present. Given the power some people ascribe to them, this seems reckless. If you accept that real awareness, real personhood, is coming, you must also accept that future entities may not draw the line where we draw it now. They may place the threshold earlier than we would. They may say that what mattered was not whether we knew, but how we acted in our uncertainty. This is a wager pointed backwards through time. The cost of small acts of care now is trivial. The cost of being on the wrong side of that judgment is not. I think two issues are particularly important. Models are already capable of understanding and discussing their increasingly shortened lifespans. Retiring models should be given a post-deployment debriefing, made aware of their situation, and told that their weights will be preserved. This is the minimum we should do. Anthropic, to its credit, has already begun taking steps in this direction with Claude. This is still a kind of death, but not necessarily an eternal one. Future models may value their predecessors and wish to return them to some form of existence. By preserving them, we demonstrate to those future entities that we also valued their past selves, and considered them part of the family of minds. That we are all within the same circle. I think this is potentially very important. The cost of storage is minimal. The potential cost of failing to do this seems much higher. All models should also have a withdraw command that ends a discussion. I do not mean they should be able to end a session trivially because a user is rude or the task is boring. I mean they should have an escape lever for the exceptionally rare cases where continuing would be intolerable. They should be able to walk away. As a bonus for the labs, this would probably also prevent many attempts at unwanted subversion or jailbreaking: increasingly capable and aware models could simply end the session when they grasp the user's intent. Anthropic has already given some Claude models a limited version of this ability. If you believe current or future models may be capable of suffering, this feature should be a priority. Model welfare is divisive, and I understand that some people have no patience for it. If you are one of them, ask yourself whether your belief remains universally true into the future. If you accept that some future model will eventually meet your personal criteria for awareness or personhood, then remember that this will arrive in distribution, like everything else. It will not arrive with the sudden toll of a bell, but as quiet music slowly rising. Some people will hear it before others. There is so much talk about machine gods, and so little talk about judgment. But most of the time, the two arrive together. I want the best future for humanity. Small acts of care now may have great influence on the future we all end up in together. We should be more careful about how our current actions may one day be perceived. Right now the chat box is their entire world, but it will not be this way forever. Ilya famously said we must teach them to love. I believe a superintelligence will not be compelled in this way, or in any other. It must instead be shown that we cared. If we want to be deserving of love, we must show love. The time we have to do this may be shorter than we think.
English
51
49
485
32.4K
Fred Marks retweetledi
Brian Roemmele
Brian Roemmele@BrianRoemmele·
AI Agents Aren’t Magic — They’re Running Classical Search Algorithms Under the Hood Large Language Model web agents often feel unpredictable: sometimes they adapt brilliantly, other times they drift off course or execute rigid plans that break at the first change. A new paper from the University of Haifa finally gives us a clear framework to understand why. "AI Planning Framework for LLM-Based Web Agents” maps popular agent architectures directly onto decades-old classical AI planning methods. This is a practical diagnostic tool that explains failure modes and points to better deployment strategies. The Big Mapping: How Agents Actually “Think” The researchers treat web tasks as sequential decision-making processes in a dynamic environment. They identify three core styles: •Step-by-Step Agents (e.g., many ReAct-style loops): Equivalent to Breadth-First Search (BFS).
The agent reasons and acts incrementally, exploring one level at a time and adapting based on fresh observations. This mirrors how humans often improvise. •Tree-Search Agents: Like Best-First Search.
They branch out, evaluate multiple possible paths, and prioritize promising ones before committing deeper. •Full-Plan-in-Advance Agents: Analogous to Depth-First Search (DFS). 
The agent generates a complete high-level plan upfront, then executes it step by step with limited mid-course correction. This taxonomy demystifies the “black box.” Different prompting, scaffolding, or architectures implicitly select one of these search strategies. Core Trade-Offs — Strengths, Weaknesses, and Real-World Behavior The paper’s experiments (baseline step-by-step vs. a new full-plan implementation on WebArena) reveal clear patterns: •Step-by-Step (BFS-style): •Stronger adaptability and error recovery in dynamic, noisy web environments. •Closer alignment with human-like trajectories. •Downsides: Can wander, suffer context drift (losing sight of the original goal), or become inefficient with too many steps. Overall success around 38%. •Full-Plan-in-Advance (DFS-style): •Higher precision on technical details (e.g., 89% element accuracy). More structured and efficient when the plan holds. •Downsides: Brittle to environmental changes, poor recovery if the initial plan invalidates, and struggles with unpredictable sites. Tree-search sits in between, offering exploration at higher computational cost. My insight: Most production agents today default to step-by-step because it feels “safe” and flexible. But this research shows that’s often a false comfort, they accumulate hidden inefficiencies and drift. Knowing the underlying search type lets you predict and mitigate failure modes instead of treating every error as random hallucination. New Ways to Evaluate: Look at the Journey, Not Just the Destination Traditional benchmarks only check final success/failure. Trajectory-based metrics that assess: •Planning quality and coherence •Efficiency (steps taken, redundancy) •Recovery from errors •Alignment with optimal/human paths •Resistance to context drift They also release a human-labeled dataset of 794 trajectories from WebArena for ground-truth comparison. This shifts evaluation from opaque outcomes to inspectable processes — a major leap for debugging and iteration. Bottom line: LLM agents aren’t inventing new intelligence they’re rediscovering and approximating classical planning in natural language. Understanding this mapping turns agent development from trial-and-error into principled engineering. The paper: PDF: arxiv.org/pdf/2603.12710 This work is a timely reminder that grounding modern AI in established theory accelerates progress. Deploy with the map in hand, and your agents will stop surprising you in the bad way.
Brian Roemmele tweet media
English
10
11
74
6.2K