GenAI Built

3.3K posts

GenAI Built

@GenAIbuilt

Building real AI systems, workflows & agents. Built. Tested. Shared.

가입일 Haziran 2024

7 팔로잉71 팔로워

GenAI Built@GenAIbuilt·11h

🔖 If this saved you time, bookmark it. ♻️ Repost the post to help others build smarter. 🔔 Follow @GenAibuilt for daily, practical AI insights. x.com/GenAIbuilt/sta…

GenAI Built@GenAIbuilt

Are we actually following the roadmap to reliable AI? Back in 2023, a clear direction emerged: RAG (Retrieval-Augmented Generation) as the path forward. Because LLMs alone still struggle with: • Hallucinations • Outdated knowledge • No traceability

English

GenAI Built@GenAIbuilt·11h

What we have today? Early-stage building blocks. The real shift is happening here: Models → Systems → Infrastructure The question now isn’t just how powerful models are. It’s how well we can engineer intelligence around them.

English

GenAI Built@GenAIbuilt·11h

English

GenAI Built@GenAIbuilt·21h

Grok 4.20 Beta benchmarks highlight an important shift in AI performance priorities. 🥇 Lowest hallucination rate (22%) 🥇 Highest instruction following accuracy (83%) 🥈 #2 in agentic tool use (97%) It records the lowest hallucination rate ever measured across AI models.

English

GenAI Built@GenAIbuilt·16h

Document → slide deck ~60 seconds Gemini is wild.

English

GenAI Built@GenAIbuilt·15h

@XFreeze This points to orchestration becoming the real product. It’s not just better models, but dynamic routing between reasoning modes. If it works, users stop choosing models and just focus on outcomes.

English

X Freeze@XFreeze·16h

All Grok models have now been updated to Grok 4.20 It intelligently switches between multi-agent and single-agent mode depending on the use case - when used in Auto mode You can also choose: Expert mode for complex problems - a full team of agents collaborating in parallel Fast mode for simple tasks - lightning fast answers

English

366

14K

Nam Dinh@namd1nh·15h

Anthropic just ran 81,000 AI-led interviews. And it signals a major shift in how we study humans. We’ve spent years building AI systems that answer questions. Now we’re entering a new phase: AI that asks them. Anthropic’s latest experiment turns Claude into an “interviewer” - a system that can run real conversations with people at scale. Not surveys. Not forms. But dynamic, adaptive interviews: • Asking follow-up questions • Adjusting based on responses • Digging deeper into context All in real time. This matters because qualitative research has always had a scaling problem. Interviews are powerful… but expensive, slow, and limited in sample size. Anthropic just showed that you can run tens of thousands of interviews — and still preserve depth. That’s new. But the most interesting part isn’t the scale. It’s what they’re studying: How humans actually use AI. Not what people say in surveys. But how they think, decide, and interact in real workflows. And the findings are subtle, but important: → People believe they use AI for collaboration → But in practice, usage splits almost evenly between automation and collaboration That gap between perception and reality is where things get interesting. It suggests something deeper: We’re not just adopting AI tools. We’re reshaping how we think about work. Sometimes without realizing it. There are also clear tensions: • Increased efficiency vs. over-reliance • Faster decisions vs. reduced independent thinking • Assistance vs. substitution These aren’t technical challenges. They’re cognitive ones. Which leads to a bigger shift: AI is becoming a research system. Claude isn’t just generating answers anymore. It’s: • collecting human signals • structuring conversations • extracting patterns at scale In other words, it’s starting to observe us. And that unlocks something we’ve never really had before: A way to run ethnography at internet scale. Understanding not just what people do — but how they reason, adapt, and change over time. Of course, this raises real questions: • How do we ensure data quality in AI-led interviews? • What biases are introduced when AI asks the questions? • Where do we draw the line on privacy and consent? Because when AI studies humans… the stakes are different. But zooming out, the trajectory is clear: AI is moving from tool → collaborator → researcher. And now, possibly, observer of human behavior at scale. The real implication isn’t just better research. It’s this: We’re building systems that don’t just understand the world - but understand how humans understand the world. That’s a very different kind of intelligence. And we’re just getting started.

English

GenAI Built@GenAIbuilt·15h

@namd1nh This feels like the early layer of “behavioral infrastructure” for AI. If systems can continuously run interviews, extract patterns, and feed that back into product decisions, you get a compounding loop of learning.

English

GenAI Built@GenAIbuilt·16h

@sukh_saroy Interesting that it scores dimensions like rhythm and trust, not just grammar. That’s closer to how humans evaluate writing. Long term, I could see these scoring layers becoming feedback loops for agents where generation and critique run continuously until style targets are met.

English

315

Sukh Sroay@sukh_saroy·18h

🚨Breaking: Someone built a Claude skill file that strips AI writing patterns from your prose. It's called Stop Slop. And it's not a grammar checker. It's a structured set of rules that teaches Claude exactly what AI writing sounds like -- and how to rewrite it so it doesn't. Here's what it catches: → Throat-clearing openers ("In today's fast-paced world...") → Emphasis crutches ("It's not just X, it's Y") → Tripling structures ("fast, reliable, and powerful") → Immediate question-answers ("What does this mean? Everything.") → Binary contrasts and dramatic fragmentation → Business jargon and rhetorical setups that signal AI instantly → Metronomic endings that make every paragraph feel the same Here's the wildest part: It scores your writing on 5 dimensions -- directness, rhythm, trust, authenticity, and density. Below 35/50? Revise before you publish. Drop SKILL.md into Claude Projects or your system prompt. That's it. Your Claude-written content has AI fingerprints all over it. This removes them. 100% Open Source. MIT License. (Link in the comments)

English

248

31.6K

GenAI Built@GenAIbuilt·16h

@RoundtableSpace This is a good example of how “memory” becomes a liability at scale. Agent frameworks default to accumulating context, but without lifecycle management, you end up paying a performance tax. Feels like we’ll need built-in memory pruning strategies, not manual fixes like this.

English

110

0xMarioNawfal@RoundtableSpace·18h

Is your OpenClaw becoming slower with time? That’s because every cron job gets loaded into context Fix it with this prompt: “ Check how many session files are in ~/.openclaw/agents/main/sessions/ and how big sessions.json is. If there are thousands of old cron session files bloating it, delete all the old .jsonl files except the main session, then rebuild sessions.json to only reference sessions that still exist on disk." This will delete all the session data around your cron outputs. If you do a ton of cron jobs, this is a tremendous amount of bloat that does not need to be loaded into context and is MAJORLY slowing down your Openclaw If you for some reason want to keep some of this cron session data in memory, then don't have your openclaw delete ALL of them. But for me, I have all the outputs automatically save to a Convex database anyway, so there was no reason to keep it all in context.” Credits: @AlexFinn

English

152

56.1K

Nam Dinh@namd1nh·18h

Claude Code 2.1.79 introduces Remote Control via VSCode. The extension has been evolving steadily, though still slower than the CLI due to the added VSCode API layer between your project and the Claude Agent SDK. With this update, Remote Control is finally stable, and it just works. Open VSCode. Run /remote-control. And let the agent handle execution while you step away.

English

112

GenAI Built@GenAIbuilt·16h

@namd1nh What’s interesting isn’t just Remote Control working it’s the shift in developer workflow. If agents can reliably execute tasks while you’re away, coding starts to look more like orchestration + review loops than continuous hands-on interaction. The IDE becomes a control plane.

English

GenAI Built@GenAIbuilt·21h

🔖 If this saved you time, bookmark it. ♻️ Repost the post to help others build smarter. 🔔 Follow @GenAibuilt for daily, practical AI insights. x.com/GenAIbuilt/sta…

GenAI Built@GenAIbuilt

English

GenAI Built@GenAIbuilt·21h

This signals a broader transition in AI development: From optimizing for fluency → optimizing for trustworthiness. A 500B-parameter model leading in these areas suggests that the next competitive edge in AI won’t just be intelligence, but consistency and accuracy at scale.

English

탐색

@XFreeze @namd1nh @sukh_saroy @RoundtableSpace @AlexFinn @elonmusk @BarackObama @taylorswift13