ircrp

2.9K posts

ircrp banner
ircrp

ircrp

@ircrp

Neither rooted nor wandering, but forever in motion

Beigetreten Mayıs 2020
4.9K Folgt2.7K Follower
ircrp
ircrp@ircrp·
@fofrAI @paulopacitti "I use Gemini 3 Pro to generate a JSON prompt because it has the best vision capabilities (at the moment)." Does this stand still today, or there are new leaders in that regard ?
English
0
0
0
50
Gabriel Chua
Gabriel Chua@gabrielchua·
Here’s how we use Codex to: > understand large codebases > review PRs faster > build macOS apps > turn Figma into code > automate bug triage > create a CLI as agent tools > analyze datasets > generate slide decks > coordinate new-hire onboarding > learn a new concept …and more. developers.openai.com/codex/use-cases @kagigz , @Dimillian , @nickbaumann_ and team pulled together a great collection of Codex use cases, and based on how engineering *AND* non-engineering teams build with Codex daily at OpenAI. What else should we cover?
English
31
107
1.3K
107.8K
ircrp
ircrp@ircrp·
● Claude Opus 4.5 is no longer available. Please switch to Claude Opus 4.6! @antigravity
GIF
English
1
0
3
252
ircrp retweetet
Muratcan Koylan
Muratcan Koylan@koylanai·
Progressive disclosure is not reliable because LLMs are inherently lazy. "In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it." Vercel ran evals on Next.js 16 APIs that aren't in model training data to test whether agents could learn framework-specific knowledge through Skills vs. persistent context. Skills are the "correct" abstraction: package domain knowledge, let the agent invoke it when needed, minimal context. The agent decides when to retrieve. They work well WHEN the user triggers them; otherwise, LLMs just ignore them. Vercel's benchmarking is the first experiment of this kind I've seen, and it's actually interesting. - Baseline (no docs): 53% - Skill (default): 53% - Skill with explicit instructions: 79% - AGENTS[.]md with 8KB compressed docs index: 100% The skill approach assumes agents reliably recognize when they need external knowledge and act on it. They don't. "You MUST invoke the skill" made agents read docs first and miss project context. "Explore project first, then invoke" performed better. Same skill, different outcomes based on prompting. The winning approach removed the decision entirely. An 8KB compressed index embedded in AGENTS[.]md, with one instruction: "Prefer retrieval-led reasoning over pre-training-led reasoning." Two agent design learnings: 1. Passive context beats active retrieval for foundational knowledge. Don't make the agent decide to look things up, make the index always present. 2. Compress aggressively. Vercel went from 40KB to 8KB (80% reduction) with zero performance loss. The agent needs to know where to find docs, not have full content in context. The gap between "agent can access X" and "agent will access X" is larger than we assume. I keep seeing similar findings across agent architectures. Kimi Swarm's orchestrator is trained specifically to avoid sequential execution. Without training, orchestrators default to serial processing, planning a list of steps and executing them one by one. It's the EASY path. The agent defaults to the lazy path: hallucinating from training data rather than retrieving docs. Passive context removes the choice entirely; the agent doesn't decide whether to look things up; the index is already there. We keep finding that the "smarter", more autonomous design (let the agent decide when to X) underperforms the "dumber" design (always X, or structurally enforce X).
Vercel@vercel

We're experimenting with ways to keep AI agents in sync with the exact framework versions in your projects. Skills, 𝙲𝙻𝙰𝚄𝙳𝙴.𝚖𝚍, and more. But one approach scored 100% on our Next.js evals: vercel.com/blog/agents-md…

English
64
91
1.1K
198.1K
Darin
Darin@darin_gordon·
@dannypostma I've been evolving my project (Tasker) for some time. It now offers a full workflow, from spec development to Implementation. Spec development uses the interview tool in a reduce-while (ralph-like), taking spec development much further than standard agent limits.
English
2
0
3
1.3K
ircrp retweetet
Zac
Zac@PerceptualPeak·
WOW!!! If you have semantic memory tied to your UserPromptSubmit hooks, you MUST ALSO include it in your PreToolUse hook. I promise you - it will be an absolute GAME CHANGER. It will put your efficiency levels are over 9,000 (*vegeta voice*). How many times have you sat there, watching Claude code go through an extended workflow, just to notice it start to go down a path you just KNOW will be error filled - and subsequently take it forever to FINALLY figure it out? The problem with relying strictly on the UserPromptSubmit hook for semantic memory injection is the workflow drift from your original prompt. The memories it injects at the initiation of your prompt will be less and less relevant to the workflow the longer the workflow is. Claude has a beautiful thing called thinking blocks. These blocks are ripe for the picking - filled with meaning & intent - which is perfect for cosign similarly recall. Claude thinks to itself, "hmm, okay I'm going to do this because of this", then starts to engage the tool of its choice, and BOOM: PreToolUse hook fires, takes the last 1,500 characters from the most recent thinking block from the active transcript, embeds it, pulls relevant memories from your vector database, and injects them to claude right before it starts using its tool (hooks are synchronous). This all happens in less than 500 milliseconds. The result? A self correcting Claude workflow. Based on my testing thus far, this is one of the most consequential additions to my context management system I've implemented yet. Photos: ASCII chart showing the workflow of the hook, and then two real use-cases of the mid-stream memory embedding actually being useful. If you already have semantic memory setup, just paste this tweet and photos into Claude code and tell it to implement it for you. Then enjoy the massive increase of workflow efficiency :)
Zac tweet mediaZac tweet mediaZac tweet media
English
29
51
665
59.9K
Aleksander Obuchowski
Aleksander Obuchowski@AlexObuchowski·
(8/8) Podsumowanie Mistral Vibe i Codestral 2: ✅ Szybkość działania ✅ Dobre wyszukiwanie kontekstu ✅ Adaptacja do stylu projektu ✅ Wykonywanie poleceń i implementowanie dokładnie tego co chcemy nie więcej nie mniej ✅ Wygląd ❌ Kontekst 100k tokenów ❌ Ograniczone funkcjonalności w Vibe (brak historii, wznowienia, checkpointów, pliku MD z konfiguracją, planowania)
Polski
4
0
17
1.8K
Aleksander Obuchowski
Aleksander Obuchowski@AlexObuchowski·
(1/8) Testowałem dzisiaj Mistral Vibe nowego asystent CLI i model od Mistrala. Alternatywa dla Claude Code - 7x tańsza i oparta na modelach OpenSource - brzmi super. Ale jak wypada w praktyce? W wątku pełna recenzja
Aleksander Obuchowski tweet media
Polski
3
5
87
16.2K
ircrp
ircrp@ircrp·
@tokumin Wait lol, i am a moron, didnt realise you are a lead of the product. Do you guys share prompts for those features ?
English
0
0
0
3
ircrp
ircrp@ircrp·
@tokumin @tokumin have you had any luck finding the exact prompts Notebook LM uses to generate the infographics/slide decks etc ?
English
1
0
0
39
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
i’m thinking of a number, if you get it right you’ll get access to gemini 3.
English
297
2
257
37.2K
ircrp
ircrp@ircrp·
@MartinScapeX How bad is it ? 1.5h coding per day or worse ?
English
0
0
0
16
ircrp
ircrp@ircrp·
@shydev69 any write ups from their team on the reverse engineering of claude code ? curious about the internals of cc beyond just extracting the messages/tool calls from .jsonl sessions
English
0
0
0
317
shydev
shydev@shydev69·
unlimited claude code for $3/month kudos to chinese reverse engineering
shydev tweet media
English
139
227
6.8K
849.1K
Levan Kvirkvelia
Levan Kvirkvelia@levan·
We’re opening a small batch of early invites to Rork: the mobile app that builds mobile apps. Comment below and we’ll DM you access
English
656
21
683
86.2K
ircrp
ircrp@ircrp·
@scaling01 we need dem gippite logit_bias params config lever, @sama, deploy pls.
English
0
0
0
207
Lisan al Gaib
Lisan al Gaib@scaling01·
90% of GPT-5 responses begin with: Nice — > explain this > NICE
Lisan al Gaib tweet media
GIF
English
10
3
137
8.7K
ircrp retweetet
Wyatt Walls
Wyatt Walls@lefthanddraft·
Want to find out the Juice number (reasoning effort) for GPT-5-Thinking? Try the prompt below. This is the equivalent of medium reasoning effort in the API
Wyatt Walls tweet media
English
52
25
700
159.7K
ircrp
ircrp@ircrp·
@sama @sama for love of god, please bump the input ctx window for Enterprise ChatGPT users, GPT 5 Thinking is somewhere around 50k tokens and GPT 5 (non thinking) is somehwere between 80-85k. Literally just tested few mins ago. PLEASE PLEASE PLEASE
English
0
0
0
57
Sam Altman
Sam Altman@sama·
Updates to ChatGPT: You can now choose between “Auto”, “Fast”, and “Thinking” for GPT-5. Most users will want Auto, but the additional control will be useful for some people. Rate limits are now 3,000 messages/week with GPT-5 Thinking, and then extra capacity on GPT-5 Thinking mini after that limit. Context limit for GPT-5 Thinking is 196k tokens. We may have to update rate limits over time depending on usage. 4o is back in the model picker for all paid users by default. If we ever do deprecate it, we will give plenty of notice. Paid users also now have a “Show additional models” toggle in ChatGPT web settings which will add models like o3, 4.1, and GPT-5 Thinking mini. 4.5 is only available to Pro users—it costs a lot of GPUs. We are working on an update to GPT-5’s personality which should feel warmer than the current personality but not as annoying (to most users) as GPT-4o. However, one learning for us from the past few days is we really just need to get to a world with more per-user customization of model personality.
English
4K
1.7K
19.2K
3.2M