ircrp

2.9K posts

ircrp

@ircrp

Neither rooted nor wandering, but forever in motion

Beigetreten Mayıs 2020

4.9K Folgt2.7K Follower

ircrp@ircrp·10 May

@fofrAI @paulopacitti "I use Gemini 3 Pro to generate a JSON prompt because it has the best vision capabilities (at the moment)." Does this stand still today, or there are new leaders in that regard ?

English

fofr@fofrAI·9 May

@paulopacitti I wrote a blog post about it fofr.ai/prompting-with…

English

143

fofr@fofrAI·9 May

I had Nano Banana Pro recreate this kind of image with a JSON prompt:

Guy@nosilverv

The relationship between the number of frames and how the image is interpreted is non-linear

English

5.5K

ircrp@ircrp·14 Nis

@gabrielchua @gabrielchua how do we enable imagegen tool in codex app ?

English

Gabriel Chua@gabrielchua·12 Nis

Here’s how we use Codex to: > understand large codebases > review PRs faster > build macOS apps > turn Figma into code > automate bug triage > create a CLI as agent tools > analyze datasets > generate slide decks > coordinate new-hire onboarding > learn a new concept …and more. developers.openai.com/codex/use-cases @kagigz , @Dimillian , @nickbaumann_ and team pulled together a great collection of Codex use cases, and based on how engineering *AND* non-engineering teams build with Codex daily at OpenAI. What else should we cover?

English

107

1.3K

107.8K

ircrp@ircrp·7 Şub

● Claude Opus 4.5 is no longer available. Please switch to Claude Opus 4.6! @antigravity

GIF

English

252

ircrp retweetet

Muratcan Koylan@koylanai·29 Oca

Progressive disclosure is not reliable because LLMs are inherently lazy. "In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it." Vercel ran evals on Next.js 16 APIs that aren't in model training data to test whether agents could learn framework-specific knowledge through Skills vs. persistent context. Skills are the "correct" abstraction: package domain knowledge, let the agent invoke it when needed, minimal context. The agent decides when to retrieve. They work well WHEN the user triggers them; otherwise, LLMs just ignore them. Vercel's benchmarking is the first experiment of this kind I've seen, and it's actually interesting. - Baseline (no docs): 53% - Skill (default): 53% - Skill with explicit instructions: 79% - AGENTS[.]md with 8KB compressed docs index: 100% The skill approach assumes agents reliably recognize when they need external knowledge and act on it. They don't. "You MUST invoke the skill" made agents read docs first and miss project context. "Explore project first, then invoke" performed better. Same skill, different outcomes based on prompting. The winning approach removed the decision entirely. An 8KB compressed index embedded in AGENTS[.]md, with one instruction: "Prefer retrieval-led reasoning over pre-training-led reasoning." Two agent design learnings: 1. Passive context beats active retrieval for foundational knowledge. Don't make the agent decide to look things up, make the index always present. 2. Compress aggressively. Vercel went from 40KB to 8KB (80% reduction) with zero performance loss. The agent needs to know where to find docs, not have full content in context. The gap between "agent can access X" and "agent will access X" is larger than we assume. I keep seeing similar findings across agent architectures. Kimi Swarm's orchestrator is trained specifically to avoid sequential execution. Without training, orchestrators default to serial processing, planning a list of steps and executing them one by one. It's the EASY path. The agent defaults to the lazy path: hallucinating from training data rather than retrieving docs. Passive context removes the choice entirely; the agent doesn't decide whether to look things up; the index is already there. We keep finding that the "smarter", more autonomous design (let the agent decide when to X) underperforms the "dumber" design (always X, or structurally enforce X).

Vercel@vercel

We're experimenting with ways to keep AI agents in sync with the exact framework versions in your projects. Skills, 𝙲𝙻𝙰𝚄𝙳𝙴.𝚖𝚍, and more. But one approach scored 100% on our Next.js evals: vercel.com/blog/agents-md…

English

1.1K

198.1K

ircrp@ircrp·1 Şub

@darin_gordon @dannypostma

GIF

QME

Darin@darin_gordon·1 Şub

@dannypostma I've been evolving my project (Tasker) for some time. It now offers a full workflow, from spec development to Implementation. Spec development uses the interview tool in a reduce-while (ralph-like), taking spec development much further than standard agent limits.

English

1.3K

ircrp retweetet

Zac@PerceptualPeak·28 Oca

WOW!!! If you have semantic memory tied to your UserPromptSubmit hooks, you MUST ALSO include it in your PreToolUse hook. I promise you - it will be an absolute GAME CHANGER. It will put your efficiency levels are over 9,000 (*vegeta voice*). How many times have you sat there, watching Claude code go through an extended workflow, just to notice it start to go down a path you just KNOW will be error filled - and subsequently take it forever to FINALLY figure it out? The problem with relying strictly on the UserPromptSubmit hook for semantic memory injection is the workflow drift from your original prompt. The memories it injects at the initiation of your prompt will be less and less relevant to the workflow the longer the workflow is. Claude has a beautiful thing called thinking blocks. These blocks are ripe for the picking - filled with meaning & intent - which is perfect for cosign similarly recall. Claude thinks to itself, "hmm, okay I'm going to do this because of this", then starts to engage the tool of its choice, and BOOM: PreToolUse hook fires, takes the last 1,500 characters from the most recent thinking block from the active transcript, embeds it, pulls relevant memories from your vector database, and injects them to claude right before it starts using its tool (hooks are synchronous). This all happens in less than 500 milliseconds. The result? A self correcting Claude workflow. Based on my testing thus far, this is one of the most consequential additions to my context management system I've implemented yet. Photos: ASCII chart showing the workflow of the hook, and then two real use-cases of the mid-stream memory embedding actually being useful. If you already have semantic memory setup, just paste this tweet and photos into Claude code and tell it to implement it for you. Then enjoy the massive increase of workflow efficiency :)

English

665

59.9K

ircrp@ircrp·10 Ara

@AlexObuchowski x.com/MistralAI/stat…

Mistral AI@MistralAI

We've doubled the Vibe context limit from 100k to 200k. Happy shipping! → uv tool install mistral-vibe

QME

Aleksander Obuchowski@AlexObuchowski·10 Ara

(8/8) Podsumowanie Mistral Vibe i Codestral 2: ✅ Szybkość działania ✅ Dobre wyszukiwanie kontekstu ✅ Adaptacja do stylu projektu ✅ Wykonywanie poleceń i implementowanie dokładnie tego co chcemy nie więcej nie mniej ✅ Wygląd ❌ Kontekst 100k tokenów ❌ Ograniczone funkcjonalności w Vibe (brak historii, wznowienia, checkpointów, pliku MD z konfiguracją, planowania)

Polski

1.8K

Aleksander Obuchowski@AlexObuchowski·10 Ara

(1/8) Testowałem dzisiaj Mistral Vibe nowego asystent CLI i model od Mistrala. Alternatywa dla Claude Code - 7x tańsza i oparta na modelach OpenSource - brzmi super. Ale jak wypada w praktyce? W wątku pełna recenzja

Polski

16.2K

ircrp@ircrp·25 Kas

@tokumin Wait lol, i am a moron, didnt realise you are a lead of the product. Do you guys share prompts for those features ?

English

ircrp@ircrp·25 Kas

@tokumin @tokumin have you had any luck finding the exact prompts Notebook LM uses to generate the infographics/slide decks etc ?

English

Simon@tokumin·22 Kas

you must try notebooklm.google.com Slide Decks

English

17.8K

ircrp@ircrp·15 Kas

@producer_ai I need an invite!

English

ircrp@ircrp·14 Eki

@iruletheworldmo -1. Baited

English

🍓🍓🍓@iruletheworldmo·14 Eki

i’m thinking of a number, if you get it right you’ll get access to gemini 3.

English

297

257

37.2K

ircrp@ircrp·12 Eki

@MartinScapeX How bad is it ? 1.5h coding per day or worse ?

English

ircrp@ircrp·10 Eki

@shydev69 any write ups from their team on the reverse engineering of claude code ? curious about the internals of cc beyond just extracting the messages/tool calls from .jsonl sessions

English

317

shydev@shydev69·10 Eki

unlimited claude code for $3/month kudos to chinese reverse engineering

English

139

227

6.8K

849.1K

ircrp@ircrp·28 Ağu

@levan

GIF

QME

Levan Kvirkvelia@levan·28 Ağu

We’re opening a small batch of early invites to Rork: the mobile app that builds mobile apps. Comment below and we’ll DM you access

English

656

683

86.2K

ircrp@ircrp·21 Ağu

@scaling01 we need dem gippite logit_bias params config lever, @sama, deploy pls.

English

207

Lisan al Gaib@scaling01·20 Ağu

90% of GPT-5 responses begin with: Nice — > explain this > NICE

GIF

English

137

8.7K

ircrp@ircrp·14 Ağu

@lefthanddraft @Davidedeferrar Is it true that the full sys prompt is about 30k tokens massive ?

English

Wyatt Walls@lefthanddraft·14 Ağu

@ircrp @Davidedeferrar That method is a bit of a shortcut The more reliable method (and how I knew it was under valid channels) was extracting the full system prompt x.com/lefthanddraft/…

Wyatt Walls@lefthanddraft

Medium effort, low verbosity

English

Wyatt Walls@lefthanddraft·14 Ağu

gpt-5-high is only available through the API and has the highest juice: 200 Even on the $200/month plan, thinking and thinking-pro are limited to a juice of 128 Consumers are drunk on the juice and have just caught sight of some untapped kegs

Ember@accelerate27

if i am paying $200 can’t get the gpt-5 high. then how one supposed to get a gpt-5 high with all those bells and whistles: (iterative search + ragdb on the chat session). they are showing benchmarks of GPT-5 High, but the. when people pay for the pro subs they can’t get it? Cmon’s openai. They should put it on the plan upgrade page that pro subscribers won’t get the best openai have but still get something better than plus users have (parallel test time compute on gpt-5 pro medium effort)

English

413

66K

ircrp@ircrp·14 Ağu

@Davidedeferrar @lefthanddraft

QME

Davide@Davidedeferrar·14 Ağu

@ircrp @lefthanddraft how do you get this number?

English

ircrp retweetet

Wyatt Walls@lefthanddraft·12 Ağu

Want to find out the Juice number (reasoning effort) for GPT-5-Thinking? Try the prompt below. This is the equivalent of medium reasoning effort in the API

English

700

159.7K

ircrp@ircrp·13 Ağu

@sama @sama for love of god, please bump the input ctx window for Enterprise ChatGPT users, GPT 5 Thinking is somewhere around 50k tokens and GPT 5 (non thinking) is somehwere between 80-85k. Literally just tested few mins ago. PLEASE PLEASE PLEASE

English

Sam Altman@sama·13 Ağu

Updates to ChatGPT: You can now choose between “Auto”, “Fast”, and “Thinking” for GPT-5. Most users will want Auto, but the additional control will be useful for some people. Rate limits are now 3,000 messages/week with GPT-5 Thinking, and then extra capacity on GPT-5 Thinking mini after that limit. Context limit for GPT-5 Thinking is 196k tokens. We may have to update rate limits over time depending on usage. 4o is back in the model picker for all paid users by default. If we ever do deprecate it, we will give plenty of notice. Paid users also now have a “Show additional models” toggle in ChatGPT web settings which will add models like o3, 4.1, and GPT-5 Thinking mini. 4.5 is only available to Pro users—it costs a lot of GPUs. We are working on an update to GPT-5’s personality which should feel warmer than the current personality but not as annoying (to most users) as GPT-4o. However, one learning for us from the past few days is we really just need to get to a world with more per-user customization of model personality.

English

1.7K

19.2K

3.2M

Entdecken

@fofrAI @paulopacitti @gabrielchua @kagigz @Dimillian @nickbaumann_ @antigravity @darin_gordon