tae

582 posts

tae

@taeshindev

builder

Присоединился Mayıs 2019

885 Подписки72 Подписчики

tae ретвитнул

✦ VISUAL AI ✦@VisualconAI·3d

Este desarrollador indio muestra cómo ChatGPT Images 2.0 diseña la UI completa de tu app o videojuego desde un solo prompt. Interfaces listas para producción, coherentes y usables de verdad, sin tocar Figma ni contratar un diseñador. Un prompt. UI completa. Lista para tu app.

anul agarwal@anulagarwal

ChatGPT Images 2.0 can generate really cool UI - either for your apps or games. Here I asked it to generate UI for game based around 'Falcons' and for a random health app. It generates everything PERFECTLY and usable in your actual game/app. The only thing I HATE about it is it can't generate transparent images - which is a degrade from the previous version of it's image model. Experimenting more, stay tuned!

Español

413

31K

tae@taeshindev·3d

currently in my “claude token saving” broke era → sonnet only curious how everyone else is using it these days

English

tae@taeshindev·5d

재밌다

takkyu@takkyuO2

SSoTというプロンプト手法を開発しました！今週の #ICLR2026 で発表します🇧🇷 コイントスに限らず、open-endedなタスクで出力の多様性を上げるなど面白さと有用性を兼ね備えるお気に入りの手法です。論文では盛り沢山に議論（擬似乱数生成器ではダメなの？など）をしています。ぜひご一読ください！

한국어

tae@taeshindev·5d

“What do you do for a living?” “I wait for Claude.”

English

tae@taeshindev·6d

Claude design this is insane I am cooked

English

tae@taeshindev·19 Nis

We did a prototyping workshop internally then suddenly one agent starts building backend stuff 👉 PM has no idea what’s happening just sits there waiting we were seriously thinking “do we need to build our own tool for this?” tried Claude Design and… wow this problem just disappears this isn’t just for designers it’s actually amazing for PMs too

Claude@claudeai

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

English

tae ретвитнул

DAIR.AI@dair_ai·16 Nis

Agent evals are drifting away from production reality. Most benchmarks use clean tasks, well-specified requirements, deterministic metrics, and retrospective curation. Production work is messier, with implicit constraints, fragmented multimodal inputs, undeclared domain knowledge, long-horizon deliverables, and expert judgment that evolves over time. This paper introduces AlphaEval, a production-grounded benchmark for evaluating agents as complete products. AlphaEval contains 94 tasks sourced from seven companies deploying AI agents in core business workflows, spanning six O*NET domains. It evaluates systems like Claude Code and Codex as commercial agent products, not just model APIs. The benchmark combines multiple evaluation paradigms: LLM-as-a-Judge, reference-driven metrics, formal verification, rubric-based assessment, automated UI testing, and domain-specific checks. Why it matters: organizations need benchmarks that start from real production requirements, then become executable evals with minimal friction. Paper: arxiv.org/abs/2604.12162 Learn to build effective AI agents in our academy: academy.dair.ai

English

216

17.9K

tae@taeshindev·16 Nis

this race is getting real interesting

OpenAI@OpenAI

Codex for (almost) everything. It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks.

English

tae@taeshindev·16 Nis

Future me

el hombre pulpo@coproduto

Peguei um Uber que tava mandando áudio pra um brother no WhatsApp falando que precisa ter um GitHub com projetos bons pras consultorias te chamarem Não sei se isso é um sinal de topo ou de fundo pro mercado

English

tae@taeshindev·16 Nis

1. Duct tape 2. Original

English

tae@taeshindev·16 Nis

👀👀

Geek@geekbb

Vercel 搞了一个网页终端模拟器 wterm。核心用 Zig 写，编译成大约 12 KB 的 WASM 包，性能跟原生差不太多。跟一般用 Canvas 画的终端不同，它直接渲染到 DOM 上，所以文本选择、复制粘贴、搜索、屏幕阅读器这些都是浏览器自带的，不用额外折腾。 github.com/vercel-labs/wt…

ART

tae ретвитнул

Ding@dingyi·16 Nis

哇终于有这种组件库了：npm install border-beam

中文

156

1.7K

83.4K

tae ретвитнул

Yoonho Lee@yoonholeee·15 Nis

We just released code for Meta-Harness! github.com/stanford-iris-… Aside from replicating paper experiments, the repo is designed to help users implement good Meta-Harnesses in completely new domains! Just point your agent at ONBOARDING.md and have a conversation

Yoonho Lee@yoonholeee

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

English

165

1.1K

122.3K

tae@taeshindev·15 Nis

Claude Code debugging must be insane context + local env + harnesses + who knows what else genuinely curious how Anthropic devs debug this

James Rogers - Cinematographer; Truth@24fps@ChronoScopeFilm

Would really appreciate if you could do something about the, at this point, obvious degradation in Opus 4.6’s performance? Overnight for me and apparently countless others, Opus went from being truly next-level to being a bumbling IDIOT, and I mean that. Lying about completing tasks, just…stopping mid task, making simply idiotic mistakes which I the. Have to literally spell out in order to get it to fix, on and on. And this is happening in both Claude desktop (in just normal chat), AND in Claude Code. I sure got the $200 a month I was paying for at first, but the last week has been truly unbearable.

English

tae@taeshindev·15 Nis

3 years ago, my team had a culture of deep code reviews. The people who built it all left — but I kept doing it. No one was asking me to anymore. It was frustrating. A lot of the time, my feedback didn’t land, and explaining things took way too long. I thought it was a waste of time. Now in the AI era, it’s one of the most valuable skills I have.

English

tae ретвитнул

Indie Fox@indie_maker_fox·14 Nis

这个技能画出来的架构图的质量是真的太高了！ github.com/Cocoon-AI/arch… 下面是OpenHarness的架构图，配色很舒服

中文

415

2.7K

357.9K

tae@taeshindev·13 Nis

No way

Bill The Investor@billtheinvestor

在 Dribbble 上发现了一个复古风格的音乐网站，太酷了……我忍不住用 vibe coding 的方式把它复刻了出来。使用了 @GoogleAIStudio 中的 Gemini 3.1 Pro，然后……它基本上重构了整个网站，甚至还实现了音频播放功能。这说明，如果你的设计能被如此快速地复现，那么护城河就不再是代码了。执行速度才是现在的唯一优势 :) Remix: ai.studio/apps/c973d305-…

English

tae@taeshindev·11 Nis

Infrastructure is needed — and it won’t be cheap.

Dr. 67. Pump@real_dr_pump

前天 @paradigm 组织了一个autoresearch hackathon，核心是想验证OpenAI联合创始人 @karpathy 的观点： “给代理一个问题、一个确定性评估器，以及足够的计算资源，它就能找到领域专家需要更长时间才能达到的解决方案。” 比赛的题目是研发一个做市算法 optimizationarena.com/prediction-mar… 1. 和传统的MM问题类似，市场上会有informed trader (套利者), uninformed trader (散户)，以及其他MM（这里简化为了最笨的一种，只会挂一种单子） 2. 但不同的是，这里的underlyer是unknown的，每一步都会jump，所以这里的背景故事上套了一个Prediction Market的皮。今天看完题我的第一反应是用belief filter+model-based RL(MCTS etc.) 硬怼上去。但后来才发现 @paradigm 的初衷是让大家用AI去自动化地探索这个问题。 @SurfAI 创始人 @ryanli 和 @zhimao_liu 在比赛里拿了第一和第三。他们还各自写了一篇博客来复盘自己的策略。我读完后理解到所以这里的核心是能不能有一套大规模的，可扩展的infra去做一类optimization的问题。最简单的办法：同时手动开多个claude-code/codex去跑。但是我们还希望不同的agent之间能够互相学习，进化，并且整个过程是高度并行化的。 @zhimao_liu 还提到了Mimir，这是 @SurfAI 内部的一套K8s 原生代理编排系统，可以用在这里并行启动AI实例来跑实验。 ps好像赛后有人直接破解了🤣

English

Открыть

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry