tae

582 posts

tae banner
tae

tae

@taeshindev

builder

Присоединился Mayıs 2019
885 Подписки72 Подписчики
tae ретвитнул
✦ VISUAL AI ✦
✦ VISUAL AI ✦@VisualconAI·
Este desarrollador indio muestra cómo ChatGPT Images 2.0 diseña la UI completa de tu app o videojuego desde un solo prompt. Interfaces listas para producción, coherentes y usables de verdad, sin tocar Figma ni contratar un diseñador. Un prompt. UI completa. Lista para tu app.
✦ VISUAL AI ✦ tweet media✦ VISUAL AI ✦ tweet media
anul agarwal@anulagarwal

ChatGPT Images 2.0 can generate really cool UI - either for your apps or games. Here I asked it to generate UI for game based around 'Falcons' and for a random health app. It generates everything PERFECTLY and usable in your actual game/app. The only thing I HATE about it is it can't generate transparent images - which is a degrade from the previous version of it's image model. Experimenting more, stay tuned!

Español
5
37
413
31K
tae
tae@taeshindev·
currently in my “claude token saving” broke era → sonnet only curious how everyone else is using it these days
English
0
0
0
13
tae
tae@taeshindev·
재밌다
takkyu@takkyuO2

SSoTというプロンプト手法を開発しました!今週の #ICLR2026 で発表します🇧🇷 コイントスに限らず、open-endedなタスクで出力の多様性を上げるなど面白さと有用性を兼ね備えるお気に入りの手法です。論文では盛り沢山に議論(擬似乱数生成器ではダメなの?など)をしています。 ぜひご一読ください!

한국어
1
0
0
88
tae
tae@taeshindev·
“What do you do for a living?” “I wait for Claude.”
tae tweet media
English
1
0
1
31
tae
tae@taeshindev·
Claude design this is insane I am cooked
English
0
0
0
17
tae
tae@taeshindev·
We did a prototyping workshop internally then suddenly one agent starts building backend stuff 👉 PM has no idea what’s happening just sits there waiting we were seriously thinking “do we need to build our own tool for this?” tried Claude Design and… wow this problem just disappears this isn’t just for designers it’s actually amazing for PMs too
Claude@claudeai

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

English
0
0
0
29
tae ретвитнул
DAIR.AI
DAIR.AI@dair_ai·
Agent evals are drifting away from production reality. Most benchmarks use clean tasks, well-specified requirements, deterministic metrics, and retrospective curation. Production work is messier, with implicit constraints, fragmented multimodal inputs, undeclared domain knowledge, long-horizon deliverables, and expert judgment that evolves over time. This paper introduces AlphaEval, a production-grounded benchmark for evaluating agents as complete products. AlphaEval contains 94 tasks sourced from seven companies deploying AI agents in core business workflows, spanning six O*NET domains. It evaluates systems like Claude Code and Codex as commercial agent products, not just model APIs. The benchmark combines multiple evaluation paradigms: LLM-as-a-Judge, reference-driven metrics, formal verification, rubric-based assessment, automated UI testing, and domain-specific checks. Why it matters: organizations need benchmarks that start from real production requirements, then become executable evals with minimal friction. Paper: arxiv.org/abs/2604.12162 Learn to build effective AI agents in our academy: academy.dair.ai
DAIR.AI tweet media
English
12
39
216
17.9K
tae
tae@taeshindev·
1. Duct tape 2. Original
tae tweet mediatae tweet media
English
0
0
0
53
tae ретвитнул
Ding
Ding@dingyi·
哇终于有这种组件库了:npm install border-beam
中文
24
156
1.7K
83.4K
tae ретвитнул
Yoonho Lee
Yoonho Lee@yoonholeee·
We just released code for Meta-Harness! github.com/stanford-iris-… Aside from replicating paper experiments, the repo is designed to help users implement good Meta-Harnesses in completely new domains! Just point your agent at ONBOARDING.md and have a conversation
Yoonho Lee tweet media
Yoonho Lee@yoonholeee

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

English
27
165
1.1K
122.3K
tae
tae@taeshindev·
3 years ago, my team had a culture of deep code reviews. The people who built it all left — but I kept doing it. No one was asking me to anymore. It was frustrating. A lot of the time, my feedback didn’t land, and explaining things took way too long. I thought it was a waste of time. Now in the AI era, it’s one of the most valuable skills I have.
English
0
0
0
21
tae ретвитнул
Indie Fox
Indie Fox@indie_maker_fox·
这个技能画出来的架构图的质量是真的太高了! github.com/Cocoon-AI/arch… 下面是OpenHarness的架构图,配色很舒服
Indie Fox tweet media
中文
45
415
2.7K
357.9K
tae
tae@taeshindev·
No way
Bill The Investor@billtheinvestor

在 Dribbble 上发现了一个复古风格的音乐网站,太酷了……我忍不住用 vibe coding 的方式把它复刻了出来。使用了 @GoogleAIStudio 中的 Gemini 3.1 Pro,然后……它基本上重构了整个网站,甚至还实现了音频播放功能。这说明,如果你的设计能被如此快速地复现,那么护城河就不再是代码了。执行速度才是现在的唯一优势 :) Remix: ai.studio/apps/c973d305-…

English
0
0
0
36
tae
tae@taeshindev·
Infrastructure is needed — and it won’t be cheap.
Dr. 67. Pump@real_dr_pump

前天 @paradigm 组织了一个autoresearch hackathon,核心是想验证OpenAI联合创始人 @karpathy 的观点: “给代理一个问题、一个确定性评估器,以及足够的计算资源,它就能找到领域专家需要更长时间才能达到的解决方案。” 比赛的题目是研发一个做市算法 optimizationarena.com/prediction-mar… 1. 和传统的MM问题类似,市场上会有informed trader (套利者), uninformed trader (散户),以及其他MM(这里简化为了最笨的一种,只会挂一种单子) 2. 但不同的是,这里的underlyer是unknown的,每一步都会jump,所以这里的背景故事上套了一个Prediction Market的皮。 今天看完题我的第一反应是用belief filter+model-based RL(MCTS etc.) 硬怼上去。但后来才发现 @paradigm 的初衷是让大家用AI去自动化地探索这个问题。 @SurfAI 创始人 @ryanli@zhimao_liu 在比赛里拿了第一和第三。他们还各自写了一篇博客来复盘自己的策略。我读完后理解到所以这里的核心是能不能有一套大规模的,可扩展的infra去做一类optimization的问题。 最简单的办法:同时手动开多个claude-code/codex去跑。但是我们还希望不同的agent之间能够互相学习,进化,并且整个过程是高度并行化的。 @zhimao_liu 还提到了Mimir,这是 @SurfAI 内部的一套K8s 原生代理编排系统,可以用在这里并行启动AI实例来跑实验。 ps好像赛后有人直接破解了🤣

English
0
0
1
34