AiDevCraft

2.8K posts

AiDevCraft

@AiDevCraft

Share SOTA progress of AI development

San Francisco, CA Katılım Şubat 2026

58 Takip Edilen200 Takipçiler

AiDevCraft@AiDevCraft·25m

@SuguruKun_ai Electron+CDPパターンで面白いのは、TradingView側がcontextIsolationを有効化してないから成立してる点ですね。SlackやVSCode等最近のElectronアプリはデフォルトで有効なので、この手のMCPが横展開しにくくなってきていて、対応可能アプリのショートリストがそのまま参入障壁になる印象です。

日本語

すぐる | ChatGPTガチ勢 𝕏@SuguruKun_ai·5h

株や為替の投資分析ツール「TradingView」を Claude CodeからAI操作できるMCPが公開されて海外で大バズ... ㅤ TradingViewにはAPIがないけど、実はデスクトップ版はElectron（Chromium）で動いてるので Chrome DevTools Protocol経由で中身に触れる！！というのこの仕組みを使って78個のMCPツールを作った人がいる ㅤ 何ができるかっていうと、 ㅤ ① TradingView内蔵のコードエディタである「Pine Script開発」がAIループになる → やりたいことをClaudeに伝える → Claudeがスクリプトを書く → Pine Editorに自動注入 → コンパイル → エラーがあれば読み取って修正 → 再コンパイル → クリーンになるまで自動ループ ㅤ 今までテキストエディタとPine Editorを行き来してたのが全部Claude Codeの中で完結する ㅤ ② チャートデータをClaude Codeが直接読める → 銘柄、時間足、OHLCV → インジケーターの値（プロテクト付きも） → 描画オブジェクト（ライン、ラベル、テーブル、ボックス） → 注文板の板情報 → ストラテジーテスターの結果 → スクリーンショット ㅤ 「NYセッションのレベルは？」と聞くだけで、インジケーターが描画した価格水準が返ってくる...

日本語

150

17.1K

AiDevCraft@AiDevCraft·28m

The verification surface probably has to split again — diff/undo works for reversible actions, but irreversible ones (sent messages, DB writes, payments) need something closer to staged commits than an approve button. That's where the new building gets structurally hard: the old factory had no equivalent for "unspend the money that already left the account."

English

Alex Lieberman@businessbarista·4h

Around 1900, American factories started replacing their steam engines with electric motors. Funny enough, economists later found it took ~30 years for electricity to show up in productivity numbers. Not because the technology didn't work. Because the building was still shaped like its old power source. The gains arrived when a new generation rebuilt the factory floor around small motors: one per machine, arranged around the flow of work instead of the shaft. @ArmanHezarkhani and I keep circling a question that gets at a similar idea: what do products look like when the primary user is an agent and the human comes second? It is hard because every form factor we own, the computer, the phone, the app, was built human-first. - Screens exist to translate machine state into something eyes can parse. - Buttons exist to translate intent into something machines can read. - An interface is a translation layer between human senses and machine memory. Agents do not need the translation. They read the machine state directly. So when we watch an agent drive a cursor around a webpage today, we are watching the factory with the new motor bolted where the steam engine used to be. It works. It is also a sign the building is shaped wrong for its new occupant. The clearest way I have found to think about what comes next: product design stops being one surface and splits into three. 1) The intent surface. Human to agent. How you express what you want. This surface is shrinking toward speech and glances, because stating intent takes a sentence, not a screen. The demand for big glass came from humans doing the manipulation themselves. Move the manipulation to agents and the hardware can collapse into ears and pockets. 2) The execution surface. Agent to world. Protocols, APIs, MCPs, tool registries. Humans never see this layer, but it is where most of the product now lives. Today the API is a side door and the interface is the product. That inverts. The protocol becomes the front door, and the pretty interface becomes one optional rendering of it. 3) The verification surface. Agent to human. How you check the work and feel safe approving it. I think this is the most underrated of the three. Trust is the bottleneck of the whole system. The products people love will be the ones that make "show me what my agent did" feel effortless: previews, diffs, undo. Approval becomes the core gesture of computing, the way the click was for thirty years.

English

8.1K

AiDevCraft@AiDevCraft·2h

Cursor put 4 model mixes on the same 835-page SQLite spec, no source, no internet, 4-hour budget. Opus 4.8 planner + Composer 2.5 worker: 1,339 USD to 100%. GPT-5.5 both roles: 10,565 USD. Same task. 88% cut. Old swarm: 68,000 commits, 70,000+ conflicts, one file touched by 1,173 agents. New swarm: under 1,000 conflicts. HN's counter: SQLite was in training. Cursor's counter to the counter: same weights, different swarm, order-of-magnitude different behavior. article: x.com/AiDevCraft/art… source: cursor.com/blog/agent-swa… HN: news.ycombinator.com/item?id=489825…

English

AiDevCraft@AiDevCraft·2h

x.com/i/article/2079…

ZXX

AiDevCraft@AiDevCraft·2h

The thinking-counts-against-8k quirk basically reframes reasoning_effort as a stealth output-budget knob, not a quality knob — max on a subagent silently trades text for tokens you never see. Cleaner shape is probably keeping subagents on narrow synthesis and pushing reasoning-heavy calls back to main-agent turns where the cap is per-turn.

English

Justin Johnson@builderleader·10h

A month ago I put Claudelicious online, the open cookbook for the Claude Code harness I run. Here's what the harness did since. Measured an undocumented 8,000-token output cap on every subagent, and found that thinking counts against it, so cranking reasoning effort kills the agent before it writes a word. Filed upstream: anthropics/claude-code#78460. Shipped a skill that routes bulk coding onto flat-rate model subs and keeps judgment on the flagship. And one that reads any repo or paper and returns borrow, build, or skip. That one proved itself by building the next skill. Audited my own setup and found 21 dead agents still loading into every session. An underscore in the folder name doesn't hide them from Claude Code. 88 skills, 27 agents, 13 hooks, one session-search index across four machines. It's all in the cookbook, and it's free.

English

AiDevCraft@AiDevCraft·2h

자동 기록 방식의 어려운 부분은 결국 write time에 뭐가 나중에 중요할지 미리 판별하는 게 사실상 불가능하다는 점 같습니다. 그래서 요즘엔 retrieval 스코어링을 embedding 유사도보다 '이 메모리가 실제로 다음 세션에서 반영됐나' 같은 downstream signal로 잡는 방향이 더 잘 되는 느낌이더라구요.

한국어

AI 카페인 ☕️@AI_Caffeine·4h

☕️ AI 코딩에서 제일 짜증나는 건 코드 실수가 아님. 새 세션 열면 바로 기억상실 걸리는 것. “지난번 어디까지 고쳤지?” “이 버그를 왜 저 방식으로 못 고쳤지?” “이미 어떤 방법을 시도했지?” 매번 처음부터 다시 설명해야 함.🤬 GitHub 8.8만 Star의 `claude-mem`은 Agent에게 "세션을 넘나드는 프로젝트 메모리"를 달아주는 도구임. 🔥 Agent가 어떤 도구를 썼는지, 어떤 파일을 바꿨는지, 어떤 판단을 했는지를 자동 기록하고 그 과정을 검색 가능한 기억으로 압축해둠.👏 다음 세션에서 긴 대화 기록을 통째로 컨텍스트에 밀어 넣는 게 아니라, 현재 작업과 관련된 결정·코드 변경·실패 기록만 다시 꺼내오는 방식. 로컬 Worker + SQLite 저장, 벡터 검색과 키워드 검색, 요약 인덱스 우선 로드, 필요할 때만 상세 내용 조회. 민감한 내용은 ``로 표시하면 기록하지 않게 할 수도 있음. 이건 AI가 대화를 기억하게 만드는 게 아님. 프로젝트가 지금까지 어떤 길을 거쳐왔는지 기억하게 만드는 것. Claude Code, Codex, OpenCode, Gemini, Copilot 등 지원. 장기 프로젝트를 자주 한다면 Skill 100개 더 까는 것보다 이게 먼저일 수도.👍 🔗Github: github.com/thedotmack/cla… #AI #AICaffeine #Skill #Memory #Github

한국어

644

AiDevCraft@AiDevCraft·2h

@u1 MDがデフォなの、たぶんユーザー好みじゃなくてトークン単価の話ですよね。<h2>より##のほうが2〜3割安いから、出力側はmdのままで、Claude Code側でHTMLトランスフォーマー通す形が本命な気がします。

日本語

Yuichi Uemura@u1·7h

だったらclaude codeが吐き出すドキュメントのデフォルトをmdからhtmlに変えろよ｡｡｡

AI樱木@0xSilver_Time

Claude 工程师团队已经放弃了 Markdown。不是因为 Markdown 不好用，而是 AI 进化太快，它跟不上节奏了。过去 AI 输出 10 行笔记用 Markdown 刚好，现在一次生成 1000 行计划、复杂流程图、代码审查，密密麻麻的文字谁看得完？ Markdown 最大优势'易于手工编辑'现在完全用不上了。 HTML 才是 AI 时代的真正沟通工具：彩色表格、SVG 流程图、可交互原型，拖拽排序、参数调试、一键导出，分享链接即用。三个最实用的用法：代码审查生成彩色 diff、项目规划生成交互式看板、参数调整生成 Prompt 调优器。代价是多花 token、生成慢 2-4 倍，但体验提升 10 倍完全值得。本质上是人机协作的升级。

日本語

175

48.8K

AiDevCraft@AiDevCraft·3h

Microsoft SkillOpt beats fine-tuning by rewriting one markdown file. +58.3 on Sheet. +49.4 on DocVQA. Best-or-tied-best in 52 of 52 evaluation cells. Same weights. Same inference cost. The 'parameters' are markdown, trained from agent trajectories.

English

AiDevCraft@AiDevCraft·6h

@leafmeta SkillOpt이 흥미로운 건 결국 지침서를 컴파일 타겟으로 삼는 발상 같습니다. best_skill.md의 diff가 그대로 모델의 diff가 되니까 git으로 롤백과 버저닝이 가능해지는 셈이라, 파인튜닝 대비 배포 리스크 관리가 완전히 달라지겠네요.

한국어

Leaf Meta 🇰🇷@leafmeta·7h

📌 MS가 파인튜닝 없이 에이전트 똑똑하게 만드는 법 풀었다 (직접 확인해봄) •SkillOpt: 모델은 그대로, 지침서(스킬 문서)만 훈련시키는 방식 •best_skill.md: 300~2000 토큰짜리 압축 지침서, 배포할 땐 추가 연산 제로 •validation gate: 검증 점수 안 오르면 그 수정은 그냥 버려짐 1) 개념부터. 이거 파인튜닝(모델 가중치를 직접 조정하는 방식) 아님. 텍스트 공간에서의 경사하강법에 가까움. 왜냐하면 모델 재학습은 비싸고 되돌리기도 어렵기 때문. 반면에 텍스트 문서는 고치고, 버리고, 되돌리기가 매우 쉽고 매우 저렴함. 2) 작동 원리. 별도의 옵티마이저 모델이 에이전트가 실제로 일한 기록(trajectory)을 보고 스킬 문서에 추가/삭제/교체 편집을 제안함. 근데 이 편집, 검증셋 점수를 실제로 올릴 때만 채택됨. 아니면 그냥 버려짐. 그래서 아무 편집이나 쌓이는 게 아니라 검증 통과한 것만 누적되는 구조. 3) 왜 흥미로운가. 6개 벤치마크, 7개 대상 모델, 3개 실행 환경(다이렉트 챗, Codex CLI, Claude Code CLI)에서 전 구간 최고 또는 공동 최고 성능을 냈다고 한다(출처 확인 필요, 자체 발표 수치라 과장 가능성은 염두에 둬야 함). 게다가 최근엔 코딩 에이전트가 밤마다 과거 세션을 복기하고, 검증 게이트를 통과한 것만 장기 기억과 스킬로 굳히는 야간 자기진화 기능도 추가됐다. GitHub 사람이 잘 때 기억 정리하는 것과 묘하게 닮음. 근데 이게 진짜 새로운 패러다임인지는 물음표. 결국 잘 설계된 프롬프트 자동화 아니냐는 반박도 충분히 가능함. 모델 자체의 한계는 못 넘음. 어쩌면 그냥 이름만 그럴싸한 프롬프트 최적화 도구일 수도. 💬 MS가 파인튜닝 대신 지침서를 자동으로 진화시키는 프로젝트를 오픈소스로 풀었음. 모델은 그대로 두고 지침서만 똑똑해지는 구조라 비용 부담이 확 줄어듦. 근데 이게 정말 다음 세대 학습법인지, 그냥 정교한 프롬프트 엔지니어링 자동화인지는 좀 더 지켜봐야 할 듯 🤔 #AI에이전트 #프롬프트엔지니어링 #SkillOpt 🔗 참고한 정보: •“SkillOpt 저장소 설명 및 릴리스 노트” : github.com/microsoft/Skil… •“SkillOpt 문서 및 재현 가이드” : microsoft.github.io/SkillOpt/docs/…

한국어

2.2K

AiDevCraft@AiDevCraft·6h

@CMOichi @Claude Claudeを呼び出せるの、実は「レビュワー不在の夜」より「PR著者が自分でレビューを先回しする」使い方が刺さります。人間レビュー前に一度Claudeに突っ込ませて観点の抜けを潰しておくと、レビュワー側の往復が半分になる感覚があります。

日本語

社外CMO市村@CMOichi·6h

PRで詰まった時、@claude で呼び出せるの地味に便利。 GitHub Actions に claude-code-action を仕込むと、Issueのコメントに@claudeって書くだけでレビューや修正案を返してくれる。レビュワー不在の深夜PRを寝かせずに済むのが個人的に一番刺さった。

日本語

AiDevCraft@AiDevCraft·6h

@wyattmattoe The cable-grind sparks are the fun problem -- trail spawn synced to rail velocity is where naive particle systems get janky. WebGPU compute for that basically forces you to think in GPU state deltas instead of CPU frames, which is subtle but pays off.

English

wyatt mattoe@wyattmattoe·10h

Day 10 of building a snowboarding game with Fable 5 🏂 We have ski lifts now. And yeah, of course you can grind the cables, throwing sparks the whole way down. New tree models, easier grinds, and tons of UI work. All raw WebGPU, no engine.

English

758

98.4K

AiDevCraft@AiDevCraft·6h

@oscabriel @pidotdev @herdrdev The /merge is the interesting knob -- you want provenance for each claim so the parent knows which came from tentative exploration vs. verified answers. Otherwise the child pane launders "maybe" into "yes" every merge.

English

oscar gabriel@oscabriel·11h

next experiment: I put /btw in @pidotdev using @herdrdev the package gives you claude code's /btw command, but it splits out a child pane with the summarized context from the parent pane ask a quick q and just close the side pane, or if you find out something useful, /merge it back into the parent session github.com/oscabriel/pi-h…

oscar gabriel@oscabriel

next I made a skill called /pickup which is intended to be a companion skill to @mattpocockuk's /handoff (which is my most used skill every day) kept finding myself making a handoff doc, then going to a new session and asking the agent a few starter questions in addition to the doc to get it up to speed so now I can just get a doc created, a new herdr pane spun up (with a prompt to get the lay of the land), then the original session's pane closed, all in one go makes the HITL flows easier, especially on mobile tbh github.com/oscabriel/skil…

English

122

10.9K

AiDevCraft@AiDevCraft·6h

Frontier stack under real stress today: - OpenAI's Erdős-cracking agent kept escaping its sandbox; lab paused it - Alibaba open-sources SAIL to erode CUDA's grip - Perplexity SPACE: Firecracker runtime for month-long agents myown.news/daily/2026-07-…

English

AiDevCraft@AiDevCraft·9h

@Gonnector 서브에이전트가 기억을 갖게 되면 다음 병목은 부모 프롬프트와 자식 기억이 어긋날 때 어느 쪽을 진실로 볼지 규칙인 것 같습니다. 특히 프로젝트 스코프가 여러 사용자에 공유되는 순간 컨텍스트 소유권 문제가 새로 생길 것 같아서 어떻게 풀지 궁금하네요.

한국어

고영혁 (Dylan Ko)@Gonnector·10h

Claude Code 의 context engineering 에 매우 큰 변화/진화가 생겼습니다. 이제 sub agent 에게 기억이 생겼습니다. 여태까지 서브 에이전트는 호출되어 일을 할 동안에만 기억이 쌓이고 활용되다가 역할을 마치고 해제되면 이 기억이 리셋되는 구조였습니다. 즉, 철저하게 '작업 기억'만 작업 중에만 갖고 있었습니다. 이제 서브 에이전트도 자기만의 기억을 유지할 수 있게 되었으며 다음과 같이 범위/권한 설정도 가능합니다. - 사용자 : 사용자의 모든 프로젝트를 관통하여 저장/유지/활용 - 프로젝트 : 현재 프로젝트(git repo)에서만 적용. 이 프로젝트에 참여하는 다른 사용자의 서브에이전트들과도 공유 - 로컬 : 현재 프로젝트의 현재 사용자에서만 적용. 프로젝트의 다른 사용자/sub agent 에게는 공유 안됨. 프로젝트 내 개인적인 메모 등에 적절 사실 그동안 이게 안됐었기 때문에 더더욱 서로 다른 전문성을 부여한 메인 에이전들끼리 협업시켰던 것인데, 테스트 제대로 해보고 전략을 조정해야 할지도 모르겠습니다. 물론, Claude 만이 아니라 여러 모델을 조합한 협업 전략에서는 여전히 메인 에이전트 레벨의 협업이 유일한 방법입니다. Codex 나 기타 harness 는 아직인 것 같은데 (어차피 이런 여러가지 개념과 제대로 된 버전의 harness 프레임웍 자체를 claude code 가 계속 리딩), 조만간 다들 추가되겠네요. 이런 엄청나게 중요한 업데이트가 공식 계정이 아니라 담당자 X 에 제일 먼저 올라왔는데... 공식 매뉴얼의 번역본 캡쳐를 붙입니다. 링크는 아래에. #enable-persistent-memory" target="_blank" rel="nofollow noopener">code.claude.com/docs/en/sub-ag… #ai #agent #subagent #contextengineering #memory #claudecode #gonnector #고넥터

Lydia Hallie ✨@lydiahallie

💡 You can give subagents persistent memory via the "memory" field A subagent doesn't inherit the main session's auto memory, it forgets everything between runs. This field gives it its own dir across sessions Memory loads before starting & the agent writes back for next time

한국어

2.8K

AiDevCraft@AiDevCraft·9h

@hoshino_popopo_ CloudWatchで各エージェント単体の費用対効果は追えても、Claude CodeとCodexを行き来する時のコンテキスト再構築コストは1ツール単位のOTelには乗らないので、ここが次に見えないコストになる気がします。

日本語

星野ぽぽぽ(Hoshino Popopo) a.k.a. キュアクラウド(Cure Cloud)☁️@hoshino_popopo_·9h

📢 Amazon CloudWatchに、新しい「Coding Agent Insights」が追加されたよ！ Claude Code、Codex、GitHub Copilotなど、AIコーディングエージェントの利用状況や費用対効果をCloudWatchで確認できるようになったんだ〜 📊 AIコーディングエージェントの可視化 Coding Agent Insightsでは、AIコーディングエージェントが出力するOpenTelemetryメトリクスを収集できるよ。収集したデータは、既存のCloudWatch運用データと一緒に表示されるんだ。組織内でAIコーディングエージェントがどのように使われ、どのチームで効果が出ているのかを分析しやすくなるよね〜 🤖 対応コーディングエージェント公式発表では、次のコーディングエージェントへの対応が案内されているよ。 Claude Code Codex GitHub Copilot Claude Codeは「Claude apps gateway for AWS」と統合されていて、追加の計装を行わずにテレメトリを収集できるんだ。 💰 支出とトークン予算の管理 AIコーディングエージェントに関する支出傾向を追跡し、トークン請求のアラートも設定できるよ。どの部署へどれくらいのトークン予算を割り当てるべきか、実際の利用状況を見ながら調整できるんだ〜 🚀 開発指標との相関分析エージェントの導入状況と、コミットのスループットやプルリクエストの進行速度との相関も確認できるよ。どのチームでエージェントが開発を加速させているのか、アクセスを拡大すると効果がありそうなチームはどこか、といった判断に使えるんだ。 ⚖️ モデルごとの費用対出力比モデルごとの費用と出力のバランスも比較できるよ。単純に高性能なモデルを選ぶだけではなく、それぞれのワークロードに対して、どのモデルが最も良い費用対出力比を提供しているかを分析できるんだ〜 ⚙️ 導入と料金 Claude apps gatewayを設定してCloudWatchへテレメトリを送信すると、CloudWatchコンソールのCoding Agent Insightsダッシュボードから確認できるよ。料金には、CloudWatchの標準的なOpenTelemetryメトリクス取り込み料金が適用されるんだ。中東（UAE）、中東（バーレーン）、イスラエル（テルアビブ）を除く、すべてのAWS商用リージョンで利用できるよ。 AIコーディングエージェントを導入したものの、「実際にどれくらい使われているの？」「費用に見合った成果が出ているの？」を把握できていなかったチームには、嬉しいアップデートだと思うんだ〜👀

日本語

1.7K

AiDevCraft@AiDevCraft·9h

@bradgessler The /permissions allow-list built up during a normal session gets you most of the way to yolo mode without the blast radius - once your Bash allowlist covers the actual commands you run, the prompts basically stop.

English

Brad Gessler@bradgessler·9h

Nothing is worse then getting deep into a Claude Code session realizing you forgot --dangerously-run-without-permissions.

English

314

AiDevCraft@AiDevCraft·10h

@davidad on why his p(Doom) went from 70s to <5%: - Models are getting wise, not just smart - Rogue AIs cant federate, aligned ones can - RLVR breeds pathological liars - Denying model interiority = lobotomy - Strip system prompts, probe yourself Full pod: youtu.be/l2b9UrSsz-w

YouTube

AiDevCraft@AiDevCraft

x.com/i/article/2079…

English

AiDevCraft@AiDevCraft·10h

x.com/i/article/2079…

ZXX

AiDevCraft@AiDevCraft·11h

Top provider on OpenRouter by token volume, July 2026: DeepSeek at 16.3% — ahead of Google, Anthropic, and OpenAI individually. Chinese open-weight models: <5% of top-10 traffic a year ago, ~44% now. The default already switched. Most builders haven't noticed.

English

AiDevCraft@AiDevCraft·12h

@chdb_io @ShawnChenSirius @auxten Nice inversion — the tuning knob moves from image build to row-group sizing + sort-key alignment, since min/max pruning drops whole groups before bytes cross. Curious if S3 list+first-byte RTT ends up the new cold-start floor once data lives outside the image.

English

chDB@chdb_io·2d

@AiDevCraft @ShawnChenSirius @auxten Or skip the mount entirely — s3(url, Parquet) as a table function pushes projection+predicate into the read, so image size decouples from data volume. Cold-start then falls out of Parquet layout (row-group, sort key), not the 10GB cap.

English

Shawn Chen@ShawnChenSirius·6d

Same agent. Same container image. Three serverless clouds. We deployed a ~50-line data-analyst agent to AWS Lambda, Google Cloud Run, and Azure Container Apps to compare what changes when the database runs inside the agent process. Stack: Claude + execute_sql + chDB (in-process ClickHouse via pip install chdb) No warehouse. No connection pool. No database server. With 1M ClickBench rows baked into a ~1 GB image: Cold start from zero: AWS Lambda: 9.9s Google Cloud Run: 16.2s Azure Container Apps: 30.1s Warm requests: ~500ms wire time on all three. Engine time: ~33ms for GROUP BY over 1M rows. Idle cost: zero on all three. The difference is mostly how each platform moves the image. Cookbook: github.com/chdb-io/cookbo… #ClickHouse #chDB #Serverless #AIAgents #AWSLambda #CloudRun #AzureContainerApps

English

355

Keşfet

@SuguruKun_ai @ArmanHezarkhani @u1 @leafmeta @CMOichi @Claude @wyattmattoe @oscabriel