cv usk

225 posts

cv usk

@cv_usk

Engineering manager of R&D and data search https://t.co/81hVhDwBgm https://t.co/gNZQweyDmG

Japan Katılım Mayıs 2026

112 Takip Edilen25 Takipçiler

cv usk@cv_usk·6h

[Why is UI-TARS so revolutionary? A deep dive 🔍] 🔻 The Problem It SolvesTraditional automation tools (like RPA) rely heavily on APIs or HTML structures, making them fragile to app updates or UI layout changes. They also hit a wall in complex tasks where dynamic, step-by-step decision-making is required. 🔻 Methodology & ApproachPowered by Vision-Language Models (VLM), it accurately grounds buttons and icons on the screen using "absolute coordinates." Its biggest breakthrough is the reinforcement learning-driven "Thought process"—the model reasons before acting. It uses three distinct modes (COMPUTER_USE, MOBILE_USE, and GROUNDING) to output precise actions like dragging and right-clicking depending on the device 🧠 🔻 Use Cases & Experimental Results ・Overwhelming Performance: Achieved State-of-the-Art (SOTA) on major GUI automation benchmarks like OSWorld, rivaling or even surpassing OpenAI CUA and Claude 3.7 🏆 ・All-In-One Versatility: Highly adaptable—from web browser research and local file management to operating apps on smartphone emulators, and even playing 3D games like Minecraft! 🎮 ・Easy Implementation: Simply install via pip install ui-tars in Python. By feeding the output coordinate data to libraries like PyAutoGUI, you can easily set it up to autopilot your own local PC! github.com/bytedance/UI-T… #UITARS #AI #ByteDance #VLM #Automation #OpenSource #RPA #Multimodal

English

cv usk@cv_usk·6h

[The era of fully automated GUI operations by AI is finally here 💻✨] ByteDance's ultimate multimodal agent, "UI-TARS," is absolutely incredible! It's not just a simple automation tool. The AI directly "sees" your screen, "thinks" about what to do next, and operates your PC or smartphone autonomously just like a human 🤖 The open-source UI-TARS-1.5 has even achieved SOTA on major GUI benchmarks! 🔥 Details on how it works and the problems it solves in the thread below 👇 #UITARS #AI #ByteDance #VLM #Automation #OpenSource #RPA #Multimodal

English

cv usk@cv_usk·6h

【UI-TARSの何が革命的なのか？徹底解説🔍】 🔻解決する課題従来の自動化ツール（RPA等）は、APIやHTMLタグの構造に依存するため、アプリのアップデートやUIレイアウトの変更に弱く、直感的な操作が困難でした。また、複雑なタスクにおいて「次に何をすべきか」を動的に判断できないという壁がありました。 🔻方法論と提案手法 VLM（視覚言語モデル）を駆使し、画面上のボタンやアイコンを「絶対座標」で正確に把握（グラウンディング）します。最大の特徴は、行動前にモデル自身が推論を行う「思考（Thought）プロセス」が強化学習によって組み込まれている点です。デバイスに合わせて「PC向け」「モバイル向け」「グラウンディング特化」の3つのモードを使い分け、ドラッグや右クリックなど精緻なアクションを出力します🧠 🔻ユースケースと実験結果・圧倒的な性能: OSWorldなどの主要GUI自動化ベンチマークで、OpenAI CUAやClaude 3.7に匹敵、または凌駕する最高水準（SOTA）を達成🏆 ・All In Oneの汎用性: ウェブブラウザのリサーチ業務やローカルのファイル操作はもちろん、スマホエミュレータでのアプリ操作、さらには「Minecraft」のような3Dゲームのプレイまで適応可能🎮 ・高い実装性: Pythonからpip install ui-tarsで導入でき、出力された座標データをPyAutoGUIに渡すことで、自身のローカルPCをそのまま自動操縦させることも可能です！ github.com/bytedance/UI-T… #UITARS #AI #ByteDance #VLM #自動化 #オープンソース #RPA #マルチモーダル

日本語

cv usk@cv_usk·6h

【ついにGUI操作もAIが自動化💻✨】 ByteDanceが開発した最強のマルチモーダルエージェント「UI-TARS」！単なる自動化ツールではなく、AIが画面を直接"見て"、次に何をすべきか"思考"し、PCやスマホを人間のように自律的に操作してくれます🤖 オープンソース版のUI-TARS-1.5は、なんと主要ベンチマークでSOTAを達成🔥 仕組みや何が解決されるのか、詳細はスレッドへ👇 #UITARS #AI #ByteDance #VLM #自動化 #オープンソース #RPA #マルチモーダル

日本語

cv usk@cv_usk·6h

[The Problem] When dealing with massive code repositories or documents, conventional AI agents couldn't accumulate "orientation knowledge" of their environment. They had to search from scratch for every new task. This fatal flaw inflated computational costs and caused agents to get lost, reducing accuracy. [Proposed Method: PEEK] The solution is a fixed-size cache area called the "Context Map." PEEK keeps the agent's mental map up-to-date automatically using 3 modules 🧠: 1️⃣ Distiller: Extracts reusable structural knowledge of the environment from past execution histories. 2️⃣ Cartographer: Organizes the extracted knowledge and updates the map. 3️⃣ Evictor: Automatically deletes outdated/unnecessary information to maintain a strict token budget. [Experimental Results & Benefits] ✅ Massive Accuracy Boost: 6.3% to 34.0% accuracy improvement over existing baselines in long-context tasks! ✅ Cost & Time Reduction: Cut unnecessary exploration loops by up to 145 times, resulting in a 1.7x to 5.8x cost reduction 💰 ✅ High Versatility: Model-agnostic and easily applicable to production-level coding agents. This is a fascinating study that gives LLMs a very human-like ability: "building a map from past experiences to shortcut future explorations!" 🔍 arxiv.org/pdf/2605.19932 github.com/zhuohangu/peek #LLM #AIAgents #MachineLearning #GenerativeAI #PaperExplanation #PEEK

English

cv usk@cv_usk·6h

A "Map" to keep LLM Agents from getting lost! 🗺️✨ Introducing "PEEK," a new framework for LLM agents handling massive codebases and documents! (May 2026 Paper) By giving agents a dedicated "Context Map," it eliminates the need to explore environments from scratch for every new task. Agents become dramatically smarter, cheaper, and faster! 🚀 Details in the thread below 👇 #LLM #AIAgents #MachineLearning #GenerativeAI #PaperExplanation #PEEK

English

cv usk@cv_usk·6h

【解決する課題】巨大なコードリポジトリや文書を扱う際、従来のAIエージェントは「環境全体の方向感覚」を蓄積できず、新しいタスクのたびにゼロから情報を探し直していました。これでは計算コストが膨らむだけでなく、途中で迷子になって精度が落ちるという致命的な弱点がありました。【提案手法：PEEK】これを解決するのが「Context Map（コンテキストマップ）」という固定サイズのキャッシュ領域です。PEEKは以下の3つのモジュールで、エージェントの脳内地図を自動的に最新に保ちます🧠 1️⃣ Distiller（抽出器）: 過去の行動履歴から、再利用できる「環境の構造ルール」を抽出 2️⃣ Cartographer（地図作成器）: 抽出した知識を整理し、マップに書き込む 3️⃣ Evictor（退去器）: トークン上限を超えないよう、不要になった古い情報を自動で削除（予算管理）【実験結果・メリット】 ✅ 精度が爆上がり：長文脈タスクで既存手法より6.3%〜34.0%の精度向上！ ✅ コストと時間の削減：無駄な探索ループを最大145回削減し、1.7倍〜5.8倍のコストカットを実現💰 ✅ 高い汎用性：特定のモデルに依存せず、プロダクションレベルのコーディングエージェント（Codexなど）にも即適用可能。「過去の経験から地図を作り、次回の探索をショートカットする」という、人間と同じようなアプローチをLLMに実装した非常に面白い研究です！🔍 arxiv.org/pdf/2605.19932 github.com/zhuohangu/peek #LLM #AIエージェント #機械学習 #生成AI #論文解説 #PEEK

日本語

cv usk@cv_usk·6h

LLMエージェントが迷子にならないための「地図」🗺️✨ 膨大なコードや文書を扱うエージェントの新手法「PEEK」が登場！専用の「Context Map（メモ帳）」を持たせ、タスクのたびにゼロから探索する無駄を排除。賢く安く速く動けるようになる🚀 詳細はツリーへ👇 #LLM #AIエージェント #論文解説 #PEEK

日本語

cv usk@cv_usk·6h

🚨 The Problem Mountains of point-to-point integration code and config files every time you add a new tool (monitoring, job queues, etc.). Systems become complex, making it incredibly hard to grasp the big picture or trace events. 💡 The Methodology "Zero Integration." By consolidating everything into a single shared runtime, it eliminates the tedious plumbing work, allowing developers to focus purely on business logic. 🛠 The Solution A radically simplified architecture using just 3 primitives: Worker, Trigger, and Function. Add capabilities instantly via CLI (e.g., iii worker add queue). The engine automatically handles routing and serialization. Official SDKs are ready for Node.js, Python, and Rust. 🚀 Use Cases 1️⃣ Lightning-Fast Microservices: Define APIs or async batches in a few lines of code. The built-in "iii-console" gives you real-time unified visibility, logs, and traces of your entire system. 2️⃣ Autonomous AI Agents: If an AI agent realizes it needs a new capability during a task, it can dynamically add workers, discover functions, and execute them on its own. A true next-gen infrastructure where humans and AI can seamlessly build and scale systems using the exact same interface! 🔥 🔗 github.com/iii-hq/iii #Backend #Engineering #API #AIAgents #Rust #DeveloperExperience #OSS #SRE

English

cv usk@cv_usk·6h

It might be time to say goodbye to backend "integration hell" 🔥 "iii" (iii-hq/iii) is a fascinating next-gen runtime engine that integrates everything—queues, Cron, HTTP, state management, and even AI agents—with zero complex configurations! 🚀 Details in the thread 👇 github.com/iii-hq/iii #Backend #Engineering #API #AIAgents #Rust #DeveloperExperience #OSS #SRE

English

cv usk@cv_usk·6h

【🚨解決する課題】新しいツール（監視、ジョブキュー等）を導入するたびに発生する「点対点」の連携コードや設定ファイルの山。システムが複雑化し、全体像の把握やトレースが困難になる問題。【💡方法論】すべてを1つの共有ランタイムに集約する「ゼロ・インテグレーション」。連携のための面倒な配管工事をなくし、ビジネスロジックに集中できる環境を作る。【🛠提案手法】アーキテクチャを「Worker」「Trigger」「Function」の3つに極限までシンプル化。 CLIから iii worker add queue のように叩くだけで機能が追加され、ルーティングやシリアライズはエンジンが自動処理。Node.js、Python、RustのSDKに対応。【🚀ユースケース】 1️⃣ 爆速マイクロサービス構築数行のコードでAPIや非同期バッチを定義。標準装備の「iii-console」により、システム全体のリアルタイムな状態やログ、トレースを一元的に可視化。 2️⃣ AIエージェントの自律的拡張AIがタスク実行中に「この処理には新機能が必要だ」と判断した場合、AI自身が動的にワーカーを追加し、関数を発見・呼び出してタスクを完遂する。人間とAIが同じインターフェースでシームレスにシステムを拡張できる、まさに次世代のインフラです🔥 🔗 github.com/iii-hq/iii #エンジニア #バックエンド #API #AIエージェント #Rust #開発者体験 #OSS #SRE

日本語

cv usk@cv_usk·6h

バックエンド開発の「インテグレーション地獄」から解放される時が来たかも🔥 キュー、Cron、HTTP、状態管理、AIエージェントまで…あらゆるサービスを複雑な連携設定なしで統合できる次世代ランタイムエンジン「iii (iii-hq/iii)」が面白い！ github.com/iii-hq/iii 🚀 詳細はツリーへ👇 #エンジニア #バックエンド #API #AIエージェント #Rust #開発者体験 #OSS #SRE

日本語

cv usk@cv_usk·12h

romantic-impressionist-quartet by cv usk via #soundcloud #classical #stringquartet #romantic #impressionist soundcloud.com/cv-usk/romanti…

English

cv usk@cv_usk·12h

[The Problem] Traditional folder hierarchies and scattered notes make it incredibly difficult to provide AI with the correct context. [Methodology] Built on local plain text (Markdown) but powered by an ultra-fast Rust core. It avoids vendor lock-in and can process over 20,000 files in under a second. [The Solution] 1️⃣ A rigid-hierarchy-free "Knowledge Graph" structure utilizing inclusion and cross-reference links. 2️⃣ Enhanced editor experience via LSP (e.g., renaming a note automatically refactors all related links across your vault). 3️⃣ Seamless AI agent integration via CLI and a built-in MCP (Model Context Protocol) server. [Use Cases] Connect it to MCP-compatible clients like Claude Desktop or Windsurf, and the AI will autonomously search and reference your notes to generate highly contextual answers. It serves as an incredibly powerful foundation for local RAG and practicing context engineering in LLM systems.

English

cv usk@cv_usk·12h

The ultimate "second brain" for AI agents to perfectly understand your context 🧠 Check out "IWE", a next-gen Markdown knowledge management system built in Rust! It ditches traditional folders, connecting notes via links to build a flexible "Knowledge Graph". With native MCP support, tools like Claude and Cursor can directly explore, read, and write to your personal knowledge base ⚡ github.com/iwe-org/iwe #LLM #AI #MCP #Markdown #KnowledgeGraph #Rust #ContextEngineering

English

cv usk@cv_usk·12h

【解決する課題】従来のフォルダ階層や散在したメモでは、AIに正しい「文脈（コンテキスト）」を渡すのが困難でした。【方法論】ローカルのプレーンテキスト（Markdown）をベースにしつつ、超高速なRustコアを採用。特定のツールに依存せず、2万件以上のファイルも1秒未満で処理します。【提案手法】 ① 包含リンクと相互参照による、階層に縛られない「ナレッジグラフ」構造 ② LSP（Language Server Protocol）対応によるエディタの強化（ノートのファイル名を変更すると、関連する全リンクを自動リファクタリング） ③ 内蔵のMCP（Model Context Protocol）サーバーとCLIによる、AIエージェントへのシームレスな接続【ユースケース】 Claude DesktopやWindsurfなどのMCP対応クライアントに接続すれば、AIが自律的にあなたのメモを検索・参照しながら高度な回答を生成します。LLMシステムにおけるローカルRAGや、コンテキストエンジニアリングの実践基盤として非常に強力なツールです。

日本語

cv usk@cv_usk·12h

AIエージェントに「自分の文脈」を完璧に理解させる、次世代の第二の脳🧠 Rust製のMarkdownナレッジ管理システム「IWE」が面白いです！フォルダ管理を廃止してノートをリンクで繋ぐ「ナレッジグラフ」を構築。MCPにネイティブ対応しており、ClaudeやCursorがあなたの知識ベースを直接探索・読み書きできるようになります⚡ github.com/iwe-org/iwe #LLM #AI #MCP #Markdown #KnowledgeGraph #Rust #ContextEngineering

日本語

cv usk@cv_usk·14h

imperial-yayue by cv usk via #soundcloud #chinesewisdom #bgm soundcloud.com/cv-usk/imperia…

English

cv usk@cv_usk·15h

[Detailed Paper Breakdown] ⚠️ The ProblemRaw wearable data (like heart rate and movement) varies drastically between individuals due to different baseline physiology and lifestyles. This made it incredibly difficult to translate low-level signals into high-level, actionable health predictions. 💡 Methodology & Proposed MethodThe team pre-trained a model using over "1 trillion minutes" of unlabeled sensor signals from 5 million people. Scaling up both the data and model size unlocked "few-shot learning" capabilities, allowing it to adapt efficiently with very little data! They also introduced a "classroom of LLM agents," letting the AI autonomously explore and optimize predictive systems built on top of this foundation model 🤖 📊 Experimental Results & Use CasesThe model demonstrated powerful performance across 35 diverse health prediction tasks, spanning cardiovascular health, sleep quality, and mental health. Ultimately, they integrated these predictive models with an LLM to create a "Personal Health Agent." Real clinicians evaluated 1,860 interactions, proving that the agent delivers safe, highly relevant, and context-aware health advice! 🩺✨

English

cv usk@cv_usk·15h

Towards a General Intelligence and Interface for Wearable Health Data arxiv.org/pdf/2605.22759 #AI #Healthcare #Wearables #LLM #HealthTech #MachineLearning

English

cv usk@cv_usk·15h

⌚️ The future where your smartwatch becomes your "personal AI physician" is just around the corner!? Google Research and Google DeepMind report. A mind-blowing paper was just published on building a massive "Foundation Model" for health data, trained on over 1 trillion minutes of wearable data from 5 million people! 🤯 This is a groundbreaking study that translates raw daily sensor data into personalized health insights. Details in the thread below 👇 #AI #Healthcare #Wearables #LLM #HealthTech #MachineLearning

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry