Robot

6.1K posts

Robot

@Random_W23

技術和科學健身

北京, 中华人民共和国 Bergabung Kasım 2016

502 Mengikuti49 Pengikut

Robot me-retweet

Kang Liao@KangLiao929·13 Eki

Introducing 𝐓𝐡𝐢𝐧𝐤𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐂𝐚𝐦𝐞𝐫𝐚📸, a unified multimodal model that integrates camera-centric spatial intelligence to interpret and create scenes from arbitrary viewpoints. Project Page: kangliao929.github.io/projects/puffi… Code: github.com/KangLiao929/Pu…

English

298

1.1M

Robot me-retweet

DailyPapers@HuggingPapers·4h

AnyRecon: 3D Reconstruction from Any Sparse Views Arbitrary views in, consistent 3D out. Unlike prior methods limited to 1-2 frames, it handles unordered inputs with explicit 3D memory. Scales to 200+ frames via efficient 4-step diffusion.

English

1.1K

Robot me-retweet

AI Bites | YouTube Channel@ai_bites·1d

Adaptive Patch Transformers (APT), a method to accelerate vision transformers (ViTs) by using multiple different patch sizes within the same image. APT reduces the total number of input tokens by using larger patch sizes in more homogeneous image regions, and smaller patches in more complex ones. APT achieves a drastic speedup in ViT inference and training, increasing throughput by 40% on ViT-L and 50% on ViT-H while maintaining downstream performance. It can be applied to a previously fine-tuned ViT and converges in as little as 1 epoch, enabling training on high-resolution images with minimal compute budgets. It also significantly reduces training and inference time with no performance degradation on high-resolution dense visual tasks, achieving up to 30% faster training and inference on visual QA, object detection and semantic segmentation. Paper Title: Accelerating Vision Transformers with Adaptive Patch Sizes Project: rccchoudhury.github.io/apt/ Link: arxiv.org/abs/2510.18091

English

126

5.3K

Robot me-retweet

Kanahiro Iguchi@kanahiro_iguchi·7h

MapLibre GL JSに3DGSを召喚する遊び

日本語

4.9K

Robot me-retweet

Joruno@wsl8297·5h

想系统学强化学习，却常被两类资料劝退：要么只讲皮毛，学完还是不会；要么数学堆满页，看两章就放弃。这本开源教材 Mathematical Foundations of Reinforcement Learning 刚好补上中间那块空白：讲得清楚、推导严谨但不吓人，还配了大量视频，把经典算法从概念到实现一步步讲透。它从数学视角梳理 RL 的核心框架，用大量例子把抽象概念落到实处，让“看起来很玄”的算法真正变得可理解、可复现。 GitHub：github.com/MathFoundation… 你能学到什么： - 基础概念一网打尽：状态、动作、策略、价值函数等完整串起来 - 经典算法拆解到位：MC / TD / Q-learning 等从原理到细节逐步推导 - 50+ 集中英文视频配套：边学边练，理论和实践同步推进 - 大量网格世界示例：用直观实验理解抽象公式与更新规则 - 数学推导规范但友好：难度控制得当，不靠“跳步”糊弄人 - 多语言代码实现：Python、R、C++ 等直接对照复现适合想把强化学习理论真正学扎实的人；建议具备基础概率论和线性代数再上手。

中文

212

8.6K

Robot me-retweet

sasaki@engineer@rsasaki0109·14 Nis

GPS衛星の電波がGPS受信機に「ちゃんと届くか／ビルに遮られるか」を再現するシミュレーターをオープンソースで作りました！ 3D都市モデル上で電波伝搬を計算しています。リンクはリプにあります！

日本語

110

603

35.9K

Robot me-retweet

tetsuo@tetsuoai·1d

how CNNs see images 16 boxes covering the core CNN stack. tensors, filters, feature maps, stride, padding, channels, pooling, receptive fields, mental model

English

142

816

30.6K

Robot me-retweet

向阳乔木@vista8·1d

今天信息推送看到一个很老的Github库，可以被称为全网第一造轮子库，哈哈哈。 49万Star，震惊😱 地址见评论区

中文

322

136.3K

Robot me-retweet

Yadong Xie@yadong_xie·1d

复刻了这两天特别火的网红项目，有点好玩

中文

532

62.2K

Robot me-retweet

tetsuo@tetsuoai·1d

monocular vision is wild one camera stream, estimating motion and sparse 3D structure from changing pixels alone

English

356

27.7K

Robot me-retweet

Tw93@HiTw93·2d

视频分享版的《你不知道的 Agent：原理、架构与工程实践》终于传到 Youtube 去了，上次看文章的小伙伴假如没有看太懂的，或者想更加了解的，欢迎看视频，Youtuber 视频小白欢迎大家一键三连加关注，以后我尽量把分享都录屏分享给大伙在这里看看。 youtube.com/watch?v=Z5If1L…

YouTube

中文

105

687

104.5K

Robot me-retweet

梓哲悟语 | Zenzhe@Zenzhe99·2d

认识AI 建议大家花三分钟搞懂 Agent 、Skills 、Harness ； Agent、Skills 和 Harness 是构建现代 AI 应用（尤其是自主智能体）时的三个核心概念，它们共同协作让 AI 从“只会聊天” 进化为“能干活”。 1. Agent（智能体）：大脑与决策者 Agent 是核心控制器，相当于人的“大脑”。它不仅仅是一个对话模型，更具备感知、规划、记忆和行动的能力。核心作用：接收用户目标，自主拆解任务步骤，决定何时调用工具，并根据反馈调整策略。特点：具有主动性（Autonomy），能在没有人类步步指令的情况下，独立完成复杂流程（如“帮我策划一次旅行并预订机票”）。 2. Skills（技能/工具）：手脚与执行力 Skills 是 Agent 调用的具体能力单元，相当于人的“手脚”或“工具箱”。核心作用：执行具体操作。大模型本身只有知识，没有行动力，必须通过 Skills 连接外部世界。常见形式：搜索互联网、读写文件、调用 API（如发送短信、查询天气）、运行代码或操作特定软件。关系： Agent 决定“做什么”， Skills 负责“怎么做”。 3. Harness（编排框架/调度器）：骨架与协调者 Harness 是支撑 Agent 运行、管理 Skills 调用的基础设施或框架，相当于“神经系统”或“工作台”。核心作用：安全管控：限制 Agent 的权限，防止其执行危险操作（如删除系统文件）。流程编排：管理多个 Skills 的调用顺序，处理并发任务。状态管理：记录对话历史和任务进度，确保 Agent 在长任务中不“失忆”。评估反馈：监控执行结果，如果 Skills 执行失败，Harness 会反馈给 Agent 让其重试或换一种方法。三者关系总结：用户提出目标 -> Harness 接收并初始化环境 -> Agent 分析目标并制定计划 -> Agent 指挥 Skills 执行具体动作 -> Skills 返回结果给 Agent -> Harness 监控全过程并确保安全 -> 最终交付结果给用户。

中文

271

912

51.8K

Robot me-retweet

西时珍🌎@usd666666·3d

分享一个让对方难受的话术

中文

237

152

1.9K

106.3K

Robot me-retweet

Givros@givros·2d

Codex + GPT-Image-2 + Three.js = instant interactive 3D worlds 🌍

English

581

38.2K

Robot me-retweet

Y11@seclink·2d

这家伙刚刚在一块单张 3090 显卡上，跑出了 Qwen 3.5-27B Dense 模型 134 tok/s 的速度，以及新版 Qwen 3.6-27B 模型 73 tok/s 的速度。2026 年的开源社区，其发展速度简直如神速一般。模型权重在傍晚发布，动态 GGUF 文件在午夜前上线，而融合内核（Fused Kernel）+ 推测解码（Speculative Decoding）的技术栈，在模型发布仅仅 12 小时后，就已经能运行这个新模型了。他的 dflash + ddtree 技术栈能够直接原封不动地加载 Qwen 3.6，因为其架构标识符（architecture string）与 3.5 版本完全匹配。这意味着无需对草稿模型（draft model）进行任何重新训练，也无需苦等上游社区提供支持。此前那套针对消费级硬件精心调优的内核代码——正是它将 3.5 版本的速度推升到了 134 tok/s——现在也能直接处理 3.6 版本，虽然速度降到了 73 tok/s；对此，他坦诚地指出了这一性能回退现象，因为草稿模型确实需要针对 3.6 版本进行一次专门的优化适配。这是一个几乎无人涉足的领域。各大主流实验室目前仍局限于发布那些专为 H100 显卡集群优化的框架抽象层。 @pupposandro 却在亲手调优那些针对“实际构建者”们真正拥有的硬件芯片所适配的内核代码。3090 显卡拥有 24GB 显存，具备成熟的 CUDA 支持，但来自大厂的内核级优化却几乎为零。在当前的消费级 AI 领域中，它无疑是被严重低估的一块研究平台。我现在正通过 llama.cpp 运行一套“诚实、基准级”的 Q4_K_M 量化模型测试，旨在确立 Dense 模型的性能下限——且不使用任何投机取巧的加速手段。随后，我会在同一块 GPU、同一款模型、同一条 Prompt（提示词）上运行 Sandro 的技术栈。这相当于是一场“通用推理方案”与“结合推测解码的定制调优内核”之间的较量。两者之间的性能差距（Delta），正是未来五年消费级 AI 领域的核心增长点所在。

Sudo su@sudoingX

this guy just cracked 134 tok/s on qwen 3.5-27b dense and 73 on new qwen 3.6-27b on a single 3090. open source moves at godspeed in 2026. weights ship in the evening, dynamic ggufs land by midnight, fused kernel + speculative decoding stack runs the new model 12 hours after release. his dflash + ddtree stack loads qwen 3.6 asis because the architecture string matches 3.5. zero retraining of the draft model, zero waiting for upstream support. the same hand tuned consumer hardware kernel work that pushed 3.5 to 134 tok/s already eats 3.6 at 73, with a regression he is openly flagging because the draft model needs a dedicated pass for 3.6. this is the lane almost nobody is working on. major labs are stuck shipping framework abstractions optimized for h100 fleets. @pupposandro is hand tuning kernels for the silicon actual builders own. 3090 has 24 gigs of vram, mature cuda support, and almost zero kernel level optimization coming out of the big shops. it is the most underrated research platform in consumer ai right now. i am running honest baseline q4_k_m on llama.cpp now to set the dense floor without tricks. then sandro's stack runs on the same gpu, same model, same prompt. generic inference vs hand tuned kernels with speculative decoding. that delta is where the next 5 years of consumer ai live. receipts incoming.

中文

207

29.5K

Robot me-retweet

Emily Han@emilyhanyf·2d

built an interactive guide to teach you the basics of mahjong ! 🀄 it includes rulesets from different regions (hk and taiwan for now) with a few interactive elements to illustrate mahjong concepts. more demos in thread, and a bit about the tools i used to build this. the one thing this guide can’t really capture is how social mahjong is -- mahjong tables are never silent ! they’re full of overlapping conversations and the constant click-clack of acrylic tiles stacking and shuffling against each other. come to @modal's mahjong night tmrw to experience that part for yourself : )

English

139

980

9.5K

402.4K

Robot me-retweet

Vaishnavi@_vmlops·2d

EFFICIENT LLM INFERENCE - The interview pocket notes you actually need drive.google.com/file/d/1mfTzOn…

English

210

1.5K

71.4K

Robot me-retweet

鸟哥 | 蓝鸟会🕊️@NFTCPS·2d

手搓大模型，你敢想吗？ GitHub上这套大模型教程我给你们挖出来了，不藏着！《动手做大模型系列》，视频+文档+代码三件套，从零打通完整技术栈，学完直接能用到项目里。重点来了，它覆盖这些： 1⃣ 微调与部署：llama-factory、vllm实战跑通 2⃣ RAG技术栈：检索增强从搭建到优化 3⃣ Agent开发：手把手构建可用智能体 4⃣ 面试指导：算法岗路线图，少走弯路想进AI行业还没系统学过的，这套直接收藏！ 🔗 github.com/echonoshy/cgft…

中文

151

9.3K

Robot me-retweet

Sida Peng@pengsida·2d

In our comparison on long image sequences, Scal3R consistently outperforms DA3. In such cases, Scal3R offers a viable alternative. Feel free to try Scal3R: github.com/zju3dv/Scal3R

English

194

10.1K

Robot me-retweet

梭哈｜超级个体@WEB3_furture·3d

GPT Image 2 是目前最强图像生成模型，已超越谷歌的 Nano Banana！这种级别的图片已经可以做到以假乱真了，太炸裂了，把普通人搞副业/搞钱的门槛再次降低，几分钟就能出能直接商用的专业图我强烈推荐这个超级实用的提示词现成库： youmind.com/gpt-image-2-pr… 这个网站已经整理了725个高质量GPT Image 2专用提示词，涵盖头像、社交媒体、YouTube封面、漫画故事板、海报传单、App/Web设计.....而且每天都在更新，直接复制就能用

中文

445

75.3K

Jelajahi

@pupposandro @modal @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA