Frank Lin

1.8K posts

Frank Lin banner
Frank Lin

Frank Lin

@developerlin

Create Limitless Value with AI | Exploring AI’s Future

Tham gia Ekim 2007
713 Đang theo dõi87 Người theo dõi
Frank Lin đã retweet
Songyou Peng
Songyou Peng@songyoupeng·
Yay, finally! Introducing Vision Banana🍌 from @GoogleDeepMind, our unified model that outperforms SoTA specialist models on various vision tasks! By treating 2D/3D vision tasks as image generation, we unlock a new foundation for CV. Project page: vision-banana.github.io (1/5)
English
53
293
2.1K
248.4K
Min Zhou
Min Zhou@fMinZhou·
GPT Image 2 is insanely good...I generated a 360° equirectangular panorama in Happycapy with just a skill + prompt. Step 1: Select the generate-image skill Step 2: Enter a prompt like: “Use a frontend 360 viewer to display an equirectangular image of […] using the GPT-Image-2 model.” Wanna see how you all get creative with this
English
64
474
4.3K
317.4K
Frank Lin đã retweet
Carlos Miguel Patiño
Carlos Miguel Patiño@cmpatino_·
On-policy distillation with 100B+ teacher models is now possible in TRL, and up to 40x faster than naive implementations! We distilled Qwen3-235B into a 4B student and gained 39+ points on AIME25. Two engineering optimizations made it work. Blogpost: huggingface.co/spaces/Hugging…
Carlos Miguel Patiño tweet media
English
4
64
354
27.3K
Frank Lin đã retweet
Guri Singh
Guri Singh@heygurisingh·
🚨BREAKING: An open-source agentic video production system just dropped. 11 pipelines, 49 tools, and a full product ad produced for $0.69 total. It's called OpenMontage. And it's not a text-to-video tool. It's a full production orchestration system where your AI coding assistant (Claude Code, Cursor, Copilot, Windsurf) becomes the director. Describe what you want in plain language. The agent researches, scripts, generates assets, edits, and renders the final video. Here's what the pipeline actually does: → Live web research first: 15-25+ searches across YouTube, Reddit, news sites before writing a single word of script → 12 video generation providers: Kling, Runway Gen-4, Google Veo 3, MiniMax, plus local GPU options (WAN 2.1, Hunyuan, CogVideo) → 8 image generation providers: FLUX, Google Imagen 4, DALL-E 3, Stable Diffusion locally → 4 TTS providers: ElevenLabs, Google (700+ voices), OpenAI, and Piper offline for free → WhisperX word-level subtitles burned in automatically → Remotion for React-based animated composition with spring physics, transitions, TikTok-style captions → Budget governance: cost estimate before execution, per-action approval above $0.50, hard cap at $10 Here's the wildest part: One product ad. 4 AI-generated images, TTS narration, royalty-free music, word-level subtitles, Remotion data visualizations. Total cost: $0.69. Zero manual asset work. Works with zero API keys too. Piper narrates locally, Pexels/Pixabay provide free stock, Remotion animates everything. No spend required to start. 100% Open Source. AGPL v3 License. (Link in the comments)
English
46
175
1.1K
110.8K
Paweł Huryn
Paweł Huryn@PawelHuryn·
Claude Code doesn't show you how many tokens you're using for subscriptions. No breakdown by model. No breakdown by project. Just a progress bar that says "63% used." So I built a local dashboard that reads the files Claude Code already writes to your machine. Turns out every session, every turn, every token is logged to ~/.claude/projects/ in JSONL files. Input tokens, output tokens, cache reads, cache creation, model name, timestamp. It's all there. You just can't see it. My numbers over the last 30 days: 440 sessions. 18,000 turns. $1,588 in API-equivalent costs. On one day, the cache spiked to 700M tokens - visible cache bug, two days in a row. The dashboard scans those local files, builds a SQLite database, and serves charts on localhost:8080. Filter by model (Opus, Sonnet, Haiku). Filter by time range (7d, 30d, 90d, all time). Cost estimates based on current Anthropic API pricing. Works retroactively. First run processes your entire Claude Code history. Install: git clone github.com/phuryn/claude-… cd claude-usage python3 cli.py dashboard Windows: use python instead of python3. Zero dependencies. Python standard library only. Open source, MIT. Star it. Fork it. Make it your own.
Paweł Huryn tweet media
English
128
219
2.3K
292.3K
Frank Lin đã retweet
Simplifying AI
Simplifying AI@simplifyinAI·
🚨 Someone just built a fully open-source mocap system that works with any camera. It's called FreeMoCap, a markerless 3D tracking system that runs on ordinary webcams. It turns multiple camera feeds into research-grade skeletal data automatically. 100% Open Source.
English
64
735
6.3K
355.5K
Frank Lin đã retweet
Nav Toor
Nav Toor@heynavtoor·
🚨BREAKING: Researchers built an AI that designs better AI than humans can. It discovered 105 new architectures that beat human-designed models. Nobody guided it. It taught itself. The paper is called "ASI-Evolve: AI Accelerates AI." Published this week by researchers at Shanghai Jiao Tong University. Fully open-sourced. And what it demonstrates should stop every AI researcher cold. They built a system that runs the entire AI research loop on its own. It reads scientific papers. It forms hypotheses. It designs experiments. It runs them. It analyzes the results. Then it uses what it learned to design better experiments. Over and over. Without human intervention. They pointed it at neural architecture design first. Over 1,773 rounds of autonomous exploration, the system generated 1,350 candidate architectures. 105 of them beat the best human-designed model. The top architecture surpassed DeltaNet by +0.97 points. That is nearly 3 times the gain of the most recent human-designed state-of-the-art improvement. Humans spent years to get +0.34 points. The AI got +0.97 on its own. Then they pointed it at training data. The AI designed its own data curation strategies and improved average benchmark performance by +3.96 points. On MMLU, the most widely used knowledge benchmark, the improvement exceeded 18 points. Then they pointed it at learning algorithms. The AI invented novel reinforcement learning algorithms that outperformed the leading human-designed method GRPO by up to +12.5 points on competition math. Three pillars of AI development. Data. Architecture. Algorithms. The AI improved all three by itself. Then they tested whether what the AI built actually works in the real world. They applied an AI-discovered architecture to drug-target interaction prediction. It achieved a +6.94 point improvement in scenarios involving completely unseen drugs. The AI designed something that works better than human experts in biomedicine. This is the first system to demonstrate AI-driven discovery across all three foundational components of AI development in a single framework. The recursive loop is now closed. AI is building AI. And it is already better at it than we are.
Nav Toor tweet media
English
63
212
1K
97.5K
Maziyar PANAHI
Maziyar PANAHI@MaziyarPanahi·
Gemma 4 watches raw video. Understands the scene. Then prompts SAM 3 to segment and RF-DETR to track. One AI directing two others. Fighter jets. Crowds. Aerial defense footage. All three models running locally on a MacBook. No cloud. What scene should I point this at next?
English
102
182
3K
364.2K
Frank Lin
Frank Lin@developerlin·
This concept is highly valuable. Currently based on workspaces, it could also function as a skill and is capable of self-evolution. Furthermore, it can be utilized as an agent's memory.
Andrej Karpathy@karpathy

Wow, this tweet went very viral! I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs. So here's the idea in a gist format: gist.github.com/karpathy/442a6… You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.

English
0
0
0
7
Frank Lin đã retweet
Sida Peng
Sida Peng@pengsida·
The training code for InfiniDepth is now open-source. Feel free to use our framework to train a monocular depth estimation model as well as a depth sensor augmentation model on your own data. github.com/zju3dv/InfiniD…
Sida Peng@pengsida

Excited to share our work InfiniDepth (CVPR 2026) — casting monocular depth estimation as a neural implicit field, which enables: 🔍 Arbitrary-Resolution 📐 Accurate Metric Depth 📷 Large-View Novel View Synthesis Feel free to try our code: github.com/zju3dv/InfiniD…

English
1
32
201
15.5K
Frank Lin
Frank Lin@developerlin·
Interesting
Bo Wang@BoWang87

Apple Research just published something really interesting about post-training of coding models. You don't need a better teacher. You don't need a verifier. You don't need RL. A model can just… train on its own outputs. And get dramatically better. Simple Self-Distillation (SSD): sample solutions from your model, don't filter them for correctness at all, fine-tune on the raw outputs. That's it. Qwen3-30B-Instruct: 42.4% → 55.3% pass@1 on LiveCodeBench. +30% relative. On hard problems specifically, pass@5 goes from 31.1% → 54.1%. Works across Qwen and Llama, at 4B, 8B, and 30B. One sample per prompt is enough. No execution environment. No reward model. No labels. SSD sidesteps this by reshaping distributions in a context-dependent way — suppressing distractors at locks while keeping diversity alive at forks. The capability was already in the model. Fixed decoding just couldn't access it. The implication: a lot of coding models are underperforming their own weights. Post-training on self-generated data isn't just a cheap trick — it's recovering latent capacity that greedy decoding leaves on the table. paper: arxiv.org/abs/2604.01193 code: github.com/apple/ml-ssd

English
0
0
0
20
Frank Lin đã retweet
Google Gemma
Google Gemma@googlegemma·
Meet Gemma 4! Purpose-built for advanced reasoning and agentic workflows on the hardware you own, and released under an Apache 2.0 license. We listened to invaluable community feedback in developing these models. Here is what makes Gemma 4 our most capable open models yet: 👇
Google Gemma tweet media
English
166
842
7.2K
621.6K
Frank Lin đã retweet
Yasser Dahou
Yasser Dahou@dahou_yasser·
We are releasing Falcon Perception, an open-vocabulary referring expression segmentation model. Along with it, a 0.3B OCR model that is on par with 3-10x larger competitors. Current systems solve this with complex pipelines (separate encoders, late fusion, matching algorithms). We developed a novel simpler "bitter" approach: one early-fusion Transformer (image + text from first layer) with a shared parameter space, and let scale + training signal do the work. Please check our work ! 📄 Paper: arxiv.org/pdf/2603.27365 💻 Code: github.com/tiiuae/falcon-… 🎮 Playground: vision.falcon.aidrc.tii.ae 🤗 Blogpost: huggingface.co/blog/tiiuae/fa…
English
26
165
990
117.2K
Frank Lin đã retweet
Jim Fan
Jim Fan@DrJimFan·
The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries as they go. CaP-X is a strict superset of our old stack, because policies like VLAs are “just” API calls as well. It solves many tasks zero-shot that a learned policy would struggle with. And we are doing much more than vibing. CaP-X is our most systematic, scientific study on agentic robotics so far: - We build a comprehensive agentic toolkit: perception (SAM3 segmentation, Molmo pointing, depth, point cloud), control (IK solvers, grasp planner, navigation), and visualization (EEF, mask overlays) that work across different robots. - CaP-Gym: LLM’s first Physical Exam! 187 manipulation tasks across RoboSuite, LIBERO-PRO, and BEHAVIOR. Tabletop, bimanual, mobile manipulation. Sim and real. Can’t wait to see the gradients flow from CaP-Gym to the next wave of frontier LLM releases. - CaP-Bench: we benchmark 12 frontier LLMs/VLMs (Gemini, GPT, Opus, Qwen, DeepSeek, Kimi, and more) across 8 evaluation tiers. We systematically vary API abstraction level, agentic harness, and visual grounding methods. Lots of insights in our paper. - CaP-Agent0: a training-free agentic harness that matches or exceeds human expert code on 4 out of 7 tasks without task-specific tuning. - CaP-RL: if you get a gym, you get RL ;). A 7B OSS model jumps from 20% to 72% success after only 50 training iterations. The synthesized programs transfer to real robots with minimal sim-to-real gap. 3 years ago, our team created Voyager, one of the earliest agentic AI that plays and learns in Minecraft continuously. Its key ideas — skill libraries, self-reflection loops, and in-context planning — have since influenced many modern agentic designs. Today, the agent graduates from Minecraft and gets a real job. It’s April Fool’s, but this Claw is getting its hands dirty for real! Link in thread:
English
96
114
694
66.4K
Frank Lin đã retweet
Markets & Mayhem
Markets & Mayhem@Mayhem4Markets·
TurboQuant is looking pretty solid. 🔥 > Original idea was to use it just for KV cache where context tokens are stored > Now it is expanding to be used with models > On Qwen 3.5-27B it shrinks the model down to 12.9B > 6X memory savings vs 16-bit precision > Stays accurate
Markets & Mayhem tweet media
English
76
155
1.6K
289.2K
Frank Lin đã retweet
Andrej Karpathy
Andrej Karpathy@karpathy·
New supply chain attack this time for npm axios, the most popular HTTP client library with 300M weekly downloads. Scanning my system I found a use imported from googleworkspace/cli from a few days ago when I was experimenting with gmail/gcal cli. The installed version (luckily) resolved to an unaffected 1.13.5, but the project dependency is not pinned, meaning that if I did this earlier today the code would have resolved to latest and I'd be pwned. It's possible to personally defend against these to some extent with local settings e.g. release-age constraints, or containers or etc, but I think ultimately the defaults of package management projects (pip, npm etc) have to change so that a single infection (usually luckily fairly temporary in nature due to security scanning) does not spread through users at random and at scale via unpinned dependencies. More comprehensive article: stepsecurity.io/blog/axios-com…
Feross@feross

🚨 CRITICAL: Active supply chain attack on axios -- one of npm's most depended-on packages. The latest axios@1.14.1 now pulls in plain-crypto-js@4.2.1, a package that did not exist before today. This is a live compromise. This is textbook supply chain installer malware. axios has 100M+ weekly downloads. Every npm install pulling the latest version is potentially compromised right now. Socket AI analysis confirms this is malware. plain-crypto-js is an obfuscated dropper/loader that: • Deobfuscates embedded payloads and operational strings at runtime • Dynamically loads fs, os, and execSync to evade static analysis • Executes decoded shell commands • Stages and copies payload files into OS temp and Windows ProgramData directories • Deletes and renames artifacts post-execution to destroy forensic evidence If you use axios, pin your version immediately and audit your lockfiles. Do not upgrade.

English
559
1.1K
10.5K
1.5M
Frank Lin đã retweet
Akshay 🚀
Akshay 🚀@akshay_pachaar·
Microsoft did it again! Speech AI models have a major limitation. They slice long recordings into tiny chunks, lose track of who's speaking, and forget all context halfway through. This is exactly what Microsoft's VibeVoice solves. It's an open-source family of frontier voice AI models for both speech recognition and speech generation. Here's what it can do: > VibeVoice-ASR processes up to 60 minutes of audio in a single pass. No chunking. It outputs structured transcriptions with who spoke, when they spoke, and what they said. > You can feed it custom hotwords like names, technical jargon, or domain-specific terms. The model uses them to significantly improve accuracy on specialized content. > VibeVoice-TTS generates up to 90 minutes of multi-speaker speech with up to 4 distinct speakers. Natural turn-taking, emotional expression, all in one pass. > VibeVoice-Realtime is a 0.5B streaming TTS model with ~300ms first-audio latency. Small enough to deploy practically anywhere. All of this is powered by continuous speech tokenizers running at just 7.5 Hz. This ultra-low frame rate preserves audio quality while making long sequences computationally feasible. I have shared the link to the GitHub repo in the replies!
English
25
62
520
45K