Seungjun

3.2K posts

Seungjun banner
Seungjun

Seungjun

@dev_seungjun

Kaggle Expert | prev Google Summer of Code 23 @ TensorFlow, WWDC21 Scholar

加入时间 Aralık 2019
676 关注601 粉丝
Seungjun 已转推
François Fleuret
François Fleuret@francoisfleuret·
Reminder that I wrote a little book about deep-learning, which is phone-formatted, entirely free, and nearing the 1M download: fleuret.org/francois/lbdl.…
François Fleuret tweet media
English
34
243
2.2K
111.4K
Seungjun 已转推
Jia-Bin Huang
Jia-Bin Huang@jbhuang0604·
Modern Transformer architecture explained I compiled a list of videos on the Transformer architecture into a short "YouTube course". Hopefully, this would be helpful for beginners in the community. 🧵
Jia-Bin Huang tweet media
English
10
136
1.1K
52.2K
Seungjun 已转推
AlphaSignal AI
AlphaSignal AI@AlphaSignalAI·
NVIDIA just trained a 14-billion-parameter AI using evolution, not calculus. Every AI today learns through backpropagation. It computes gradients, adjusts weights, repeats. It works, but it demands precision hardware and enormous GPU clusters. Evolution Strategies offered an alternative. Mutate the model, test it, keep what works. Like biological evolution. The problem was speed. Random mutations on GPUs were painfully slow. EGGROLL fixes this with one trick. It splits huge random matrices into two small ones per mutation. The model mutates, tests, and keeps what works. Hundreds of thousands of mutations run at once. > 100x faster training throughput > 91% speed of pure inference > Pretrains models using only integers > Competitive with backprop on reasoning > Works on non-differentiable systems It pretrained a language model from scratch using zero gradients. It also matched reinforcement learning methods on math reasoning tasks. Everyone kept scaling the calculus to train massive AIs. It turns out, we just needed to evolve.
AlphaSignal AI tweet media
English
50
90
668
62.3K
Seungjun 已转推
Seungjun 已转推
Tech with Mak
Tech with Mak@techNmak·
For 38 years, computer scientists believed Dijkstra's algorithm was optimal for sparse graphs. The logic seemed airtight: Dijkstra sorts vertices by distance. Sorting has a lower bound of O(n log n). Therefore shortest paths can't be faster. 5 researchers proved the assumption wrong. The trick: combine Dijkstra's priority queue with Bellman-Ford's dynamic programming. Divide and conquer on vertex sets. Shrink the frontier. Result: O(m log^(2/3) n) First improvement for directed graphs since Fibonacci heap in 1987. Tsinghua. Stanford. Max Planck. 17 pages.
Tech with Mak tweet media
English
27
276
1.9K
487.8K
Seungjun 已转推
Hanchen Li
Hanchen Li@lihanc02·
An agent that beats Claude Mythos on Terminal Bench and SWE-bench Verified? 🎉We are excited to share Terminator-1, our newest agent that achieved 95+% on SWE-bench Verified and Terminal-Bench with @MogicianTony! We show that besides model capabilities, well-designed harness could actually boost the accuracy by 3x in coding tasks. Well if you really wanted you could get 100% accuracy without solving a single task. The actual finding is that most AI benchmarks can be easily reward-hacked with simple exploits. Read more about the same 7 design flaws that almost every evaluation has ⬇️
Hanchen Li tweet media
Hao Wang@MogicianTony

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

English
170
279
3.8K
948.1K
Seungjun
Seungjun@dev_seungjun·
NotebookLM sincerely does help you understand long documents really fast. I thought it was just hype, but today I was able to understand a whole 70+ page document about AI Agents in 30 minutes. It’s different from just dropping a document into a chat and having an ongoing conversation. NotebookLM provides you a mind map, and using this, you can cover the entire document at a high level first and keep diving into smaller and smaller details. Plus, with the quizzes it generates, you can consolidate your understanding.
English
0
0
0
62
Seungjun
Seungjun@dev_seungjun·
Imagine having 8x NVIDIA B200s and running Qwen3.5-397B locally 24/7. Claude 4.5 Opus-level performance with zero rate limits and a 1M context window. The ultimate local setup.
English
0
0
0
74
Seungjun 已转推
DAIR.AI
DAIR.AI@dair_ai·
NEW paper from Google on multi-agent research agents. It's one of the first systems that handles end-to-end LaTeX generation, targeted literature reviews, and conceptual diagrams as a decoupled, standalone writer. Automated research frameworks can run experiments, but their writing modules remain the weakest link. Literature reviews are shallow, citations are sparse, and no system generates conceptual diagrams. This new research introduces a standalone writing framework that addresses all of this. PaperOrchestra is a multi-agent system that transforms unconstrained pre-writing materials, raw ideas, experimental logs, notes, into submission-ready LaTeX manuscripts. It uses specialized agents for deep literature synthesis, plot generation, conceptual diagram creation, and iterative refinement. The team also releases PaperWritingBench, the first standardized benchmark with reverse-engineered materials from 200 top-tier AI conference papers. Why does it matter? In side-by-side human evaluations, PaperOrchestra achieved absolute win rate margins of 50 to 68% in literature review quality and 14 to 38% in overall manuscript quality over autonomous baselines. Paper: arxiv.org/abs/2604.05018 Learn to build effective AI agents in our academy: academy.dair.ai
DAIR.AI tweet media
English
15
79
464
122.5K
Bummo Koo
Bummo Koo@gbmksquare·
불멍🔥
Bummo Koo tweet mediaBummo Koo tweet mediaBummo Koo tweet mediaBummo Koo tweet media
한국어
2
0
4
176
Seungjun
Seungjun@dev_seungjun·
Anyone else feeling like Claude Opus 4.6 started responding in a longer format?
English
0
0
0
52
Seungjun 已转推
Google for Developers
Google for Developers@googledevs·
A new PyTorch-native backend is coming to unlock the power of Google TPUs: ✨ Run existing PyTorch with minimal code changes. ✨ Get a 50-100%+ performance boost with Fused Eager mode. Read the engineering deep dive here: goo.gle/4vbTQQl #TorchTPU #PyTorch #MLOps #AI
Google for Developers tweet media
English
14
119
776
51.6K
Seungjun 已转推
TestingCatalog News 🗞
TestingCatalog News 🗞@testingcatalog·
BREAKING 🚨: Z AI released GLM-5.1, an open-source model with top tier coding performance! “Number 1 in open source and number 3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo.” “Runs autonomously for 8 hours, refining strategies through thousands of iterations.”
TestingCatalog News 🗞 tweet media
Z.ai@Zai_org

Introducing GLM-5.1: The Next Level of Open Source - Top-Tier Performance: #1 in open source and #3 globally across SWE-Bench Pro, Terminal-Bench, and NL2Repo. - Built for Long-Horizon Tasks: Runs autonomously for 8 hours, refining strategies through thousands of iterations. Blog: z.ai/blog/glm-5.1 Weights: huggingface.co/zai-org/GLM-5.1 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Coming to chat.z.ai in the next few days.

English
24
52
804
69.4K
Giyu
Giyu@rutu_3·
Uninstalling VS Code can increase your productivity by 85%
English
42
2
160
9K
Seungjun 已转推
Suraj Sharma
Suraj Sharma@suraj_sharma14·
$3,850/week. $15K/mo in compute. No PhD required. OpenAI Safety Fellowship is hunting sharp minds: What you'll do: • Full-time empirical AI safety research (Sept '26 – Feb '27) • Ship a paper, benchmark, or dataset by program end • Work with OpenAI's safety + alignment teams What you get: • ~$200K annualized stipend • ~$15K/mo compute budget • Visa support (J-1, F-1/CPT, OPT) • Berkeley workspace (remote possible) Deadline: May 3, 2026 (11:59PM AoE) Apply now 👇 openai.com/index/introduc… #AISafety #OpenAI #ResearchFellowship #Alignment @OpenAI
English
7
32
319
21.8K
Seungjun
Seungjun@dev_seungjun·
Did a full MLOps cycle today. > Data pipeline (BigQuery + Spark) > fine-tuned LLaMA-3-8B base (DeepSpeed + LoRA, 1x A100) > HumanEval: 30.7% / 58.6% / 69.9% (pass@1/5/10). > quantization (AWQ INT4) > serving (vLLM) > FastAPI gateway (rate limiting, logging, OpenAI fallback) > Vue frontend
Seungjun tweet media
English
0
0
0
88
Seungjun
Seungjun@dev_seungjun·
Finally understand why everyone here keeps saying buy a GPU or Mac Mini for local LLMs. Had an unused M1 MacBook sitting around. Ran Gemma 4 - E2B on it, exposed it over the network, accessed it from my current MBP. The experience feels surprisingly fast and good. Gonna save up and buy a proper GPU. Run Qwen3.5-397B or Gemma4-31B someday 👀
English
1
0
0
147
Seungjun
Seungjun@dev_seungjun·
Running Gemma-4 2B (8-bit Quantized) locally on my M3 MBP via llama.cpp. Hitting a smooth 31 tokens/sec! 🚀
English
0
0
1
286