Junyang Lin

3.2K posts

Junyang Lin banner
Junyang Lin

Junyang Lin

@JustinLin610

❤️ 🍵 ☕️ 🍷 🥃 🍺

Katılım Aralık 2015
2K Takip Edilen87.1K Takipçiler
Yu Su
Yu Su@ysu_nlp·
Introducing @NeoCognition, the agent lab for specialized intelligence. Everyone needs experts, but human expertise does not scale. Backed by $40M seed funding, we build self-learning agents that specialize across domains to make expertise abundant.
English
92
134
875
174.6K
Junyang Lin
Junyang Lin@JustinLin610·
i do like this passage, and here are some thoughts: 1. critical thinking is essential in the era of agents. i still remember that many years ago when i studied the lesson of critical thinking, i learned that keeping debating with yourself by listing out reasons can really deepen your thinking. today, critical thinking becomes humans debating with agents, so that they can think more deeply together and analyze problems in a more comprehensive way. 2. designing a healthy and well-structured organization and system is essential for creation and building. with systematic support and efficient tooling, humans can work exponentially more effectively together with agents. that gives people more time to take care of their physical and mental health, while also exploring new opportunities. 3. new era often favors newbies, because they have less past experience and therefore less fear of current difficulties. what oldbies should really think about is which parts of their experience are actually worth leveraging. from my perspective, we should think more carefully about which experiences are truly aligned with first principles. but anyway, ai first is super, super exciting!
Peter Pang@intuitiveml

x.com/i/article/2043…

English
9
20
245
55.4K
Junyang Lin
Junyang Lin@JustinLin610·
we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing.
English
22
17
239
27.5K
Junyang Lin retweetledi
Dawn Song
Dawn Song@dawnsongtweets·
x.com/MogicianTony/s… 🧵 1/ Our agent Terminator-1 scored ~100% on 8 major AI agent benchmarks, e.g., SWE-bench Verified & Pro, Terminal-Bench, beating Claude Mythos. It solved 0 tasks. Benchmarks are the field's shared language for measuring AI progress. Our new work shows that language is broken. Here’s how.
Dawn Song tweet media
Hao Wang@MogicianTony

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

English
20
52
334
89.2K
Elon Musk
Elon Musk@elonmusk·
SpaceXAI Colossus 2 now has 7 models in training: - Imagine V2 - 2 variants of 1T - 2 variants of 1.5T - 6T - 10T Some catching up to do.
English
6.7K
8.1K
68.7K
28.3M
Lincoln 🇿🇦
Lincoln 🇿🇦@Presidentlin·
@JustinLin610 @Zai_org Still wild seeing you on the streets, brother. How is life treating you? It seems like you are on a small vacation.
English
1
0
12
1.5K
Junyang Lin
Junyang Lin@JustinLin610·
tokenmaxxing vs. ironmaxxing lol. it should be an era where results matter but it seems not.
English
8
2
73
14.7K
Junyang Lin
Junyang Lin@JustinLin610·
mountain climbing is so funny
English
14
1
108
17.6K
Percy Liang
Percy Liang@percyliang·
Academic titles are funny. After 14 years, I finally have the official title that people might have always assumed I had.
English
93
22
1.3K
115.3K
Junyang Lin retweetledi
ollama
ollama@ollama·
Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework. This change unlocks much faster performance to accelerate demanding work on macOS: - Personal assistants like OpenClaw - Coding agents like Claude Code, OpenCode, or Codex
English
292
733
5.8K
774.3K
Junyang Lin
Junyang Lin@JustinLin610·
model+harness is now over model only. agent perf can be significantly influenced by the design and quality of harness. i do believe this is a right direction, nice work!
Yoonho Lee@yoonholeee

How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end

English
24
55
578
78.5K
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Anthropic’s new model, Capybara: “Compared to Claude Opus 4.6, Capybara achieves dramatically higher scores in software coding, academic reasoning, and cybersecurity.” According to Dario's previous interview, it might be a 10T-parameter model that cost $10 billion to train.
Yuchen Jin tweet media
English
215
196
3.5K
623.8K
Junyang Lin retweetledi
alexintosh
alexintosh@Alexintosh·
I just ran Qwen3.5 35B on my iPhone at 5.6 tok/sec. Fully on-device. 4bit | 256 experts. Model: 19.5GB. iPhone: 12GB RAM. wild.
English
91
151
2.3K
387K