Yiwei Hou

14 posts

Yiwei Hou banner
Yiwei Hou

Yiwei Hou

@yiwei_hou

CS PhD student @berkeley_ai BAIR, @BerkeleySky Sky Lab. Prev: MS @Tsinghua_Uni, Summer @UChicago. In the intersection of Security, Software Systems, and AI.

Berkeley, California Katılım Ocak 2020
278 Takip Edilen139 Takipçiler
Yiwei Hou retweetledi
Qiuyang Mang
Qiuyang Mang@MangQiuyang·
Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: frontier-cs.org/blog/frontiers… Paper: arxiv.org/abs/2605.14445 Code: github.com/FrontierCS/Fro… Model: huggingface.co/runyuanhe/qwen…
English
14
71
331
93.2K
Yiwei Hou retweetledi
Chris.
Chris.@chefmade_92·
Canvas was hit with a nasty ransomeware. So alot of students are getting this message when they try to log in to canvas Canvas gets hacked during Finals week is insane
Chris. tweet media
English
42
222
2.6K
215.8K
Yiwei Hou retweetledi
Koushik Sen
Koushik Sen@koushik77·
I am thrilled to announce the release of a new version of KISS Sorcar at github.com/ksenxx/kiss_ai. KISS Sorcar is a general-purpose AI assistant and IDE, implemented as a Visual Studio Code extension and a web/mobile app, built on its KISS Agent Framework. It runs locally, is free and open source, and uses model API keys from major LLM providers such as Anthropic. It also supports Claude Code, or OpenAI Codex. New KISS Sorcar implements parallel agents, a Git worktree isolation, multiple tabs, a Sorcar web/mobile app, third-party agents, and skills. At its core, KISS Sorcar is a reliable coding and research assistant with strong browser support through Chromium and Playwright, multimodal support, Docker container support, and the ability to run agents for extended periods. KISS Sorcar scored 62.2% on Terminal Bench 2.0, slightly ahead of Cursor agent at 61.7% and Claude Code at 58%.
English
6
4
47
280.5K
Yiwei Hou
Yiwei Hou@yiwei_hou·
5/5 The economics have flipped A scan that costs hundreds of dollars, not a six-figure audit budget, can surface CVE-worthy bugs. Defenders, not attackers, should scan first.
English
1
0
2
195
Yiwei Hou
Yiwei Hou@yiwei_hou·
Agent harness is as important as the model for cybersecurity. $300 in compute, 9 OSS-Fuzz projects, 14 security issues and 5 CVEs. The key lesson: you don’t need a secret model to find real security issues. You need an effective, affordable, reliable harness. 5 takeaways 🧵
Yiwei Hou tweet media
English
1
8
16
1.4K
Yiwei Hou retweetledi
Hao Wang
Hao Wang@MogicianTony·
SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵
Hao Wang tweet media
English
22
91
680
825.5K
Yiwei Hou
Yiwei Hou@yiwei_hou·
@slimshetty_ @METR_Evals Really resonate with your take on evals, such an important gap. Lucky to be a labmate and learn from you. Huge congrats on the PhD and all the best!
English
0
0
1
78
Manish Shetty
Manish Shetty@slimshetty_·
1/ Thrilled to share that I’m joining @METR_Evals after finishing my PhD at Berkeley!
English
12
3
181
10.6K
Yiwei Hou retweetledi
Melissa Pan
Melissa Pan@melissapan·
The Sky’s Fun Committee, representing the ppl of sky, just dropped the new lab theme: ⚫️💖 Black Pink x Halloween 🎃🦇 We have: - Gru & the minions - kpop ??? 🫰😉
Melissa Pan tweet mediaMelissa Pan tweet mediaMelissa Pan tweet mediaMelissa Pan tweet media
English
8
8
52
7.6K
Yiwei Hou retweetledi
Yihao Chen
Yihao Chen@PorcupineAndrew·
Introducing matrix-bgpsim: a GPU-accelerated, matrix-based BGP simulator. ✅ Internet-scale, all-pairs route generation in just hours ✅ PyTorch/CuPy backend for high performance ✅ Research-friendly for lagre scale analytics. pypi.org/project/matrix… #BGP #Networking #Research
English
0
1
3
327