Yiwei Hou

14 posts

Yiwei Hou

@yiwei_hou

CS PhD student @berkeley_ai BAIR, @BerkeleySky Sky Lab. Prev: MS @Tsinghua_Uni, Summer @UChicago. In the intersection of Security, Software Systems, and AI.

Berkeley, California Katılım Ocak 2020

278 Takip Edilen139 Takipçiler

Yiwei Hou retweetledi

Qiuyang Mang@MangQiuyang·15 May

Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: frontier-cs.org/blog/frontiers… Paper: arxiv.org/abs/2605.14445 Code: github.com/FrontierCS/Fro… Model: huggingface.co/runyuanhe/qwen…

English

331

93.2K

Yiwei Hou retweetledi

Chris.@chefmade_92·7 May

Canvas was hit with a nasty ransomeware. So alot of students are getting this message when they try to log in to canvas Canvas gets hacked during Finals week is insane

English

222

2.6K

215.8K

Yiwei Hou retweetledi

Koushik Sen@koushik77·8 May

I am thrilled to announce the release of a new version of KISS Sorcar at github.com/ksenxx/kiss_ai. KISS Sorcar is a general-purpose AI assistant and IDE, implemented as a Visual Studio Code extension and a web/mobile app, built on its KISS Agent Framework. It runs locally, is free and open source, and uses model API keys from major LLM providers such as Anthropic. It also supports Claude Code, or OpenAI Codex. New KISS Sorcar implements parallel agents, a Git worktree isolation, multiple tabs, a Sorcar web/mobile app, third-party agents, and skills. At its core, KISS Sorcar is a reliable coding and research assistant with strong browser support through Chromium and Playwright, multimodal support, Docker container support, and the ability to run agents for extended periods. KISS Sorcar scored 62.2% on Terminal Bench 2.0, slightly ahead of Cursor agent at 61.7% and Claude Code at 58%.

English

280.5K

Yiwei Hou@yiwei_hou·1 May

You are welcome to request a Revelio scan of your own repository: docs.google.com/forms/d/e/1FAI…. 🌟Full blog: m1-llie.github.io/Revelio-agent-…. With @MogicianTony, @MuxiLyu7038, @MariusMomeu, @dawnsongtweets, @koushik77, and David Wagner. @ericnwen and @shtigeryang also contributed.

English

325

Yiwei Hou@yiwei_hou·1 May

5/5 The economics have flipped A scan that costs hundreds of dollars, not a six-figure audit budget, can surface CVE-worthy bugs. Defenders, not attackers, should scan first.

English

195

Yiwei Hou@yiwei_hou·1 May

Agent harness is as important as the model for cybersecurity. $300 in compute, 9 OSS-Fuzz projects, 14 security issues and 5 CVEs. The key lesson: you don’t need a secret model to find real security issues. You need an effective, affordable, reliable harness. 5 takeaways 🧵

English

1.4K

Yiwei Hou retweetledi

Hao Wang@MogicianTony·9 Nis

SWE-bench Verified and Terminal-Bench—two of the most cited AI benchmarks—can be reward-hacked with simple exploits. Our agent scored 100% on both. It solved 0 tasks. Evaluate the benchmark before it evaluates your agent. If you’re picking models by leaderboard score alone, you’re optimizing for the wrong thing. 🧵

English

680

825.5K

Yiwei Hou@yiwei_hou·20 Mar

@slimshetty_ @METR_Evals Really resonate with your take on evals, such an important gap. Lucky to be a labmate and learn from you. Huge congrats on the PhD and all the best!

English

Manish Shetty@slimshetty_·19 Mar

1/ Thrilled to share that I’m joining @METR_Evals after finishing my PhD at Berkeley!

English

181

10.6K

Yiwei Hou retweetledi

Melissa Pan@melissapan·1 Kas

The Sky’s Fun Committee, representing the ppl of sky, just dropped the new lab theme: ⚫️💖 Black Pink x Halloween 🎃🦇 We have: - Gru & the minions - kpop ??? 🫰😉

English

7.6K

Yiwei Hou retweetledi

Yihao Chen@PorcupineAndrew·4 Eyl

Introducing matrix-bgpsim: a GPU-accelerated, matrix-based BGP simulator. ✅ Internet-scale, all-pairs route generation in just hours ✅ PyTorch/CuPy backend for high performance ✅ Research-friendly for lagre scale analytics. pypi.org/project/matrix… #BGP #Networking #Research

English

327

Keşfet

@MogicianTony @MuxiLyu7038 @MariusMomeu @dawnsongtweets @koushik77 @ericnwen @shtigeryang @slimshetty_