Alex Tong

8 posts

Alex Tong

Alex Tong

@ATong_04

research @Harvard, math&stats @UCBerkeley, @generalcatalyst fellow. searched for aliens in a past life @SETIInstitute

Katılım Mart 2024
107 Takip Edilen28 Takipçiler
steven
steven@stevenlu0·
finally made it official while waiting in the airport in boarding group 6! I’ll be starting a PhD at @SCSatCMU in the fall, excited for the journey to come 🥳🎉
steven tweet media
English
24
11
432
19.3K
Angelina Lue
Angelina Lue@angelina_lue·
Hey twitter/x, one of my goals this year is to share more things that excite me with the world. I’m starting here so let me introduce myself: My name is Angelina, I’m 22, and I currently live in SF! For the past six months, I’ve been working at Meta Superintelligence Labs on model training infra and data strategy👩🏻‍💻 Before that I was at UCLA studying CS and Econ and spent a lot of my time in college building in fintech and investing in early stage companies (General Catalyst Venture Fellows, NEA, Mantis VC). I love food, traveling to new places, a good story, snowboarding, and hosting dinners and game nights🕺🏻 I also love meeting new people, feel free to say hi :)
Angelina Lue tweet media
English
102
5
577
197.3K
Alex Tong
Alex Tong@ATong_04·
@giffmana funny how this showed up on my feed the moment i started noticing unusual questions in the benchmark I’m currently working with
English
1
0
2
414
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
A lot of datasets are actually really bad! Even big conference ones, even ones that got awards! It made me blanket lose trust. It's simple to find out: Just spend 30min looking at it randomly. For vision, finetune a blind and a non-blind model and compare. That's all it takes.
Lei Yang@diyerxx

Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment. So here’s what happened. Earlier this month, a colleague shared an Apple paper on arXiv with me — it was also under review for ICLR 2026. The benchmark they proposed was perfectly aligned with a project we’re working on. I got excited after reading it. I immediately stopped my current tasks and started adapting our model to their benchmark. Pulled a whole weekend crunch session to finish the integration… only to find our model scoring absurdly low. I was really frustrated. I spent days debugging, checking everything — maybe I used it wrong, maybe there was a hidden bug. During this process, I actually found a critical bug in their official code: * When querying the VLM, it only passed in the image path string, not the image content itself. The most ridiculous part? After I fixed their bug, the model's scores got even lower! The results were so counterintuitive that I felt forced to do deeper validation. After multiple checks, the conclusion held: fixing the bug actually made the scores worse. At this point I decided to manually inspect the data. I sampled the first 20 questions our model got wrong, and I was shocked: * 6 out of 20 had clear GT errors. * The pattern suggested the “ground truth” was model-generated with extremely poor quality control, leading to tons of hallucinations. * Based on this quick sample, the GT error rate could be as high as 30%. I reported the data quality issue in a GitHub issue. After 6 days, the authors replied briefly and then immediately closed the issue. That annoyed me — I’d already wasted a ton of time, and I didn’t want others in the community to fall into the same trap — so I pushed back. Only then did they reopen the GitHub issue. Then I went back and checked the examples displayed in the paper itself. Even there, I found at least three clear GT errors. It’s hard to believe the authors were unaware of how bad the dataset quality was, especially when the paper claims all samples were reviewed by annotators. Yet even the examples printed in the paper contain blatant hallucinations and mistakes. When the ICLR reviews came out, I checked the five reviews for this paper. Not a single reviewer noticed the GT quality issues or the hallucinations in the paper's examples. So I started preparing a more detailed GT error analysis and wrote a Public Comment on OpenReview to inform the reviewers and the community about the data quality problems. The next day — the authors withdrew the paper and took down the GitHub repo. Fortunately, ICLR is an open conference with Public Comment. If this had been a closed-review venue, this kind of shoddy work would have been much harder to expose. So here’s a small call to the community: For any paper involving model-assisted dataset construction, reviewers should spend a few minutes checking a few samples manually. We need to prevent irresponsible work from slipping through and misleading everyone. Looking back, I should have suspected the dataset earlier based on two red flags: * The paper’s experiments claimed that GPT-5 has been surpassed by a bunch of small open-source models. * The original code, with a ridiculous bug, produced higher scores than the bug-fixed version. But because it was a paper from Big Tech, I subconsciously trusted the integrity and quality, which prevented me from spotting the problem sooner. This whole experience drained a lot of my time, energy, and emotion — especially because accusing others of bad data requires extra caution. I’m sharing this in hopes that the ML community remains vigilant and pushes back against this kind of sloppy, low-quality, and irresponsible behavior before it misleads people and wastes collective effort. #ICLR #ICLR2026 #NeurIPS #CVPR #openreview #MachineLearning #LLM #VLM

English
22
42
658
154.2K
Sean Cai
Sean Cai@SeanZCai·
Excited to announce that I'm joining @costanoavc as an investor! Having spent the past few years at General Catalyst and Hummingbird Ventures experiencing the best of both megacap and boutique venture, I firmly believe that Costanoa represents the best of both worlds.
English
11
0
25
1.7K
Chris Samra
Chris Samra@crsamra·
for the next 24hrs, im giving away camera glasses comment "Waves" picking 20 ppl :)
English
6.3K
322
5.9K
787.9K
Alex Tong
Alex Tong@ATong_04·
@jlinbio genuinely so confused what you’re doing at this point….
English
0
0
0
9
Alex Tong
Alex Tong@ATong_04·
@jxmnop seems like it’s partially a byproduct of the current landscape… not 100% sure how we can align the incentives better
English
0
0
0
90
dr. jack morris
dr. jack morris@jxmnop·
## The case for more ambition i wrote about how AI researchers should ask bigger and simpler questions, and publish fewer papers:
dr. jack morris tweet media
English
25
94
1.1K
78.2K
Doug McCracken
Doug McCracken@DougMcCracken·
Thanks to everyone for coming out to our @a16zGames Tech X Games Mixer tonight I love visiting our NY friends and making new ones! 🗽#TechWeek
Doug McCracken tweet mediaDoug McCracken tweet media
English
5
3
40
4K