Raphael Tang

128 posts

Raphael Tang banner
Raphael Tang

Raphael Tang

@ralph_tang

AI researcher working on foundation models for language, vision, speech, and everything in between @ucl_nlp

The Internet Katılım Nisan 2012
92 Takip Edilen337 Takipçiler
Raphael Tang
Raphael Tang@ralph_tang·
Ironic that this derivative came from Schmidhuber's lab
Yuntian Deng@yuntiandeng

Glad to see followups to neural-os.com, but disappointed that neither the blog (with 34 refs) nor the code repo acknowledged NeuralOS, even tho the released data code appears to build directly on top of ours. That omission is hard to understand given our shared vision.

English
0
0
3
126
Raphael Tang retweetledi
Simone Foti
Simone Foti@simo_foti·
It's time to bring 3D meshes into modern machine learning properly! 🛸 Our work solves the non-differentiability of the Exp map on meshes, enabling gradients to flow directly through geodesics. It’s differentiable, GPU-fast, and fully parallelised. circle-group.github.io/research/DSG
English
7
79
685
68.8K
Raphael Tang retweetledi
return of the research era ꙮ
return of the research era ꙮ@byebyescaling·
This seems like a pretty big deal to me. OpenAI's circuit-sparsity release potentially entails that MoEs are a dead end. ​We've been isolating weights into "experts" as a crude approximation of sparsity just to appease dense matrix kernels. It fragments the manifold. ​The real target is inherent sparsity: projecting into massive nominal dimensions (d \gg d_{model}) with strict k-sparse activation. This forces features to be monosemantic and orthogonal by design, solving superposition natively rather than relying on router hacks to disentangle interference. It appears that we aren't just scaling parameters anymore, we're scaling the basis!
return of the research era ꙮ tweet media
AK@_akhaliq

OpenAI just released circuit-sparsity huggingface.co/openai/circuit…

English
31
152
1.3K
130.1K
Raphael Tang retweetledi
Yao Lu
Yao Lu@yaolu_nlp·
What if you could build a competitive multilingual LLM with just a translator, no matter the level of data resource available? 🤔TL;DR: We translate FineWeb to TransWeb, showing multilingual modelling can be addressed by the simple "translate everything" idea. w/@togethercompute
English
3
3
9
2.2K
Raphael Tang retweetledi
Wenyan Li
Wenyan Li@Wenyan62·
I will be presenting our Lost in embeddings poster at EMNLP! Hope to see many old and new friends in Suzhou!🤗🤗 📍Time/Date: Fri. Nov 7 at 12:30-13:30 Location: Hall C Also happy to chat anything about VLMs, RAG and recently get in the domain of fintech. #EMNLP2025
Wenyan Li@Wenyan62

Happy to share (with a bit of delay tho) our paper on quantifying visual information loss in VLMs --- "Lost in Embeddings: Information Loss in Vision-Language Models" is accepted to EMNLP 2025 findings: arxiv.org/pdf/2509.11986 💃code is also released: github.com/lyan62/vlm-inf…

English
2
4
37
7.4K
Raphael Tang retweetledi
Avi Chawla
Avi Chawla@_avichawla·
Finally, Python 3.14 lets you disable GIL! It's a big deal because earlier, even if you wrote multi-threaded code, Python could only run one thread at a time, giving no performance benefit. But now, Python can run your multi-threaded code in parallel. And uv fully supports it!
English
118
488
5.1K
546.2K
Raphael Tang
Raphael Tang@ralph_tang·
📢Our new paper critically examines arena-style LLM evaluation, e.g., LMArena, questioning whether draws actually mean equal model ability. TL;DR: simply ignoring draws improves rating systems by 1-3%, and query difficulty/subjectivity relate more strongly to draws than model ratings do.
Raphael Tang tweet media
English
1
2
4
725
Gregory Wieber
Gregory Wieber@dreamwieber·
Alright, friends – my NEW Apple Vision Pro app Metaballs: Spatial has launched! 🤯 This was SO much work, and would mean the world to me if you share, and go download it today!
English
358
767
11.3K
1.6M
DailyPapers
DailyPapers@HuggingPapers·
Microsoft research reveals information loss in Vision-Language Models VLMs lose 40-60% semantic context after the projection step, distorting visual representations & impacting downstream tasks. See k-NNs diverge from fruit to mushrooms!
DailyPapers tweet media
English
4
48
365
31.2K
Raphael Tang retweetledi
Wenyan Li
Wenyan Li@Wenyan62·
Happy to share (with a bit of delay tho) our paper on quantifying visual information loss in VLMs --- "Lost in Embeddings: Information Loss in Vision-Language Models" is accepted to EMNLP 2025 findings: arxiv.org/pdf/2509.11986 💃code is also released: github.com/lyan62/vlm-inf…
Wenyan Li tweet media
English
8
36
308
29.9K
Raphael Tang
Raphael Tang@ralph_tang·
📈WhisTLE+TTS adaptation ⇒ 50%+ relative improvement in word error rate vs. no adaptation, wins in 27/32 scenarios. Outperforms TTS adaptation alone by a relative 12.3% in WER
English
1
0
1
89
Raphael Tang
Raphael Tang@ralph_tang·
We introduce WhisTLE (arxiv.org/abs/2509.10452): the first deeply supervised, text-only domain adaptation method for pretrained ASR models like Whisper. Tl;dr: fast, no extra runtime cost, 50%+ relative word error rate reduction
Raphael Tang tweet media
English
1
1
1
214
Raphael Tang retweetledi
David McAllister
David McAllister@davidrmcall·
Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.
English
8
205
1.2K
150.1K
Raphael Tang retweetledi
Shashwat Goel
Shashwat Goel@ShashwatGoel7·
There's been a hole at the heart of #LLM evals, and we can now fix it. 📜New paper: Answer Matching Outperforms Multiple Choice for Language Model Evaluations. ❗️We found MCQs can be solved without even knowing the question. Looking at just the choices helps guess the answer and get high accuracies. This affects popular benchmarks like MMLU-Pro, SuperGPQA etc. and even "multimodal" benchmarks like MMMU-Pro, which can be solved without even looking at the image ⁉️. Such choice-only shortcuts are hard to fix. We find prior attempts at fixing them-- GoldenSwag (for HellaSwag) and TruthfulQA v2 ended up worsening the problem. MCQs are inherently a discriminative task, only requiring picking the correct choice among a few given options. Instead we should evaluate language models for the generative capabilities they are used for. We show discrimination is easier than even verification, let alone generation. 🤔 But how do we grade generative responses outside "verifiable domains" like code and math? So many paraphrases are valid answers... We show a scalable alternative--Answer Matching--works surprisingly well. Its simple--get generative responses to existing benchmark questions that are specific enough to have a semantically unique answer without showing choices. Then, use an LM to match the response against the ground-truth answer. 👨‍🔬We conduct a meta-evaluation by comparing to ground-truth verification on MATH, and human grading on MMLU-Pro and GPQA-Diamond questions. Answer Matching outcomes give near-perfect alignment, with even small (recent) models like Qwen3-4B. In contrast, LLM-as-a-judge, even with frontier reasoning models like o4-mini, fares much worse. This is because without the reference-answer, the model is tasked with verification, which is harder than what answer matching requires--paraphrase detection--a skill modern language models have aced💡 Lets shift the benchmarking ecosystem from MCQs to Answer Matching. Impacts: Leaderboards: We show model rankings can change and accuracies go down making benchmarks seem less saturated. Benchmark Creation: Instead of creating harder MCQs, we should focus our efforts on creating questions with for answer matching, much like SimpleQA, GAIA etc. 🤑 Cost: Finally, to our great surprise, answer matching evals are cheaper to run than MCQs! See our paper for more, its packed with insights. 🧵 has paper and more result figures.
Shashwat Goel tweet media
English
11
37
229
35.6K
Raphael Tang retweetledi
Xueguang Ma
Xueguang Ma@xueguang_ma·
Sharing our recent efforts on applying OmniEmbed to large-scale video retrieval MultiVENT2.0! tl;dr, we achieve SoTA on the MAGMAR shared task leaderboard. More importantly, we provide in-depth analysis on the effectiveness of different input modalities for video retrieval.
Xueguang Ma tweet mediaXueguang Ma tweet media
English
1
4
23
1.1K
Raphael Tang retweetledi
WIRED
WIRED@WIRED·
On Yupp, chatbot users earn cash by saying which of two prompts they prefer—info that has great value to the AI companies running the models wired.com/story/yupp-cha…
English
3
15
57
28.5K
Raphael Tang retweetledi
Jimmy Lin
Jimmy Lin@lintool·
💥 My awesome @UWaterloo ugrad student @sisi_xili - with the help of @rpradeep42 - slapped an MCP server in front of Pyserini to create MCPyserini and connected it to Claude to create DeepResearcherini! 🤪 Here, an example of RAG using the MS MARCO v1 passage collection.
English
1
10
59
18.8K