Bill Byrne

46 posts

Bill Byrne banner
Bill Byrne

Bill Byrne

@unattributed

speech and language processing researcher, also machine learning, professor, emigrant, bicyclist

Cambridge UK 가입일 Mayıs 2010
547 팔로잉74 팔로워
Desi R. Ivanova
Desi R. Ivanova@desirivanova·
Not sure about Cambridge, but Oxford courses are like ~10 years out of date. In stats, we are still teaching SVMs and kernels in the "advanced" ML classes. AFAICT, there aren't any serious DL classes in other departments (CS, math) either, of the likes of e.g. Stanford's offering
Ferenc Huszár@fhuszar

European academia 2010-2026

English
7
13
186
27.3K
Ian Wu
Ian Wu@ianwu97·
1/ What’s the best way to supervise LLMs using LLM judges? We show that Minimum Bayes Risk (MBR) decoding is the way to go! With MBR decoding you can: 💡 Trade compute for performance at inference time 🧰 Generate data for self-training without needing external labels
Ian Wu tweet media
English
5
19
83
11.9K
Bill Byrne
Bill Byrne@unattributed·
Time to dust off all the old Boltzmann Machine papers...
Bill Byrne tweet media
English
1
0
0
68
Kyunghyun Cho
Kyunghyun Cho@kchonyc·
it's taking surprisingly long for an LLM bro to announce the discovery of task oriented dialogue systems
English
3
2
41
11.9K
Bill Byrne
Bill Byrne@unattributed·
@natolambert Very nice work. Following your notes in the discussion, it'll be really interesting to see if the `reference free` reward for DPO also works. Having the reward depend on the regularisation mechanism used in training doesn't seem quite right.
English
0
0
1
68
Nathan Lambert
Nathan Lambert@natolambert·
Excited to share something that we've needed since the early open RLHF days: RewardBench, the first benchmark for reward models. 1. We evaluated 30+ of the currently available RMs (w/ DPO too). 2. We created new datasets covering chat, safety, code, math, etc. We learned a lot. We hope this is a major step in understanding why reward models work, rather than just how they do for RLHF. In short, we created pairs of responses, one good one bad (with manual review) and see where reward models agree! It's a simple and powerful process. Key takeaways: * Running reward models is hard, we build infra to make this easier. * We're already using this to learn more about PPO RLHF training (more on this soon). * Reward models mirror the refusals behavior we're confused about in RLHF. Some refuse everything (including llama 2 style stuff), some refuse nothing, and few models handle both cases well. * Datasets like Anthropic HH / Learning to Summarize only take us so far (and don't work for DPO) * Scaling matters (big models win again) Here's the current leaderboard: I'm very excited about future work. Figuring out what values are reflected, generative RMs, better RMs for training, more on DPO, and everything in between. Links! Leaderboard: huggingface.co/spaces/allenai… Code: github.com/allenai/reward… Paper (arxiv soon): github.com/allenai/reward… Eval dataset: huggingface.co/datasets/allen… YouTube walkthrough: youtu.be/CAaHAfCqrBA
YouTube video
YouTube
Nathan Lambert tweet mediaNathan Lambert tweet mediaNathan Lambert tweet media
English
111
150
446
112.6K
Bill Byrne 리트윗함
SIGdial
SIGdial@sigdial·
@JChiyahGarcia @ale_suglia @arash_eshghi @hfhastie And well done @alexcoca23, Bo-Hsiang Tseng, Jinghong Chen, Weizhe Lin, Weixuan Zhang, Tisha Anders, Bill Byrne for winning best long paper!! (Let me know if we have failed to find your handle and you want to be tagged)
English
0
2
4
388
Bill Byrne 리트윗함
GITT
GITT@GITT2024·
When you register for @EAMT_2023 , don't forget to sign up for our workshop (it's free!) so you can hear our 2nd 🔑🎵 Danielle Saunders talk about the challenges and needs in #Gender #inclusive #MachineTranslation, such as applying methods at scale (data size & language variety)
GITT tweet media
EAMT2024@EAMT_2024

Last day to register for #EAMT2023 for the 💶regular fee💶 is today! And if you can't make it to Tampere in person, choose the hybrid light option with live stream of the main conference. More details here: events.tuni.fi/eamt23/registr…

English
0
10
13
1.3K
Benjamin Marie
Benjamin Marie@bnjmn_marie·
Since MBR decoding is making a comeback, I gave it another try for MT system combination (so not during decoding itself as in recent work) using COMET. medium.com/mlearning-ai/m…
English
3
4
17
0
Bill Byrne
Bill Byrne@unattributed·
@bnjmn_marie Nice writeup! Its great to see early work by Vaibhava Goel cited, but that 2000 paper was about A* search over lattices. Andreas Stolcke and colleagues had a'97 paper (MBR over n-best lists), which we cite, and which is probably the earliest citation in ASR...
English
1
0
0
0
Bill Byrne
Bill Byrne@unattributed·
@zngu Agree entirely about UK visa process. Do you have details on the Huawei scheme? How does it avoid the 'working away' problem?
English
0
0
0
0
Margaret Li @ Neurips ‘25
Train an LM made of independent expert LMs (no syncs! no shared params!) ➡️ ➕ new or ➖ existing experts. At. Any. Time. ➡️ Ensemble OR parameter average(!!) to outperform dense & sparse LMs & ensemble baselines with less compute, a fraction of the simultaneous GPU usage. 🌳/n
Margaret Li @ Neurips ‘25 tweet media
English
7
59
331
0
Bill Byrne 리트윗함
Giving Voice to Digital Democracies
Tomorrow we will be hosting our free event on "Understanding and Automating #Counterspeech". We are very excited to discuss this important topic with experts on philosophy of language, law, computer science, media & political communication, counter terrorism and online activism.
English
1
4
3
0
Arvind Narayanan
Arvind Narayanan@random_walker·
When a machine learning system uses argmax to select outputs from a probability distribution — and most of them do — it's a clue that it might be biased. That's because argmax selects the "most probable" output, which may amplify tiny data biases into perfectly biased outputs.
English
21
290
1.4K
0