GenBench

193 posts

GenBench banner
GenBench

GenBench

@GenBench

State-of-the-art generalisation testing in NLP. Tag us for a RT of your NLP generalisation paper tweet!

Entrou em Nisan 2022
15 Seguindo436 Seguidores
Tweet fixado
GenBench
GenBench@GenBench·
The GenBench workshop is back! Do you work on generalisation (benchmarking) in #NLProc? Submit to the 2nd edition (genbench.org/workshop/) co-located with #EMNLP2024. We have a regular track and a ✨collaborative benchmarking task (CBT)✨ that's fully LLM-focused this year (1/6)
English
1
11
22
12.6K
Robin Jia
Robin Jia@robinomial·
@GenBench @mrdrozdov @_dieuwke_ @najoungkim @kylelostat @sameer_ Interesting, my first thought is that overfitting is a subset of reward hacking 😅 overfitting is hacking the supervised learning “reward function” but the reward function could be different (and have more degenerate solutions)
English
2
0
3
262
GenBench retweetou
Najoung Kim 🫠
Najoung Kim 🫠@najoungkim·
so proud of @HayleyRossLing for getting a best paper award at @GenBench this year!! 🎉🪅🎉 I'm sure @TeaAnd_OrCoffee would be too :) check out our paper and share if you think homemade cats are cats!
Najoung Kim 🫠 tweet media
Hayley Ross@HayleyRossLing

New paper with @najoungkim and @TeaAnd_OrCoffee testing if LLMs can draw adjective-noun inferences like humans! Turns out they often can, and even generalize to unseen combinations. But they're more optimistic about "artificial intelligence" than humans. arxiv.org/abs/2410.17482

English
1
5
60
3.5K
GenBench
GenBench@GenBench·
Congrats to all the authors!
English
0
0
2
92
GenBench
GenBench@GenBench·
Best paper!
GenBench tweet mediaGenBench tweet media
English
2
0
7
1.4K
GenBench
GenBench@GenBench·
And we also have an honourable mention!
GenBench tweet mediaGenBench tweet media
English
0
0
1
103
GenBench
GenBench@GenBench·
Come listen to the hot takes of our panelist in the Brickell room! Do we still need generalisation evaluation? 🧐 #GenBench2024 #EMNLP2024
GenBench tweet media
English
0
2
15
1.5K
GenBench
GenBench@GenBench·
Still at the poster session? Come join us for keynote 3 by @sameer_!
GenBench tweet media
English
0
1
5
738
GenBench
GenBench@GenBench·
Did you miss the GenBench poster session? Don't worry we've got you, here are (nearly all) posters! 😉 #GenBench2024 #EMNLP2024 Next up: keynote by Sameer Singh at 3!
English
0
1
13
830
GenBench
GenBench@GenBench·
Last spotlight presentation: MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models aclanthology.org/2024.genbench-… Unfortunately the authors couldn't make it, the work is kindly presented by their colleague Hengyi Wang 🙏
GenBench tweet media
English
0
0
1
70
GenBench
GenBench@GenBench·
Continuing with Bastian Bunzeck, presenting The SlayQA benchmark of social reasoning: testing gender-inclusive generalization with neopronouns aclanthology.org/2024.genbench-…
GenBench tweet media
English
1
0
3
84
GenBench
GenBench@GenBench·
@kylelostat He got all the room snickering already at slide 3! 😁
English
1
0
2
97
GenBench
GenBench@GenBench·
Join us for our second keynote by Olmo co-lead @kylelostat
GenBench tweet media
English
1
3
16
1.2K