Cor-Paul Bezemer

62 posts

Cor-Paul Bezemer

Cor-Paul Bezemer

@corpaul

Associate professor in Software Engineering @ University of Alberta ECE

Edmonton, Alberta Katılım Eylül 2008
393 Takip Edilen500 Takipçiler
Cor-Paul Bezemer retweetledi
taesiri
taesiri@taesiri·
Happy to announce that GlitchBench has been accepted to #CVPR2024🎉 twitter.com/_akhaliq/statu…
AK@_akhaliq

GlitchBench: Can large multimodal models detect video game glitches? paper page: huggingface.co/papers/2312.05… Large multimodal models (LMMs) have evolved from large language models (LLMs) to integrate multiple input modalities, such as visual inputs. This integration augments the capacity of LLMs for tasks requiring visual comprehension and reasoning. However, the extent and limitations of their enhanced abilities are not fully understood, especially when it comes to real-world tasks. To address this gap, we introduce GlitchBench, a novel benchmark derived from video game quality assurance tasks, to test and evaluate the reasoning capabilities of LMMs. Our benchmark is curated from a variety of unusual and glitched scenarios from video games and aims to challenge both the visual and linguistic reasoning powers of LMMs in detecting and interpreting out-of-the-ordinary events. We evaluate multiple state-of-the-art LMMs, and we show that GlitchBench presents a new challenge for these models.

English
0
2
7
1.1K
Cor-Paul Bezemer retweetledi
AK
AK@_akhaliq·
GlitchBench: Can large multimodal models detect video game glitches? paper page: huggingface.co/papers/2312.05… Large multimodal models (LMMs) have evolved from large language models (LLMs) to integrate multiple input modalities, such as visual inputs. This integration augments the capacity of LLMs for tasks requiring visual comprehension and reasoning. However, the extent and limitations of their enhanced abilities are not fully understood, especially when it comes to real-world tasks. To address this gap, we introduce GlitchBench, a novel benchmark derived from video game quality assurance tasks, to test and evaluate the reasoning capabilities of LMMs. Our benchmark is curated from a variety of unusual and glitched scenarios from video games and aims to challenge both the visual and linguistic reasoning powers of LMMs in detecting and interpreting out-of-the-ordinary events. We evaluate multiple state-of-the-art LMMs, and we show that GlitchBench presents a new challenge for these models.
English
0
16
41
12.1K
Cor-Paul Bezemer retweetledi
taesiri
taesiri@taesiri·
Excited to share GlitchBench! 🚀 It is a new benchmark designed specifically for large multimodal models. GlitchBench sets a new standard by incorporating tasks from actual game quality assurance scenarios 🎮, bringing real-world challenges into focus. #AI #MachineLearning #GameDev ArXiv: arxiv.org/abs/2312.05291 Project Website: glitchbench.github.io Hugging Face 🤗 Dataset: huggingface.co/datasets/glitc… Leaderboard 🏆: huggingface.co/spaces/glitchb…
English
3
5
11
1.4K
Cor-Paul Bezemer retweetledi
Anh Totti Nguyen
Anh Totti Nguyen@anh_ng8·
How to score > 90% on ImageNet? Our new study on the spatial biases of ImageNet and relevant ImageNet-scale, OOD benchmarks reveals that all common image classifiers tested can score > 90%, if the model looks at the correct crop, i.e., ⭐️ Zoom 🔎 is all you need! ⭐️ 1/n
English
2
29
138
31.2K
Cor-Paul Bezemer retweetledi
Philipp Leitner (@xLeitix@discuss.systems)
Ad for our teaching professor position is now out: web103.reachmee.com/ext/I005/1035/… 100%, permanent from start, min. 25% of worktime reserved for research (can be increased with grants). Hit me up if you want to know more, and RTs appreciated ;)
Philipp Leitner (@[email protected])@xLeitix

My division at @cse_gbg is hiring 1-2 Assistant / Associate Teaching Professors in Software Engineering and/or Interaction Design. Positions will be 100% & permanent from start.

English
1
9
4
0
Cor-Paul Bezemer retweetledi
Philipp Leitner (@xLeitix@discuss.systems)
Give yourself an early Easter present and join us at ICPE! It’s free of charge and we have great speakers (keynote and otherwise :) )!
SPEC@spec_perf

Very excited to see the #ICPE2022 keynote from @Google's John Wilkes on "Building Warehouse-Scale Computers," taking place at the 4/9-4/13 virtual conference. Learn more about this year's keynotes at: ow.ly/U0Nc50IrGb0 Don't forget to register - it's free!

English
0
6
11
0
Cor-Paul Bezemer retweetledi
Simon Eismann
Simon Eismann@simon_eismann·
Ever wondered how the performance of #Serverless applications changes WITHOUT code changes? Over 10 months, we observed significant changes on #AWS in our @JSSoftware paper "A case study on the stability of performance tests for serverless applications" bit.ly/38kG5ZH
Simon Eismann tweet media
English
1
6
20
0
Cor-Paul Bezemer retweetledi
Karim Ali (كريم علي)
Karim Ali (كريم علي)@karimhamdanali·
Game developers! We are running an anonymous research survey on the current practices, goals, and needs for quality assurance in #gamedev and would love your input. Over $4,000 in random draw prizes. Please spread the word! surveymonkey.com/r/gamedevtesti…
English
1
5
7
0
Cor-Paul Bezemer retweetledi
The ASGAARD Lab
The ASGAARD Lab@asgaard_lab·
Also, (Twitter-anonymous) Mikael's paper "Studying the Performance Risks of Upgrading Docker Hub Images: A Case Study of WordPress" was accepted at ICPE 2022! Preprint available @ asgaard.ece.ualberta.ca/studying-the-p…
English
0
1
7
0
Cor-Paul Bezemer retweetledi
Diego Elias Costa
Diego Elias Costa@DiegoEliasCosta·
Great to see @gvwilson summary of our work in bad practices of Java benchmarking! Work done in collaboration with @xLeitix, @corpaul, and Artur Andrzejak. :)
English
0
3
11
0
Cor-Paul Bezemer retweetledi
The ASGAARD Lab
The ASGAARD Lab@asgaard_lab·
Mikael and Chloe's systematic literature survey on Applications of Generative Adversarial Networks in Anomaly Detection is available now on arXiv: arxiv.org/abs/2110.12076
English
1
1
5
0