Ahmet Iscen (@ahmetius) - Twitter Profili | Zamantika Mersobahis Locabet

Ahmet Iscen retweetledi

Alireza Fathi@alirezafathi·5 Ağu

Our team at Google DeepMind Foundational Research has an opening for a full-time Research Scientist! Areas of Interest are Multimodal, 3D and Spatial Reasoning, Self-improving Agents. Looking for candidates with strong publications at top ML and CV conferences. Email: af_hiring@google.com

English

2

28

346

37.5K

Ahmet Iscen retweetledi

Alireza Fathi@alirezafathi·7 Oca

Our team at Google DeepMind Foundational Research is hiring full-time Research Scientists and Research Interns! Multimodal, Reasoning, self-improving agents, Video Understanding. Looking for candidates with strong papers at top ML and CV conferences. Email: af_hiring@google.com

English

13

64

618

61.4K

Ahmet Iscen@ahmetius·5 Ara

Want to work on the future of multimodal AI? Our Google DeepMind team in Grenoble, led by @CordeliaSchmid, is hiring interns for multimodal AI research (long-video understanding and visual reasoning in 2D and 3D). Email ai.gnb.hiring@gmail.com or find me at #NeurIPS2024!

English

5

16

182

16K

Ahmet Iscen@ahmetius·8 Kas

Work done with excellent collaborators @mcaron31 , @alirezafathi and @CordeliaSchmid

English

0

2

421

Ahmet Iscen@ahmetius·8 Kas

Our new #NeurIPS2024 paper tackles web-scale visual entity recognition by automatically curating a training dataset with a multimodal LLM, achieving SOTA results (+6.9% on OVEN)! Learn how we use multimodal LLMs for label verification and data enrichment: arxiv.org/abs/2410.23676

English

2

3

25

4K

Ahmet Iscen retweetledi

Alireza Fathi@alirezafathi·17 Eyl

Our team at Google DeepMind is seeking a Research Scientist with a strong publication record (multiple first-author papers) on multi-modal LLMs in top ML venues like NeurIPS, ICLR, CVPR. Email me at af_hiring@google.com @CordeliaSchmid

English

4

47

378

53.2K

Ahmet Iscen retweetledi

Dmytro Mishkin 🇺🇦@ducha_aiki·7 Ağu

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval @SumaPavel @g_kordo @ahmetius @giotolias tl;dr: global+local similarity via transformer->binarize descrs+distill. Crucial:train with random number of descriptors. arxiv.org/abs/2408.03282

English

1

11

25

2.8K

Ahmet Iscen retweetledi

Yisong Yue@yisongyue·1 Ağu

In case you missed our #ICML2024 oral presentation, check out SceneCraft, an LLM Agent for writing Blender-executable code that can render complex scenes with up to a hundred 3D assets. Paper: arxiv.org/abs/2403.01248 The SceneCraft agent is able to do complex spatial planning and arrangement, by maintaining a scene graph blueprint, and detailing spatial relationships among assets in the scene. SceneCraft leverages VLMs to iteratively refine a scene, and library learning to build a reusable spatial skill library. Taken together, SceneCraft is able to handle increasingly complex scenes and descriptions without external human expertise or LLM parameter tuning. This work was led by amazing @acbuller in collaboration with awesome colleagues at Google: @ahmetius, @aashi7jain, @tkipf, David Ross, @CordeliaSchmid, Alireza Fathi.

English

3

18

80

8.5K

Ahmet Iscen retweetledi

Arsha Nagrani@NagraniArsha·19 Haz

Ahmet @ahmetius, @CordeliaSchmid and I are looking to hire a student researcher at @GooglDeepMind this fall! Start: September Loc: Cam, USA but flexible Unfortunately I’m not at @CVPRConf this year (2 month old baby!!👶) but pls find Ahmet or Cordelia @CVPR if interested!

English

6

10

81

24.3K

Ahmet Iscen@ahmetius·17 Haz

Cordelia Schmid will now present "Multistage reasoning for video understanding and scene generation." in Summit 321 ! #CVPR2024

Ahmet Iscen@ahmetius

Xuhui Jui, the author of the Instruct-Imagen (CVPR24 oral paper), will present his work in 20 minutes! Come to Summit 321! #CVPR24

English

0

866

Ahmet Iscen@ahmetius·17 Haz

Xuhui Jui, the author of the Instruct-Imagen (CVPR24 oral paper), will present his work in 20 minutes! Come to Summit 321! #CVPR24

Ahmet Iscen@ahmetius

🔥 Calling all #CVPR2024 attendees! 🔥 Join us for the 1st Tool-Augmented VIsion (TAVI) Workshop on Monday morning in Summit 321! 💡 5 inspiring keynote talks 🎨 5 invited posters from the main conference Don't miss out! ➡️ More info: sites.google.com/corp/view/tavi…

English

0

4

2.4K

Ahmet Iscen@ahmetius·14 Haz

🔥 Calling all #CVPR2024 attendees! 🔥 Join us for the 1st Tool-Augmented VIsion (TAVI) Workshop on Monday morning in Summit 321! 💡 5 inspiring keynote talks 🎨 5 invited posters from the main conference Don't miss out! ➡️ More info: sites.google.com/corp/view/tavi…

English

1

7

21

12.4K

Ahmet Iscen@ahmetius·11 Mar

VLMs are great, but can we use their generative capabilities for web-scale entity recognition? GERALD leverages VLMs to generate unambiguous, language-based and discriminative codes for 6M-scale entity recognition. Looking forward to present GERALD at CVPR24!

Mathilde Caron@mcaron31

Happy to introduce GERALD - our new VLM that recognizes 6M+ entities, an exciting step towards Web-scale visual entity recognition! Predictions are simply made by auto-regressively decoding a code representing the entity name. Check out our CVPR24 paper: arxiv.org/abs/2403.02041

English

0

17

1.9K

Ahmet Iscen@ahmetius·7 Şub

We will be organizing the 1st Tool-Augmented VIsion (TAVI) Workshop at #CVPR2024. We are looking forward to having an exciting list of keynote speakers covering various topics about tool-use and retrieval augmented models. More details at: sites.google.com/view/tavi-cvpr…

#CVPR2026@CVPR

The list of #CVPR2024 workshops is now available! cvpr.thecvf.com/Conferences/20…

English

1

9

34

13.7K

Ahmet Iscen@ahmetius·17 Oca

Looking forward to present RECO at #ICLR2024 !

Ahmet Iscen@ahmetius

Happy to share our recent preprint! Models like CLIP tend to struggle on fine-grained tasks. We equip these models with the ability to retrieve and refine additional data at inference, which substantially improves the performance. With @mcaron31 , @alirezafathi , @CordeliaSchmid

English

0

1

13

1.3K

Ahmet Iscen retweetledi

Ziniu Hu@acbuller·12 Ara

Interested in LLM + Tool-Use, via Tree-Search? This afternoon in #NeurIPS2023, #215, I'll present "AVIS: Autonomous Visual Information Seeking with Large Language Model Agent" (blog.research.google/2023/08/autono…) Feel free to drop by and chat.

English

2

25

146

17.3K

Ahmet Iscen retweetledi

Sundar Pichai@sundarpichai·6 Ara

Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks. With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU. blog.google/technology/ai/…

English

930

3.6K

22.5K

5M

Ahmet Iscen retweetledi

Alireza Fathi@alirezafathi·19 Ağu

Here is our Google AI blog post on AVIS, a Large Language Model Agent that achieves state-of-the-art results on visual information seeking tasks. @acbuller @ahmetius @jesu9 @CordeliaSchmid

Google AI@GoogleAI

Today on the blog, read all about AVIS — Autonomous Visual Information Seeking with Large Language Models — a novel method that iteratively employs a planner and reasoner to achieve state-of-the-art results on visual information seeking tasks → goo.gle/3P2y2mY

English

0

3

17

5.7K

Ahmet Iscen@ahmetius·16 Haz

How do we find information on the web? We try to address this question in AVIS, by coupling #LLM-based reasoner and planner with external tools, e.g. search. This results in a significant performance increase in challenging fine-grained VQA datasets, where SOTA VLMs struggle.

AK@_akhaliq

AVIS: Autonomous Visual Information Seeking with Large Language Models paper page: huggingface.co/papers/2306.08… In this paper, we propose an autonomous information seeking visual question answering framework, AVIS. Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools and to investigate their outputs, thereby acquiring the indispensable knowledge needed to provide answers to the posed questions. Responding to visual questions that necessitate external knowledge, such as "What event is commemorated by the building depicted in this image?", is a complex task. This task presents a combinatorial search space that demands a sequence of actions, including invoking APIs, analyzing their responses, and making informed decisions. We conduct a user study to collect a variety of instances of human decision-making when faced with this task. This data is then used to design a system comprised of three components: an LLM-powered planner that dynamically determines which tool to use next, an LLM-powered reasoner that analyzes and extracts key information from the tool outputs, and a working memory component that retains the acquired information throughout the process. The collected user behavior serves as a guide for our system in two key ways. First, we create a transition graph by analyzing the sequence of decisions made by users. This graph delineates distinct states and confines the set of actions available at each state. Second, we use examples of user decision-making to provide our LLM-powered planner and reasoner with relevant contextual instances, enhancing their capacity to make informed decisions. We show that AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA.

English

0

2

10

1.1K

Ahmet Iscen retweetledi

Alireza Fathi@alirezafathi·16 Haz

🚀Introducing AVIS: a groundbreaking system that couples #LLM powered planning & reasoning with external tools, resulting in #StateOfTheArt performance on VQA datasets that demand external knowledge! 🧠🔍

AK@_akhaliq

AVIS: Autonomous Visual Information Seeking with Large Language Models paper page: huggingface.co/papers/2306.08… In this paper, we propose an autonomous information seeking visual question answering framework, AVIS. Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools and to investigate their outputs, thereby acquiring the indispensable knowledge needed to provide answers to the posed questions. Responding to visual questions that necessitate external knowledge, such as "What event is commemorated by the building depicted in this image?", is a complex task. This task presents a combinatorial search space that demands a sequence of actions, including invoking APIs, analyzing their responses, and making informed decisions. We conduct a user study to collect a variety of instances of human decision-making when faced with this task. This data is then used to design a system comprised of three components: an LLM-powered planner that dynamically determines which tool to use next, an LLM-powered reasoner that analyzes and extracts key information from the tool outputs, and a working memory component that retains the acquired information throughout the process. The collected user behavior serves as a guide for our system in two key ways. First, we create a transition graph by analyzing the sequence of decisions made by users. This graph delineates distinct states and confines the set of actions available at each state. Second, we use examples of user decision-making to provide our LLM-powered planner and reasoner with relevant contextual instances, enhancing their capacity to make informed decisions. We show that AVIS achieves state-of-the-art results on knowledge-intensive visual question answering benchmarks such as Infoseek and OK-VQA.

English

0

3

10

1.9K

Ahmet Iscen

Keşfet