Sabitlenmiş Tweet

Last week, we made Gemini Embedding 2, our first natively multimodal embedding model, available to the general public. Since then, developers have used it to build video analysis tools, visual shopping assistants, and more.
But you might be wondering... what is an embedding model? 🤔 Let’s break it down!
1. What is it?
Think of an embedding model as a "universal translator." It takes text, images, video, and audio data and turns them into a long string of numbers, like a unique digital fingerprint.
2. How does it work?
Historically, search has been text only. Now, instead of just matching data by keyword, Gemini Embedding 2 maps multiple modalities in the same space based on meaning. It "feels" the connection between a video of a soccer goal and the words "game-winning shot" without needing tags.
For example, "ocean" and "waves" are placed close together, but "ocean" and "toaster" are miles apart.
3. How can you use it?
Developers have been using it to incorporate smarter search functionality into their builds. This means creating tools where you can snap a photo of a product and type "find this in yellow," or search through thousands of hours of video by describing what happens in a scene.
4. Ready to try it out for yourself?
You can start using it today via the Gemini API or the Gemini Enterprise Agent Platform.
English

