Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Google releases the first native multimodal embedding model, Gemini Embedding 2
On March 10th, Google DeepMind launched Gemini Embedding 2, their first native multi-modal embedding model. It maps text, images, videos, audio, and documents into a single embedding space, marking a new stage of full-modal integration in AI embedding technology.
Gemini Embedding 2 supports semantic understanding in over 100 languages and outperforms existing mainstream models in benchmarks for text, image, and video tasks. It also introduces speech processing capabilities previously lacking in embedding models.
The model is now available for public preview through the Gemini API and Vertex AI, allowing developers to access it immediately.
For enterprise users, this release lowers the technical barriers to building multi-modal retrieval-augmented generation (RAG), semantic search, and data classification systems, potentially simplifying complex data pipelines that previously required separate processing across different modalities.
Unified Multi-Modal: Expanding from Text to Five Media Types
Built on the Gemini architecture, Gemini Embedding 2 extends embedding capabilities from pure text to five input types:
Unlike traditional methods that handle each modality separately, this model supports interleaved inputs, meaning multiple modalities such as images and text can be submitted simultaneously in a single request. This enables the model to capture complex and subtle semantic relationships across different media types.
Gemini Embedding 2 continues to utilize Google’s previously developed Matryoshka Representation Learning (MRL) technology. This technique compresses vector dimensions dynamically through “nesting,” allowing output dimensions to be flexibly reduced from the default 3,072, helping developers balance model performance and storage costs.
Benchmark Performance and New Speech Capabilities
Google states that Gemini Embedding 2 outperforms current leading models in benchmarks for text, image, and video tasks, establishing a new performance standard in multi-modal embedding.
Google recommends developers choose among 3,072, 1,536, or 768 dimensions to optimize embedding quality based on application needs. This flexible design is especially important for enterprises deploying large-scale embedding vectors, enabling effective cost control without significantly sacrificing accuracy.
In terms of capabilities, the model introduces native speech embedding, a feature often missing in similar models. It can process audio data directly without relying on speech-to-text conversion.
Google notes that embedding technology is already widely used across its products, including context engineering in RAG scenarios, large-scale data management, and traditional search and analytics.
Some early access partners are already building multi-modal applications based on Gemini Embedding 2. Google states these use cases are demonstrating the model’s practical potential in high-value scenarios.
Risk Warning and Disclaimer
Market risks exist; invest cautiously. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should consider whether any opinions, views, or conclusions herein are suitable for their particular circumstances. Invest at your own risk.