Gemini Models

This document provides an overview of the Gemini models available through the Gemini API.

Model Variants

The Gemini API offers a range of models optimized for various use cases. Here’s a summary of the available Gemini variants:

Model variant	Input(s)	Output	Optimized for
Gemini 2.5 Pro	Audio, images, videos, text, and PDF	Text	Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more
Gemini 2.5 Flash	Audio, images, videos, and text	Text	Adaptive thinking, cost efficiency
Gemini 2.5 Flash-Lite	Text, image, video, audio	Text	Most cost-efficient model supporting high throughput
Gemini 2.5 Flash Live	Audio, video, and text	Text, audio	Low-latency bidirectional voice and video interactions
Gemini 2.5 Flash Native Audio	Audio, videos, and text	Text and audio, interleaved	High quality, natural conversational audio outputs, with or without thinking
Gemini 2.5 Flash Preview TTS	Text	Audio	Low latency, controllable, single- and multi-speaker text-to-speech audio generation
Gemini 2.5 Pro Preview TTS	Text	Audio	Low latency, controllable, single- and multi-speaker text-to-speech audio generation
Gemini 2.0 Flash	Audio, images, videos, and text	Text	Next generation features, speed, and realtime streaming.
Gemini 2.0 Flash Preview Image Generation	Audio, images, videos, and text	Text, images	Conversational image generation and editing
Gemini 2.0 Flash-Lite	Audio, images, videos, and text	Text	Cost efficiency and low latency
Gemini 2.0 Flash Live	Audio, video, and text	Text, audio	Low-latency bidirectional voice and video interactions
Gemini 1.5 Flash	Audio, images, videos, and text	Text	Fast and versatile performance across a diverse variety of tasks
Gemini 1.5 Flash-8B	Audio, images, videos, and text	Text	High volume and lower intelligence tasks
Gemini 1.5 Pro	Audio, images, videos, and text	Text	Complex reasoning tasks requiring more intelligence

You can find the rate limits for each model on the rate limits page.

Gemini 2.5 Pro

Gemini 2.5 Pro is a state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context.

Gemini 2.5 Flash

Gemini 2.5 Flash is a price-performance model, offering well-rounded capabilities. It is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases.

Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite is a model optimized for cost-efficiency and high throughput.

Gemini 2.5 Flash Live

The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini.

Gemini 2.5 Flash Native Audio

These native audio dialog models, with and without thinking, are available through the Live API. They provide interactive and unstructured conversational experiences, with style and control prompting.

Gemini 2.5 Flash Preview Text-to-Speech

Gemini 2.5 Flash Preview TTS is a price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more.

Gemini 2.5 Pro Preview Text-to-Speech

Gemini 2.5 Pro Preview TTS is a powerful text-to-speech model, delivering high control and transparency for structured workflows.

Gemini 2.0 Flash

Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window.

Gemini 2.0 Flash Preview Image Generation

This model delivers improved image generation features, including generating and editing images conversationally.

Gemini 2.0 Flash-Lite

A Gemini 2.0 Flash model optimized for cost efficiency and low latency.

Gemini 2.0 Flash Live

The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini.

Gemini 1.5 Flash

Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks.

Gemini 1.5 Flash-8B

Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks.

Gemini 1.5 Pro

Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. It can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text.

Model Version Name Patterns

Gemini models are available in stable, preview, or experimental versions. You can use the following model name formats to specify which model and version you want to use:

Latest stable: Points to the most recent stable version released for the specified model generation and variation.
Stable: Points to a specific stable model. Stable models usually don’t change.
Preview: Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled.
Experimental: Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits.

Supported Languages

Gemini models are trained to work with a wide variety of languages, including Arabic, Chinese, English, French, German, Japanese, Russian, and Spanish.