Gemini Models

This document provides an overview of the Gemini models available through the Gemini API.

Model Variants

The Gemini API offers a range of models optimized for various use cases. Here’s a summary of the available Gemini variants:

Model variant Input(s) Output Optimized for
Gemini 2.5 Pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more
Gemini 2.5 Flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency
Gemini 2.5 Flash-Lite Text, image, video, audio Text Most cost-efficient model supporting high throughput
Gemini 2.5 Flash Live Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions
Gemini 2.5 Flash Native Audio Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking
Gemini 2.5 Flash Preview TTS Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation
Gemini 2.5 Pro Preview TTS Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation
Gemini 2.0 Flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming.
Gemini 2.0 Flash Preview Image Generation Audio, images, videos, and text Text, images Conversational image generation and editing
Gemini 2.0 Flash-Lite Audio, images, videos, and text Text Cost efficiency and low latency
Gemini 2.0 Flash Live Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions
Gemini 1.5 Flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks
Gemini 1.5 Flash-8B Audio, images, videos, and text Text High volume and lower intelligence tasks
Gemini 1.5 Pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence

You can find the rate limits for each model on the rate limits page.

Gemini 2.5 Pro

Gemini 2.5 Pro is a state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context.

Gemini 2.5 Flash

Gemini 2.5 Flash is a price-performance model, offering well-rounded capabilities. It is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases.

Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite is a model optimized for cost-efficiency and high throughput.

Gemini 2.5 Flash Live

The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini.

Gemini 2.5 Flash Native Audio

These native audio dialog models, with and without thinking, are available through the Live API. They provide interactive and unstructured conversational experiences, with style and control prompting.

Gemini 2.5 Flash Preview Text-to-Speech

Gemini 2.5 Flash Preview TTS is a price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more.

Gemini 2.5 Pro Preview Text-to-Speech

Gemini 2.5 Pro Preview TTS is a powerful text-to-speech model, delivering high control and transparency for structured workflows.

Gemini 2.0 Flash

Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window.

Gemini 2.0 Flash Preview Image Generation

This model delivers improved image generation features, including generating and editing images conversationally.

Gemini 2.0 Flash-Lite

A Gemini 2.0 Flash model optimized for cost efficiency and low latency.

Gemini 2.0 Flash Live

The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini.

Gemini 1.5 Flash

Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks.

Gemini 1.5 Flash-8B

Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks.

Gemini 1.5 Pro

Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. It can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text.

Model Version Name Patterns

Gemini models are available in stable, preview, or experimental versions. You can use the following model name formats to specify which model and version you want to use:

  • Latest stable: Points to the most recent stable version released for the specified model generation and variation.
  • Stable: Points to a specific stable model. Stable models usually don’t change.
  • Preview: Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled.
  • Experimental: Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits.

Supported Languages

Gemini models are trained to work with a wide variety of languages, including Arabic, Chinese, English, French, German, Japanese, Russian, and Spanish.