ChatOpenRouter integration

This will help you get started with OpenRouter chat models. OpenRouter is a unified API that provides access to models from multiple providers (OpenAI, Anthropic, Google, Meta, and more) through a single endpoint.

API ReferenceFor detailed documentation of all features and configuration options, head to the ChatOpenRouter API reference.

For a full list of available models, visit the OpenRouter models page.

Overview

Integration details

Class	Package	Serializable	JS/TS Support	Downloads	Latest Version
`ChatOpenRouter`	`langchain-openrouter`	beta	❌

Model features

Tool calling	Structured output	Image input	Audio input	Video input	Token-level streaming	Native async	Token usage	Logprobs
✅	✅	✅	✅	✅	✅	✅	✅	✅

Setup

To access models via OpenRouter you’ll need to create an OpenRouter account, get an API key, and install the langchain-openrouter integration package.

Installation

The LangChain OpenRouter integration lives in the langchain-openrouter package:

pip install -U langchain-openrouter

Credentials

Head to the OpenRouter keys page to sign up and generate an API key. Once you’ve done this set the OPENROUTER_API_KEY environment variable:

import getpass
import os

if not os.getenv("OPENROUTER_API_KEY"):
    os.environ["OPENROUTER_API_KEY"] = getpass.getpass("Enter your OpenRouter API key: ")

To enable automated tracing of your model calls, set your LangSmith API key:

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

Instantiation

Now we can instantiate our model object and generate chat completions:

from langchain_openrouter import ChatOpenRouter

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    temperature=0,
    max_tokens=1024,
    max_retries=2,
    # other params...
)

Invocation

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = model.invoke(messages)
ai_msg.content

"J'adore la programmation."

Streaming

for chunk in model.stream("Write a short poem about the sea."):
    print(chunk.text, end="", flush=True)

Async streaming is also supported:

async for chunk in model.astream("Write a short poem about the sea."):
    print(chunk.text, end="", flush=True)

Tool calling

OpenRouter uses the OpenAI-compatible tool calling format. You can describe tools and their arguments, and have the model return a JSON object with a tool to invoke and the inputs to that tool.

Bind tools

With ChatOpenRouter.bind_tools, you can pass in Pydantic classes, dict schemas, LangChain tools, or functions as tools to the model. Under the hood these are converted to OpenAI tool schemas and passed in every model invocation.

from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    """Get the current weather in a given location"""

    location: str = Field(description="The city and state, e.g. San Francisco, CA")

model_with_tools = model.bind_tools([GetWeather])

ai_msg = model_with_tools.invoke(
    "what is the weather like in San Francisco",
)
ai_msg

AIMessage(content='', response_metadata={'finish_reason': 'tool_calls'}, tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco, CA'}, 'id': 'call_abc123', 'type': 'tool_call'}], usage_metadata={'input_tokens': 68, 'output_tokens': 17, 'total_tokens': 85})

Tool calls

The AIMessage has a tool_calls attribute. This contains tool calls in a standardized format that is model-provider agnostic.

ai_msg.tool_calls

[{'name': 'GetWeather',
  'args': {'location': 'San Francisco, CA'},
  'id': 'call_abc123',
  'type': 'tool_call'}]

Strict mode

Pass strict=True to guarantee that model output exactly matches the JSON Schema provided in the tool definition:

model_with_tools = model.bind_tools([GetWeather], strict=True)

For more on binding tools and tool call outputs, head to the tool calling docs.

Structured output

ChatOpenRouter supports structured output via the with_structured_output method. Two methods are available: function_calling (default) and json_schema.

Individual model calls

Use with_structured_output to generate a structured model response. Specify method="json_schema" to use JSON Schema-based structured output; otherwise the method defaults to function calling.

from langchain_openrouter import ChatOpenRouter
from pydantic import BaseModel, Field

model = ChatOpenRouter(model="openai/gpt-5.4")

class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(description="The title of the movie")
    year: int = Field(description="The year the movie was released")
    director: str = Field(description="The director of the movie")
    rating: float = Field(description="The movie's rating out of 10")

structured_model = model.with_structured_output(Movie, method="json_schema")  
response = structured_model.invoke("Provide details about the movie Inception")
response

Movie(title='Inception', year=2010, director='Christopher Nolan', rating=8.8)

Agent response format

Specify response_format with ProviderStrategy to engage structured output when generating the agent’s final response.

from langchain.agents import create_agent
from langchain.agents.structured_output import ProviderStrategy
from pydantic import BaseModel

class Weather(BaseModel):
    temperature: float
    condition: str

def weather_tool(location: str) -> str:
    """Get the weather at a location."""
    return "Sunny and 75 degrees F."

agent = create_agent(
    model="openrouter:openai/gpt-5.4",
    tools=[weather_tool],
    response_format=ProviderStrategy(Weather),  
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What's the weather in SF?"}]
})

result["structured_response"]

Weather(temperature=75.0, condition='Sunny')

You can pass strict=True with the function_calling and json_schema methods to enforce exact schema adherence. The strict parameter is not supported with json_mode.

structured_model = model.with_structured_output(Movie, method="json_schema", strict=True)

Reasoning output

For models that support reasoning (e.g., anthropic/claude-sonnet-4.5, deepseek/deepseek-r1), you can enable reasoning tokens via the reasoning parameter. See the OpenRouter reasoning docs for details:

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    max_tokens=16384,
    reasoning={"effort": "high", "summary": "auto"},
)

ai_msg = model.invoke("What is the square root of 529?")

# Access reasoning content via content_blocks
for block in ai_msg.content_blocks:
    if block["type"] == "reasoning":
        print(block["reasoning"])

For more on content blocks, see the standard content blocks guide. The reasoning dict supports two keys:

effort: Controls reasoning token budget. Values: "xhigh", "high", "medium", "low", "minimal", "none".
summary: Controls verbosity of the reasoning summary returned in the response. Values: "auto", "concise", "detailed".

Reasoning token usage is included in usage_metadata:

print(ai_msg.usage_metadata)
# {'input_tokens': ..., 'output_tokens': ..., 'total_tokens': ...,
#  'output_token_details': {'reasoning': ...}}

The effort-to-budget mapping is model-dependent. For example, Google Gemini models map effort to an internal thinkingLevel rather than an exact token budget. See the OpenRouter reasoning docs for details.

Multimodal inputs

OpenRouter supports multimodal inputs for models that accept them. The available modalities depend on the model you select—check the OpenRouter models page for details.

Supported input methods

Method	Image	Audio	Video	PDF
HTTP/HTTPS URLs	✅	❌	✅	✅
Base64 inline data	✅	✅	✅	✅

Not all models support all modalities. Check the OpenRouter models page for model-specific support.

Image input

Provide image inputs along with text using a HumanMessage with list content format.

from langchain_openrouter import ChatOpenRouter
from langchain.messages import HumanMessage

model = ChatOpenRouter(model="openai/gpt-4o")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Describe this image."},
        {
            "type": "image",
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
        },
    ]
)
response = model.invoke([message])

Audio input

Provide audio inputs along with text. Audio is passed as base64 inline data.

import base64
from pathlib import Path
from langchain_openrouter import ChatOpenRouter
from langchain.messages import HumanMessage

model = ChatOpenRouter(model="google/gemini-2.5-flash")

audio_data = base64.b64encode(Path("/path/to/audio.wav").read_bytes()).decode("utf-8")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Transcribe this audio."},
        { 
            "type": "audio", 
            "base64": audio_data, 
            "mime_type": "audio/wav", 
        }, 
    ]
)
response = model.invoke([message])

Video input

Video inputs are automatically converted to OpenRouter’s video_url format.

from langchain_openrouter import ChatOpenRouter
from langchain.messages import HumanMessage

model = ChatOpenRouter(model="google/gemini-2.5-pro-preview")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Describe this video."},
        {
            "type": "video",
            "url": "https://example.com/video.mp4",
        },
    ]
)
response = model.invoke([message])

PDF input

Provide PDF file inputs along with text.

from langchain_openrouter import ChatOpenRouter
from langchain.messages import HumanMessage

model = ChatOpenRouter(model="google/gemini-2.5-pro-preview")

message = HumanMessage(
    content=[
        {"type": "text", "text": "Summarize this document."},
        {
            "type": "file",
            "url": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
            "mime_type": "application/pdf",
        },
    ]
)
response = model.invoke([message])

Token usage metadata

After an invocation, token usage information is available on the usage_metadata attribute of the response:

ai_msg = model.invoke("Tell me a joke.")
ai_msg.usage_metadata

{'input_tokens': 12,
 'output_tokens': 25,
 'total_tokens': 37}

When the underlying provider includes detailed token breakdowns in its response, they are surfaced automatically. These fields are omitted when the provider does not report them or when their values are zero.

Reasoning tokens

output_token_details.reasoning reports the number of tokens the model used for internal chain-of-thought reasoning. This appears when using reasoning models (e.g., deepseek/deepseek-r1, openai/o3) or when reasoning is explicitly enabled:

from langchain_openrouter import ChatOpenRouter

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    reasoning={"effort": "high"},
)

ai_msg = model.invoke("What is the square root of 529?")
ai_msg.usage_metadata

{'input_tokens': 39,
 'output_tokens': 98,
 'total_tokens': 137,
 'output_token_details': {'reasoning': 62}}

Cached input tokens

input_token_details.cache_read reports the number of input tokens served from the provider’s prompt cache, and input_token_details.cache_creation reports tokens written to the cache on the first call. Prompt caching requires explicit cache_control breakpoints in message content blocks. Pass {"cache_control": {"type": "ephemeral"}} on the content block you want cached:

from langchain_openrouter import ChatOpenRouter

model = ChatOpenRouter(model="anthropic/claude-sonnet-4.5")

long_system = "You are a helpful assistant. " * 200
messages = [
    ("system", [{"type": "text", "text": long_system, "cache_control": {"type": "ephemeral"}}]),
    ("human", "Say hi."),
]

# First call writes to cache
ai_msg = model.invoke(messages)
ai_msg.usage_metadata

{'input_tokens': 1210,
 'output_tokens': 12,
 'total_tokens': 1222,
 'input_token_details': {'cache_creation': 1201}}

# Second call reads from cache
ai_msg = model.invoke(messages)
ai_msg.usage_metadata

{'input_tokens': 1210,
 'output_tokens': 12,
 'total_tokens': 1222,
 'input_token_details': {'cache_read': 1201}}

Without cache_control on message content blocks, the provider will not cache the prompt and these fields will not appear.

When streaming, aggregate token usage from the final chunk:

stream = model.stream("Tell me a joke.")
full = next(stream)
for chunk in stream:
    full += chunk
full.usage_metadata

{'input_tokens': 12,
 'output_tokens': 25,
 'total_tokens': 37}

Response metadata

After an invocation, provider and model metadata is available on the response_metadata attribute:

ai_msg = model.invoke("Tell me a joke.")
ai_msg.response_metadata

{'model_name': 'anthropic/claude-sonnet-4.5',
 'id': 'gen-1771043112-yLUz3txgvHSjkyCQK8KQ',
 'created': 1771043112,
 'object': 'chat.completion',
 'finish_reason': 'stop',
 'logprobs': None,
 'model_provider': 'openrouter'}

The native_finish_reason field, if present, contains the underlying provider’s original finish reason, which may differ from the normalized finish_reason.

Provider routing

Many models on OpenRouter are served by multiple providers. The openrouter_provider parameter gives you control over which providers handle your requests and how they’re selected.

Ordering and filtering providers

Use order to set a preferred provider sequence. OpenRouter tries each provider in order and falls back to the next if one is unavailable:

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    openrouter_provider={
        "order": ["Anthropic", "Google"],
        "allow_fallbacks": True,  # default; fall back beyond the order list if needed
    },
)

To restrict requests to specific providers only, use only. To exclude certain providers, use ignore:

# Only use these providers (no fallback to others)
model = ChatOpenRouter(
    model="openai/gpt-4o",
    openrouter_provider={"only": ["OpenAI", "Azure"]},
)

# Use any provider except DeepInfra
model = ChatOpenRouter(
    model="meta-llama/llama-4-maverick",
    openrouter_provider={"ignore": ["DeepInfra"]},
)

Sorting by cost, speed, or latency

By default, OpenRouter load-balances across providers with a preference for lower cost. Use sort to change the priority:

# Prefer the fastest providers (highest tokens/second)
model = ChatOpenRouter(
    model="openai/gpt-4o",
    openrouter_provider={"sort": "throughput"},
)

# Prefer the lowest-latency providers
model = ChatOpenRouter(
    model="openai/gpt-4o",
    openrouter_provider={"sort": "latency"},
)

Data collection policy

If your use case requires that providers do not store or train on your data, set data_collection to "deny":

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    openrouter_provider={"data_collection": "deny"},
)

Filtering by quantization

For open-weight models, you can restrict routing to specific precision levels:

model = ChatOpenRouter(
    model="meta-llama/llama-4-maverick",
    openrouter_provider={"quantizations": ["fp16", "bf16"]},
)

Route parameter

The route parameter controls high-level routing behavior:

"fallback": enable automatic failover across providers (default behavior).
"sort": route based on the sorting strategy configured in openrouter_provider.

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    route="fallback",
)

Combining options

Provider options can be composed together:

model = ChatOpenRouter(
    model="openai/gpt-4o",
    openrouter_provider={
        "order": ["OpenAI", "Azure"],
        "allow_fallbacks": False,       # strict—only use providers in order
        "require_parameters": True,     # skip providers that don't support all params
        "data_collection": "deny",
    },
)

See the OpenRouter provider routing docs for the full list of options.

App attribution

OpenRouter supports app attribution via HTTP headers. You can set these through init params or environment variables:

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    app_url="https://myapp.com",   # or OPENROUTER_APP_URL env var
    app_title="My App",            # or OPENROUTER_APP_TITLE env var
)

Observability and tracing

OpenRouter can broadcast request data to configured observability destinations. ChatOpenRouter exposes two related parameters: session_id for grouping related requests under a single logical workflow, and trace for per-request trace metadata. See the OpenRouter broadcast docs for details.

Grouping requests with `session_id`

Pass a session_id to associate multiple requests with the same workflow (a conversation, an agent run, a batch job, a CI run, and so on). Maximum 256 characters.

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    session_id="workflow-abc-123",   # or OPENROUTER_SESSION_ID env var
)

The OPENROUTER_SESSION_ID environment variable is read at instantiation when no explicit session_id is provided, which lets a process tag every request without threading the value through application code. You can also override the value per call:

model.invoke("Summarize the doc", session_id="workflow-abc-123-step-1")

Adding trace metadata with `trace`

Pass trace to attach per-request metadata that OpenRouter forwards to broadcast destinations. Recognized keys are trace_id, trace_name, span_name, generation_name, and parent_span_id; additional keys are passed through as custom metadata.

model = ChatOpenRouter(
    model="anthropic/claude-sonnet-4.5",
    trace={
        "trace_id": "trace-789",
        "span_name": "summarize",
    },
)

session_id and trace are independent — session_id groups requests into a logical workflow on the OpenRouter side, while trace annotates the individual request.

API reference

For detailed documentation of all ChatOpenRouter features and configurations, head to the ChatOpenRouter API reference. For more information about OpenRouter’s platform, models, and features, see the OpenRouter documentation.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

Popular Providers

Integrations by component

Overview

Integration details

Model features

Setup

Installation

Credentials

Instantiation

Invocation

Streaming

Tool calling

Bind tools

Tool calls

Strict mode

Structured output

Reasoning output

Multimodal inputs

Supported input methods

Image input

Audio input

Video input

PDF input

Token usage metadata

Reasoning tokens

Cached input tokens

Response metadata

Provider routing

Ordering and filtering providers

Sorting by cost, speed, or latency

Data collection policy

Filtering by quantization

Route parameter

Combining options

App attribution

Observability and tracing

Grouping requests with `session_id`

Adding trace metadata with `trace`

API reference

Popular Providers

Integrations by component

Documentation Index

​Overview

​Integration details

​Model features

​Setup

​Installation

​Credentials

​Instantiation

​Invocation

​Streaming

​Tool calling

​Bind tools

​Tool calls

​Strict mode

​Structured output

​Reasoning output

​Multimodal inputs

​Supported input methods

​Image input

​Audio input

​Video input

​PDF input

​Token usage metadata

​Reasoning tokens

​Cached input tokens

​Response metadata

​Provider routing

​Ordering and filtering providers

​Sorting by cost, speed, or latency

​Data collection policy

​Filtering by quantization

​Route parameter

​Combining options

​App attribution

​Observability and tracing

​Grouping requests with session_id

​Adding trace metadata with trace

​API reference

Overview

Integration details

Model features

Setup

Installation

Credentials

Instantiation

Invocation

Streaming

Tool calling

Bind tools

Tool calls

Strict mode

Structured output

Reasoning output

Multimodal inputs

Supported input methods

Image input

Audio input

Video input

PDF input

Token usage metadata

Reasoning tokens

Cached input tokens

Response metadata

Provider routing

Ordering and filtering providers

Sorting by cost, speed, or latency

Data collection policy

Filtering by quantization

Route parameter

Combining options

App attribution

Observability and tracing

Grouping requests with `session_id`

Adding trace metadata with `trace`

API reference