Integration tests verify that your agent works correctly with model APIs and external services. Unlike unit tests that use fakes and mocks, integration tests make actual network calls to confirm that components work together, credentials are valid, and latency is acceptable. Because LLM responses are nondeterministic, integration tests require different strategies than traditional software tests. This guide covers how to organize, write, and run integration tests for your agents. For general test infrastructure when contributing to LangChain itself, see Contributing to code.Documentation Index
Fetch the complete documentation index at: https://langchain-5e9cc07a-preview-mdrxyo-1777658790-7be347c.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Separate unit and integration tests
Integration tests are slower and require API credentials, so keep them separate from unit tests. This lets you run fast unit tests on every change and reserve integration tests for CI or pre-deploy checks. Use pytest markers to tag integration tests:Manage API keys
Integration tests require real API credentials. Load them from environment variables so keys stay out of source control. Use aconftest.py fixture to validate that required keys are available:
.env file and load them with python-dotenv:
.env
conftest.py
Assert on structure, not content
LLM responses vary between runs. Instead of asserting on exact output strings, verify the structural properties of the response: message types, tool call names, argument shapes, and message count.Reduce cost and latency
Integration tests that call LLM APIs incur real costs. A few practices help keep test suites fast and affordable:- Use smaller models:
gemini-3.1-flash-lite-previewor equivalent for tests that only need to verify tool calling and response structure. - Set
maxTokens: Cap response length to avoid long, expensive completions. - Limit test scope: Test one behavior per test. Avoid end-to-end scenarios that chain many LLM calls when a single-turn test suffices.
- Run selectively: Use the test separation from above to run integration tests only in CI or before deploy, not on every file save.
Record and replay HTTP calls
For tests that run frequently in CI, you can record HTTP interactions on the first run and replay them on subsequent runs without making real API calls. This eliminates cost and latency after the initial recording.vcrpy records HTTP request/response pairs into YAML “cassette” files. The pytest-recording plugin integrates this with pytest.
Set up your conftest.py to filter sensitive information from cassettes:
conftest.py
vcr marker:
The
--record-mode=once option records HTTP interactions on the first run and replays them on subsequent runs.vcr marker:
tests/cassettes/. Subsequent runs replay the recorded responses.
Next steps
Learn how to evaluate agent trajectories with deterministic matching or LLM-as-judge evaluators in Evals.Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

