Semantic caching middleware for FastAPI.

A thin async caching layer that compares incoming requests to prior ones, reuses responses when the match is strong enough, and falls through to your handler on a miss. Built for FastAPI teams that want a concrete control point for repeated LLM work.

View on GitHub Read the Docs

Why

Built for FastAPI teams that want clearer behavior.

asgi

Middleware-native

Intercepts at the ASGI layer. No per-route changes, no manual wrapping.

pg

No extra infra

Uses your existing Postgres with pgvector. No separate vector database.

io

Truly async

Python coordinates I/O only. Heavy work runs in pgvector or your embedding provider's API.

cost

Pay for what you use

Four core dependencies. Embedding providers and Redis are optional extras.

Embedder options

OpenAI Voyage AI Ollama HuggingFace (dev only)

Install

bash

pip install fastapi-semcache

Quickstart

python

from fastapi import FastAPI
from semanticcache import SemanticCache, SemanticCacheMiddleware

app = FastAPI()
cache = SemanticCache()
app.add_middleware(SemanticCacheMiddleware, cache=cache)

When not to use it

Semantic caching is not the right fit for every endpoint. If exact freshness matters more than semantic reuse, or your traffic has very little prompt repetition, start with plain caching and measure before adding embeddings to the request path.

Read when not to use semcache