Middleware-native
Intercepts at the ASGI layer. No per-route changes, no manual wrapping.
Current product
A thin async caching layer that compares incoming requests to prior ones, reuses responses when the match is strong enough, and falls through to your handler on a miss. Built for FastAPI teams that want a concrete control point for repeated LLM work.
Why
Intercepts at the ASGI layer. No per-route changes, no manual wrapping.
Uses your existing Postgres with pgvector. No separate vector database.
Python coordinates I/O only. Heavy work runs in pgvector or your embedding provider's API.
Four core dependencies. Embedding providers and Redis are optional extras.
Embedder options
Install
pip install fastapi-semcache Quickstart
from fastapi import FastAPI
from semanticcache import SemanticCache, SemanticCacheMiddleware
app = FastAPI()
cache = SemanticCache()
app.add_middleware(SemanticCacheMiddleware, cache=cache) When not to use it
Semantic caching is not the right fit for every endpoint. If exact freshness matters more than semantic reuse, or your traffic has very little prompt repetition, start with plain caching and measure before adding embeddings to the request path.
Read when not to use semcache