Cache
LLM Foundry caches all responses by default.
If a requests has the same attributes below as a previous request, the previous response is returned.
- URL: e.g.
/open/v1/chat/completions
- HTTP Method: e.g.
POST
- Body: e.g.
"messages": [{"role": "user", "content": "Hello"}]}
- HTTP Headers: If you send a different set of HTTP header, a new request is generated. But these headers are ignored when comparing:
Authorization
,Cache-Control
,Cookie
,Host
,Origin
,User-Agent
,Set-Cookie
,Connection
,Accept
,Accept-Encoding
,Accept-Language
,Content-Length
,Vary
,Sec-*
,X-*
.
The X-Cache: HIT
response header is set if the request was served from a cached response.
Send a Cache-Control: no-cache
header to bust the cache, i.e. bypass previously cached responses.