Cache

LLM Foundry caches all responses by default.

If a requests has the same attributes below as a previous request, the previous response is returned.

URL: e.g. /open/v1/chat/completions
HTTP Method: e.g. POST
Body: e.g. "messages": [{"role": "user", "content": "Hello"}]}
HTTP Headers: If you send a different set of HTTP header, a new request is generated. But these headers are ignored when comparing: Authorization, Cache-Control, Cookie, Host, Origin, User-Agent, Set-Cookie, Connection, Accept, Accept-Encoding, Accept-Language, Content-Length, Vary, Sec-*, X-*.

The X-Cache: HIT response header is set if the request was served from a cached response.

Send a Cache-Control: no-cache header to bust the cache, i.e. bypass previously cached responses.