Cache

LLM Foundry caches all responses by default.

If a requests has the same attributes below as a previous request, the previous response is returned.

  • URL: e.g. /open/v1/chat/completions
  • HTTP Method: e.g. POST
  • Body: e.g. "messages": [{"role": "user", "content": "Hello"}]}
  • HTTP Headers: If you send a different set of HTTP header, a new request is generated. But these headers are ignored when comparing: Authorization, Cache-Control, Cookie, Host, Origin, User-Agent, Set-Cookie, Connection, Accept, Accept-Encoding, Accept-Language, Content-Length, Vary, Sec-*, X-*.

The X-Cache: HIT response header is set if the request was served from a cached response.

Send a Cache-Control: no-cache header to bust the cache, i.e. bypass previously cached responses.