Handling errors

Here are the different kinds of errors we see people face in LLM Foundry -- and how to handle them.

Timeout errors

This typically has a HTTP code of 599 and means LLM Foundry (not the underlying model) is overloaded. You can:

  1. Slow down your requests
  2. Contact s.anand@gramener.com to intervene (may take a few hours)
  3. Use the underlying API (e.g. Google, Azure) directly, bypassing LLM Foundry
{
  "error": {
    "code": "application_error",
    "message": "HTTP 599: Timeout in request queue",
    "type": "application_error"
  }
}
{ "code": "application_error", "message": "Timeout while connecting", "type": "application_error" }

Rate limit errors

This typicaly has a HTTP code of 429 and means that you made too many requests faster than the model lets you. You can:

  1. Slow down your requests
  2. Contact s.anand@gramener.com to increase rate limits (may take days to weeks, depending on the provider)
  3. Get access to the model directly with a higher rate limit (e.g. via your client)
{
  "message": "Rate limit reached for model `...` in organization `...` on requests per minute (RPM): Limit 30, Used 30, Requested 1. Please try again in ... Visit https://console.groq.com/docs/rate-limits for more information.",
  "type": "requests",
  "code": "rate_limit_exceeded"
}
{
  "type": "rate_limit_error",
  "message": "Number of concurrent connections has exceeded your rate limit. Please try again later or contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."
}
{
  "code": 429,
  "message": "Quota exceeded for aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model with base model: anthropic-claude-3-haiku. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.",
  "status": "RESOURCE_EXHAUSTED"
}
{ "code": 429, "message": "Resource has been exhausted (e.g. check quota).", "status": "RESOURCE_EXHAUSTED" }

Invalid request errors

This typicaly has a HTTP code of 400 and means you made a mistake in your request. Look at the message. You should fix your request and retry.

{ "type": "invalid_request_error", "message": "messages: text content blocks must be non-empty" }
{
  "message": "'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.",
  "type": "invalid_request_error",
  "param": "messages"
}
{
  "message": "Failed to generate JSON. Please adjust your prompt. See 'failed_generation' for more details.",
  "type": "invalid_request_error",
  "code": "json_validate_failed",
  "failed_generation": "The provided text does not include any message or data to extract, so I am unable to extract the requested information."
}
{ "code": "BadRequest", "message": "Invalid image data.", "param": null, "type": null }
{
  "code": 400,
  "message": "Invalid JSON payload received. Expected , or } after key:value pair.\n ...",
  "status": "INVALID_ARGUMENT"
}
{ "message": "Only POST requests are accepted.", "type": "invalid_request_error", "param": null, "code": "method_not_supported" }

Wrong URL or model

This means the URL or model you requested is wrong. Maybe the model was mis-spelt, or the path was incorrect. You should fix the URL and retry.

{ "code": "404", "message": "Resource not found" }
{
  "type": "invalid_request_error",
  "code": "unknown_url",
  "message": "Unknown request URL: POST /chat/completions. Please check the URL for typos, or see the docs at https://platform.openai.com/docs/api-reference/.",
  "param": null
}
{
  "message": "The model `claude-3-haiku-20240307` does not exist or you do not have access to it.",
  "type": "invalid_request_error",
  "param": null,
  "code": "model_not_found"
}

Context length exceeded

You sent too much text to the model. You should reduce the input or split it into smaller parts.

{
  "message": "This model's maximum context length is 128000 tokens. However, you requested ... tokens. Please reduce the length of the messages or completion.",
  "type": "invalid_request_error",
  "param": "messages",
  "code": "context_length_exceeded"
}
{
  "message": "Please reduce the length of the messages or completion.",
  "type": "invalid_request_error",
  "param": "messages",
  "code": "context_length_exceeded"
}

Content filtering errors

This means the response was filtered by the provider's policy (e.g. no jailbreaking, no adult content, etc). You can:

  1. Modify your prompt to avoid the content filter
  2. Use a different model
{
  "inner_error": { "code": "ResponsibleAIPolicyViolation", "content_filter_results": { "jailbreak": { "filtered": true, "detected": true } } },
  "code": "content_filter",
  "message": "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: \r\nhttps://go.microsoft.com/fwlink/?linkid=2198766.",
  "param": "prompt",
  "type": null
}

Authentication error

This typicaly has a HTTP code of 401 or 403 and means your LLMPROXY_TOKEN (or API key) was not sent properly. You should check carefully and retry.

{
  "code": "401",
  "message": "Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource."
}
{ "message": "Invalid API Key", "type": "invalid_request_error", "code": "invalid_api_key" }
{
  "message": "Incorrect API key provided: .... You can find your API key at https://platform.openai.com/account/api-keys.",
  "type": "invalid_request_error",
  "param": null,
  "code": "invalid_api_key"
}
{ "message": "Your authentication token is not from a valid issuer.", "type": "invalid_request_error", "param": null, "code": "invalid_issuer" }
{
  "message": "Failed to create completion as the model generated invalid Unicode output. Unfortunately, this can happen in rare situations. Consider reviewing your prompt or reducing the temperature of your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID req_947d9e183bd9d3a0195ad53eb36383bb in your message.)",
  "type": "server_error",
  "param": null,
  "code": "invalid_model_output"
}
{ "code": "no_auth_header", "message": "Missing header: Authorization: Bearer ..." }
{ "code": "invalid_auth_header", "message": "Bearer ..." }
{ "code": "invalid_jwt_token", "message": "Bearer ..." }

Billing or credit errors

This means LLM Foundry needs to pay the cloud providers. Contact s.anand@gramener.com.

{
  "code": 403,
  "message": "This API method requires billing to be enabled. Please enable billing on project ...",
  "status": "PERMISSION_DENIED"
}
{
  "type": "invalid_request_error",
  "message": "Your credit balance is too low to access the Claude API. Please go to Plans & Billing to upgrade or purchase credits."
}
{
  "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
  "type": "insufficient_quota",
  "param": null,
  "code": "insufficient_quota"
}

Internal server errors

These typically have a HTTP code of 500 and mean that the server made a mistake. You should retry after some time.

{ "code": "InternalServerError", "message": "The service is temporarily unable to process your request. Please try again later." }
{ "message": "The model `llama3-8b-8192` is not active.", "type": "internal_server_error", "code": "model_not_active" }
{
  "code": 500,
  "message": "An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting",
  "status": "INTERNAL"
}
{ "code": 503, "message": "The model is overloaded. Please try again later.", "status": "UNAVAILABLE" }