LLM Foundry

LLM Foundry lets you use multiple large language models (LLMs) like GPT 4, Llama 3, etc. in one place.

The easiest way to use it is the Playground. Type out text and see the response.

LLM Foundry Playground: Tutorial

Tutorials

User Guides

There are other tools you can use directly:

  • Playground: Run prompts, attach documents, search the web, and more
    • Logprobs: See the probability of each token in the response
    • RAG: Use RAG to answer questions about long documents
    • Diagrams: Create diagrams using Mermaid
  • Apps: Create and deploy single-page web-apps
  • Classify: Find topics and classify a bunch of text
  • Draw: Create or modify an image
  • Speak: Convert text to speech
  • Extract: Extract specific terms from text
  • Google Sheets: Use LLMs in Google Sheets
  • Rewrite: Rewrite text in a diff
  • Templates: Create and share prompts from the playground
  • Transcribe: Get speech as text from an audio file

You can explore the usage of these tools:

  • Usage shows the overall usage of LLM Foundry
  • History shows your recent usage

Developer Guides

See the developer guide page at /code

  • Providers:
    • OpenAI: Use the GPT family of models. GPT 4o mini is currently the best affordable all-purpose model.
    • Anthropic: Use the Claude family of models. Claude 3.5 Sonnet is currently the best code generation model.
    • Gemini: Gemini models from Straive's Google Cloud tenant. Gemini Flash is fast, cheap, and fully multi-modal.
    • Vertex AI: Claude, Gemini models from Straive's Google Cloud tenant. Use for confidential / client data.
    • Azure OpenAI: GPT models from Straive's Microsoft Azure tenant. Use for confidential / client data.
    • Azure AI: Llama, Phi-3 models from Straive's Microsoft Azure tenant. Use to experiment before self-hosting.
    • Azure Form Recognizer: Best models to extract structured text and images from PDFs with layout.
    • Bedrock: Titan models from Straive's Amazon Web Services tenant. Pretty poor quality. Fairly high price.
    • Groq: Llama models served fast (~500 tokens/second) for free, but rate-limited. Use for experimenting.
    • Cerebras: Llama models served faster (~2,000 tokens/second) for free. Rate-limited. Use for experimenting.
    • OpenRouter: Proxies models from many providers. New open source models are usually found here.
    • Deepseek: Deepseek models. Moderately quality for moderate price. No reason to use.
    • Cloudflare: Hosts some open-source models like SQLCoder. No reason to use.
    • Voyage AI: Voyage AI models. Use for embeddings.
  • APIs
    • Token API: Get a user's LLM Foundry token for use in your application
    • Speech API: Convert text to speech
    • Similarity API: Find text or image similarity using embeddings
    • Cluster API: Find groups of similar text or images using embeddings
    • PDF API: Convert PDFs to Markdown for LLMs to read
    • Markdown API: Convert websites to Markdown for LLMs to read
    • Template API: Directly call a saved prompt templates as an independent API
    • Proxy API: Fetch external URLs as-is (to avoid CORS issues)
  • Guides
    • Production guide: Using LLM Foundry in projects and production
    • Cache: Generate a new response or use the cached response
    • CORS: Access LLM Foundry from your front-end application
    • Batch requests: Make bulk requests at lower cost but with a delay
    • Errors: Lists possible error codes and how to handle them
    • Registered Apps: Generate your own tokens for your app, tracking users in your app
    • Local Setup: Run LLM Foundry in your system
  • Updates
    • Change Log: Timeline of features and model updates. Typically 2-3 features are added every week.
    • Source Code: Explore how LLM Foundry was written. Re-use the libraries and code in your apps.