LLM Foundry

LLM Foundry lets you use multiple large language models (LLMs) like GPT 4, Llama 3, etc. in one place.

The easiest way to use it is the Playground. Type out text and see the response.

Tutorials

Prompting: How to write better prompts. Also see the prompting guide audio presentation
Cost: How to manage costs
Diagrams: How to create diagrams with LLMs
Reliability: Why LLMs don't give the same answer and what you can do about it
JSON: Structured output with JSON Schema
Formatting: Formatting the output
Cursor: Using Cursor with LLM Foundry

User Guides

There are other tools you can use directly:

Playground: Run prompts, attach documents, search the web, and more
- Logprobs: See the probability of each token in the response
- RAG: Use RAG to answer questions about long documents
- Diagrams: Create diagrams using Mermaid
Apps: Create and deploy single-page web-apps
Classify: Find topics and classify a bunch of text
Draw: Create or modify an image
Speak: Convert text to speech
Extract: Extract specific terms from text
Google Sheets: Use LLMs in Google Sheets
Rewrite: Rewrite text in a diff
Templates: Create and share prompts from the playground
Transcribe: Get speech as text from an audio file

You can explore the usage of these tools:

Usage shows the overall usage of LLM Foundry
History shows your recent usage

Developer Guides

See the developer guide page at /code

Providers:
- OpenAI: Use the GPT family of models. GPT 4o mini is currently the best affordable all-purpose model.
- Anthropic: Use the Claude family of models. Claude 3.7 Sonnet is currently the best code generation model.
- Gemini: Gemini models from Straive's Google Cloud tenant. Gemini Flash is fast, cheap, and fully multi-modal.
- Vertex AI: Claude, Gemini models from Straive's Google Cloud tenant. Use for confidential / client data.
- Azure OpenAI: GPT models from Straive's Microsoft Azure tenant. Use for confidential / client data.
- Azure AI: Llama, Phi-3 models from Straive's Microsoft Azure tenant. Use to experiment before self-hosting.
- Azure Form Recognizer: Best models to extract structured text and images from PDFs with layout.
- Bedrock: Titan models from Straive's Amazon Web Services tenant. Pretty poor quality. Fairly high price.
- Groq: Llama models served fast (~500 tokens/second) for free, but rate-limited. Use for experimenting.
- Cerebras: Llama models served faster (~2,000 tokens/second) for free. Rate-limited. Use for experimenting.
- OpenRouter: Proxies models from many providers. New open source models are usually found here.
- Deepseek: Deepseek models. Moderately quality for moderate price. No reason to use.
- Cloudflare: Hosts some open-source models like SQLCoder. No reason to use.
- Voyage AI: Voyage AI models. Use for embeddings.
APIs
- Token API: Get a user's LLM Foundry token for use in your application
- Speech API: Convert text to speech
- Similarity API: Find text or image similarity using embeddings
- Cluster API: Find groups of similar text or images using embeddings
- PDF API: Convert PDFs to Markdown for LLMs to read
- Markdown API: Convert websites to Markdown for LLMs to read
- Template API: Directly call a saved prompt templates as an independent API
- Proxy API: Fetch external URLs as-is (to avoid CORS issues)
Guides
- Production guide: Using LLM Foundry in projects and production
- Cache: Generate a new response or use the cached response
- CORS: Access LLM Foundry from your front-end application
- Batch requests: Make bulk requests at lower cost but with a delay
- Errors: Lists possible error codes and how to handle them
- Registered Apps: Generate your own tokens for your app, tracking users in your app
- LLM Foundry architecture: How LLM Foundry is built
- Local Setup: Run LLM Foundry in your system
Updates
- Change Log: Timeline of features and model updates. Typically 2-3 features are added every week.
- Source Code: Explore how LLM Foundry was written. Re-use the libraries and code in your apps.