LLM Foundry
LLM Foundry lets you use multiple large language models (LLMs) like GPT 4, Llama 3, etc. in one place.
The easiest way to use it is the Playground. Type out text and see the response.
Tutorials
- Prompting: How to write better prompts. Also see the prompting guide audio presentation
- Cost: How to manage costs
- Diagrams: How to create diagrams with LLMs
- Reliability: Why LLMs don't give the same answer and what you can do about it
- JSON: Structured output with JSON Schema
- Formatting: Formatting the output
- Cursor: Using Cursor with LLM Foundry
User Guides
There are other tools you can use directly:
- Playground: Run prompts, attach documents, search the web, and more
- Apps: Create and deploy single-page web-apps
- Classify: Find topics and classify a bunch of text
- Draw: Create or modify an image
- Speak: Convert text to speech
- Extract: Extract specific terms from text
- Google Sheets: Use LLMs in Google Sheets
- Rewrite: Rewrite text in a diff
- Templates: Create and share prompts from the playground
- Transcribe: Get speech as text from an audio file
You can explore the usage of these tools:
Developer Guides
See the developer guide page at /code
- Providers:
- OpenAI: Use the GPT family of models. GPT 4o mini is currently the best affordable all-purpose model.
- Anthropic: Use the Claude family of models. Claude 3.5 Sonnet is currently the best code generation model.
- Gemini: Gemini models from Straive's Google Cloud tenant. Gemini Flash is fast, cheap, and fully multi-modal.
- Vertex AI: Claude, Gemini models from Straive's Google Cloud tenant. Use for confidential / client data.
- Azure OpenAI: GPT models from Straive's Microsoft Azure tenant. Use for confidential / client data.
- Azure AI: Llama, Phi-3 models from Straive's Microsoft Azure tenant. Use to experiment before self-hosting.
- Azure Form Recognizer: Best models to extract structured text and images from PDFs with layout.
- Bedrock: Titan models from Straive's Amazon Web Services tenant. Pretty poor quality. Fairly high price.
- Groq: Llama models served fast (~500 tokens/second) for free, but rate-limited. Use for experimenting.
- Cerebras: Llama models served faster (~2,000 tokens/second) for free. Rate-limited. Use for experimenting.
- OpenRouter: Proxies models from many providers. New open source models are usually found here.
- Deepseek: Deepseek models. Moderately quality for moderate price. No reason to use.
- Cloudflare: Hosts some open-source models like SQLCoder. No reason to use.
- Voyage AI: Voyage AI models. Use for embeddings.
- APIs
- Token API: Get a user's LLM Foundry token for use in your application
- Speech API: Convert text to speech
- Similarity API: Find text or image similarity using embeddings
- Cluster API: Find groups of similar text or images using embeddings
- PDF API: Convert PDFs to Markdown for LLMs to read
- Markdown API: Convert websites to Markdown for LLMs to read
- Template API: Directly call a saved prompt templates as an independent API
- Proxy API: Fetch external URLs as-is (to avoid CORS issues)
- Guides
- Production guide: Using LLM Foundry in projects and production
- Cache: Generate a new response or use the cached response
- CORS: Access LLM Foundry from your front-end application
- Batch requests: Make bulk requests at lower cost but with a delay
- Errors: Lists possible error codes and how to handle them
- Registered Apps: Generate your own tokens for your app, tracking users in your app
- Local Setup: Run LLM Foundry in your system
- Updates
- Change Log: Timeline of features and model updates. Typically 2-3 features are added every week.
- Source Code: Explore how LLM Foundry was written. Re-use the libraries and code in your apps.