Gemini

Replace https://generativelanguage.googleapis.com/ with https://llmfoundry.straive.com/gemini/.

All Gemini models and APIs are supported, including:

gemini-1.5-flash-8b
gemini-1.5-flash-latest
gemini-1.5-pro-latest
text-embedding-004

Curl

curl -X POST https://llmfoundry.straive.com/gemini/v1beta/openai/chat/completions \
  -H "Authorization: Bearer $LLMFOUNDRY_TOKEN:my-test-project" \
  -H "Content-Type: application/json" \
  -d '{"model": "gemini-1.5-flash-8b", "messages": [{"role": "user", "content": "What is 2 + 2"}]}'

curl -X POST https://llmfoundry.straive.com/gemini/v1beta/openai/embeddings \
  -H "Authorization: Bearer $LLMFOUNDRY_TOKEN:my-test-project" \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-004", "input": "Hello world"}'

Python requests

import os
import requests  # Or replace requests with httpx

response = requests.post(
    "https://llmfoundry.straive.com/gemini/v1beta/openai/chat/completions",
    headers={"Authorization": f"Bearer {os.environ['LLMFOUNDRY_TOKEN']}:my-test-project"},
    json={"model": "gemini-1.5-flash-8b", "messages": [{"role": "user", "content": "What is 2 + 2"}]}
)
print(response.json())

JavaScript

const token = process.env.LLMFOUNDRY_TOKEN;
const response = await fetch("https://llmfoundry.straive.com/gemini/v1beta/openai/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json", Authorization: `Bearer ${token}:my-test-project` },
  // If the user is already logged into LLM Foundry, use `credentials: "include"` to send **THEIR** API token instead of the `Authorization` header.
  credentials: "include",
  body: JSON.stringify({ model: "gemini-1.5-flash-8b", messages: [{ role: "user", content: "What is 2 + 2" }] }),
});
console.log(await response.json());

LangChain

import os
from langchain_openai import ChatOpenAI

# Chat
chat_model = ChatOpenAI(
    openai_api_base="https://llmfoundry.straive.com/gemini/v1beta/openai/",
    openai_api_key=f'{os.environ["LLMFOUNDRY_TOKEN"]}:my-test-project',
    model="gemini-1.5-flash-8b",
)
print(chat_model.invoke("What is 2 + 2?").content)

# Embeddings
from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings(
    openai_api_base="https://llmfoundry.straive.com/gemini/v1beta/openai/",
    openai_api_key=f'{os.environ["LLMFOUNDRY_TOKEN"]}:my-test-project',
    model="text-embedding-004",
)
embeddings = embeddings_model.embed_documents(["Alpha", "Beta", "Gamma"])
print(len(embeddings), len(embeddings[0]))

Gemini V1 Beta API

The Gemini API also supports a non-OpenAI API that is available via LLM Foundry.

curl -X POST https://llmfoundry.straive.com/gemini/v1beta/models/gemini-1.5-flash-latest:generateContent \
  -H "Authorization: Bearer $LLMFOUNDRY_TOKEN:my-test-project" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"parts":[{"text":"What is 2 + 2"}]}]}'