Skip to main content
No one wants to see output = client.chat.completions.create(...).choices[0].message.content.
AgentHeaven routes LLM calls through LiteLLM, so it supports any provider LiteLLM covers. The default preset uses OpenRouter, which gives you access to most major models through a single API key. You are free to choose any provider you like and configure it in the AgentHeaven config. More aggressively, we believe model choices and provider choices should NEVER appear once in your code. With the LLM configs persisted in database (see ConfigManager), your code just calls llm = LLM(preset="chat"/"sys"/"reason"/"embedder"/"translator"/...) and use it with simple output = llm(query). Only logical roles and task-specific presets should be referenced in code, and the actual model and provider can be swapped freely in the config without changing a single line of code.
We are planning to transform our current LiteLLM-centric LLM system towards Portkey-centric, multi-gateway support (including Portkey, Bifrost, LiteLLM, and native OpenAI formatted calls) in the near future. The goal is to allow you to choose the best gateway for your specific considerations of latency, throughput, features, and data safety without being locked in.

1. API Key Setup

AgentHeaven configures certain default providers and they recognize environment variables. So a direct LLM call may work out of the box if the right env vars are already set. However, we can still explicitly set the API key in the config via ahvn cfg set to avoid relying on env vars and ensure the value is stored in the database.
OpenRouter is the default provider. No additional config is needed — the sys and chat presets and most models route through it automatically. The default api_key is <OPENROUTER_API_KEY> which automatically resolves to your env var. To set it manually:
ahvn cfg set llm.providers.openrouter.api_key "sk-or-v1-..."
ahvn cfg set llm.default_provider openrouter

ahvn cfg set llm.presets.sys.provider openrouter
ahvn cfg set llm.presets.sys.model gemini-flash     # google/gemini-3-flash-preview

ahvn cfg set llm.presets.chat.provider openrouter
ahvn cfg set llm.presets.chat.model dsv3            # deepseek/deepseek-v3.2

ahvn cfg set llm.presets.reason.provider openrouter
ahvn cfg set llm.presets.reason.model gpt           # openai/gpt-5.4
In principle, we can set up any provider supported by LiteLLM. If you want to use a different provider, check the LiteLLM Providers Doc for the provider-specific setup and then mirror that config in AgentHeaven.
For full LLM configuration options, see the LLM feature page.

2. First CLI Message

ahvn chat sends a one-shot message and prints the response. The -v flag shows which model and provider are being used (as well as inference kwargs and your api key, encrypted):
ahvn chat -v "Hello! Who are you?"
Using deepseek as an example:
HTTP  Proxy: None
HTTPS Proxy: None
Request: {'seed': 42, 'timeout': 120, 'enforce_non_stream_structured': False, 'api_key': 'sk-1******11', 'api_base': 'https://api.deepseek.com/beta', 'model': 'deepseek/deepseek-chat', 'messages': [{'role': 'user', 'content': 'Hello! Who are you?'}], 'stream': True}
Hello! I'm DeepSeek, an AI assistant...
For the full list of CLI flags (--stream, --no-stream, -s, -p, -m, etc.), see the LLM CLI reference.

Two other commands extend the same underlying LLM interface:
  • ahvn embed "<text>" — returns a vector embedding for the given text, using the embedder preset (Ollama by default). Useful for testing your embedding setup before wiring it into a knowledge pipeline.
  • ahvn session — starts an interactive multi-turn chat session in the terminal. Supports slash commands (/save, /load, /clear, /regen, /back) and history search. Press Ctrl+C or type /bye to exit.

Configure the embedding provider

The embedder preset controls which model ahvn embed uses. By default it runs EmbeddingGemma (a lightweight 300M-parameter Google model) on Ollama locally. To switch providers:
ahvn cfg set llm.presets.embedder.provider openai
ahvn cfg set llm.presets.embedder.model text-embedding-3-small  # text-embedding-3-small (1536)
Again, theoretically, you can set up any provider supported by LiteLLM. If you want to use a different provider, check out the LiteLLM Providers Doc for instructions on how to configure it and then mirror that config in AgentHeaven.

4. Python API

The LLM class in ahvn.utils.llm is the same engine the CLI uses. Construct one with no arguments to use the active preset:
from ahvn.utils.llm import LLM


llm = LLM()  # uses the configured default preset
oracle collects the full response and returns it as a string:
answer = llm.oracle("What programming language is AgentHeaven written in?")
print(answer)
# Python
Pass a conversation history as a list of message dicts when you need multi-turn context:
messages = [
    {"role": "system", "content": "You are a concise assistant."},
    {"role": "user", "content": "Name three Python web frameworks."},
]
answer = llm.oracle(messages)
print(answer)
# Flask, Django, FastAPI
stream yields text deltas as they arrive, which lets you print output incrementally:
for chunk in llm.stream("Explain recursion in two sentences."):
    print(chunk, end="", flush=True)
print()
embed returns a vector embedding for the given text:
vector = llm.embed("AgentHeaven is a framework for building agentic applications.")
print(vector[:100])
print(len(vector))
You can pass a specific preset, model, provider, backend at construction time to override the defaults, or add any inference parameters (e.g., temperature, seed, max_tokens, etc.) at either construction or call time. For example:
fast_llm = LLM(preset="tiny")
local_llm = LLM(preset="local")
specific_llm = LLM(model="sonnet", provider="anthropic")
highly_customized_llm = LLM(
    preset="chat", model="gemini-flash", provider="google", backend="google",
    temperature=0.7, seed=42, max_tokens=4096,  # ... add more inference kwargs as needed
)
For full LLM features, see the LLM feature page.

5. What’s Next — Prompt Management

AgentHeaven treats prompts as callable, versioned functions — not string templates. With PromptSpec, you can register, persist, translate, and retrieve prompts globally without hard-coding strings.
See Prompts for a full walkthrough of PromptSpec, PM_AHVN, translation, and template-style prompts.

Further Exploration

Quickstart flow:
LLM references:
  • LLM — model, provider, and session workflows
  • Prompts — PromptSpec, PM_AHVN, translation
  • CLI Reference — complete command reference