LLM Overview - HeavenBase

Local and hosted providers, six gateways, one client. The resolution model does the remembering for you.

hb.LLM is the HeavenBase Python client for chat, streaming, embeddings, image generation, mocks, and gateway routing. It resolves a preset, model, provider, and gateway from the shared heavenbase.llm config, then materializes the request format selected by the gateway. The default is preset="system", default_provider="openrouter", and gateway="openai". One API key covers most major models out of the box.

export OPENROUTER_API_KEY="..."

import heavenbase as hb

llm = hb.LLM()
text = llm.chat("Reply with exactly: hb-ok")

1. Resolution Model

HeavenBase never asks you to hardcode a provider SDK call. Instead, four config-backed concepts resolve every request:

preset: a named shortcut such as system, chat, reason, coder, embed, imagen, mock, or custom.
model: a canonical model key or alias such as ds-flash, sonnet, gpt, or gpt-image-mini.
provider: where the model is served. Normal presets inherit heavenbase.llm.default_provider; explicit provider= and preset-level provider pins override it.
gateway: how the request is transported. The default openai gateway uses the OpenAI Python SDK against OpenAI-compatible endpoints; anthropic uses the official Anthropic SDK and native Messages payloads.

The resolution path is preset -> model -> provider -> gateway. The final URL comes from provider.base_url, gateway.base_url, or a call-time base_url=... override.

hb.LLM()                                      # preset="system" -> ds-flash on openrouter via openai
hb.LLM(model="ds-flash")                      # explicit model, provider inherited
hb.LLM(model="ds-flash", provider="deepseek") # pin the provider
hb.LLM(preset="reason")                       # reasoning preset
hb.LLM(preset="mock")                         # offline deterministic

Unknown keyword arguments become provider request defaults, so tuning lives in the call, not in a wrapper:

llm = hb.LLM(model="ds-flash", max_tokens=256, temperature=0)

2. Presets

Presets use persistable model aliases so user configs stay compact and readable. Most production presets do not pin a provider; changing heavenbase.llm.default_provider switches them together. Local and special presets such as local, worker-local, embed-local, imagen-local, ocr-local, mock, and custom keep their own provider because they require a dedicated setup.

Preset	Model alias	Thinking default	Description
`system`	`ds-flash`	disabled	Default lightweight system LLM for short orchestration calls.
`tiny`	`gemma`	disabled	Tiny chat model for low-latency offline work.
`chat`	`ds-flash`	disabled	General chat preset for fast non-thinking answers.
`chat-pro`	`ds-pro`	disabled	Stronger chat preset that still avoids reasoning by default.
`reason`	`ds-pro`	enabled	Reasoning preset for harder tasks that should expose thinking when supported.
`reason-pro`	`gpt`	enabled	Higher-capability reasoning preset backed by the default GPT alias.
`worker`	`ds-flash`	disabled	Background worker preset for deterministic non-thinking utility calls.
`local`	`qwen3.6-flash`	disabled	Local LM Studio chat preset.
`worker-local`	`qwen3.6-flash`	disabled	Local LM Studio worker preset.
`coder`	`sonnet`	enabled	Coding preset with thinking enabled for multi-step implementation work.
`coder-pro`	`opus`	enabled	Highest-end coding preset for deep implementation and review work.
`embed`	`gpt-embedding-small`	n/a	Default embedding preset.
`embed-local`	`embeddinggemma`	n/a	Local embedding preset.
`imagen`	`gpt-image-mini`	n/a	Default image-generation preset.
`imagen-local`	`z-image-turbo`	n/a	Local Ollama image-generation preset.
`ocr-local`	`glm-ocr`	disabled	Local Ollama OCR preset.
`mock`	`mock`	offline	Non-LLM deterministic mock preset for tests and demos.
`custom`	`custom`	caller supplied	Caller-supplied OpenAI-compatible provider preset.

Preset thinking defaults use the canonical think option. HeavenBase applies gateway-level control through extra_body.reasoning for both think=True and think=False on the OpenAI-compatible gateways (openai, portkey, bifrost, and litellm). The anthropic gateway maps think=True to Claude Messages adaptive thinking with summarized display, maps reasoning_effort to Anthropic effort, and normalizes native thinking blocks back to the think include field. See LLM Chat for the per-call thinking controls.

3. Curated Model Catalog

Online bundled models include OpenRouter identifiers and direct-provider identifiers where available. The root default_provider chooses which identifier is used unless a call or preset pins another provider. Local-only entries, such as embeddinggemma, z-image-turbo, and glm-ocr, list only local providers.

Canonical model	Aliases
`deepseek-v4-flash`	`ds-flash`, `deepseek-flash`, `deepseek-chat`
`deepseek-reasoner`	`ds-flash-thinking`, `deepseek-flash-thinking`
`deepseek-v4-pro`	`ds`, `ds-pro`, `dsv4`, `dsv4-pro`, `deepseek`, `deepseek-v4`, `deepseek-pro`
`gpt-5.4-nano`	`gpt-nano`, `5.4-nano`
`gpt-5.4-mini`	`gpt-mini`, `5.4-mini`
`gpt-5.5`	`gpt`, `5.5`
`gpt-5.5-pro`	`gpt-pro`, `5.5-pro`
`claude-haiku-4-5`	`haiku`, `haiku-4.5`
`claude-sonnet-5`	`sonnet`, `sonnet-5`
`claude-sonnet-4-6`	`sonnet-4.6`
`claude-opus-4-8`	`opus`, `opus-4.8`
`claude-fable-5`	`fable`, `fable-5`
`gemini-3.1-flash-lite`	`gemini-flash-lite`, `gemini-lite`
`gemini-3.5-flash`	`gemini-flash`
`gemini-3.1-pro-preview`	`gemini-pro`
`kimi-k2.6`	`kimi`, `k2.6`
`glm-5.2`	`glm`
`glm-4.7-flash`	`glm-flash`, `glm-4.7`
`gemma-4-26b-a4b-it`	`gemma4`, `gemma4-26b`, `gemma4-26b-a4b`, `gemma4-26b-a4b-it`, `gemma`, `gemma-26b`, `gemma-26b-a4b`, `gemma-26b-a4b-it`
`qwen3.6-flash`	`qwen3.6`, `qwen3.6-flash`, `qwen3.6-35b`, `qwen3.6-35b-a3b`, `qwen`, `qwen-flash`, `qwen-35b`, `qwen-35b-a3b`
`embeddinggemma`	`embeddinggemma-300m`
`text-embedding-3-small`	`gpt-embedding`, `gpt-embedding-small`, `text-embedding-small`
`embed-v4.0`	`cohere`, `cohere-embedding`, `cohere-embedding-v4`, `embed-v4`
`voyage-4-lite`	`voyage`, `voyage-lite`
`gpt-5-image-mini`	`gpt-image-mini`, `image-mini`
`gpt-5.4-image-2`	`gpt-image`, `gpt-image-2`, `image-2`
`z-image-turbo`	`z-image`, `image-local`, `imagen-local`
`glm-ocr`	`ocr`

mock and custom are utility model entries for offline tests and caller-supplied OpenAI-compatible providers. embed-v4.0 and voyage-4-lite are embedding-only catalog entries served by their own providers (not OpenRouter). z-image-turbo is an Ollama-only local entry, while glm-ocr can resolve through Ollama or oMLX. For the full provider list and per-provider setup, see LLM providers.

4. Gateways at a Glance

A gateway is the transport adapter HeavenBase uses after it resolves the preset, model, and provider. The provider decides where the model is served; the gateway decides how the request is sent. heavenbase.llm.default_gateway defaults to openai and is used only when the call, preset, or provider does not pin a gateway.

Gateway	Use when
`openai`	You want the OpenAI Python SDK against an OpenAI-compatible endpoint. This is the default.
`anthropic`	You want the official Anthropic Python SDK and native Messages payloads.
`portkey`	You want Portkey routing, policy, observability, or gateway-side controls.
`bifrost`	You run a Bifrost-compatible gateway and want provider-prefixed model routing.
`litellm`	You already standardize provider routing through LiteLLM’s Python gateway.
`mock`	You need deterministic offline behavior for tests and demos.

If a non-default gateway cannot be imported by the active environment, hb.LLM falls back to the openai gateway so a missing optional dependency never breaks resolution. See Advanced LLM for gateway materialization, thinking control, client exports, and the temporary upstream limitations per gateway.

5. What Lives Where

The LLM section is organized so each page adds one layer of capability:

LLM Chat — chat, stream, message inputs, multimodal images, response projection, and reasoning controls.
Embeddings — embed, batching, deduplication, cache, and local embedding providers.
Tool Use — schema-only and executable tools, MCP Toolkits, structured output, and CLI tool loops.
Sessions — the stateless design, LLMSession, CLI sessions, and resolved-spec inspection.
Advanced LLM — gateways, response caching, client exports, image generation, LLMImage, tool-call repair, and async APIs.

Further Exploration

Related resources:

First LLM - install, set an API key, and run your first chat, embed, and session from the CLI.
LLM providers - the full provider catalog and per-provider configuration.
Configuration - the heavenbase.llm config tree behind presets, providers, and gateways.

​1. Resolution Model

​2. Presets

​3. Curated Model Catalog

​4. Gateways at a Glance

​5. What Lives Where

​Further Exploration

1. Resolution Model

2. Presets

3. Curated Model Catalog

4. Gateways at a Glance

5. What Lives Where

Further Exploration