Documentation Index
Fetch the complete documentation index at: https://ahvn.top/llms.txt
Use this file to discover all available pages before exploring further.
hb.LLM is the HeavenBase Python client for chat, streaming, embeddings, image generation, mocks, and gateway routing. It resolves a preset, model, provider, and gateway from the shared heavenbase.llm config, then materializes the request format selected by the gateway.
The default is preset="system", default_provider="openrouter", and gateway="openai".
Resolution model
preset: a named shortcut such assystem,chat,reason,coder,embed,imagen,mock, orcustom.model: a canonical model key or alias such asds-flash,sonnet,gpt, orgpt-image-mini.provider: where the model is served. Normal presets inheritheavenbase.llm.default_provider; explicitprovider=and preset-level provider pins override it.gateway: how the request is transported. The defaultopenaigateway uses the OpenAI Python SDK against OpenAI-compatible endpoints;anthropicuses the official Anthropic SDK and native Messages payloads.
Presets
Presets use persistable model aliases so user configs stay compact and readable. Most production presets do not pin a provider; changingheavenbase.llm.default_provider switches them together. Local and special presets such as embed-local, mock, and custom keep their own provider because they require a specific runtime.
| Preset | Model alias | Thinking default | Description |
|---|---|---|---|
system | ds-flash | disabled | Default lightweight system LLM for short orchestration calls. |
tiny | gemma | disabled | Tiny chat model for low-latency offline work. |
chat | ds-flash | disabled | General chat preset for fast non-thinking answers. |
chat-pro | ds-pro | disabled | Stronger chat preset that still avoids reasoning by default. |
reason | ds-pro | enabled | Reasoning preset for harder tasks that should expose thinking when supported. |
reason-pro | gpt | enabled | Higher-capability reasoning preset backed by the default GPT alias. |
worker | ds-flash | disabled | Background worker preset for deterministic non-thinking utility calls. |
coder | sonnet | enabled | Coding preset with thinking enabled for multi-step implementation work. |
coder-pro | opus | enabled | Highest-end coding preset for deep implementation and review work. |
embed | gpt-embedding-small | n/a | Default embedding preset. |
embed-local | embeddinggemma | n/a | Local embedding preset. |
imagen | gpt-image-mini | n/a | Default image-generation preset. |
mock | mock | offline | Non-LLM deterministic mock preset for tests and demos. |
custom | custom | runtime supplied | Runtime-supplied OpenAI-compatible provider preset. |
think option. HeavenBase applies gateway-level control through extra_body.reasoning for both think=True and think=False on the OpenAI-compatible gateways (openai, portkey, bifrost, and litellm). The anthropic gateway maps think=True to Claude Messages adaptive thinking with summarized display, maps reasoning_effort to Anthropic effort, and normalizes native thinking blocks back to the think include field. Provider-specific local-server options can still be passed explicitly with extra_body.
Gateway and endpoint decision
Endpoint selection stays inside provider and gateway config. The resolution path ispreset -> model -> provider -> gateway; the final URL comes from provider.base_url, gateway.base_url, or a runtime base_url=... override.
Do not add a separate endpoint layer unless a real workload needs runtime endpoint policies. Use provider="anthropic", gateway="openai" for quick Claude compatibility checks, provider="anthropic", gateway="anthropic" for the native Claude Messages format, and gateway="portkey" when you want routing, observability, or policy controls.
For GLM tool-call validation, keep native OpenAI JSON tools as the default. Start with glm-flash through OpenRouter via Portkey when you want a broadly compatible route.
Local live checks can use simple provider-key, base URL, and proxy exports from ~/.bashrc, including HTTP_PROXY, HTTPS_PROXY, and NO_PROXY. Use those exports when VPN/TUN mode changes GLM, Anthropic, or OpenRouter reachability.
Curated model catalog
Online bundled models include OpenRouter identifiers and direct-provider identifiers where available. The rootdefault_provider chooses which identifier is used unless a call or preset pins another provider. Local-only entries, such as embeddinggemma, list only local providers.
| Canonical model | Aliases |
|---|---|
deepseek-v4-flash | ds-flash, deepseek-flash, deepseek-chat |
deepseek-reasoner | ds-flash-thinking, deepseek-flash-thinking |
deepseek-v4-pro | ds, ds-pro, dsv4, dsv4-pro, deepseek, deepseek-v4, deepseek-pro |
gpt-5.4-nano | gpt-nano, 5.4-nano |
gpt-5.5 | gpt, 5.5 |
gpt-5.5-pro | gpt-pro, 5.5-pro |
claude-haiku-4-5 | haiku, haiku-4.5 |
claude-sonnet-4-6 | sonnet, sonnet-4.6 |
claude-opus-4-7 | opus, opus-4.7 |
gemini-3.1-flash-lite | gemini-flash-lite, gemini-lite |
gemini-3-flash-preview | gemini-flash |
gemini-3.1-pro-preview | gemini-pro |
kimi-k2.6 | kimi, k2.6 |
glm-5.1 | glm |
glm-4.7-flash | glm-flash, glm-4.7 |
gemma-4-26b-a4b-it | gemma4, gemma4-26b, gemma4-26b-a4b, gemma4-26b-a4b-it, gemma, gemma-26b, gemma-26b-a4b, gemma-26b-a4b-it |
qwen3.6-flash | qwen3.6, qwen3.6-flash, qwen3.6-35b, qwen3.6-35b-a3b, qwen, qwen-flash, qwen-35b, qwen-35b-a3b |
embeddinggemma | embeddinggemma-300m |
text-embedding-3-small | gpt-embedding, gpt-embedding-small, text-embedding-small |
embed-v4.0 | cohere, cohere-embedding, cohere-embedding-v4, embed-v4 |
voyage-4-lite | voyage, voyage-lite |
gpt-5-image-mini | gpt-image-mini, image-mini |
gpt-5.4-image-2 | gpt-image, gpt-image-2, image-2 |
mock and custom are utility model entries for offline tests and runtime-supplied OpenAI-compatible providers. Built-in provider configs include OpenRouter, OpenAI, Anthropic, Gemini, Grok, DeepSeek, Moonshot, Z.ai, Minimax, Cohere, Voyage, DashScope, Ollama, LM Studio, vLLM, mock, and custom. embed-v4.0 and voyage-4-lite are embedding-only catalog entries served by their own providers (not OpenRouter).

