Skip to main content
Defaults get you started. This page is for when you need to see the wiring.
Most workflows never leave hb.LLM(preset="chat") and the default OpenAI-compatible gateway. When you route through Portkey, cache deterministic calls, export SDK clients, or generate images, the same resolution model still applies — preset, model, provider, gateway.

1. Gateways in Depth

The default gateway is openai. It uses the OpenAI Python SDK against the provider’s OpenAI-compatible endpoint, so no Portkey, LiteLLM, Bifrost, or native Anthropic SDK process is required.
GatewayUse when
openaiYou want the OpenAI Python SDK against an OpenAI-compatible endpoint. This is the default.
anthropicYou want the official Anthropic Python SDK and native Messages payloads.
portkeyYou want gateway-level routing, observability, caching, or policy controls.
bifrostYou run a Bifrost-compatible gateway and want provider-prefixed model routing.
litellmYou already standardize provider routing through LiteLLM’s Python gateway.
mockYou need deterministic offline behavior for tests and demos.
Use Portkey when you want gateway-level routing, observability, caching, or policy controls:
export PORTKEY_API_KEY="..."
import heavenbase as hb

llm = hb.LLM(model="ds-flash", provider="deepseek", gateway="portkey")
Other gateway notes:
  • litellm sends provider-prefixed model IDs such as deepseek/deepseek-v4-flash.
  • bifrost sends provider-prefixed model IDs and uses BIFROST_BASE_URL, defaulting to http://localhost:8080/v1.
  • mock stays offline and uses the built-in mock adapter.
Portkey, LiteLLM, and Bifrost can route Anthropic traffic today. Keep endpoint switching in provider, gateway, and base_url config unless a workload needs payload-family policy independent from those layers. Temporary upstream limitations are raised explicitly:
  • gateway="portkey" with provider="openrouter" for embeddings is blocked until Portkey Gateway support lands.
  • Bifrost image generation is blocked while upstream image support is unresolved.
  • The native anthropic gateway raises for embeddings and image generation because Claude Messages does not provide those operations.
If a non-default gateway cannot be imported by the active environment, hb.LLM falls back to the openai gateway. For step-by-step gateway setup, see First LLM §5.2 and LLM providers.

2. Response Caching

HeavenBase caches normalized LLM responses in a dedicated llm-cache workspace backed by SQLite entities. Three namespaces exist: text for chat completions, embedding for vectors, and image for generated images. Caching is enabled by default (heavenbase.llm.cache.enabled: true). Disable it when you need a fresh provider call:
import heavenbase as hb

llm = hb.LLM(preset="chat", cache=False)
text = llm.chat("Reply with exactly: hb-ok")
Per-call overrides accept cache=False, cache=True, or a custom cache config dict. Chat cache skips tool loops automatically — executable tools disable text cache for that call. The default policy is deterministic. Text and image cache writes require deterministic request args: temperature=0 (or unset with a fixed seed), default top_p/top_k, and for images a set seed. Stochastic calls without a seed bypass cache reads and writes. Configure namespaces under heavenbase.llm.cache.namespaces:
hb cfg set heavenbase.llm.cache.namespaces.text.ttl_seconds 86400
hb cfg set heavenbase.llm.cache.namespaces.embedding.enabled true
Embedding cache deduplicates by input hash inside the batching path described in Embeddings.

3. Client Reuse and SDK Exports

Every resolved LLMSpec can produce deterministic hash keys:
import heavenbase as hb

llm = hb.LLM(model="ds-flash", provider="deepseek")

spec_key = llm.spec.hash_key()
client_key = llm.spec.client_key()
hash_key() includes the resolved model, provider, gateway mode, request defaults, and materialized resolved values. client_key() includes only gateway client construction fields: gateway, API key, base URL, headers, timeout, and retries. SDK adapters for openai, portkey, bifrost, and anthropic keep an in-memory cache keyed by client_key(), so duplicated LLM instances reuse the same SDK client. An OpenAI-compatible LLM instance can export raw SDK clients when an external library owns the call loop:
llm = hb.LLM(preset="chat")
client = llm.to_client()       # openai.OpenAI
aclient = llm.to_aclient()     # openai.AsyncOpenAI

response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Hello"}],
    **llm.to_args(),
)
to_client(), to_aclient(), and to_args() work for OpenAI-compatible gateways: openai, portkey, or bifrost. They raise ValueError for litellm, anthropic, and mock. For the native Anthropic gateway:
llm = hb.LLM(model="sonnet", provider="anthropic", gateway="anthropic")
client = llm.to_anthropic_client()
aclient = llm.to_anthropic_aclient()
See First LLM §5.3 for OpenAI Agents SDK integration patterns.

4. Image Generation

Use imagen for image generation responses:
import heavenbase as hb

image = hb.LLM(preset="imagen").imagen("A clean product render of a white ceramic mug")
image.save("mug.png")
image.to_pil().show()
Built-in image models are gpt-5-image-mini and gpt-5.4-image-2:
image = hb.LLM(model="gpt-image-2").imagen("A small product icon for HB")
Image responses normalize into LLMImage objects when possible; raw provider payloads remain available through include="raw". imagen accepts the same images= input formats as chat for reference images:
reference = hb.LLMImage.from_any("./style-reference.png")
image = hb.LLM(preset="imagen").imagen("Apply this style to an HB mark", images=reference)

5. LLMImage API

LLMImage is the shared image type for chat inputs, reference images, and generation output. Factory methods normalize common sources:
from heavenbase.utils import LLMImage

img = LLMImage.from_any("./photo.png")
img = LLMImage.from_bytes(raw_bytes, format="png")
img = LLMImage.from_b64("...")
img = LLMImage.from_url("https://example.com/image.png")
img = LLMImage.from_provider_item(provider_response_item)
Conversion helpers:
img.to_bytes()
img.to_b64()
img.to_data_url()
img.to_dict()      # OpenAI-compatible image_url content part
img.to_pil()
img.save("out.png")
URL-backed LLMImage values fetch lazily only when converted to bytes, base64, a data URL, or saved. The fetch timeout is configured by heavenbase.llm.image_url_timeout.

6. Tool-Call Repair

LLMToolCallRepair fixes malformed OpenAI-style tool-call argument strings before execution. It strips markdown fences, balances JSON brackets, fills missing required schema fields, and re-serializes compact JSON. Global config lives under heavenbase.llm.tool_call_repair:
hb cfg set heavenbase.llm.tool_call_repair.enabled true
hb cfg set heavenbase.llm.tool_call_repair.strict false
Pass tool_call_repair={...} to hb.LLM(...), or repair_tool_calls=True on a single chat call. When repair is enabled on the instance, repair_tool_calls=True is applied by default. With strict: true, repair raises ValueError when arguments cannot be parsed instead of returning the original string.

7. Custom OpenAI-Compatible Providers

Use the custom preset for a provider that speaks the OpenAI API but is not in the bundled model catalog:
import heavenbase as hb

llm = hb.LLM(
    preset="custom",
    base_url="http://localhost:9999/v1",
    model="third-party-model",
    api_key="optional-key",
)
The custom provider requires a call-time base_url and a concrete model.

8. Async APIs

Every sync method has an async counterpart:
import asyncio
import heavenbase as hb

llm = hb.LLM(preset="chat")

async def main():
    text = await llm.achat("hello")
    async for chunk in llm.astream("Count to three."):
        print(chunk, end="")
    vec = await llm.aembed("semantic text")
    img = await llm.aimagen("a blue square")

asyncio.run(main())
Async executable tools require achat — sync chat raises when a tool callable is async.

Further Exploration

Related resources:
  • LLM Overview — presets, model catalog, and resolution model.
  • First LLM — gateway setup and client export walkthrough.
  • LLM providers — per-provider configuration and route checks.