> ## Documentation Index
> Fetch the complete documentation index at: https://ahvn.top/llms.txt
> Use this file to discover all available pages before exploring further.

# Advanced LLM

> Gateways, response caching, client exports, image generation, and async APIs.

<Note>
  *Defaults get you started. This page is for when you need to see the wiring.*
</Note>

Most workflows never leave `hb.LLM(preset="chat")` and the default OpenAI-compatible gateway. When you route through Portkey, cache deterministic calls, export SDK clients, or generate images, the same resolution model still applies — preset, model, provider, gateway.

<br />

## 1. Gateways in Depth

The default gateway is `openai`. It uses the OpenAI Python SDK against the provider's OpenAI-compatible endpoint, so no Portkey, LiteLLM, Bifrost, or native Anthropic SDK process is required.

| Gateway     | Use when                                                                                   |
| ----------- | ------------------------------------------------------------------------------------------ |
| `openai`    | You want the OpenAI Python SDK against an OpenAI-compatible endpoint. This is the default. |
| `anthropic` | You want the official Anthropic Python SDK and native Messages payloads.                   |
| `portkey`   | You want gateway-level routing, observability, caching, or policy controls.                |
| `bifrost`   | You run a Bifrost-compatible gateway and want provider-prefixed model routing.             |
| `litellm`   | You already standardize provider routing through LiteLLM's Python gateway.                 |
| `mock`      | You need deterministic offline behavior for tests and demos.                               |

Use Portkey when you want gateway-level routing, observability, caching, or policy controls:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
export PORTKEY_API_KEY="..."
```

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import heavenbase as hb

llm = hb.LLM(model="ds-flash", provider="deepseek", gateway="portkey")
```

Other gateway notes:

* `litellm` sends provider-prefixed model IDs such as `deepseek/deepseek-v4-flash`.
* `bifrost` sends provider-prefixed model IDs and uses `BIFROST_BASE_URL`, defaulting to `http://localhost:8080/v1`.
* `mock` stays offline and uses the built-in mock adapter.

Portkey, LiteLLM, and Bifrost can route Anthropic traffic today. Keep endpoint switching in provider, gateway, and `base_url` config unless a workload needs payload-family policy independent from those layers.

Temporary upstream limitations are raised explicitly:

* `gateway="portkey"` with `provider="openrouter"` for embeddings is blocked until Portkey Gateway support lands.
* Bifrost image generation is blocked while upstream image support is unresolved.
* The native `anthropic` gateway raises for embeddings and image generation because Claude Messages does not provide those operations.

If a non-default gateway cannot be imported by the active environment, `hb.LLM` falls back to the `openai` gateway.

For step-by-step gateway setup, see [First LLM](/quickstart/first-llm) §5.2 and [LLM providers](/integrations/llm-providers).

<br />

## 2. Response Caching

HeavenBase caches normalized LLM responses in a dedicated `llm-cache` workspace backed by SQLite entities. Three namespaces exist: `text` for chat completions, `embedding` for vectors, and `image` for generated images.

Caching is enabled by default (`heavenbase.llm.cache.enabled: true`). Disable it when you need a fresh provider call:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import heavenbase as hb

llm = hb.LLM(preset="chat", cache=False)
text = llm.chat("Reply with exactly: hb-ok")
```

Per-call overrides accept `cache=False`, `cache=True`, or a custom cache config dict. Chat cache skips tool loops automatically — executable tools disable text cache for that call.

The default policy is `deterministic`. Text and image cache writes require deterministic request args: `temperature=0` (or unset with a fixed `seed`), default `top_p`/`top_k`, and for images a set `seed`. Stochastic calls without a seed bypass cache reads and writes.

Configure namespaces under `heavenbase.llm.cache.namespaces`:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
hb cfg set heavenbase.llm.cache.namespaces.text.ttl_seconds 86400
hb cfg set heavenbase.llm.cache.namespaces.embedding.enabled true
```

Embedding cache deduplicates by input hash inside the batching path described in [Embeddings](/features/llm/embeddings).

<br />

## 3. Client Reuse and SDK Exports

Every resolved `LLMSpec` can produce deterministic hash keys:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import heavenbase as hb

llm = hb.LLM(model="ds-flash", provider="deepseek")

spec_key = llm.spec.hash_key()
client_key = llm.spec.client_key()
```

`hash_key()` includes the resolved model, provider, gateway mode, request defaults, and materialized resolved values. `client_key()` includes only gateway client construction fields: gateway, API key, base URL, headers, timeout, and retries.

SDK adapters for `openai`, `portkey`, `bifrost`, and `anthropic` keep an in-memory cache keyed by `client_key()`, so duplicated `LLM` instances reuse the same SDK client.

An OpenAI-compatible `LLM` instance can export raw SDK clients when an external library owns the call loop:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
llm = hb.LLM(preset="chat")
client = llm.to_client()       # openai.OpenAI
aclient = llm.to_aclient()     # openai.AsyncOpenAI

response = client.chat.completions.create(
    messages=[{"role": "user", "content": "Hello"}],
    **llm.to_args(),
)
```

`to_client()`, `to_aclient()`, and `to_args()` work for OpenAI-compatible gateways: `openai`, `portkey`, or `bifrost`. They raise `ValueError` for `litellm`, `anthropic`, and `mock`.

For the native Anthropic gateway:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
llm = hb.LLM(model="sonnet", provider="anthropic", gateway="anthropic")
client = llm.to_anthropic_client()
aclient = llm.to_anthropic_aclient()
```

See [First LLM](/quickstart/first-llm) §5.3 for OpenAI Agents SDK integration patterns.

<br />

## 4. Image Generation

Use `imagen` for image generation responses:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import heavenbase as hb

image = hb.LLM(preset="imagen").imagen("A clean product render of a white ceramic mug")
image.save("mug.png")
image.to_pil().show()
```

Built-in image models are `gpt-5-image-mini` and `gpt-5.4-image-2`:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
image = hb.LLM(model="gpt-image-2").imagen("A small product icon for HB")
```

Image responses normalize into `LLMImage` objects when possible; raw provider payloads remain available through `include="raw"`.

`imagen` accepts the same `images=` input formats as chat for reference images:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
reference = hb.LLMImage.from_any("./style-reference.png")
image = hb.LLM(preset="imagen").imagen("Apply this style to an HB mark", images=reference)
```

<br />

## 5. LLMImage API

`LLMImage` is the shared image type for chat inputs, reference images, and generation output.

Factory methods normalize common sources:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from heavenbase.utils import LLMImage

img = LLMImage.from_any("./photo.png")
img = LLMImage.from_bytes(raw_bytes, format="png")
img = LLMImage.from_b64("...")
img = LLMImage.from_url("https://example.com/image.png")
img = LLMImage.from_provider_item(provider_response_item)
```

Conversion helpers:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
img.to_bytes()
img.to_b64()
img.to_data_url()
img.to_dict()      # OpenAI-compatible image_url content part
img.to_pil()
img.save("out.png")
```

URL-backed `LLMImage` values fetch lazily only when converted to bytes, base64, a data URL, or saved. The fetch timeout is configured by `heavenbase.llm.image_url_timeout`.

<br />

## 6. Tool-Call Repair

`LLMToolCallRepair` fixes malformed OpenAI-style tool-call argument strings before execution. It strips markdown fences, balances JSON brackets, fills missing required schema fields, and re-serializes compact JSON.

Global config lives under `heavenbase.llm.tool_call_repair`:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
hb cfg set heavenbase.llm.tool_call_repair.enabled true
hb cfg set heavenbase.llm.tool_call_repair.strict false
```

Pass `tool_call_repair={...}` to `hb.LLM(...)`, or `repair_tool_calls=True` on a single `chat` call. When repair is enabled on the instance, `repair_tool_calls=True` is applied by default.

With `strict: true`, repair raises `ValueError` when arguments cannot be parsed instead of returning the original string.

<br />

## 7. Custom OpenAI-Compatible Providers

Use the `custom` preset for a provider that speaks the OpenAI API but is not in the bundled model catalog:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import heavenbase as hb

llm = hb.LLM(
    preset="custom",
    base_url="http://localhost:9999/v1",
    model="third-party-model",
    api_key="optional-key",
)
```

The custom provider requires a call-time `base_url` and a concrete `model`.

<br />

## 8. Async APIs

Every sync method has an async counterpart:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import asyncio
import heavenbase as hb

llm = hb.LLM(preset="chat")

async def main():
    text = await llm.achat("hello")
    async for chunk in llm.astream("Count to three."):
        print(chunk, end="")
    vec = await llm.aembed("semantic text")
    img = await llm.aimagen("a blue square")

asyncio.run(main())
```

Async executable tools require `achat` — sync `chat` raises when a tool callable is async.

<br />

## Further Exploration

<Tip>
  **Related resources:**

  * [LLM Overview](/features/llm/overview) — presets, model catalog, and resolution model.
  * [First LLM](/quickstart/first-llm) — gateway setup and client export walkthrough.
  * [LLM providers](/integrations/llm-providers) — per-provider configuration and route checks.
</Tip>

<br />
