> ## Documentation Index
> Fetch the complete documentation index at: https://ahvn.top/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM Chat

> Chat completions, streaming, and response projection with hb.LLM.

<Note>
  *`llm("hello")` and `llm.chat("hello")` are the same call. Shorter is the whole point.*
</Note>

`chat` is the main text entry point of `hb.LLM`. It takes a prompt, a message, or a message list, runs a chat completion, and returns only the fields you ask for through the `include` projection.

<br />

## 1. Chat and Call

`LLM` is callable, so `llm("hello")` and `llm.chat("hello")` are equivalent. Pass a system prompt with `system=...`, and set provider arguments at construction time or call time:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
import heavenbase as hb

llm = hb.LLM(model="ds-flash")

text = llm.chat("Summarize HeavenBase in one sentence.")
same = llm("Summarize HeavenBase in one sentence.")

answer = llm.chat("What should I do next?", system="You are concise and practical.")

drafter = hb.LLM(model="ds-flash", temperature=0)
title = drafter.chat("Draft a title", max_tokens=24)
```

<br />

## 2. Message Inputs

`chat` and `stream` accept a single string, one OpenAI-style message dictionary, or a list of messages:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
llm.chat("Hello")

llm.chat({"role": "user", "content": "Hello"})

llm.chat([
    {"role": "system", "content": "Be direct."},
    {"role": "user", "content": "Explain vector search."},
])
```

The formatter also accepts objects with `model_dump()`, `to_dict()`, `dict()`, `json()`, or `role` and `content` attributes, so Pydantic models and SDK message objects pass through without manual conversion.

<br />

## 3. Multimodal Images

For multimodal models, pass image inputs with `images=`. HeavenBase normalizes each input to `LLMImage` and appends OpenAI-compatible `image_url` content parts to the last user message.

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
answer = hb.LLM(model="qwen3.6", provider="openrouter").chat(
    "What is in this image?",
    images=["./sample.png"],
)

for chunk in hb.LLM(model="gpt-nano").stream("Describe this image.", images=b"...png bytes..."):
    print(chunk, end="")
```

`images=` accepts `LLMImage`, URLs, data URLs, base64 strings, local paths, bytes-like objects, binary file objects, provider-style dictionaries, Pillow images, and numpy-compatible ndarrays. A single value or an iterable both work. See [Advanced LLM](/features/llm/advanced) for the full `LLMImage` API.

<br />

## 4. Streaming

Use `stream` when you want deltas as they arrive:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
for chunk in llm.stream("Write a short haiku"):
    print(chunk, end="")
```

`chat` uses the same streaming path internally and gathers the final response. This keeps regular chat, reasoning streams, usage accounting, structured outputs, and tool calls on one response pipeline, so the include fields mean the same thing whether you gather or iterate.

<br />

## 5. Include Projection

The `include` argument selects response fields. Pass `None` for the default `text` value, a string for one field, or a list for several. Unknown field names raise a contextual error.

* `text`: final assistant text, excluding separate reasoning chunks.
* `think`: reasoning or thinking content, when a provider streams it separately.
* `content`: `think` wrapped in `<think>` tags followed by `text`.
* `message`: OpenAI-format assistant response dictionary with `role`, `content`, and optional `tool_calls`; it is not the full history.
* `delta`: new OpenAI-format messages produced by this inference call. With executable tools, this includes the assistant tool-call message, one or more `role="tool"` result messages, and the final assistant response.
* `messages`: full conversation history: normalized input messages plus `delta`.
* `tool_calls`: normalized OpenAI `tool_calls` from the assistant response.
* `usage`: provider usage counters for this call. Common keys are `prompt_tokens`, `completion_tokens`, and `total_tokens`; streamed usage chunks are merged by summing numeric counters and keeping the first non-numeric value per key.
* `raw`: raw provider payloads.
* `elapsed`: request elapsed seconds.
* `created_at`: local response creation timestamp.
* `structured`: parsed structured output.

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
detail = llm.chat(
    "Reply with exactly: hb-ok",
    include=["text", "usage", "elapsed"],
    reduce=False,
    max_tokens=8,
)
```

When `stream` includes `delta` or `messages`, progressive content still arrives as normal and HeavenBase emits one final metadata chunk with empty `text`/`think` and the completed message delta. With a single-field `include` and `reduce=True` (the default), the value is returned directly instead of a one-key dict.

<br />

## 6. Thinking and Reasoning

Reasoning presets enable the canonical `think` option by default. You can override it per call, and HeavenBase converts both `think=True` and `think=False` to gateway-level `extra_body.reasoning` for OpenAI-compatible gateways. Pair `think` with `reasoning_effort` and an optional reasoning budget when the model supports it:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
result = hb.LLM(preset="reason").chat(
    "Solve 17 * 23.",
    think=True,
    reasoning_effort="medium",
    include=["think", "text"],
    reduce=False,
)
```

CLI output wraps visible thinking chunks in `<think>` and `</think>` before printing the normal answer text. The `anthropic` gateway maps `think` to native Claude thinking and normalizes thinking blocks back to the `think` include field. Pass `think=False` to suppress reasoning entirely from the response and its message history.

<br />

## 7. CLI Chat and MCP Tools

`hb llm chat` sends a single message with the `chat` preset. Override the preset, model, or provider with `--preset`, `--model`, and `--provider`; inspect the resolved spec with `--verbose`; and read the prompt from a file with `--input`.

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
$ hb llm chat "What is a data gateway?"
$ hb llm chat --preset system "Name three data backends"
$ hb llm chat --model sonnet --provider anthropic "Draft a release note"
```

Add `--mcp` to attach MCP tools for a single-turn agentic call. HeavenBase imports each source as a Toolkit, lets the model call tools until it produces a final assistant response, then prints tool calls and tool results before the final answer. MCP sources accept URLs or canonical `namespace.toolkit:version` refs. Negative versions are offsets from latest: `-1` is latest, `-2` is the second-most latest. Tool loops are capped by `--max-steps` and default to 20 assistant steps.

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
hb llm chat --mcp quickstart.math-tools:-1 "What's 42 * 73?"
hb llm chat --mcp quickstart.math-tools:-1 --max-steps 20 "What's 42 * 73?"
hb llm chat --mcp http://127.0.0.1:7001/mcp "List the available workspace entities."
```

Use `--copy` / `-cp` to copy the final response to the clipboard, and `--json` to emit the JSON payload instead of plain text.

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
hb llm chat "Summarize HeavenBase in one sentence." --copy
hb llm chat "Summarize HeavenBase in one sentence." --json --copy
```

<Tip>
  For interactive, multi-turn tool use, start an `hb llm session` instead. See [Sessions](/features/llm/sessions) and [Tool Use](/features/llm/tool-use) for the full executable-tool contract.
</Tip>

<br />

## Further Exploration

<Tip>
  **Related resources:**

  * [LLM Overview](/features/llm/overview) - presets, the model catalog, and the resolution model.
  * [Tool Use](/features/llm/tool-use) - schema-only and executable tools, MCP Toolkits, and structured output.
  * [First LLM](/quickstart/first-llm) - the quickstart tour of `hb llm` and `hb.LLM`.
</Tip>

<br />
