llm("hello") and llm.chat("hello") are the same call. Shorter is the whole point.
chat is the main text entry point of hb.LLM. It takes a prompt, a message, or a message list, runs a chat completion, and returns only the fields you ask for through the include projection.
1. Chat and Call
LLM is callable, so llm("hello") and llm.chat("hello") are equivalent. Pass a system prompt with system=..., and set provider arguments at construction time or call time:
import heavenbase as hb
llm = hb.LLM(model="ds-flash")
text = llm.chat("Summarize HeavenBase in one sentence.")
same = llm("Summarize HeavenBase in one sentence.")
answer = llm.chat("What should I do next?", system="You are concise and practical.")
drafter = hb.LLM(model="ds-flash", temperature=0)
title = drafter.chat("Draft a title", max_tokens=24)
chat and stream accept a single string, one OpenAI-style message dictionary, or a list of messages:
llm.chat("Hello")
llm.chat({"role": "user", "content": "Hello"})
llm.chat([
{"role": "system", "content": "Be direct."},
{"role": "user", "content": "Explain vector search."},
])
The formatter also accepts objects with model_dump(), to_dict(), dict(), json(), or role and content attributes, so Pydantic models and SDK message objects pass through without manual conversion.
3. Multimodal Images
For multimodal models, pass image inputs with images=. HeavenBase normalizes each input to LLMImage and appends OpenAI-compatible image_url content parts to the last user message.
answer = hb.LLM(model="qwen3.6", provider="openrouter").chat(
"What is in this image?",
images=["./sample.png"],
)
for chunk in hb.LLM(model="gpt-nano").stream("Describe this image.", images=b"...png bytes..."):
print(chunk, end="")
images= accepts LLMImage, URLs, data URLs, base64 strings, local paths, bytes-like objects, binary file objects, provider-style dictionaries, Pillow images, and numpy-compatible ndarrays. A single value or an iterable both work. See Advanced LLM for the full LLMImage API.
4. Streaming
Use stream when you want deltas as they arrive:
for chunk in llm.stream("Write a short haiku"):
print(chunk, end="")
chat uses the same streaming path internally and gathers the final response. This keeps regular chat, reasoning streams, usage accounting, structured outputs, and tool calls on one response pipeline, so the include fields mean the same thing whether you gather or iterate.
5. Include Projection
The include argument selects response fields. Pass None for the default text value, a string for one field, or a list for several. Unknown field names raise a contextual error.
text: final assistant text, excluding separate reasoning chunks.
think: reasoning or thinking content, when a provider streams it separately.
content: think wrapped in <think> tags followed by text.
message: OpenAI-format assistant response dictionary with role, content, and optional tool_calls; it is not the full history.
delta: new OpenAI-format messages produced by this inference call. With executable tools, this includes the assistant tool-call message, one or more role="tool" result messages, and the final assistant response.
messages: full conversation history: normalized input messages plus delta.
tool_calls: normalized OpenAI tool_calls from the assistant response.
usage: provider usage counters for this call. Common keys are prompt_tokens, completion_tokens, and total_tokens; streamed usage chunks are merged by summing numeric counters and keeping the first non-numeric value per key.
raw: raw provider payloads.
elapsed: request elapsed seconds.
created_at: local response creation timestamp.
structured: parsed structured output.
detail = llm.chat(
"Reply with exactly: hb-ok",
include=["text", "usage", "elapsed"],
reduce=False,
max_tokens=8,
)
When stream includes delta or messages, progressive content still arrives as normal and HeavenBase emits one final metadata chunk with empty text/think and the completed message delta. With a single-field include and reduce=True (the default), the value is returned directly instead of a one-key dict.
6. Thinking and Reasoning
Reasoning presets enable the canonical think option by default. You can override it per call, and HeavenBase converts both think=True and think=False to gateway-level extra_body.reasoning for OpenAI-compatible gateways. Pair think with reasoning_effort and an optional reasoning budget when the model supports it:
result = hb.LLM(preset="reason").chat(
"Solve 17 * 23.",
think=True,
reasoning_effort="medium",
include=["think", "text"],
reduce=False,
)
CLI output wraps visible thinking chunks in <think> and </think> before printing the normal answer text. The anthropic gateway maps think to native Claude thinking and normalizes thinking blocks back to the think include field. Pass think=False to suppress reasoning entirely from the response and its message history.
hb llm chat sends a single message with the chat preset. Override the preset, model, or provider with --preset, --model, and --provider; inspect the resolved spec with --verbose; and read the prompt from a file with --input.
$ hb llm chat "What is a data gateway?"
$ hb llm chat --preset system "Name three data backends"
$ hb llm chat --model sonnet --provider anthropic "Draft a release note"
Add --mcp to attach MCP tools for a single-turn agentic call. HeavenBase imports each source as a Toolkit, lets the model call tools until it produces a final assistant response, then prints tool calls and tool results before the final answer. MCP sources accept URLs or canonical namespace.toolkit:version refs. Negative versions are offsets from latest: -1 is latest, -2 is the second-most latest. Tool loops are capped by --max-steps and default to 20 assistant steps.
hb llm chat --mcp quickstart.math-tools:-1 "What's 42 * 73?"
hb llm chat --mcp quickstart.math-tools:-1 --max-steps 20 "What's 42 * 73?"
hb llm chat --mcp http://127.0.0.1:7001/mcp "List the available workspace entities."
Use --copy / -cp to copy the final response to the clipboard, and --json to emit the JSON payload instead of plain text.
hb llm chat "Summarize HeavenBase in one sentence." --copy
hb llm chat "Summarize HeavenBase in one sentence." --json --copy
For interactive, multi-turn tool use, start an hb llm session instead. See Sessions and Tool Use for the full executable-tool contract.
Further Exploration
Related resources:
- LLM Overview - presets, the model catalog, and the resolution model.
- Tool Use - schema-only and executable tools, MCP Toolkits, and structured output.
- First LLM - the quickstart tour of
hb llm and hb.LLM.