The LLM remembers nothing. That’s a feature — until you want it to.
hb.LLM stays stateless on purpose. Conversation history is just a message list you pass to chat or stream. When you need multi-turn chat without rebuilding that list every time, LLMSession holds the history and optional tools for you.
1. Manual History Without a Session
You can manage history yourself when you only need one or two turns:2. Session Tools
LLMSession can hold session tools and use them on every turn. Tools follow the same tools=[...] contract as LLM.chat: Python callables, Tool, Toolkit, and schema dictionaries are accepted.
role="tool" result, and the final assistant response in messages. See Tool Use for the full executable-tool contract.
3. Session Lifecycle
LLMSession exposes small helpers for interactive workflows:
to_dict() and from_dict() return JSON-safe payloads with the same message-only contract.
4. CLI Interactive Sessions
Start a multi-turn CLI session with thechat preset:
>>> prompt. Slash commands:
| Command | Action |
|---|---|
/help | Show available commands |
/save <path> | Save message history to JSON |
/load <path> | Load message history from JSON |
/clear | Clear the session |
/regen <seed> | Regenerate the last response with an optional seed |
/back | Remove the latest user turn |
/tools | List attached tools |
/mcp SOURCE | Attach an MCP Toolkit mid-session |
/bye, /exit | Quit |
--mcp values:
/mcp:
namespace.toolkit:version. Negative versions count back from latest: -1 is latest, -2 is second-most latest.
When a provider emits separate thinking content, the CLI session prints it inside <think> and </think> before the visible assistant text. Tool iterations print step-by-step with STEPS: 001 / 020, followed by tool calls and tool results.
5. Inspect Resolved State
Usespec when you need to see how a client resolved:
spec.to_dict() omits secrets. spec.to_dict(secrets=True) includes the materialized resolved dictionary and should only be used in trusted debugging contexts.
Resolved specs also produce stable hash keys for deduplication and cache lookup:
client_key() includes only gateway client construction fields, so duplicate LLM instances can reuse the same in-memory OpenAI-compatible SDK client.

