Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ahvn.top/llms.txt

Use this file to discover all available pages before exploring further.

Use embed for single strings or batches of strings.
import heavenbase as hb

llm = hb.LLM(preset="embed")

vector = llm.embed("hello")
vectors = llm.embed(["hello", "world"])
dim = llm.embed("hello", include="dim")
same_dim = llm.dim
The default embed preset uses the persistable alias text-embedding-small, which resolves to text-embedding-3-small. It inherits heavenbase.llm.default_provider unless you pin heavenbase.llm.presets.embed.provider. If your chat default provider does not serve embeddings, configure the embedding preset separately:
hb cfg set heavenbase.llm.presets.embed.provider openai
hb cfg set heavenbase.llm.presets.embed.model text-embedding-3-small
Known dimensions are stored in config: embeddinggemma is 768, text-embedding-3-small is 1536, embed-v4.0 is 1536, and voyage-4-lite is 1024. llm.dim reads config first and falls back to one test embedding call when a custom embedding model has no configured dimension. Cohere and Voyage are dedicated embedding providers (no OpenRouter route). Pin the preset provider and use the bundled model keys:
hb cfg set heavenbase.llm.presets.embed.provider cohere
hb cfg set heavenbase.llm.presets.embed.model cohere

hb cfg set heavenbase.llm.presets.embed.provider voyage
hb cfg set heavenbase.llm.presets.embed.model voyage
Provider backends follow LiteLLM names (cohere, voyage). Gateway-specific base_url values live in heavenbase.llm.providers (for example Cohere uses COHERE_BASE_URL on the OpenAI-compatible gateway and COHERE_LITELLM_BASE_URL on LiteLLM). gateway="portkey" and gateway="bifrost" prefix model IDs as cohere/embed-v4.0 and voyage/voyage-4-lite.

Include projection

Embedding responses support the same include style as chat:
result = llm.embed(
    ["hello", "world"],
    include=["embeddings", "dim", "usage"],
    reduce=False,
)
Available fields are embeddings, usage, raw, elapsed, created_at, and dim.

Local embeddings

Use embed-local for LM Studio or another OpenAI-compatible local server:
local = hb.LLM(preset="embed-local", provider="lmstudio")
vector = local.embed("hello")
embeddinggemma is local-only in the bundled catalog and has identifiers for LM Studio and Ollama. gateway="portkey" with provider="openrouter" is temporarily blocked for embeddings because Portkey Gateway does not yet support that route. Use the default OpenAI-compatible gateway or LiteLLM for OpenRouter embeddings until upstream support lands.

Batching and cache

embed deduplicates repeated inputs before provider calls, splits cache misses with embedding_batch_size, runs split batches with bounded embedding_max_workers, and broadcasts cached and fresh vectors back to the original input order. The root defaults are embedding_batch_size=256 and embedding_max_workers=8; provider defaults or call kwargs can override them. These controls are excluded from provider payloads and cache keys.