UseDocumentation Index
Fetch the complete documentation index at: https://ahvn.top/llms.txt
Use this file to discover all available pages before exploring further.
embed for single strings or batches of strings.
embed preset uses the persistable alias text-embedding-small, which resolves to text-embedding-3-small. It inherits heavenbase.llm.default_provider unless you pin heavenbase.llm.presets.embed.provider.
If your chat default provider does not serve embeddings, configure the embedding preset separately:
embeddinggemma is 768, text-embedding-3-small is 1536, embed-v4.0 is 1536, and voyage-4-lite is 1024. llm.dim reads config first and falls back to one test embedding call when a custom embedding model has no configured dimension.
Cohere and Voyage are dedicated embedding providers (no OpenRouter route). Pin the preset provider and use the bundled model keys:
cohere, voyage). Gateway-specific base_url values live in heavenbase.llm.providers (for example Cohere uses COHERE_BASE_URL on the OpenAI-compatible gateway and COHERE_LITELLM_BASE_URL on LiteLLM). gateway="portkey" and gateway="bifrost" prefix model IDs as cohere/embed-v4.0 and voyage/voyage-4-lite.
Include projection
Embedding responses support the same include style as chat:embeddings, usage, raw, elapsed, created_at, and dim.
Local embeddings
Useembed-local for LM Studio or another OpenAI-compatible local server:
embeddinggemma is local-only in the bundled catalog and has identifiers for LM Studio and Ollama.
gateway="portkey" with provider="openrouter" is temporarily blocked for embeddings because Portkey Gateway does not yet support that route. Use the default OpenAI-compatible gateway or LiteLLM for OpenRouter embeddings until upstream support lands.
Batching and cache
embed deduplicates repeated inputs before provider calls, splits cache misses with embedding_batch_size, runs split batches with bounded embedding_max_workers, and broadcasts cached and fresh vectors back to the original input order.
The root defaults are embedding_batch_size=256 and embedding_max_workers=8; provider defaults or call kwargs can override them. These controls are excluded from provider payloads and cache keys.

