Architecture - HeavenBase

A workspace is not a database. It is a map from meaning to storage.

1. The Shortest Mental Model

HeavenBase gives you one logical workspace over many physical backends. You define entities to describe what your data means. Each entity has typed fields, and every row has one stable object_id. HeavenBase then decides, or follows your explicit instructions, where each field should live: scalar values may go to SQL, vectors to a vector store, long text to search, and JSON or files to a row store. The important part is that users and agents do not have to think in those physical pieces. They write to a workspace, query a workspace, and receive one logical row or one ResultFrame. HeavenBase keeps the physical placement, query compilation, backend execution, and result merge behind that surface. For a user, the model is:

Workspace -> Entity -> Field placement -> Backend execution -> Logical result

For a developer, the implementation follows the same shape:

workspace -> entity/schema/types -> storage/strategies -> handlers/backends -> frame

2. The Core Path

The diagram is intentionally small. HeavenBase has more modules than this, but most of them sit on one of these lines: they define structure, choose placement, compile operations, talk to providers, or adapt the workspace to another interface.

3. Workspace Is the Boundary

A HeavenBase workspace owns the things that must agree with each other:

registered entities
configured backend instances
field-level storage plans
operation handlers
system rows for discovery and introspection

This is why most public work starts with:

import heavenbase as hb

ws = hb.HeavenBase("shop", preset="debug")

The workspace is also the unit of isolation. Two workspaces can use different backends, different registered entities, and different storage plans in the same Python process. The process registry behind hb.HeavenBase.load("shop") makes a named workspace easy to reuse, while the workspace lock keeps registration, CRUD, query, refresh, and drop operations from racing inside one workspace object.

4. Entities Describe Meaning

An entity is a logical schema. It says, “this object has these fields, and these fields have these meanings.” It does not say, by itself, which database must store them. When you define an hb.Entity, the metaclass compiles the class into an EntitySchema. Fields become FieldSchema records. Logical types such as ShortText, LongText, Float, Date, Json, Array, HyperG, and Vector own the value contract:

validate() turns user input into canonical Python values.
encode() turns canonical values into backend-ready storage values.
decode() turns backend values back into canonical values.

Every entity gets an object_id: Identifier primary key. If you do not provide one, HeavenBase derives a deterministic ID from the configured source field, usually name. That object_id is the thread that lets split field fragments come back together later.

5. Routing Places Fields

Routing is field-level. A field can have an explicit placement with .store(to=..., strategy=...), or it can use the default storage profile for its logical type. The default intuition is:

vectors prefer vector backends with VectorIndex
long text prefers search backends with InvertedIndex, then row storage
JSON prefers JSON-capable row storage with JsonField
arrays and HyperG prefer SQL side tables when available
ordinary scalars use inline row columns

Registration resolves these rules once into a StoragePlan. The plan contains effective StorageBinding rows: entity, field, backend, strategy, and source. Backends then receive ensure(schema, bindings) so they can prepare tables, indexes, files, or provider collections for the fields they own. The object_id field is special. HeavenBase controls its placement and replicates it to the backends that store pieces of the same entity so those pieces can be merged safely.

6. CRUD Fans Out, Then Rejoins

Writes start from the workspace surface: upsert, set, delete, and their batch variants. For an upsert, HeavenBase materializes the row first. It normalizes object_id, applies defaults, runs compute Hooks, validates each present field through its logical type, and then encodes only the fields owned by each backend. The writer groups those pieces into RowOp batches and calls Backend.upsert(...) on each routed backend. After a non-system entity write succeeds, HeavenBase publishes a Catalog row. This keeps object discovery separate from the object payload: the catalog tells agents what exists, while the entity row still carries the typed data. Reads reverse the route. get and get_many ask each routed backend for the requested object_id, decode the fields that backend owns, and merge the fragments into one logical row. If two routed backends disagree about the same field, HeavenBase raises under the normal unique read path instead of hiding the conflict. Deletes remove Catalog rows before entity rows, so discovery does not keep pointing at objects that are being removed.

7. Queries Compile to Fragments

HeavenBase accepts more than one query surface, but they meet at the same QuerySpec. Python expressions such as Product.price < 100, Mongo-style JSON filters, vector .near(...), projections, ordering, offset, and limit all normalize into that spec. The query engine then looks at the StoragePlan and asks a narrow question for each leaf:

Which backend owns this field, and which handler compiles this operation there?

Handlers are registered by (logical type, operation, backend type, strategy). A handler does not execute the query. It compiles a leaf into a QueryFragment: SQLAlchemy where clauses for SQL, provider query payloads for search/vector systems, or Python row predicates for scan fallback. Backends execute only fragments. They do not parse user queries. When a query spans multiple backends, HeavenBase executes the relevant fragments and merges frames by object_id: AND becomes intersection, OR becomes union, and NOT becomes difference from the entity universe. Vector queries follow the same principle, with extra planning for near + filter: same-backend combined execution first, duplicated metadata prefilter when available, bounded candidate-ID prefilter when supported, and post-filter merge otherwise. The final public value is a ResultFrame. It keeps object_id, preserves columns from backend merges, hydrates missing logical fields when needed, then applies ordering, offset, limit, and projection.

8. Catalog Makes the Workspace Observable

HeavenBase publishes built-in metadata through system and the default-loaded prompt extension. Every workspace enables system automatically. The common built-in entities are:

sys-catalog (Catalog) describes concrete objects: target entity, target object_id, name, description, tags, active flag, and workspace.
sys-metaschema (MetaSchema) describes structure: workspaces, backends, capabilities, entities, fields, storage bindings, and enabled extensions.
sys-prompt and sys-translation come from the default-loaded prompt extension and store callable prompts and prompt-bound translations.
sys-capsule and sys-toolkit store executable Capsule manifests and Toolkit references for MCP serving.

They are ordinary queryable entities. That matters because agents can discover the landscape before they act. An agent can search Catalog to find the right object, inspect MetaSchema to understand the entity and field structure, then run a typed query or CRUD operation through the same workspace. This is the small loop behind the larger HeavenBase idea: structure is data too, and agents should be able to observe it.

9. Extensions Plug In Without Changing the Workspace API

HeavenBase separates extension work into two layers:

Entity extensions add optional entity types through ExtensionSpec and ws.enable_extension(...). Package authors register them with register_extension(...) before workspaces load them.
Developer extensions register backends, handlers, storage strategies, logical types, and query operations through hb.ext. Workspace routing and handler seeding consume these process-global registries.

The built-in system extension is the only required entity extension today. Capsule and Toolkit are part of it; Prompt and Translation live in the default-loaded prompt extension. Custom entity extensions publish MetaSchema rows and register entities through the normal workspace path. Developer extensions compile query leaves into QueryFragment objects that backends execute. Neither layer teaches backends to parse user queries directly.

10. Where Developers Should Look

If you are reading the source, start from the public path and move downward:

Area	Source path	What to look for
Workspace facade	`src/heavenbase/workspace/`	Public CRUD/query methods, workspace registry, writer, query engine, system-row publisher
Logical model	`src/heavenbase/entity/`, `src/heavenbase/schema/`, `src/heavenbase/types/`	Entity metaclass, field DSL, schema IR, logical type validation and encoding
Placement	`src/heavenbase/storage/`, `src/heavenbase/strategies/`	Storage rules, default profiles, effective bindings, strategy identifiers
Execution	`src/heavenbase/handlers/`, `src/heavenbase/backends/`	Handler lookup, native compilers, scan fallbacks, provider adapters
Results	`src/heavenbase/query/`, `src/heavenbase/frame/`	Query AST, JSON query parser, `QueryBuilder`, `ResultFrame` merge/export behavior
Extensions	`src/heavenbase/extensions/`, `src/heavenbase/ext.py`	Built-in `system` extension, entity extension registry, developer registry exports
Discovery and adapters	`src/heavenbase/discovery/`, `src/heavenbase/extensions/system/toolkit/`, `src/heavenbase/mcp/`, `src/heavenbase/cli/`, `src/heavenbase/interop/`	Capability browsing, MCP tools, CLI commands, import/export bridges

The safest rule while extending HeavenBase is the one the implementation already follows: add new physical behavior through backend builders, storage profiles, and handlers. Do not teach backends to parse QuerySpec, and do not hide new routing rules inside provider code.

Further Exploration

Related resources:

Overview - What HeavenBase is for
Philosophy - Why structure and search matter
Workspace - The workspace boundary
Entities - Logical schemas and types
Routing - Field-level placement
Query - QueryBuilder, explain, and ResultFrame
Catalog - Catalog and MetaSchema
Extensions - Entity and developer extension model
Extension System - Enable and inspect extensions

​1. The Shortest Mental Model

​2. The Core Path

​3. Workspace Is the Boundary

​4. Entities Describe Meaning

​5. Routing Places Fields

​6. CRUD Fans Out, Then Rejoins

​7. Queries Compile to Fragments

​8. Catalog Makes the Workspace Observable

​9. Extensions Plug In Without Changing the Workspace API

​10. Where Developers Should Look

​Further Exploration

1. The Shortest Mental Model

2. The Core Path

3. Workspace Is the Boundary

4. Entities Describe Meaning

5. Routing Places Fields

6. CRUD Fans Out, Then Rejoins

7. Queries Compile to Fragments

8. Catalog Makes the Workspace Observable

9. Extensions Plug In Without Changing the Workspace API

10. Where Developers Should Look

Further Exploration