Skip to main content
Random utilities solve two kinds of stability: the same seed should reproduce the same sequence, and changing dataset size should not reshuffle every existing split.
Use Random utilities when generated rows, samples, or vectors should stay stable across runs.

1. Core Idea

StableRNG is for tests, demos, benchmarks, and LLM-application fixtures where “random” data still needs to be debuggable. If a failure happens with seed 42, you should be able to regenerate the same rows, vectors, and samples later. There are two stability patterns:
  • Sequence stability: the same seed and the same draw sequence produce the same output. Use a with StableRNG(seed=...) context when a sequence of draws should advance together.
  • Membership stability: adding more candidate items should not change the split result for every existing item. Use hash_sample(...) and hash_split(...) for stable samples and partitions.

2. Generate a Reproducible Sequence

Create a generator with a seed and use it as a context when multiple draws belong to one sequence.
from heavenbase.utils import StableRNG

with StableRNG(seed=42) as rng:
    row_id = rng.rnd_str(8)
    score = rng.rnd_float(0.0, 1.0)
    label = rng.choice(["open", "closed", "review"])
The same seed and draw order produce the same sequence:
from heavenbase.utils import StableRNG

with StableRNG(seed=7) as left:
    left_values = [left.rnd_str(6), left.rnd_int(0, 10)]

with StableRNG(seed=7) as right:
    right_values = [right.rnd_str(6), right.rnd_int(0, 10)]

assert left_values == right_values
Outside a context, a single method call is stable for the current seed. That is useful when one value should be reproducible without coupling it to previous draws.

3. Derive Child Streams

Use step(...) to split one base seed into named streams without mutating the parent generator.
from heavenbase.utils import StableRNG

base = StableRNG(seed=42)
rows_rng = base.step("rows")
sample_rng = base.step("samples")

rows = rows_rng.rnd_str(6, n=3)
sample = sample_rng.choice(["p1", "p2", "p3", "p4"], k=2, replace=False)
This keeps unrelated parts of a test from changing each other when one workflow adds a new random draw.

4. Generate Batches and Vectors

Most generators accept n for a count or shape. Use this for fixtures that need many values at once.
from heavenbase.utils import StableRNG

rng = StableRNG(seed=42)

ids = rng.rnd_str(4, n=5)
grid = rng.rnd_int(0, 10, n=(2, 3))
vecs = rng.rnd_vec(dim=8, n=3)
rnd_vec(...) returns unit-length vectors. That makes it useful for LLM and vector-search development, where you often need embeddings-shaped data before wiring a real embedding provider into the example.

5. Sample Without Rewriting the World

Use hash_sample(...) and hash_split(...) when a stable sample should not depend on input order or on unrelated new records.
from heavenbase.utils import StableRNG

items = ["p1", "p2", "p3", "p4"]

sample = StableRNG(seed=11).hash_sample(items, k=2)
selected, remaining = StableRNG(seed=11).hash_split(items, r=0.5)
If you later add "p5", the decision for "p1" through "p4" is still based on each item’s hash and the seed, not on the original list position.
HeavenBase benchmarks use seeded generation so row content, samples, and vectors can be reproduced.

Further Exploration

Related resources:
  • Hash - deterministic hashes used by stable sampling.
  • File System - write generated fixtures to local artifacts.
  • Query - where synthetic vector fixtures are often used.