← Back to Learn
🔬

How the breeding simulator works

What's real science, what's an estimate, and exactly what the model does — with its sources and limits laid out.

7 min read

Short version: the simulator is a transparent, simplified teaching model. It is not trained on a strain database, it is not peer-reviewed, and its outputs are illustrative — a well-reasoned starting point, not lab results. What follows is exactly what it does and where the line between real science and estimate sits, so you can judge it for yourself.

The two real ideas it's built on

1. Polygenic traits follow the "midparent" rule. For traits controlled by many genes — overall potency, flowering time, yield, indica/sativa lean — the average offspring sits roughly halfway between the two parents, and the spread of offspring grows the more the parents differ and in later generations. This is the additive model of quantitative genetics (Falconer & Mackay, Introduction to Quantitative Genetics, 4th ed.).

2. The THC-vs-CBD chemotype is a single-gene, near-Mendelian trait. The ratio of THC to CBD is governed mainly by one locus (called "B") with two co-dominant alleles — one makes the enzyme that produces THC, the other the enzyme that produces CBD. That gives three chemotypes: THC-dominant, roughly balanced (1:1), and CBD-dominant. This is the best-characterised piece of cannabis genetics, established in de Meijer et al. (2003), "The inheritance of chemical phenotype in Cannabis sativa L.", Genetics 163(1):335–346.

What actually happens when you press go

  1. It reads the two parents' values and infers each parent's chemotype (THC / balanced / CBD) from its CBD:THC ratio.
  2. It simulates one offspring: polygenic traits are drawn around the midparent value with random noise; the THC:CBD split is decided by a Mendelian coin-flip of the parents' chemotype alleles, then applied to an overall-potency figure that is itself inherited midparent-style.
  3. It repeats that hundreds of times — 300 runs on this page — because each "seed" comes out a little different. This is a Monte Carlo simulation: many random trials, then averaged.
  4. It averages the runs into the predicted offspring you see, and measures how much the runs varied to give the stability score and the chemotype breakdown.

F1 vs F2 — why the toggle matters

Cross two stable strains and the first generation (F1) is fairly uniform — most plants look alike. The wide variation growers "pheno-hunt" through shows up in the F2, when the F1 is crossed back together and the genes re-shuffle. The simulator models both: switch to F2 and you'll see stability drop and new chemotypes appear — for example, crossing a THC strain with a CBD strain gives an all-balanced F1, but the F2 segregates roughly 1 THC-dominant : 2 balanced : 1 CBD-dominant, the classic single-gene ratio.

Environment, heritability, and why you see a range

A real plant isn't a single number — it's a range. The same seed grown twice gives different results, because the final plant is its genetics plus the environment it grew in (light, nutrients, training, drying, harvest timing). How much each matters is the trait's heritability: high-heritability traits grow close to their genetic potential no matter what; low-heritability traits are pushed around by the grow.

We use published cannabis heritability figures to set how wide each trait's range is. Flowering time is highly heritable (broad-sense heritability around 0.94), so it barely moves — its band is narrow. Yield is the least heritable trait measured (broad-sense heritability around 0.21), so the environment dominates it and its band is wide. THC and CBD sit in between (heritability roughly 0.34–0.37). That's why the Grow-environment selector shifts potency and yield a lot but flowering hardly at all, and why every prediction now shows a typical low–high range, not false precision.

Dominance and hybrid vigour

Beyond simple averaging, the model adds a touch of directional dominance: an F1 hybrid from two distant parents yields a little more than the midparent ("hybrid vigour"), and that boost largely collapses in the F2 as the genetics settle. This follows how dedicated breeding simulators handle non-additive effects. Gene linkage and the full sweep of epistasis (gene-gene interactions) are still simplified away — they'd need a marker-level genome the library doesn't carry.

What's real and what's an estimate

  • Real (genetics rules): midparent inheritance, increasing variance with parental distance and generation, the single-locus 1:2:1 chemotype segregation, hybrid vigour in the F1, and environment/heritability setting each trait's spread.
  • Real (calibration numbers): the relative trait heritabilities come from published cannabis/hemp studies (see sources below).
  • Estimated: every per-strain value in the library is our hand-curated estimate of that strain's genetic CENTRE — there is no open, per-seed-lot lab assay for every named strain, and the same name varies a lot between growers anyway, which is exactly why we model a range around it.
  • Deliberately cautious: terpene and effect predictions are heuristic blends. The simulator makes no medical claims and takes no position on the "entourage effect", which remains unsettled.

Sources

  • de Meijer et al. (2003), The inheritance of chemical phenotype in Cannabis sativa L., Genetics 163:335–346 — the single-locus chemotype model.
  • Soorni et al. (2025), population genomics of a Cannabis sativa collection, BMC Plant Biology — SNP-based heritabilities for THC, CBD, flowering time and yield proxies.
  • Petit et al. (2020), Genetic variability in morphological, flowering and biomass traits in hemp, Frontiers in Plant Science — broad-sense heritabilities (yield lowest, flowering highest).
  • Falconer & Mackay, Introduction to Quantitative Genetics — the additive + environment + dominance framework.
  • Gaynor, Gorjanc & Hickey (2021), AlphaSimR, G3 — the additive/dominance/epistasis/G×E (ADEG) modelling approach this borrows from.

So how much should you trust it?

Use it for direction, not precision. It's good at telling you which pairings push toward your goal, roughly how predictable a cross will be, and how much the grow itself will swing the result — the comparisons between crosses are more meaningful than any single absolute number. It is not a substitute for actually growing, measuring, and selecting. Everything it reports is reproducible (the same parents, goal, generation and environment always give the same result) precisely because it's a defined model, not a measurement of the real world.

Educational content for adults where cannabis is legal. Know and follow your local laws.