Skip to content
Go to Boltz API

Design small molecules

Generate novel small molecules against a protein target, monitor progress, fetch scored results, and stop early if needed.

Small molecule design generates novel molecules against a protein target, scored by binding confidence (confidence that binding occurs), optimization score (relative binding-strength ranking, for lead optimization), and structure confidence.

run() submits the design, waits while molecules are generated, and downloads scored results to a local directory. Use start() + client.experiments.download_results() to submit now and download later; download_results() resumes if the download is interrupted.

import os
from boltz_api import Boltz
client = Boltz(api_key=os.environ["BOLTZ_API_KEY"])
target = {
"entities": [{"type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"]}],
"pocket_residues": {"A": [10, 11, 12, 35, 36, 37]},
} # see Input for chemical_space, molecule_filters, constraints, …
# One call: submit, wait, and download results to a run directory.
run_dir = client.small_molecule.design.run(target=target, num_molecules=100, name="my-design")
# ...or submit now and download later:
design = client.small_molecule.design.start(target=target, num_molecules=100)
run_dir = client.experiments.download_results(id=design.id, name="my-design") # rerun to resume an interrupted download

A design run takes a target to design against, plus optional control over the chemical space and which molecules pass through. The example below is a complete, valid input you can copy.

{
  "target": {
    "entities": [
      # protein chains only; at least one
      { "type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"] }
    ],
    "pocket_residues": { "A": [2, 3, 4, 7, 8, 9] }, # optional: keyed by chain ID; pocket residues (0-indexed); omit to auto-detect
    "reference_ligands": ["CC(=O)Oc1ccccc1C(=O)O"], # optional: known-binder SMILES that help locate the pocket
    "constraints": [ # optional: guide the geometry
      {
        "type": "pocket", # keep the binder near a set of receptor residues
        "binder_chain_id": "L", # chain ID the pipeline assigns to the designed molecule
        "contact_residues": { "A": [2, 3, 4, 7, 8, 9] },
        "max_distance_angstrom": 6.0
      }
    ]
  },
  "chemical_space": "enamine_real" # optional: building-block library (currently only enamine_real),
  "molecule_filters": {
    "boltz_smarts_catalog_filter_level": "recommended", # recommended | extra | aggressive | disabled
    "custom_filters": [
      # any combination; a molecule must pass all of them (AND logic). one of each type shown:
      {
        "type": "lipinski_filter", # Rule of Five
        "max_mw": 500,
        "max_logp": 5,
        "max_hbd": 5,
        "max_hba": 10,
        "allow_single_violation": False # optional: allow one rule to fail
      },
      {
        "type": "rdkit_descriptor_filter", # min/max on RDKit descriptors; include only the ones you want to bound
        "mol_wt": { "min": 150, "max": 500 },
        "mol_logp": { "max": 5 },
        "tpsa": { "max": 140 },
        "num_h_donors": { "max": 5 },
        "num_h_acceptors": { "max": 10 },
        "num_rotatable_bonds": { "max": 10 },
        "num_heteroatoms": { "max": 12 },
        "num_aromatic_rings": { "min": 1, "max": 4 },
        "num_rings": { "max": 6 },
        "fraction_csp3": { "min": 0.2 }
      },
      {
        "type": "smarts_custom_filter", # reject molecules matching any of these SMARTS
        "patterns": ["[N+](=O)[O-]", "C(=O)Cl"]
      },
      {
        "type": "smarts_catalog_filter", # reject by a named alert catalog
        "catalog": "PAINS" # PAINS | PAINS_A | PAINS_B | PAINS_C | BRENK | CHEMBL | CHEMBL_BMS | CHEMBL_Dundee | CHEMBL_Glaxo | CHEMBL_Inpharmatica | CHEMBL_LINT | CHEMBL_MLSMR | CHEMBL_SureChEMBL | NIH
      },
      {
        "type": "smiles_regex_filter", # reject molecules whose SMILES matches any of these regexes
        "patterns": ["P", "S(=O)(=O)Cl"]
      }
    ]
  },
  "num_molecules": 100 # how many molecules to generate (10 to 1,000,000)
}
FieldRequiredWhat it isLink
targetYesThe protein and binding pocket you're designing against.Target
chemical_spaceNoThe building-block library molecules are generated from.Chemical space
molecule_filtersNoWhich generated molecules pass through to results.Molecular filters
num_moleculesYesHow many molecules to generate.

The target is the protein you're designing against. List its entities (protein chains only), then optionally point the pipeline at the binding pocket:

  • Pocket residues: pocket_residues maps chain ID to the 0-indexed residues that line the pocket. Omit it and the pipeline auto-detects the pocket.
  • Reference ligands: reference_ligands are SMILES of known binders that help the pipeline locate the right pocket. When omitted, a set of drug-like default ligands is used for pocket detection.
  • Constraints and bonds: constraints (pocket and contact) and bonds (covalent links) add finer geometric control. See Core Concepts.

You can provide pocket residues, reference ligands, both, or neither. Providing both gives the pipeline the strongest signal for finding the right pocket.

chemical_space controls the building blocks available for generation. It defaults to enamine_real (the Enamine REAL space), which constrains molecules to commercially available, synthetically accessible building blocks. That keeps generated molecules makeable in the lab and avoids computationally promising hits that turn out to be impossible or prohibitively expensive to synthesize. Contact contact@boltz.bio for access to other chemical spaces.

Filters control which generated molecules reach your results. They combine with AND logic: a molecule must pass every filter.

Built-in alerts (boltz_smarts_catalog_filter_level) tune Boltz's curated structural-alert filtering, encoding substructures known to cause toxicity, reactivity, or poor pharmacokinetics:

LevelBehavior
recommended (default)Balanced filtering that catches the most common problematic substructures.
extraStricter filtering with additional alerts.
aggressiveMost conservative; rejects anything with a known structural concern.
disabledNo built-in filtering; only custom_filters apply.

Custom filters (custom_filters) are any combination of these:

Filter typeWhat it does
lipinski_filterLipinski's Rule of Five: set max_mw, max_logp, max_hbd, max_hba. Optional allow_single_violation.
rdkit_descriptor_filterMin/max ranges on RDKit descriptors (mol_wt, mol_logp, tpsa, num_h_donors, num_h_acceptors, num_rotatable_bonds, num_heteroatoms, num_aromatic_rings, num_rings, fraction_csp3). Each accepts {min, max}; omitted descriptors are unconstrained.
smarts_custom_filterReject molecules matching any of the provided SMARTS patterns.
smarts_catalog_filterReject molecules matching a named catalog: PAINS, BRENK, the CHEMBL family, NIH, and more.
smiles_regex_filterReject molecules whose SMILES matches any of the provided regex patterns.

When you download with run() / start() + client.experiments.download_results() (or the CLI's download-results), results land in a self-contained run directory:

run() and start() + client.experiments.download_results() poll on your behalf, append each result as it's generated, and download its files into a self-contained run directory. Rerun with the same name to resume.

boltz-experiments/<name>/
boltz-experiments/
└── my-run/ # the name you chose (or an auto-generated one)
├── .boltz-run.json # run + resume state, managed for you (don't edit)
├── run.json # the run object: status, progress, engine (download URLs stripped)
└── results/
├── index.jsonl # the manifest: one JSON record per result
└── <result-id>/
├── archive.tar.gz # the downloaded result archive
├── metadata.json # this result's fields (metrics, sequence/SMILES, …)
└── files/ # extracted from the archive
├── metrics.json
├── <result-id>_predicted.cif # predicted structure
└── pae.npz

results/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the molecule `smiles`, all `metrics`, and Tier-1 `adme` properties when available.

one results/index.jsonl record (pretty-printed; the file stores one per line)
{
"id": "<result-id>",
"created_at": "2026-02-25T13:03:40Z",
"metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 },
"paths": {
"archive": "results/<result-id>/archive.tar.gz",
"files": "results/<result-id>/files",
"metrics": "results/<result-id>/files/metrics.json",
"structure": "results/<result-id>/files/<result-id>_predicted.cif",
"pae": "results/<result-id>/files/pae.npz"
}
}

Everything is downloaded by default. To keep just the manifest and skip the archives, pass download_mode="metadata_only".

Each result is a scored molecule. This is what list_results() streams (and what each index.jsonl record mirrors):

{
  "data": [
    {
      "id": "sm_des_result_8f3a2b", # unique result ID
      "created_at": "2026-02-25T13:03:40Z",
      "smiles": "Cc1ccc(cc1)C(=O)Nc1ccc2[nH]ncc2c1", # the generated molecule
      "metrics": {
        "binding_confidence": 0.94, # 0–1; confidence protein binding occurs (affinity probability + structural quality); 0.7+ high-confidence
        "optimization_score": 0.53, # 0–1; ranks relative binding strength for lead optimization (normalized); higher is better
        "structure_confidence": 0.95, # 0–1; confidence in the predicted structure
        "iptm": 0.91, # 0–1; interface predicted TM-score
        "ptm": 0.92, # 0–1; global predicted TM-score
        "complex_plddt": 0.95, # 0–1; pLDDT across the full complex
        "complex_iplddt": 0.88 # 0–1; interface pLDDT
      },
      "adme": { # optional: Tier-1 ADME properties, when available
        "lipophilicity": 2.7,
        "permeability": 0.61,
        "solubility": "medium-confidence" # high-confidence | medium-confidence | high-risk
      },
      "artifacts": {
        # short-lived presigned download URLs; check url_expires_at and download promptly
        "structure": { # predicted bound structure (.cif); may be null until ready
          "url": "https://.../structure.cif",
          "url_expires_at": "2026-02-25T14:03:40Z"
        },
        "archive": { # full result archive (.tar.gz): structure, metrics.json, and pae.npz
          "url": "https://.../archive.tar.gz",
          "url_expires_at": "2026-02-25T14:03:40Z"
        }
      },
      "warnings": [] # optional quality warnings for this result, if any
    }
    # ...more results on this page
  ],
  "has_more": True, # true if more pages remain
  "first_id": "sm_des_result_8f3a2b", # ID of the first item; pass as before_id for the previous page
  "last_id": "sm_des_result_4ab7e0" # ID of the last item; pass as after_id for the next page
}

The run object tracks status and progress. It's what retrieve() returns, and what run.json mirrors:

{
  "id": "sm_des_run_8f3a2b",
  "status": "running", # pending | running | succeeded | failed | stopped
  "progress": {
    "total_molecules_to_generate": 100,
    "num_molecules_generated": 37, # generated and available to download so far
    "latest_result_id": "sm_des_result_8f3a2b"
  },
  "error": None, # { code, message } once status is "failed"
  "pipeline": "boltzmol",
  "pipeline_version": "1.0",
  "livemode": True, # false for runs created with a test key
  "workspace_id": "ws_3a2b",
  "created_at": "2026-02-25T12:00:00Z",
  "started_at": "2026-02-25T12:00:05Z",
  "completed_at": None, # set when the run finishes
  "stopped_at": None, # set if you stop the run early
  "data_deleted_at": None # set once the run's data is deleted
  # "input" echoes the request you submitted (null after data deletion)
}

For full control (and the only option in TypeScript, which has no managed download), drive the REST API yourself: poll the run for status, page through results as they're generated (cursor-paginated, so you can read them before the run finishes), and stop early. See Output format for the object shapes.

import time
design = client.small_molecule.design.start(target=target, num_molecules=100)
# Poll the run for status and progress.
while design.status not in ("succeeded", "failed", "stopped"):
time.sleep(10)
design = client.small_molecule.design.retrieve(design.id)
p = design.progress
print(f"{design.status}: {p.num_molecules_generated}/{p.total_molecules_to_generate}")
# Page through scored molecules; sort by binding_confidence (most confident binders)
# or optimization_score (lead optimization).
results = list(client.small_molecule.design.list_results(design.id))
results.sort(key=lambda r: r.metrics.binding_confidence, reverse=True)
for r in results[:5]:
print(f"{r.id} bind={r.metrics.binding_confidence:.2f} opt={r.metrics.optimization_score:.2f} {r.smiles}")
# Stop early once you've collected enough; results already produced stay available.
client.small_molecule.design.stop(design.id)
MetricRangeWhat it measures
binding_confidence0–1Confidence that protein binding occurs, combining affinity probability with structural quality. For triage, 0.7+ is typically the high-confidence range.
optimization_score0–1Ranks relative binding strength for lead optimization, normalized 0–1 (higher is better). Use it to prioritize the top-scoring candidates within the same run rather than as a universal pass/fail threshold.
structure_confidence0–1Measures the confidence of the predicted structure (0 = low, 1 = high).
iptm0–1Interface predicted TM-score.
ptm0–1Global predicted TM-score.
complex_plddt0–1pLDDT across the full complex.
complex_iplddt0–1Interface pLDDT for the complex.
StatusMeaning
pendingThe run is queued and has not started yet.
runningThe run is actively generating molecules. Results may already be available.
succeededThe run completed all requested molecules.
failedThe run encountered an error. Check the error field.
stoppedThe run was stopped early. Partial results are available.