Design small molecules

Guides

Generate novel small molecules against a protein target, monitor progress, fetch scored results, and stop early if needed.

Small molecule design generates novel molecules against a protein target, scored by binding confidence (confidence that binding occurs), optimization score (relative binding-strength ranking, for lead optimization), and structure confidence.

Run

run() submits the design, waits while molecules are generated, and downloads scored results to a local directory. Use start() + client.experiments.download_results() to submit now and download later; download_results() resumes if the download is interrupted.

import os
from boltz_api import Boltz

client = Boltz(api_key=os.environ["BOLTZ_API_KEY"])

target = {
    "entities": [{"type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"]}],
    "pocket_residues": {"A": [10, 11, 12, 35, 36, 37]},
}  # see Input for chemical_space, molecule_filters, constraints, …

# One call: submit, wait, and download results to a run directory.
run_dir = client.small_molecule.design.run(target=target, num_molecules=100, name="my-design")

# ...or submit now and download later:
design = client.small_molecule.design.start(target=target, num_molecules=100)
run_dir = client.experiments.download_results(id=design.id, name="my-design")  # rerun to resume an interrupted download

Write your input to small-molecule-design.yaml (see Input format), then:

RUN_ID=$(
  boltz-api --format raw small-molecule:design start \
    --input @yaml://./small-molecule-design.yaml | jq -r '.id'
)

# download-results polls and downloads on your behalf; rerun with the same --name to resume.
boltz-api download-results --id "$RUN_ID" --name my-design

The TypeScript client drives the REST API directly. Submit with start(), then poll and read results yourself (see Use the API directly).

import Boltz from "boltz-api";

const client = new Boltz({ apiKey: process.env["BOLTZ_API_KEY"] });

const target = {
  entities: [{ type: "protein", value: "MKTIIALSYIFCLVFA", chain_ids: ["A"] }],
  pocket_residues: { A: [10, 11, 12, 35, 36, 37] },
}; // see Input format for chemical_space, molecule_filters, …

const design = await client.smallMolecule.design.start({ target, num_molecules: 100 });

Input format

A design run takes a target to design against, plus optional control over the chemical space and which molecules pass through. The example below is a complete, valid input you can copy.

{
  "target": {
    "entities": [
      # protein chains only; at least one
      { "type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"] }
    ],
    "pocket_residues": { "A": [2, 3, 4, 7, 8, 9] }, # optional: keyed by chain ID; pocket residues (0-indexed); omit to auto-detect
    "reference_ligands": ["CC(=O)Oc1ccccc1C(=O)O"], # optional: known-binder SMILES that help locate the pocket
    "constraints": [ # optional: guide the geometry
      {
        "type": "pocket", # keep the binder near a set of receptor residues
        "binder_chain_id": "L", # chain ID the pipeline assigns to the designed molecule
        "contact_residues": { "A": [2, 3, 4, 7, 8, 9] },
        "max_distance_angstrom": 6.0
      }
    ]
  },
  "chemical_space": "enamine_real" # optional: building-block library (currently only enamine_real),
  "molecule_filters": {
    "boltz_smarts_catalog_filter_level": "recommended", # recommended | extra | aggressive | disabled
    "custom_filters": [
      # any combination; a molecule must pass all of them (AND logic). one of each type shown:
      {
        "type": "lipinski_filter", # Rule of Five
        "max_mw": 500,
        "max_logp": 5,
        "max_hbd": 5,
        "max_hba": 10,
        "allow_single_violation": False # optional: allow one rule to fail
      },
      {
        "type": "rdkit_descriptor_filter", # min/max on RDKit descriptors; include only the ones you want to bound
        "mol_wt": { "min": 150, "max": 500 },
        "mol_logp": { "max": 5 },
        "tpsa": { "max": 140 },
        "num_h_donors": { "max": 5 },
        "num_h_acceptors": { "max": 10 },
        "num_rotatable_bonds": { "max": 10 },
        "num_heteroatoms": { "max": 12 },
        "num_aromatic_rings": { "min": 1, "max": 4 },
        "num_rings": { "max": 6 },
        "fraction_csp3": { "min": 0.2 }
      },
      {
        "type": "smarts_custom_filter", # reject molecules matching any of these SMARTS
        "patterns": ["[N+](=O)[O-]", "C(=O)Cl"]
      },
      {
        "type": "smarts_catalog_filter", # reject by a named alert catalog
        "catalog": "PAINS" # PAINS | PAINS_A | PAINS_B | PAINS_C | BRENK | CHEMBL | CHEMBL_BMS | CHEMBL_Dundee | CHEMBL_Glaxo | CHEMBL_Inpharmatica | CHEMBL_LINT | CHEMBL_MLSMR | CHEMBL_SureChEMBL | NIH
      },
      {
        "type": "smiles_regex_filter", # reject molecules whose SMILES matches any of these regexes
        "patterns": ["P", "S(=O)(=O)Cl"]
      }
    ]
  },
  "num_molecules": 100 # how many molecules to generate (10 to 1,000,000)
}

Field	Required	What it is	Link
`target`	Yes	The protein and binding pocket you're designing against.	Target
`chemical_space`	No	The building-block library molecules are generated from.	Chemical space
`molecule_filters`	No	Which generated molecules pass through to results.	Molecular filters
`num_molecules`	Yes	How many molecules to generate.	—

Target (`target`)

The target is the protein you're designing against. List its entities (protein chains only), then optionally point the pipeline at the binding pocket:

Pocket residues: pocket_residues maps chain ID to the 0-indexed residues that line the pocket. Omit it and the pipeline auto-detects the pocket.
Reference ligands: reference_ligands are SMILES of known binders that help the pipeline locate the right pocket. When omitted, a set of drug-like default ligands is used for pocket detection.
Constraints and bonds: constraints (pocket and contact) and bonds (covalent links) add finer geometric control. See Core Concepts.

You can provide pocket residues, reference ligands, both, or neither. Providing both gives the pipeline the strongest signal for finding the right pocket.

Chemical space (`chemical_space`)

chemical_space controls the building blocks available for generation. It defaults to enamine_real (the Enamine REAL space), which constrains molecules to commercially available, synthetically accessible building blocks. That keeps generated molecules makeable in the lab and avoids computationally promising hits that turn out to be impossible or prohibitively expensive to synthesize. Contact contact@boltz.bio for access to other chemical spaces.

Molecular filters (`molecule_filters`)

Filters control which generated molecules reach your results. They combine with AND logic: a molecule must pass every filter.

Built-in alerts (boltz_smarts_catalog_filter_level) tune Boltz's curated structural-alert filtering, encoding substructures known to cause toxicity, reactivity, or poor pharmacokinetics:

Level	Behavior
`recommended` (default)	Balanced filtering that catches the most common problematic substructures.
`extra`	Stricter filtering with additional alerts.
`aggressive`	Most conservative; rejects anything with a known structural concern.
`disabled`	No built-in filtering; only `custom_filters` apply.

Custom filters (custom_filters) are any combination of these:

Filter type	What it does
`lipinski_filter`	Lipinski's Rule of Five: set `max_mw`, `max_logp`, `max_hbd`, `max_hba`. Optional `allow_single_violation`.
`rdkit_descriptor_filter`	Min/max ranges on RDKit descriptors (`mol_wt`, `mol_logp`, `tpsa`, `num_h_donors`, `num_h_acceptors`, `num_rotatable_bonds`, `num_heteroatoms`, `num_aromatic_rings`, `num_rings`, `fraction_csp3`). Each accepts `{min, max}`; omitted descriptors are unconstrained.
`smarts_custom_filter`	Reject molecules matching any of the provided SMARTS `patterns`.
`smarts_catalog_filter`	Reject molecules matching a named `catalog`: `PAINS`, `BRENK`, the `CHEMBL` family, `NIH`, and more.
`smiles_regex_filter`	Reject molecules whose SMILES matches any of the provided regex `patterns`.

Output format

When you download with run() / start() + client.experiments.download_results() (or the CLI's download-results), results land in a self-contained run directory:

run() and start() + client.experiments.download_results() poll on your behalf, append each result as it's generated, and download its files into a self-contained run directory. Rerun with the same name to resume.

boltz-experiments/
└── my-run/                       # the name you chose (or an auto-generated one)
    ├── .boltz-run.json           # run + resume state, managed for you (don't edit)
    ├── run.json                  # the run object: status, progress, engine (download URLs stripped)
    └── results/
        ├── index.jsonl           # the manifest: one JSON record per result
        └── <result-id>/
            ├── archive.tar.gz             # the downloaded result archive
            ├── metadata.json              # this result's fields (metrics, sequence/SMILES, …)
            └── files/                     # extracted from the archive
                ├── metrics.json
                ├── <result-id>_predicted.cif   # predicted structure
                └── pae.npz

results/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the molecule `smiles`, all `metrics`, and Tier-1 `adme` properties when available.

{
  "id": "<result-id>",
  "created_at": "2026-02-25T13:03:40Z",
  "metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 },
  "paths": {
    "archive": "results/<result-id>/archive.tar.gz",
    "files": "results/<result-id>/files",
    "metrics": "results/<result-id>/files/metrics.json",
    "structure": "results/<result-id>/files/<result-id>_predicted.cif",
    "pae": "results/<result-id>/files/pae.npz"
  }
}

Everything is downloaded by default. To keep just the manifest and skip the archives, pass download_mode="metadata_only".

download-results polls on your behalf, appends each result as it's generated, and downloads its files into a self-contained run directory. Rerun with the same --name to resume.

boltz-experiments/
└── my-run/                       # the name you chose (or an auto-generated one)
    ├── .boltz-run.json           # run + resume state, managed for you (don't edit)
    ├── run.json                  # the run object: status, progress, engine (download URLs stripped)
    └── results/
        ├── index.jsonl           # the manifest: one JSON record per result
        └── <result-id>/
            ├── archive.tar.gz             # the downloaded result archive
            ├── metadata.json              # this result's fields (metrics, sequence/SMILES, …)
            └── files/                     # extracted from the archive
                ├── metrics.json
                ├── <result-id>_predicted.cif   # predicted structure
                └── pae.npz

{
  "id": "<result-id>",
  "created_at": "2026-02-25T13:03:40Z",
  "metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 },
  "paths": {
    "archive": "results/<result-id>/archive.tar.gz",
    "files": "results/<result-id>/files",
    "metrics": "results/<result-id>/files/metrics.json",
    "structure": "results/<result-id>/files/<result-id>_predicted.cif",
    "pae": "results/<result-id>/files/pae.npz"
  }
}

Everything is downloaded by default. To keep just the manifest and skip the archives, pass --download-mode metadata_only.

The TypeScript client doesn't download for you; you read the API objects directly (shown below). The result and run-status shapes are identical to what index.jsonl and run.json mirror on disk.

Each result is a scored molecule. This is what list_results() streams (and what each index.jsonl record mirrors):

{
  "data": [
    {
      "id": "sm_des_result_8f3a2b", # unique result ID
      "created_at": "2026-02-25T13:03:40Z",
      "smiles": "Cc1ccc(cc1)C(=O)Nc1ccc2[nH]ncc2c1", # the generated molecule
      "metrics": {
        "binding_confidence": 0.94, # 0–1; confidence protein binding occurs (affinity probability + structural quality); 0.7+ high-confidence
        "optimization_score": 0.53, # 0–1; ranks relative binding strength for lead optimization (normalized); higher is better
        "structure_confidence": 0.95, # 0–1; confidence in the predicted structure
        "iptm": 0.91, # 0–1; interface predicted TM-score
        "ptm": 0.92, # 0–1; global predicted TM-score
        "complex_plddt": 0.95, # 0–1; pLDDT across the full complex
        "complex_iplddt": 0.88 # 0–1; interface pLDDT
      },
      "adme": { # optional: Tier-1 ADME properties, when available
        "lipophilicity": 2.7,
        "permeability": 0.61,
        "solubility": "medium-confidence" # high-confidence | medium-confidence | high-risk
      },
      "artifacts": {
        # short-lived presigned download URLs; check url_expires_at and download promptly
        "structure": { # predicted bound structure (.cif); may be null until ready
          "url": "https://.../structure.cif",
          "url_expires_at": "2026-02-25T14:03:40Z"
        },
        "archive": { # full result archive (.tar.gz): structure, metrics.json, and pae.npz
          "url": "https://.../archive.tar.gz",
          "url_expires_at": "2026-02-25T14:03:40Z"
        }
      },
      "warnings": [] # optional quality warnings for this result, if any
    }
    # ...more results on this page
  ],
  "has_more": True, # true if more pages remain
  "first_id": "sm_des_result_8f3a2b", # ID of the first item; pass as before_id for the previous page
  "last_id": "sm_des_result_4ab7e0" # ID of the last item; pass as after_id for the next page
}

The run object tracks status and progress. It's what retrieve() returns, and what run.json mirrors:

{
  "id": "sm_des_run_8f3a2b",
  "status": "running", # pending | running | succeeded | failed | stopped
  "progress": {
    "total_molecules_to_generate": 100,
    "num_molecules_generated": 37, # generated and available to download so far
    "latest_result_id": "sm_des_result_8f3a2b"
  },
  "error": None, # { code, message } once status is "failed"
  "pipeline": "boltzmol",
  "pipeline_version": "1.0",
  "livemode": True, # false for runs created with a test key
  "workspace_id": "ws_3a2b",
  "created_at": "2026-02-25T12:00:00Z",
  "started_at": "2026-02-25T12:00:05Z",
  "completed_at": None, # set when the run finishes
  "stopped_at": None, # set if you stop the run early
  "data_deleted_at": None # set once the run's data is deleted
  # "input" echoes the request you submitted (null after data deletion)
}

Use the API directly

For full control (and the only option in TypeScript, which has no managed download), drive the REST API yourself: poll the run for status, page through results as they're generated (cursor-paginated, so you can read them before the run finishes), and stop early. See Output format for the object shapes.

import time

design = client.small_molecule.design.start(target=target, num_molecules=100)

# Poll the run for status and progress.
while design.status not in ("succeeded", "failed", "stopped"):
    time.sleep(10)
    design = client.small_molecule.design.retrieve(design.id)
    p = design.progress
    print(f"{design.status}: {p.num_molecules_generated}/{p.total_molecules_to_generate}")

# Page through scored molecules; sort by binding_confidence (most confident binders)
# or optimization_score (lead optimization).
results = list(client.small_molecule.design.list_results(design.id))
results.sort(key=lambda r: r.metrics.binding_confidence, reverse=True)
for r in results[:5]:
    print(f"{r.id}  bind={r.metrics.binding_confidence:.2f}  opt={r.metrics.optimization_score:.2f}  {r.smiles}")

# Stop early once you've collected enough; results already produced stay available.
client.small_molecule.design.stop(design.id)

boltz-api small-molecule:design retrieve --id "$RUN_ID"      # run status + progress
boltz-api small-molecule:design list-results --id "$RUN_ID"  # scored molecules (paginated)
boltz-api small-molecule:design stop --id "$RUN_ID"          # stop early; partial results stay available

// Reuses the client and target from the Run section above.
let design = await client.smallMolecule.design.start({ target, num_molecules: 100 });

// Poll until the run finishes.
while (!["succeeded", "failed", "stopped"].includes(design.status)) {
  await new Promise((r) => setTimeout(r, 10000));
  design = await client.smallMolecule.design.retrieve(design.id);
}

// Stream scored molecules as they arrive (cursor-paginated).
for await (const result of client.smallMolecule.design.listResults(design.id)) {
  console.log(`${result.id}  bind=${result.metrics.binding_confidence}  ${result.smiles}`);
}

// Stop early once you've collected enough; results already produced stay available.
await client.smallMolecule.design.stop(design.id);

Metrics

Metric	Range	What it measures
`binding_confidence`	0–1	Confidence that protein binding occurs, combining affinity probability with structural quality. For triage, 0.7+ is typically the high-confidence range.
`optimization_score`	0–1	Ranks relative binding strength for lead optimization, normalized 0–1 (higher is better). Use it to prioritize the top-scoring candidates within the same run rather than as a universal pass/fail threshold.
`structure_confidence`	0–1	Measures the confidence of the predicted structure (0 = low, 1 = high).
`iptm`	0–1	Interface predicted TM-score.
`ptm`	0–1	Global predicted TM-score.
`complex_plddt`	0–1	pLDDT across the full complex.
`complex_iplddt`	0–1	Interface pLDDT for the complex.

Status values

Status	Meaning
`pending`	The run is queued and has not started yet.
`running`	The run is actively generating molecules. Results may already be available.
`succeeded`	The run completed all requested molecules.
`failed`	The run encountered an error. Check the `error` field.
`stopped`	The run was stopped early. Partial results are available.