Design small molecules
Generate novel small molecules against a protein target, monitor progress, fetch scored results, and stop early if needed.
Small molecule design generates novel molecules against a protein target, scored by binding confidence (confidence that binding occurs), optimization score (relative binding-strength ranking, for lead optimization), and structure confidence.
run() submits the design, waits while molecules are generated, and downloads scored results to a local directory. Use start() + client.experiments.download_results() to submit now and download later; download_results() resumes if the download is interrupted.
import osfrom boltz_api import Boltz
client = Boltz(api_key=os.environ["BOLTZ_API_KEY"])
target = { "entities": [{"type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"]}], "pocket_residues": {"A": [10, 11, 12, 35, 36, 37]},} # see Input for chemical_space, molecule_filters, constraints, …
# One call: submit, wait, and download results to a run directory.run_dir = client.small_molecule.design.run(target=target, num_molecules=100, name="my-design")
# ...or submit now and download later:design = client.small_molecule.design.start(target=target, num_molecules=100)run_dir = client.experiments.download_results(id=design.id, name="my-design") # rerun to resume an interrupted downloadWrite your input to small-molecule-design.yaml (see Input format), then:
RUN_ID=$( boltz-api --format raw small-molecule:design start \ --input @yaml://./small-molecule-design.yaml | jq -r '.id')
# download-results polls and downloads on your behalf; rerun with the same --name to resume.boltz-api download-results --id "$RUN_ID" --name my-designThe TypeScript client drives the REST API directly. Submit with start(), then poll and read results yourself (see Use the API directly).
import Boltz from "boltz-api";
const client = new Boltz({ apiKey: process.env["BOLTZ_API_KEY"] });
const target = { entities: [{ type: "protein", value: "MKTIIALSYIFCLVFA", chain_ids: ["A"] }], pocket_residues: { A: [10, 11, 12, 35, 36, 37] },}; // see Input format for chemical_space, molecule_filters, …
const design = await client.smallMolecule.design.start({ target, num_molecules: 100 });Input format
Section titled “Input format”A design run takes a target to design against, plus optional control over the chemical space and which molecules pass through. The example below is a complete, valid input you can copy.
{
"target": {
"entities": [
# protein chains only; at least one
{ "type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"] }
],
"pocket_residues": { "A": [2, 3, 4, 7, 8, 9] }, # optional: keyed by chain ID; pocket residues (0-indexed); omit to auto-detect
"reference_ligands": ["CC(=O)Oc1ccccc1C(=O)O"], # optional: known-binder SMILES that help locate the pocket
"constraints": [ # optional: guide the geometry
{
"type": "pocket", # keep the binder near a set of receptor residues
"binder_chain_id": "L", # chain ID the pipeline assigns to the designed molecule
"contact_residues": { "A": [2, 3, 4, 7, 8, 9] },
"max_distance_angstrom": 6.0
}
]
},
"chemical_space": "enamine_real" # optional: building-block library (currently only enamine_real),
"molecule_filters": {
"boltz_smarts_catalog_filter_level": "recommended", # recommended | extra | aggressive | disabled
"custom_filters": [
# any combination; a molecule must pass all of them (AND logic). one of each type shown:
{
"type": "lipinski_filter", # Rule of Five
"max_mw": 500,
"max_logp": 5,
"max_hbd": 5,
"max_hba": 10,
"allow_single_violation": False # optional: allow one rule to fail
},
{
"type": "rdkit_descriptor_filter", # min/max on RDKit descriptors; include only the ones you want to bound
"mol_wt": { "min": 150, "max": 500 },
"mol_logp": { "max": 5 },
"tpsa": { "max": 140 },
"num_h_donors": { "max": 5 },
"num_h_acceptors": { "max": 10 },
"num_rotatable_bonds": { "max": 10 },
"num_heteroatoms": { "max": 12 },
"num_aromatic_rings": { "min": 1, "max": 4 },
"num_rings": { "max": 6 },
"fraction_csp3": { "min": 0.2 }
},
{
"type": "smarts_custom_filter", # reject molecules matching any of these SMARTS
"patterns": ["[N+](=O)[O-]", "C(=O)Cl"]
},
{
"type": "smarts_catalog_filter", # reject by a named alert catalog
"catalog": "PAINS" # PAINS | PAINS_A | PAINS_B | PAINS_C | BRENK | CHEMBL | CHEMBL_BMS | CHEMBL_Dundee | CHEMBL_Glaxo | CHEMBL_Inpharmatica | CHEMBL_LINT | CHEMBL_MLSMR | CHEMBL_SureChEMBL | NIH
},
{
"type": "smiles_regex_filter", # reject molecules whose SMILES matches any of these regexes
"patterns": ["P", "S(=O)(=O)Cl"]
}
]
},
"num_molecules": 100 # how many molecules to generate (10 to 1,000,000)
}| Field | Required | What it is | Link |
|---|---|---|---|
target | Yes | The protein and binding pocket you're designing against. | Target |
chemical_space | No | The building-block library molecules are generated from. | Chemical space |
molecule_filters | No | Which generated molecules pass through to results. | Molecular filters |
num_molecules | Yes | How many molecules to generate. | — |
Target (target)
Section titled “Target (target)”The target is the protein you're designing against. List its entities (protein chains only), then optionally point the pipeline at the binding pocket:
- Pocket residues:
pocket_residuesmaps chain ID to the 0-indexed residues that line the pocket. Omit it and the pipeline auto-detects the pocket. - Reference ligands:
reference_ligandsare SMILES of known binders that help the pipeline locate the right pocket. When omitted, a set of drug-like default ligands is used for pocket detection. - Constraints and bonds:
constraints(pocket and contact) andbonds(covalent links) add finer geometric control. See Core Concepts.
You can provide pocket residues, reference ligands, both, or neither. Providing both gives the pipeline the strongest signal for finding the right pocket.
Chemical space (chemical_space)
Section titled “Chemical space (chemical_space)”chemical_space controls the building blocks available for generation. It defaults to enamine_real (the Enamine REAL space), which constrains molecules to commercially available, synthetically accessible building blocks. That keeps generated molecules makeable in the lab and avoids computationally promising hits that turn out to be impossible or prohibitively expensive to synthesize. Contact contact@boltz.bio for access to other chemical spaces.
Molecular filters (molecule_filters)
Section titled “Molecular filters (molecule_filters)”Filters control which generated molecules reach your results. They combine with AND logic: a molecule must pass every filter.
Built-in alerts (boltz_smarts_catalog_filter_level) tune Boltz's curated structural-alert filtering, encoding substructures known to cause toxicity, reactivity, or poor pharmacokinetics:
| Level | Behavior |
|---|---|
recommended (default) | Balanced filtering that catches the most common problematic substructures. |
extra | Stricter filtering with additional alerts. |
aggressive | Most conservative; rejects anything with a known structural concern. |
disabled | No built-in filtering; only custom_filters apply. |
Custom filters (custom_filters) are any combination of these:
| Filter type | What it does |
|---|---|
lipinski_filter | Lipinski's Rule of Five: set max_mw, max_logp, max_hbd, max_hba. Optional allow_single_violation. |
rdkit_descriptor_filter | Min/max ranges on RDKit descriptors (mol_wt, mol_logp, tpsa, num_h_donors, num_h_acceptors, num_rotatable_bonds, num_heteroatoms, num_aromatic_rings, num_rings, fraction_csp3). Each accepts {min, max}; omitted descriptors are unconstrained. |
smarts_custom_filter | Reject molecules matching any of the provided SMARTS patterns. |
smarts_catalog_filter | Reject molecules matching a named catalog: PAINS, BRENK, the CHEMBL family, NIH, and more. |
smiles_regex_filter | Reject molecules whose SMILES matches any of the provided regex patterns. |
Output format
Section titled “Output format”When you download with run() / start() + client.experiments.download_results() (or the CLI's download-results), results land in a self-contained run directory:
run() and start() + client.experiments.download_results() poll on your behalf, append each result as it's generated, and download its files into a self-contained run directory. Rerun with the same name to resume.
boltz-experiments/└── my-run/ # the name you chose (or an auto-generated one) ├── .boltz-run.json # run + resume state, managed for you (don't edit) ├── run.json # the run object: status, progress, engine (download URLs stripped) └── results/ ├── index.jsonl # the manifest: one JSON record per result └── <result-id>/ ├── archive.tar.gz # the downloaded result archive ├── metadata.json # this result's fields (metrics, sequence/SMILES, …) └── files/ # extracted from the archive ├── metrics.json ├── <result-id>_predicted.cif # predicted structure └── pae.npzresults/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the molecule `smiles`, all `metrics`, and Tier-1 `adme` properties when available.
{ "id": "<result-id>", "created_at": "2026-02-25T13:03:40Z", "metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 }, "paths": { "archive": "results/<result-id>/archive.tar.gz", "files": "results/<result-id>/files", "metrics": "results/<result-id>/files/metrics.json", "structure": "results/<result-id>/files/<result-id>_predicted.cif", "pae": "results/<result-id>/files/pae.npz" }}Everything is downloaded by default. To keep just the manifest and skip the archives, pass download_mode="metadata_only".
download-results polls on your behalf, appends each result as it's generated, and downloads its files into a self-contained run directory. Rerun with the same --name to resume.
boltz-experiments/└── my-run/ # the name you chose (or an auto-generated one) ├── .boltz-run.json # run + resume state, managed for you (don't edit) ├── run.json # the run object: status, progress, engine (download URLs stripped) └── results/ ├── index.jsonl # the manifest: one JSON record per result └── <result-id>/ ├── archive.tar.gz # the downloaded result archive ├── metadata.json # this result's fields (metrics, sequence/SMILES, …) └── files/ # extracted from the archive ├── metrics.json ├── <result-id>_predicted.cif # predicted structure └── pae.npzresults/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the molecule `smiles`, all `metrics`, and Tier-1 `adme` properties when available.
{ "id": "<result-id>", "created_at": "2026-02-25T13:03:40Z", "metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 }, "paths": { "archive": "results/<result-id>/archive.tar.gz", "files": "results/<result-id>/files", "metrics": "results/<result-id>/files/metrics.json", "structure": "results/<result-id>/files/<result-id>_predicted.cif", "pae": "results/<result-id>/files/pae.npz" }}Everything is downloaded by default. To keep just the manifest and skip the archives, pass --download-mode metadata_only.
The TypeScript client doesn't download for you; you read the API objects directly (shown below). The result and run-status shapes are identical to what index.jsonl and run.json mirror on disk.
Each result is a scored molecule. This is what list_results() streams (and what each index.jsonl record mirrors):
{
"data": [
{
"id": "sm_des_result_8f3a2b", # unique result ID
"created_at": "2026-02-25T13:03:40Z",
"smiles": "Cc1ccc(cc1)C(=O)Nc1ccc2[nH]ncc2c1", # the generated molecule
"metrics": {
"binding_confidence": 0.94, # 0–1; confidence protein binding occurs (affinity probability + structural quality); 0.7+ high-confidence
"optimization_score": 0.53, # 0–1; ranks relative binding strength for lead optimization (normalized); higher is better
"structure_confidence": 0.95, # 0–1; confidence in the predicted structure
"iptm": 0.91, # 0–1; interface predicted TM-score
"ptm": 0.92, # 0–1; global predicted TM-score
"complex_plddt": 0.95, # 0–1; pLDDT across the full complex
"complex_iplddt": 0.88 # 0–1; interface pLDDT
},
"adme": { # optional: Tier-1 ADME properties, when available
"lipophilicity": 2.7,
"permeability": 0.61,
"solubility": "medium-confidence" # high-confidence | medium-confidence | high-risk
},
"artifacts": {
# short-lived presigned download URLs; check url_expires_at and download promptly
"structure": { # predicted bound structure (.cif); may be null until ready
"url": "https://.../structure.cif",
"url_expires_at": "2026-02-25T14:03:40Z"
},
"archive": { # full result archive (.tar.gz): structure, metrics.json, and pae.npz
"url": "https://.../archive.tar.gz",
"url_expires_at": "2026-02-25T14:03:40Z"
}
},
"warnings": [] # optional quality warnings for this result, if any
}
# ...more results on this page
],
"has_more": True, # true if more pages remain
"first_id": "sm_des_result_8f3a2b", # ID of the first item; pass as before_id for the previous page
"last_id": "sm_des_result_4ab7e0" # ID of the last item; pass as after_id for the next page
}The run object tracks status and progress. It's what retrieve() returns, and what run.json mirrors:
{
"id": "sm_des_run_8f3a2b",
"status": "running", # pending | running | succeeded | failed | stopped
"progress": {
"total_molecules_to_generate": 100,
"num_molecules_generated": 37, # generated and available to download so far
"latest_result_id": "sm_des_result_8f3a2b"
},
"error": None, # { code, message } once status is "failed"
"pipeline": "boltzmol",
"pipeline_version": "1.0",
"livemode": True, # false for runs created with a test key
"workspace_id": "ws_3a2b",
"created_at": "2026-02-25T12:00:00Z",
"started_at": "2026-02-25T12:00:05Z",
"completed_at": None, # set when the run finishes
"stopped_at": None, # set if you stop the run early
"data_deleted_at": None # set once the run's data is deleted
# "input" echoes the request you submitted (null after data deletion)
}Use the API directly
Section titled “Use the API directly”For full control (and the only option in TypeScript, which has no managed download), drive the REST API yourself: poll the run for status, page through results as they're generated (cursor-paginated, so you can read them before the run finishes), and stop early. See Output format for the object shapes.
import time
design = client.small_molecule.design.start(target=target, num_molecules=100)
# Poll the run for status and progress.while design.status not in ("succeeded", "failed", "stopped"): time.sleep(10) design = client.small_molecule.design.retrieve(design.id) p = design.progress print(f"{design.status}: {p.num_molecules_generated}/{p.total_molecules_to_generate}")
# Page through scored molecules; sort by binding_confidence (most confident binders)# or optimization_score (lead optimization).results = list(client.small_molecule.design.list_results(design.id))results.sort(key=lambda r: r.metrics.binding_confidence, reverse=True)for r in results[:5]: print(f"{r.id} bind={r.metrics.binding_confidence:.2f} opt={r.metrics.optimization_score:.2f} {r.smiles}")
# Stop early once you've collected enough; results already produced stay available.client.small_molecule.design.stop(design.id)boltz-api small-molecule:design retrieve --id "$RUN_ID" # run status + progressboltz-api small-molecule:design list-results --id "$RUN_ID" # scored molecules (paginated)boltz-api small-molecule:design stop --id "$RUN_ID" # stop early; partial results stay available// Reuses the client and target from the Run section above.let design = await client.smallMolecule.design.start({ target, num_molecules: 100 });
// Poll until the run finishes.while (!["succeeded", "failed", "stopped"].includes(design.status)) { await new Promise((r) => setTimeout(r, 10000)); design = await client.smallMolecule.design.retrieve(design.id);}
// Stream scored molecules as they arrive (cursor-paginated).for await (const result of client.smallMolecule.design.listResults(design.id)) { console.log(`${result.id} bind=${result.metrics.binding_confidence} ${result.smiles}`);}
// Stop early once you've collected enough; results already produced stay available.await client.smallMolecule.design.stop(design.id);Metrics
Section titled “Metrics”| Metric | Range | What it measures |
|---|---|---|
binding_confidence | 0–1 | Confidence that protein binding occurs, combining affinity probability with structural quality. For triage, 0.7+ is typically the high-confidence range. |
optimization_score | 0–1 | Ranks relative binding strength for lead optimization, normalized 0–1 (higher is better). Use it to prioritize the top-scoring candidates within the same run rather than as a universal pass/fail threshold. |
structure_confidence | 0–1 | Measures the confidence of the predicted structure (0 = low, 1 = high). |
iptm | 0–1 | Interface predicted TM-score. |
ptm | 0–1 | Global predicted TM-score. |
complex_plddt | 0–1 | pLDDT across the full complex. |
complex_iplddt | 0–1 | Interface pLDDT for the complex. |
Status values
Section titled “Status values”| Status | Meaning |
|---|---|
pending | The run is queued and has not started yet. |
running | The run is actively generating molecules. Results may already be available. |
succeeded | The run completed all requested molecules. |
failed | The run encountered an error. Check the error field. |
stopped | The run was stopped early. Partial results are available. |