Screen small molecule libraries
Score your own small molecules against a protein target, fetch results as they arrive, and stop early if needed.
A small molecule library screen scores molecules you provide against a protein target, evaluating each for binding confidence, optimization score, and structure confidence. Results stream in as molecules are scored, so you can fetch them before the screen finishes and stop early.
run() submits the screen, waits while molecules are scored, and downloads scored results to a local directory. Use start() + client.experiments.download_results() to submit now and download later; download_results() resumes if the download is interrupted.
import osfrom boltz_api import Boltz
client = Boltz(api_key=os.environ["BOLTZ_API_KEY"])
target = { "entities": [{"type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"]}],} # see Input format for pocket_residues, reference_ligands, molecule_filters, …
molecules = [ {"smiles": "CC(=O)OC1=CC=CC=C1C(=O)O", "id": "aspirin"}, {"smiles": "C1=CC=C(C=C1)O", "id": "phenol"}, {"smiles": "CC1=CC=CC=C1"},] # see Input format for ids and filtering
# One call: submit, wait, and download results to a run directory.run_dir = client.small_molecule.library_screen.run(target=target, molecules=molecules, name="my-screen")
# ...or submit now and download later:screen = client.small_molecule.library_screen.start(target=target, molecules=molecules)run_dir = client.experiments.download_results(id=screen.id, name="my-screen") # rerun to resume an interrupted downloadWrite your input to small-molecule-screen.yaml (see Input format), then:
SCREEN_ID=$( boltz-api --format raw small-molecule:library-screen start \ --input @yaml://./small-molecule-screen.yaml | jq -r '.id')
# download-results polls and downloads on your behalf; rerun with the same --name to resume.boltz-api download-results --id "$SCREEN_ID" --name my-screenThe TypeScript client drives the REST API directly. Submit with start(), then poll and read results yourself (see Use the API directly).
import Boltz from "boltz-api";
const client = new Boltz({ apiKey: process.env["BOLTZ_API_KEY"] });
const target = { entities: [{ type: "protein", value: "MKTIIALSYIFCLVFA", chain_ids: ["A"] }],}; // see Input format for pocket_residues, reference_ligands, molecule_filters, …
const molecules = [ { smiles: "CC(=O)OC1=CC=CC=C1C(=O)O", id: "aspirin" }, { smiles: "C1=CC=C(C=C1)O", id: "phenol" }, { smiles: "CC1=CC=CC=C1" },];
const screen = await client.smallMolecule.libraryScreen.start({ target, molecules });Input format
Section titled “Input format”A screen takes a target to score against, the molecules to screen, and optional filtering.
The example below is a complete, valid input you can copy. The deep-dive sections explain each part, and Run shows the run() / start() calls that consume it.
{
"target": {
"entities": [
# protein chains only; at least one
{ "type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"] }
],
"pocket_residues": { "A": [2, 3, 4, 7, 8, 9] }, # optional: keyed by chain ID; pocket residues (0-indexed); omit to auto-detect
"reference_ligands": ["CC(=O)Oc1ccccc1C(=O)O"], # optional: known-binder SMILES that help locate the pocket
"constraints": [ # optional: guide the geometry
{
"type": "pocket", # keep the binder near a set of receptor residues
"binder_chain_id": "L", # chain ID the pipeline assigns to the screened molecule
"contact_residues": { "A": [2, 3, 4, 7, 8, 9] },
"max_distance_angstrom": 6.0
}
]
},
"molecules": [
# your library to screen; each needs "smiles". optional "id" is returned as
# "external_id" on the matching result so you can correlate back to your library
{ "smiles": "CC(=O)OC1=CC=CC=C1C(=O)O", "id": "aspirin" },
{ "smiles": "C1=CC=C(C=C1)O", "id": "phenol" },
{ "smiles": "CC1=CC=CC=C1" }
],
"molecule_filters": {
"boltz_smarts_catalog_filter_level": "recommended", # recommended | extra | aggressive | disabled
"custom_filters": [
# any combination; a molecule must pass all of them (AND logic). one of each type shown:
{
"type": "lipinski_filter", # Rule of Five
"max_mw": 500,
"max_logp": 5,
"max_hbd": 5,
"max_hba": 10,
"allow_single_violation": False # optional: allow one rule to fail
},
{
"type": "rdkit_descriptor_filter", # min/max on RDKit descriptors; include only the ones you want to bound
"mol_wt": { "min": 150, "max": 500 },
"mol_logp": { "max": 5 },
"tpsa": { "max": 140 },
"num_h_donors": { "max": 5 },
"num_h_acceptors": { "max": 10 },
"num_rotatable_bonds": { "max": 10 },
"num_heteroatoms": { "max": 12 },
"num_aromatic_rings": { "min": 1, "max": 4 },
"num_rings": { "max": 6 },
"fraction_csp3": { "min": 0.2 }
},
{
"type": "smarts_custom_filter", # reject molecules matching any of these SMARTS
"patterns": ["[N+](=O)[O-]", "C(=O)Cl"]
},
{
"type": "smarts_catalog_filter", # reject by a named alert catalog
"catalog": "PAINS" # PAINS | PAINS_A | PAINS_B | PAINS_C | BRENK | CHEMBL | CHEMBL_BMS | CHEMBL_Dundee | CHEMBL_Glaxo | CHEMBL_Inpharmatica | CHEMBL_LINT | CHEMBL_MLSMR | CHEMBL_SureChEMBL | NIH
},
{
"type": "smiles_regex_filter", # reject molecules whose SMILES matches any of these regexes
"patterns": ["P", "S(=O)(=O)Cl"]
}
]
}
}| Field | Required | What it is | Link |
|---|---|---|---|
target | Yes | The protein and binding pocket to score against. | Target |
molecules | Yes | The molecules you want to screen. | Molecules |
molecule_filters | No | Which molecules pass through to results. | Molecular filters |
Target (target)
Section titled “Target (target)”The target is the protein you're screening against. List its entities (protein chains only), then optionally point the pipeline at the binding pocket:
- Pocket residues:
pocket_residuesmaps chain ID to the 0-indexed residues that line the pocket. Omit it and the pipeline auto-detects the pocket. - Reference ligands:
reference_ligandsare SMILES of known binders that help the pipeline locate the right pocket. When omitted, a set of drug-like default ligands is used for pocket detection. - Constraints and bonds:
constraints(pocket and contact) andbonds(covalent links) add finer geometric control. See Core Concepts.
Molecules (molecules)
Section titled “Molecules (molecules)”molecules is the library you want to score, given inline as an array. Each entry needs a smiles string and may carry an optional id. When you provide an id, it comes back as external_id on the matching result, so you can correlate results to your input library. Molecules that fail filtering are skipped and don't appear in results.
Molecular filters (molecule_filters)
Section titled “Molecular filters (molecule_filters)”Filters control which molecules reach your results. They combine with AND logic: a molecule must pass every filter.
Built-in alerts (boltz_smarts_catalog_filter_level) tune Boltz's curated structural-alert filtering, encoding substructures known to cause toxicity, reactivity, or poor pharmacokinetics:
| Level | Behavior |
|---|---|
recommended (default) | Balanced filtering that catches the most common problematic substructures. |
extra | Stricter filtering with additional alerts. |
aggressive | Most conservative; rejects anything with a known structural concern. |
disabled | No built-in filtering; only custom_filters apply. |
Custom filters (custom_filters) are any combination of these:
| Filter type | What it does |
|---|---|
lipinski_filter | Lipinski's Rule of Five: set max_mw, max_logp, max_hbd, max_hba. Optional allow_single_violation. |
rdkit_descriptor_filter | Min/max ranges on RDKit descriptors (mol_wt, mol_logp, tpsa, num_h_donors, num_h_acceptors, num_rotatable_bonds, num_heteroatoms, num_aromatic_rings, num_rings, fraction_csp3). Each accepts {min, max}; omitted descriptors are unconstrained. |
smarts_custom_filter | Reject molecules matching any of the provided SMARTS patterns. |
smarts_catalog_filter | Reject molecules matching a named catalog: PAINS, BRENK, the CHEMBL family, NIH, and more. |
smiles_regex_filter | Reject molecules whose SMILES matches any of the provided regex patterns. |
Output format
Section titled “Output format”When you download with run() / start() + client.experiments.download_results() (or the CLI's download-results), results land in a self-contained run directory:
run() and start() + client.experiments.download_results() poll on your behalf, append each result as it's generated, and download its files into a self-contained run directory. Rerun with the same name to resume.
boltz-experiments/└── my-run/ # the name you chose (or an auto-generated one) ├── .boltz-run.json # run + resume state, managed for you (don't edit) ├── run.json # the run object: status, progress, engine (download URLs stripped) └── results/ ├── index.jsonl # the manifest: one JSON record per result └── <result-id>/ ├── archive.tar.gz # the downloaded result archive ├── metadata.json # this result's fields (metrics, sequence/SMILES, …) └── files/ # extracted from the archive ├── metrics.json ├── <result-id>_predicted.cif # predicted structure └── pae.npzresults/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the molecule `smiles`, your `external_id`, all `metrics`, and Tier-1 `adme` properties when available.
{ "id": "<result-id>", "created_at": "2026-02-25T13:03:40Z", "metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 }, "paths": { "archive": "results/<result-id>/archive.tar.gz", "files": "results/<result-id>/files", "metrics": "results/<result-id>/files/metrics.json", "structure": "results/<result-id>/files/<result-id>_predicted.cif", "pae": "results/<result-id>/files/pae.npz" }}Everything is downloaded by default. To keep just the manifest and skip the archives, pass download_mode="metadata_only".
download-results polls on your behalf, appends each result as it's generated, and downloads its files into a self-contained run directory. Rerun with the same --name to resume.
boltz-experiments/└── my-run/ # the name you chose (or an auto-generated one) ├── .boltz-run.json # run + resume state, managed for you (don't edit) ├── run.json # the run object: status, progress, engine (download URLs stripped) └── results/ ├── index.jsonl # the manifest: one JSON record per result └── <result-id>/ ├── archive.tar.gz # the downloaded result archive ├── metadata.json # this result's fields (metrics, sequence/SMILES, …) └── files/ # extracted from the archive ├── metrics.json ├── <result-id>_predicted.cif # predicted structure └── pae.npzresults/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the molecule `smiles`, your `external_id`, all `metrics`, and Tier-1 `adme` properties when available.
{ "id": "<result-id>", "created_at": "2026-02-25T13:03:40Z", "metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 }, "paths": { "archive": "results/<result-id>/archive.tar.gz", "files": "results/<result-id>/files", "metrics": "results/<result-id>/files/metrics.json", "structure": "results/<result-id>/files/<result-id>_predicted.cif", "pae": "results/<result-id>/files/pae.npz" }}Everything is downloaded by default. To keep just the manifest and skip the archives, pass --download-mode metadata_only.
The TypeScript client doesn't download for you; you read the API objects directly (shown below). The result and run-status shapes are identical to what index.jsonl and run.json mirror on disk.
Each result is a scored molecule. This is what list_results() streams (and what each index.jsonl record mirrors):
{
"data": [
{
"id": "sm_scr_result_8f3a2b", # unique result ID
"external_id": "aspirin", # the "id" you gave this molecule in the input, if any
"created_at": "2026-02-25T13:03:40Z",
"smiles": "CC(=O)OC1=CC=CC=C1C(=O)O", # the screened molecule
"metrics": {
"binding_confidence": 0.94, # 0–1; confidence protein binding occurs (affinity probability + structural quality); 0.7+ high-confidence
"optimization_score": 0.53, # 0–1; ranks relative binding strength for lead optimization (normalized); higher is better
"structure_confidence": 0.95, # 0–1; confidence in the predicted structure
"iptm": 0.91, # 0–1; interface predicted TM-score
"ptm": 0.92, # 0–1; global predicted TM-score
"complex_plddt": 0.95, # 0–1; pLDDT across the full complex
"complex_iplddt": 0.88 # 0–1; interface pLDDT
},
"adme": { # optional: Tier-1 ADME properties, when available
"lipophilicity": 2.7,
"permeability": 0.61,
"solubility": "medium-confidence" # high-confidence | medium-confidence | high-risk
},
"artifacts": {
# short-lived presigned download URLs; check url_expires_at and download promptly
"structure": { # predicted bound structure (.cif); may be null until ready
"url": "https://.../structure.cif",
"url_expires_at": "2026-02-25T14:03:40Z"
},
"archive": { # full result archive (.tar.gz): structure, metrics.json, and pae.npz
"url": "https://.../archive.tar.gz",
"url_expires_at": "2026-02-25T14:03:40Z"
}
},
"warnings": [] # optional quality warnings for this result, if any
}
# ...more results on this page
],
"has_more": True, # true if more pages remain
"first_id": "sm_scr_result_8f3a2b", # ID of the first item; pass as before_id for the previous page
"last_id": "sm_scr_result_4ab7e0" # ID of the last item; pass as after_id for the next page
}The run object tracks status and progress. It's what retrieve() returns, and what run.json mirrors:
{
"id": "sm_scr_run_8f3a2b",
"status": "running", # pending | running | succeeded | failed | stopped
"progress": {
"total_molecules_to_screen": 3,
"num_molecules_screened": 1, # scored and available to download so far
"num_molecules_failed": 0, # molecules that errored during scoring
"latest_result_id": "sm_scr_result_8f3a2b"
},
"error": None, # { code, message } once status is "failed"
"pipeline": "boltzmol",
"pipeline_version": "1.0",
"livemode": True, # false for runs created with a test key
"workspace_id": "ws_3a2b",
"created_at": "2026-02-25T12:00:00Z",
"started_at": "2026-02-25T12:00:05Z",
"completed_at": None, # set when the screen finishes
"stopped_at": None, # set if you stop the screen early
"data_deleted_at": None # set once the run's data is deleted
# "input" echoes the request you submitted (null after data deletion)
}Use the API directly
Section titled “Use the API directly”For full control (and the only option in TypeScript, which has no managed download), drive the REST API yourself: poll the run for status and progress, page through results as they're scored (cursor-paginated, so you can read them before the screen finishes), and stop early. See Output format for the object shapes.
import time
screen = client.small_molecule.library_screen.start(target=target, molecules=molecules)
# Poll the run for status and progress.while screen.status not in ("succeeded", "failed", "stopped"): time.sleep(10) screen = client.small_molecule.library_screen.retrieve(screen.id) p = screen.progress print(f"{screen.status}: {p.num_molecules_screened}/{p.total_molecules_to_screen}")
# Page through scored molecules; use external_id to correlate back to your library.# Sort by binding_confidence (most confident binders) or optimization_score (lead optimization).results = list(client.small_molecule.library_screen.list_results(screen.id))results.sort(key=lambda r: r.metrics.binding_confidence, reverse=True)for r in results[:5]: print(f"{r.id} ext={r.external_id} bind={r.metrics.binding_confidence:.2f} opt={r.metrics.optimization_score:.2f} {r.smiles}")
# Stop early once you've collected enough; results already produced stay available.client.small_molecule.library_screen.stop(screen.id)boltz-api small-molecule:library-screen retrieve --id "$SCREEN_ID" # run status + progressboltz-api small-molecule:library-screen list-results --id "$SCREEN_ID" # scored molecules (use external_id to correlate)boltz-api small-molecule:library-screen stop --id "$SCREEN_ID" # stop early; partial results stay available// Reuses the client, target, and molecules from the Run section above.let screen = await client.smallMolecule.libraryScreen.start({ target, molecules });
// Poll until the screen finishes.while (!["succeeded", "failed", "stopped"].includes(screen.status)) { await new Promise((r) => setTimeout(r, 10000)); screen = await client.smallMolecule.libraryScreen.retrieve(screen.id); const p = screen.progress; console.log(`${screen.status}: ${p.num_molecules_screened}/${p.total_molecules_to_screen}`);}
// Stream scored molecules as they arrive; use external_id to correlate back to your library.for await (const result of client.smallMolecule.libraryScreen.listResults(screen.id)) { console.log(`${result.id} ext=${result.external_id} bind=${result.metrics.binding_confidence} ${result.smiles}`);}
// Stop early once you've collected enough; results already produced stay available.await client.smallMolecule.libraryScreen.stop(screen.id);Metrics
Section titled “Metrics”| Metric | Range | What it measures |
|---|---|---|
binding_confidence | 0–1 | Confidence that protein binding occurs, combining affinity probability with structural quality. For triage, 0.7+ is typically the high-confidence range. |
optimization_score | 0–1 | Ranks relative binding strength for lead optimization, normalized 0–1 (higher is better). Use it to prioritize the top-scoring candidates within the same run rather than as a universal pass/fail threshold. |
structure_confidence | 0–1 | Measures the confidence of the predicted structure (0 = low, 1 = high). |
iptm | 0–1 | Interface predicted TM-score. |
ptm | 0–1 | Global predicted TM-score. |
complex_plddt | 0–1 | pLDDT across the full complex. |
complex_iplddt | 0–1 | Interface pLDDT for the complex. |
Status values
Section titled “Status values”| Status | Meaning |
|---|---|
pending | The screen is queued and has not started yet. |
running | The screen is actively scoring molecules. Results may already be available. |
succeeded | All molecules have been screened. |
failed | The screen encountered an error. Check the error field. |
stopped | The screen was stopped early. Partial results are available. |