Skip to content
Go to Boltz API

Screen small molecule libraries

Score your own small molecules against a protein target, fetch results as they arrive, and stop early if needed.

A small molecule library screen scores molecules you provide against a protein target, evaluating each for binding confidence, optimization score, and structure confidence. Results stream in as molecules are scored, so you can fetch them before the screen finishes and stop early.

run() submits the screen, waits while molecules are scored, and downloads scored results to a local directory. Use start() + client.experiments.download_results() to submit now and download later; download_results() resumes if the download is interrupted.

import os
from boltz_api import Boltz
client = Boltz(api_key=os.environ["BOLTZ_API_KEY"])
target = {
"entities": [{"type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"]}],
} # see Input format for pocket_residues, reference_ligands, molecule_filters, …
molecules = [
{"smiles": "CC(=O)OC1=CC=CC=C1C(=O)O", "id": "aspirin"},
{"smiles": "C1=CC=C(C=C1)O", "id": "phenol"},
{"smiles": "CC1=CC=CC=C1"},
] # see Input format for ids and filtering
# One call: submit, wait, and download results to a run directory.
run_dir = client.small_molecule.library_screen.run(target=target, molecules=molecules, name="my-screen")
# ...or submit now and download later:
screen = client.small_molecule.library_screen.start(target=target, molecules=molecules)
run_dir = client.experiments.download_results(id=screen.id, name="my-screen") # rerun to resume an interrupted download

A screen takes a target to score against, the molecules to screen, and optional filtering.

The example below is a complete, valid input you can copy. The deep-dive sections explain each part, and Run shows the run() / start() calls that consume it.

{
  "target": {
    "entities": [
      # protein chains only; at least one
      { "type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"] }
    ],
    "pocket_residues": { "A": [2, 3, 4, 7, 8, 9] }, # optional: keyed by chain ID; pocket residues (0-indexed); omit to auto-detect
    "reference_ligands": ["CC(=O)Oc1ccccc1C(=O)O"], # optional: known-binder SMILES that help locate the pocket
    "constraints": [ # optional: guide the geometry
      {
        "type": "pocket", # keep the binder near a set of receptor residues
        "binder_chain_id": "L", # chain ID the pipeline assigns to the screened molecule
        "contact_residues": { "A": [2, 3, 4, 7, 8, 9] },
        "max_distance_angstrom": 6.0
      }
    ]
  },
  "molecules": [
    # your library to screen; each needs "smiles". optional "id" is returned as
    # "external_id" on the matching result so you can correlate back to your library
    { "smiles": "CC(=O)OC1=CC=CC=C1C(=O)O", "id": "aspirin" },
    { "smiles": "C1=CC=C(C=C1)O", "id": "phenol" },
    { "smiles": "CC1=CC=CC=C1" }
  ],
  "molecule_filters": {
    "boltz_smarts_catalog_filter_level": "recommended", # recommended | extra | aggressive | disabled
    "custom_filters": [
      # any combination; a molecule must pass all of them (AND logic). one of each type shown:
      {
        "type": "lipinski_filter", # Rule of Five
        "max_mw": 500,
        "max_logp": 5,
        "max_hbd": 5,
        "max_hba": 10,
        "allow_single_violation": False # optional: allow one rule to fail
      },
      {
        "type": "rdkit_descriptor_filter", # min/max on RDKit descriptors; include only the ones you want to bound
        "mol_wt": { "min": 150, "max": 500 },
        "mol_logp": { "max": 5 },
        "tpsa": { "max": 140 },
        "num_h_donors": { "max": 5 },
        "num_h_acceptors": { "max": 10 },
        "num_rotatable_bonds": { "max": 10 },
        "num_heteroatoms": { "max": 12 },
        "num_aromatic_rings": { "min": 1, "max": 4 },
        "num_rings": { "max": 6 },
        "fraction_csp3": { "min": 0.2 }
      },
      {
        "type": "smarts_custom_filter", # reject molecules matching any of these SMARTS
        "patterns": ["[N+](=O)[O-]", "C(=O)Cl"]
      },
      {
        "type": "smarts_catalog_filter", # reject by a named alert catalog
        "catalog": "PAINS" # PAINS | PAINS_A | PAINS_B | PAINS_C | BRENK | CHEMBL | CHEMBL_BMS | CHEMBL_Dundee | CHEMBL_Glaxo | CHEMBL_Inpharmatica | CHEMBL_LINT | CHEMBL_MLSMR | CHEMBL_SureChEMBL | NIH
      },
      {
        "type": "smiles_regex_filter", # reject molecules whose SMILES matches any of these regexes
        "patterns": ["P", "S(=O)(=O)Cl"]
      }
    ]
  }
}
FieldRequiredWhat it isLink
targetYesThe protein and binding pocket to score against.Target
moleculesYesThe molecules you want to screen.Molecules
molecule_filtersNoWhich molecules pass through to results.Molecular filters

The target is the protein you're screening against. List its entities (protein chains only), then optionally point the pipeline at the binding pocket:

  • Pocket residues: pocket_residues maps chain ID to the 0-indexed residues that line the pocket. Omit it and the pipeline auto-detects the pocket.
  • Reference ligands: reference_ligands are SMILES of known binders that help the pipeline locate the right pocket. When omitted, a set of drug-like default ligands is used for pocket detection.
  • Constraints and bonds: constraints (pocket and contact) and bonds (covalent links) add finer geometric control. See Core Concepts.

molecules is the library you want to score, given inline as an array. Each entry needs a smiles string and may carry an optional id. When you provide an id, it comes back as external_id on the matching result, so you can correlate results to your input library. Molecules that fail filtering are skipped and don't appear in results.

Filters control which molecules reach your results. They combine with AND logic: a molecule must pass every filter.

Built-in alerts (boltz_smarts_catalog_filter_level) tune Boltz's curated structural-alert filtering, encoding substructures known to cause toxicity, reactivity, or poor pharmacokinetics:

LevelBehavior
recommended (default)Balanced filtering that catches the most common problematic substructures.
extraStricter filtering with additional alerts.
aggressiveMost conservative; rejects anything with a known structural concern.
disabledNo built-in filtering; only custom_filters apply.

Custom filters (custom_filters) are any combination of these:

Filter typeWhat it does
lipinski_filterLipinski's Rule of Five: set max_mw, max_logp, max_hbd, max_hba. Optional allow_single_violation.
rdkit_descriptor_filterMin/max ranges on RDKit descriptors (mol_wt, mol_logp, tpsa, num_h_donors, num_h_acceptors, num_rotatable_bonds, num_heteroatoms, num_aromatic_rings, num_rings, fraction_csp3). Each accepts {min, max}; omitted descriptors are unconstrained.
smarts_custom_filterReject molecules matching any of the provided SMARTS patterns.
smarts_catalog_filterReject molecules matching a named catalog: PAINS, BRENK, the CHEMBL family, NIH, and more.
smiles_regex_filterReject molecules whose SMILES matches any of the provided regex patterns.

When you download with run() / start() + client.experiments.download_results() (or the CLI's download-results), results land in a self-contained run directory:

run() and start() + client.experiments.download_results() poll on your behalf, append each result as it's generated, and download its files into a self-contained run directory. Rerun with the same name to resume.

boltz-experiments/<name>/
boltz-experiments/
└── my-run/ # the name you chose (or an auto-generated one)
├── .boltz-run.json # run + resume state, managed for you (don't edit)
├── run.json # the run object: status, progress, engine (download URLs stripped)
└── results/
├── index.jsonl # the manifest: one JSON record per result
└── <result-id>/
├── archive.tar.gz # the downloaded result archive
├── metadata.json # this result's fields (metrics, sequence/SMILES, …)
└── files/ # extracted from the archive
├── metrics.json
├── <result-id>_predicted.cif # predicted structure
└── pae.npz

results/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the molecule `smiles`, your `external_id`, all `metrics`, and Tier-1 `adme` properties when available.

one results/index.jsonl record (pretty-printed; the file stores one per line)
{
"id": "<result-id>",
"created_at": "2026-02-25T13:03:40Z",
"metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 },
"paths": {
"archive": "results/<result-id>/archive.tar.gz",
"files": "results/<result-id>/files",
"metrics": "results/<result-id>/files/metrics.json",
"structure": "results/<result-id>/files/<result-id>_predicted.cif",
"pae": "results/<result-id>/files/pae.npz"
}
}

Everything is downloaded by default. To keep just the manifest and skip the archives, pass download_mode="metadata_only".

Each result is a scored molecule. This is what list_results() streams (and what each index.jsonl record mirrors):

{
  "data": [
    {
      "id": "sm_scr_result_8f3a2b", # unique result ID
      "external_id": "aspirin", # the "id" you gave this molecule in the input, if any
      "created_at": "2026-02-25T13:03:40Z",
      "smiles": "CC(=O)OC1=CC=CC=C1C(=O)O", # the screened molecule
      "metrics": {
        "binding_confidence": 0.94, # 0–1; confidence protein binding occurs (affinity probability + structural quality); 0.7+ high-confidence
        "optimization_score": 0.53, # 0–1; ranks relative binding strength for lead optimization (normalized); higher is better
        "structure_confidence": 0.95, # 0–1; confidence in the predicted structure
        "iptm": 0.91, # 0–1; interface predicted TM-score
        "ptm": 0.92, # 0–1; global predicted TM-score
        "complex_plddt": 0.95, # 0–1; pLDDT across the full complex
        "complex_iplddt": 0.88 # 0–1; interface pLDDT
      },
      "adme": { # optional: Tier-1 ADME properties, when available
        "lipophilicity": 2.7,
        "permeability": 0.61,
        "solubility": "medium-confidence" # high-confidence | medium-confidence | high-risk
      },
      "artifacts": {
        # short-lived presigned download URLs; check url_expires_at and download promptly
        "structure": { # predicted bound structure (.cif); may be null until ready
          "url": "https://.../structure.cif",
          "url_expires_at": "2026-02-25T14:03:40Z"
        },
        "archive": { # full result archive (.tar.gz): structure, metrics.json, and pae.npz
          "url": "https://.../archive.tar.gz",
          "url_expires_at": "2026-02-25T14:03:40Z"
        }
      },
      "warnings": [] # optional quality warnings for this result, if any
    }
    # ...more results on this page
  ],
  "has_more": True, # true if more pages remain
  "first_id": "sm_scr_result_8f3a2b", # ID of the first item; pass as before_id for the previous page
  "last_id": "sm_scr_result_4ab7e0" # ID of the last item; pass as after_id for the next page
}

The run object tracks status and progress. It's what retrieve() returns, and what run.json mirrors:

{
  "id": "sm_scr_run_8f3a2b",
  "status": "running", # pending | running | succeeded | failed | stopped
  "progress": {
    "total_molecules_to_screen": 3,
    "num_molecules_screened": 1, # scored and available to download so far
    "num_molecules_failed": 0, # molecules that errored during scoring
    "latest_result_id": "sm_scr_result_8f3a2b"
  },
  "error": None, # { code, message } once status is "failed"
  "pipeline": "boltzmol",
  "pipeline_version": "1.0",
  "livemode": True, # false for runs created with a test key
  "workspace_id": "ws_3a2b",
  "created_at": "2026-02-25T12:00:00Z",
  "started_at": "2026-02-25T12:00:05Z",
  "completed_at": None, # set when the screen finishes
  "stopped_at": None, # set if you stop the screen early
  "data_deleted_at": None # set once the run's data is deleted
  # "input" echoes the request you submitted (null after data deletion)
}

For full control (and the only option in TypeScript, which has no managed download), drive the REST API yourself: poll the run for status and progress, page through results as they're scored (cursor-paginated, so you can read them before the screen finishes), and stop early. See Output format for the object shapes.

import time
screen = client.small_molecule.library_screen.start(target=target, molecules=molecules)
# Poll the run for status and progress.
while screen.status not in ("succeeded", "failed", "stopped"):
time.sleep(10)
screen = client.small_molecule.library_screen.retrieve(screen.id)
p = screen.progress
print(f"{screen.status}: {p.num_molecules_screened}/{p.total_molecules_to_screen}")
# Page through scored molecules; use external_id to correlate back to your library.
# Sort by binding_confidence (most confident binders) or optimization_score (lead optimization).
results = list(client.small_molecule.library_screen.list_results(screen.id))
results.sort(key=lambda r: r.metrics.binding_confidence, reverse=True)
for r in results[:5]:
print(f"{r.id} ext={r.external_id} bind={r.metrics.binding_confidence:.2f} opt={r.metrics.optimization_score:.2f} {r.smiles}")
# Stop early once you've collected enough; results already produced stay available.
client.small_molecule.library_screen.stop(screen.id)
MetricRangeWhat it measures
binding_confidence0–1Confidence that protein binding occurs, combining affinity probability with structural quality. For triage, 0.7+ is typically the high-confidence range.
optimization_score0–1Ranks relative binding strength for lead optimization, normalized 0–1 (higher is better). Use it to prioritize the top-scoring candidates within the same run rather than as a universal pass/fail threshold.
structure_confidence0–1Measures the confidence of the predicted structure (0 = low, 1 = high).
iptm0–1Interface predicted TM-score.
ptm0–1Global predicted TM-score.
complex_plddt0–1pLDDT across the full complex.
complex_iplddt0–1Interface pLDDT for the complex.
StatusMeaning
pendingThe screen is queued and has not started yet.
runningThe screen is actively scoring molecules. Results may already be available.
succeededAll molecules have been screened.
failedThe screen encountered an error. Check the error field.
stoppedThe screen was stopped early. Partial results are available.