Screen protein libraries
Score your own protein sequences against a target, fetch results as they arrive, and stop early if needed.
A protein library screen scores protein complexes you provide against a target, evaluating each for binding confidence, structure confidence, and secondary-structure composition. Results stream in as proteins are scored, so you can fetch them before the screen finishes and stop early.
run() submits the screen, waits while proteins are scored, and downloads scored results to a local directory. Use start() + client.experiments.download_results() to submit now and download later; download_results() resumes if the download is interrupted.
import osfrom boltz_api import Boltz
client = Boltz(api_key=os.environ["BOLTZ_API_KEY"])
target = { "type": "no_template", "entities": [{"type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"]}], "epitope_residues": {"A": [10, 11, 12]},} # see Input format for structure templates, non-binding residues, constraints, …
proteins = [ {"entities": [{"type": "protein", "value": "MKTAYIVKSHFSRQ", "chain_ids": ["B"]}], "id": "binder-001"}, {"entities": [{"type": "protein", "value": "ACDEFGHIKLMNPQRSTVWY", "chain_ids": ["B"]}], "id": "binder-002"},]
# One call: submit, wait, and download results to a run directory.run_dir = client.protein.library_screen.run(target=target, proteins=proteins, name="my-screen")
# ...or submit now and download later:screen = client.protein.library_screen.start(target=target, proteins=proteins)run_dir = client.experiments.download_results(id=screen.id, name="my-screen") # rerun to resume an interrupted downloadWrite your input to protein-screen.yaml (see Input format), then:
SCREEN_ID=$( boltz-api --format raw protein:library-screen start \ --input @yaml://./protein-screen.yaml | jq -r '.id')
# download-results polls and downloads on your behalf; rerun with the same --name to resume.boltz-api download-results --id "$SCREEN_ID" --name my-screenThe TypeScript client drives the REST API directly. Submit with start(), then poll and read results yourself (see Use the API directly).
import Boltz from "boltz-api";
const client = new Boltz({ apiKey: process.env["BOLTZ_API_KEY"] });
const target = { type: "no_template", entities: [{ type: "protein", value: "MKTIIALSYIFCLVFA", chain_ids: ["A"] }], epitope_residues: { A: [10, 11, 12] },}; // see Input format for structure templates, non-binding residues, …
const proteins = [ { entities: [{ type: "protein", value: "MKTAYIVKSHFSRQ", chain_ids: ["B"] }], id: "binder-001" }, { entities: [{ type: "protein", value: "ACDEFGHIKLMNPQRSTVWY", chain_ids: ["B"] }], id: "binder-002" },];
const screen = await client.protein.libraryScreen.start({ target, proteins });Input format
Section titled “Input format”A screen pairs a target to score against with the proteins you want to screen. Use No template if you only have a sequence, or Structure template if you have a 3D structure to score against.
Toggle the option below to choose how you provide the target, then copy it. The deep-dive sections explain each part, and Run shows the run() / start() calls that consume it.
{
"target": {
"type": "no_template",
"entities": [
# entity types: protein | rna | dna | ligand_smiles | ligand_ccd (at least one)
{ "type": "protein", "value": "MKTIIALSYIFCLVFA", "chain_ids": ["A"] },
{ "type": "ligand_ccd", "value": "ATP", "chain_ids": ["L1"] }
],
"epitope_residues": { "A": [10, 11, 12] }, # optional: keyed by chain ID; residues the binders should contact (0-indexed)
"non_binding_residues": { "A": [0, 1, 2] }, # optional: keyed by chain ID; residues to keep binders away from
"epitope_ligand_chains": ["L1"] # optional: ligand chain IDs that form part of the epitope
# also optional: "constraints" and "bonds" (see Core Concepts)
},
"proteins": [
# your library to screen; each is a complex of "entities" with an optional "id".
# the "id" is returned as "external_id" so you can correlate results to your library
{ "entities": [ { "type": "protein", "value": "MKTAYIVKSHFSRQ", "chain_ids": ["B"] } ], "id": "binder-001" },
{ "entities": [ { "type": "protein", "value": "ACDEFGHIKLMNPQRSTVWY", "chain_ids": ["B"] } ], "id": "binder-002" }
]
}| Field | Required | What it is | Link |
|---|---|---|---|
target | Yes | The molecule or complex you're screening against. | Target |
proteins | Yes | The protein complexes you want to score. | Proteins |
Target (target)
Section titled “Target (target)”The target is the molecule or complex you're screening against: one or more proteins, optionally with nucleic acids or ligands. The type field picks how you provide it.
No template (type: "no_template")
Section titled “No template (type: "no_template")”Use this when you only have sequences. List the target's entities (proteins, RNA, DNA, or ligands by SMILES or CCD code) and the pipeline assembles the complex without a reference structure.
You then shape where the binders engage:
- Epitope:
epitope_residuesmarks the residues you want binders to contact, keyed by chain ID (0-indexed). To put a whole ligand in the epitope, list its chain inepitope_ligand_chains. - Non-binding residues:
non_binding_residuesmarks residues to steer binders away from. They can't overlap the epitope on the same chain. - Constraints and bonds:
constraints(pocket and contact) andbonds(covalent links) give finer geometric control. See Core Concepts.
Structure template (type: "structure_template")
Section titled “Structure template (type: "structure_template")”Use this when you have a 3D structure. Provide the CIF file as structure (base64-encoded or a URL) and choose the chains to use in chain_selection. Only the chains you list are included; anything else in the file is ignored.
For each polymer chain you select:
- Crop:
crop_residueschooses which residues to keep ("all", or a list of 0-indexed positions). - Epitope, non-binding, flexible: the same epitope and non-binding concepts as above, plus
flexible_residues, the residues allowed to move during scoring. Every index must fall within the cropped set.
Ligand chains are given as { "chain_type": "ligand" } and are always kept whole.
Proteins (proteins)
Section titled “Proteins (proteins)”proteins is the library you want to score, given inline as an array. Each entry is a complex defined by an entities array (the chains that make up that candidate) plus an optional id. When you provide an id, it comes back as external_id on the matching result, so you can correlate results to your input library.
Output format
Section titled “Output format”When you download with run() / start() + client.experiments.download_results() (or the CLI's download-results), results land in a self-contained run directory:
run() and start() + client.experiments.download_results() poll on your behalf, append each result as it's generated, and download its files into a self-contained run directory. Rerun with the same name to resume.
boltz-experiments/└── my-run/ # the name you chose (or an auto-generated one) ├── .boltz-run.json # run + resume state, managed for you (don't edit) ├── run.json # the run object: status, progress, engine (download URLs stripped) └── results/ ├── index.jsonl # the manifest: one JSON record per result └── <result-id>/ ├── archive.tar.gz # the downloaded result archive ├── metadata.json # this result's fields (metrics, sequence/SMILES, …) └── files/ # extracted from the archive ├── metrics.json ├── <result-id>_predicted.cif # predicted structure └── pae.npzresults/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the scored complex `entities` (target chain(s) plus the screened protein), your `external_id`, and all `metrics`.
{ "id": "<result-id>", "created_at": "2026-02-25T13:03:40Z", "metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 }, "paths": { "archive": "results/<result-id>/archive.tar.gz", "files": "results/<result-id>/files", "metrics": "results/<result-id>/files/metrics.json", "structure": "results/<result-id>/files/<result-id>_predicted.cif", "pae": "results/<result-id>/files/pae.npz" }}Everything is downloaded by default. To keep just the manifest and skip the archives, pass download_mode="metadata_only".
download-results polls on your behalf, appends each result as it's generated, and downloads its files into a self-contained run directory. Rerun with the same --name to resume.
boltz-experiments/└── my-run/ # the name you chose (or an auto-generated one) ├── .boltz-run.json # run + resume state, managed for you (don't edit) ├── run.json # the run object: status, progress, engine (download URLs stripped) └── results/ ├── index.jsonl # the manifest: one JSON record per result └── <result-id>/ ├── archive.tar.gz # the downloaded result archive ├── metadata.json # this result's fields (metrics, sequence/SMILES, …) └── files/ # extracted from the archive ├── metrics.json ├── <result-id>_predicted.cif # predicted structure └── pae.npzresults/index.jsonl is what you read to triage a run: one compact JSON record per result, appended as results arrive. Each record mirrors the API result minus its artifacts (those are short-lived download URLs), and adds a paths map pointing at the files downloaded for that result. Each record also carries the scored complex `entities` (target chain(s) plus the screened protein), your `external_id`, and all `metrics`.
{ "id": "<result-id>", "created_at": "2026-02-25T13:03:40Z", "metrics": { "binding_confidence": 0.94, "structure_confidence": 0.95 }, "paths": { "archive": "results/<result-id>/archive.tar.gz", "files": "results/<result-id>/files", "metrics": "results/<result-id>/files/metrics.json", "structure": "results/<result-id>/files/<result-id>_predicted.cif", "pae": "results/<result-id>/files/pae.npz" }}Everything is downloaded by default. To keep just the manifest and skip the archives, pass --download-mode metadata_only.
The TypeScript client doesn't download for you; you read the API objects directly (shown below). The result and run-status shapes are identical to what index.jsonl and run.json mirror on disk.
Each result is a scored protein. This is what list_results() streams (and what each index.jsonl record mirrors):
{
"data": [
{
"id": "prot_scr_result_8f3a2b", # unique result ID
"external_id": "binder-001", # the "id" you gave this protein in the input, if any
"created_at": "2026-02-25T13:03:40Z",
"entities": [
# the scored complex: the target chain(s) plus the screened protein
{ "type": "protein", "chain_ids": ["A"], "value": "MKTIIALSYIFCLVFA" },
{ "type": "protein", "chain_ids": ["B"], "value": "MKTAYIVKSHFSRQ" }
],
"metrics": {
"binding_confidence": 0.88, # 0–1; confidence protein binding occurs (affinity probability + structural quality); 0.7+ high-confidence
"structure_confidence": 0.91, # 0–1; confidence in the predicted structure
"iptm": 0.86, # 0–1; interface predicted TM-score
"min_interaction_pae": 4.9, # Ångström; interface error, lower is better
"helix_fraction": 0.74, # 0–1; fraction of the screened protein in alpha helices
"sheet_fraction": 0.0, # 0–1; fraction in beta sheets
"loop_fraction": 0.26 # 0–1; fraction in coil/loop regions
},
"artifacts": {
# short-lived presigned download URLs; check url_expires_at and download promptly
"structure": { # predicted bound structure (.cif); may be null until ready
"url": "https://.../structure.cif",
"url_expires_at": "2026-02-25T14:03:40Z"
},
"archive": { # full result archive (.tar.gz): structure, metrics.json, and pae.npz
"url": "https://.../archive.tar.gz",
"url_expires_at": "2026-02-25T14:03:40Z"
}
},
"warnings": [] # optional quality warnings for this result, if any
}
# ...more results on this page
],
"has_more": True, # true if more pages remain
"first_id": "prot_scr_result_8f3a2b", # ID of the first item; pass as before_id for the previous page
"last_id": "prot_scr_result_4ab7e0" # ID of the last item; pass as after_id for the next page
}The run object tracks status and progress. It's what retrieve() returns, and what run.json mirrors:
{
"id": "prot_scr_run_8f3a2b",
"status": "running", # pending | running | succeeded | failed | stopped
"progress": {
"total_proteins_to_screen": 2,
"num_proteins_screened": 1, # scored and available to download so far
"num_proteins_failed": 0, # proteins that errored during scoring
"latest_result_id": "prot_scr_result_8f3a2b"
},
"error": None, # { code, message } once status is "failed"
"pipeline": "boltzprot",
"pipeline_version": "1.0",
"livemode": True, # false for runs created with a test key
"workspace_id": "ws_3a2b",
"created_at": "2026-02-25T12:00:00Z",
"started_at": "2026-02-25T12:00:05Z",
"completed_at": None, # set when the screen finishes
"stopped_at": None, # set if you stop the screen early
"data_deleted_at": None # set once the run's data is deleted
# "input" echoes the request you submitted (null after data deletion)
}Use the API directly
Section titled “Use the API directly”For full control (and the only option in TypeScript, which has no managed download), drive the REST API yourself: poll the run for status, page through results as they're scored (cursor-paginated, so you can read them before the screen finishes), and stop early. See Output format for the object shapes.
import time
screen = client.protein.library_screen.start(target=target, proteins=proteins)
# Poll the run for status and progress.while screen.status not in ("succeeded", "failed", "stopped"): time.sleep(10) screen = client.protein.library_screen.retrieve(screen.id) p = screen.progress print(f"{screen.status}: {p.num_proteins_screened}/{p.total_proteins_to_screen}")
BINDER_CHAIN = "B" # the screened protein's chain
def binder_sequence(result): return next(e.value for e in result.entities if BINDER_CHAIN in e.chain_ids)
# Best first: highest binding confidence, then lowest interface error.# Use external_id to correlate each result back to the protein you submitted.results = list(client.protein.library_screen.list_results(screen.id))results.sort(key=lambda r: (-r.metrics.binding_confidence, r.metrics.min_interaction_pae))for r in results[:5]: print( f"{r.id} " f"ext={r.external_id} " f"bind={r.metrics.binding_confidence:.2f} " f"iPAE={r.metrics.min_interaction_pae:.1f}Å " f"{binder_sequence(r)}" )
# Stop early once you've collected enough; results already produced stay available.client.protein.library_screen.stop(screen.id)boltz-api protein:library-screen retrieve --id "$SCREEN_ID" # run status + progressboltz-api protein:library-screen list-results --id "$SCREEN_ID" # scored proteins (paginated)boltz-api protein:library-screen stop --id "$SCREEN_ID" # stop early; partial results stay available// Reuses the client, target, and proteins from the Run section above.let screen = await client.protein.libraryScreen.start({ target, proteins });
// Poll until the screen finishes.while (!["succeeded", "failed", "stopped"].includes(screen.status)) { await new Promise((r) => setTimeout(r, 10000)); screen = await client.protein.libraryScreen.retrieve(screen.id); const p = screen.progress; console.log(`${screen.status}: ${p.num_proteins_screened}/${p.total_proteins_to_screen}`);}
// Stream scored proteins as they arrive (cursor-paginated).// Use external_id to correlate each result back to your library.for await (const result of client.protein.libraryScreen.listResults(screen.id)) { console.log( `${result.id} ext=${result.external_id} bind=${result.metrics.binding_confidence} iPAE=${result.metrics.min_interaction_pae}`, );}
// Stop early once you've collected enough; results already produced stay available.await client.protein.libraryScreen.stop(screen.id);Metrics
Section titled “Metrics”| Metric | Range | What it measures |
|---|---|---|
binding_confidence | 0–1 | Confidence that protein binding occurs, combining affinity probability with structural quality. For triage, 0.7+ is typically the high-confidence range. |
structure_confidence | 0–1 | Measures the confidence of the predicted structure (0 = low, 1 = high). |
iptm | 0–1 | Interface predicted TM-score. |
min_interaction_pae | Ångström | Minimum predicted aligned error at the interface. Lower is better. |
helix_fraction | 0–1 | Fraction of the screened protein in alpha helices. |
sheet_fraction | 0–1 | Fraction in beta sheets. |
loop_fraction | 0–1 | Fraction in coil/loop regions. |
Status values
Section titled “Status values”| Status | Meaning |
|---|---|
pending | The screen is queued and has not started yet. |
running | The screen is actively scoring proteins. Results may already be available. |
succeeded | All proteins have been screened. |
failed | The screen encountered an error. Check the error field. |
stopped | The screen was stopped early. Partial results are available. |