Core Concepts
Key concepts for working with the Boltz API — entities, metrics, binding, file inputs, constraints, modifications, and bonds.
Entities and chain IDs
Section titled “Entities and chain IDs”Molecular systems are described as a list of entities, each with a type and value field, along with one or more chain IDs. The supported entity types are:
- Protein — Amino acid sequence (single-letter codes) in the
valuefield. Supportsmodificationsandcyclicoptions. - RNA — Ribonucleotide sequence in the
valuefield. - DNA — Deoxyribonucleotide sequence in the
valuefield. - Ligand (SMILES) — Small molecule defined by a SMILES string in the
valuefield. - Ligand (CCD) — Small molecule defined by a CCD code in the
valuefield.
Chain IDs are used throughout the API to reference specific chains in constraints, bonds, binding configuration, and results.
Metrics
Section titled “Metrics”The API returns several confidence and quality metrics with prediction and screening results:
| Metric | What it measures |
|---|---|
| pTM (predicted TM-score) | Global predicted fold quality for the complex (0–1, higher is better). For single-chain inputs, pTM (not ipTM) drives the confidence ranking. |
| ipTM (interface predicted TM-score) | Confidence in the relative positioning of chains across interfaces (0–1, higher is better). Variants protein_iptm and ligand_iptm restrict to protein–protein and protein–ligand interfaces. |
| pLDDT (predicted Local Distance Difference Test) | Per-residue confidence in the local structure, as normalized 0–1 floats (higher is better): complex_plddt averaged over the complex and complex_iplddt with interface residues up-weighted. |
| PAE (Predicted Aligned Error) | Expected positional error between residue pairs, in ångström (lower is better), when the structure is aligned on one residue's frame. For protein-binder designs the minimum at the binder–target interface is surfaced as min_interaction_pae. |
| PDE (Predicted Distance Error) | Expected error in the distance between residue pairs, in ångström (lower is better). Reported as complex_pde across the complex and complex_ipde at the interface. |
| Structure confidence | Measures the confidence of the predicted structure (0 = low, 1 = high). It is a composite (≈ 0.8 × complex_plddt + 0.2 × ipTM; pTM for single chains) that also orders the returned samples. Usually high, so use it as a quality/sanity filter rather than the primary ranking key. |
| Binding confidence | Confidence that protein binding occurs, combining affinity probability with structural quality (0–1). For triage, 0.7+ is typically the high-confidence range (computed when binding is requested). |
| Optimization score | Ranks relative binding strength for lead optimization, normalized 0–1 (higher is better). Use it to prioritize the top-scoring candidates within the same run rather than as a universal pass/fail threshold (computed when binding is requested). |
Binding
Section titled “Binding”Binding configuration tells the model to compute binding metrics for the prediction. There are two binding types:
- Ligand-protein binding (
ligand_protein_binding) — Specify abinder_chain_idpointing to a ligand chain. The ligand must have exactly one copy (single chain ID) and the complex must contain only ligands and proteins. - Protein-protein binding (
protein_protein_binding) — Specifybinder_chain_idspointing to one or more protein chains.
When binding is provided, the prediction output includes binding metrics (binding_confidence and optimization_score) in addition to structural results.
File inputs
Section titled “File inputs”The API accepts file inputs in two formats:
- URL — Provide a publicly accessible URL to the file.
- Base64 — Provide the file contents as a Base64-encoded string, along with a
media_type(e.g.,chemical/x-cif).
Constraints
Section titled “Constraints”Constraints guide predictions by specifying spatial relationships. There are two constraint types:
- Pocket constraints — Define a binding pocket by specifying a
binder_chain_idandcontact_residues(a mapping of chain IDs to arrays of 0-based residue indices). Includes amax_distance_angstromparameter. - Contact constraints — Require two tokens to be within a maximum distance. Tokens can be:
polymer_contact— Identifies a residue on a polymer chain (chain ID + residue index).ligand_contact— Identifies an atom on a ligand chain (chain ID + atom name).
All residue indices in constraints are 0-indexed.
Modifications
Section titled “Modifications”Modifications can be applied to residues in protein, RNA, and DNA entities:
- CCD modifications — Reference a modification by its CCD code at a specific residue index. SMILES-based custom residue modifications are not currently supported.
Bonds are separate from constraints and define covalent bonds between specific atoms. Each bond specifies two atoms via atom1 and atom2, where each atom reference includes a chain_id, residue_index, and atom_name.