Core Concepts

Concepts

Key concepts for working with the Boltz API — entities, metrics, binding, file inputs, constraints, modifications, and bonds.

Entities and chain IDs

Molecular systems are described as a list of entities, each with a type and value field, along with one or more chain IDs. The supported entity types are:

Protein — Amino acid sequence (single-letter codes) in the value field. Supports modifications and cyclic options.
RNA — Ribonucleotide sequence in the value field.
DNA — Deoxyribonucleotide sequence in the value field.
Ligand (SMILES) — Small molecule defined by a SMILES string in the value field.
Ligand (CCD) — Small molecule defined by a CCD code in the value field.

Chain IDs are used throughout the API to reference specific chains in constraints, bonds, binding configuration, and results.

Metrics

The API returns several confidence and quality metrics with prediction and screening results:

Metric	What it measures
pTM (predicted TM-score)	Overall predicted structural similarity to the true structure. Higher is better.
ipTM (interface predicted TM-score)	Confidence in the predicted interface between chains. Variants include `protein_iptm` and `ligand_iptm` for specific interaction types.
pLDDT (predicted Local Distance Difference Test)	Per-residue confidence in the predicted structure. Reported as normalized 0–1 floats via `complex_plddt` across the complex and `complex_iplddt` for the interface.
PDE (Predicted Distance Error)	Expected positional error between residue pairs. Reported as `complex_pde` and `complex_ipde` for the interface.
Structure confidence	Overall confidence score for the predicted structure.
Binding confidence	Confidence that binding occurs (when binding is requested).
Optimization score	Binding strength ranking score, useful for lead optimization (when binding is requested).

Binding

Binding configuration tells the model to compute binding metrics for the prediction. There are two binding types:

Ligand-protein binding (ligand_protein_binding) — Specify a binder_chain_id pointing to a ligand chain. The ligand must have exactly one copy (single chain ID) and the complex must contain only ligands and proteins.
Protein-protein binding (protein_protein_binding) — Specify binder_chain_ids pointing to one or more protein chains.

When binding is provided, the prediction output includes binding metrics (binding_confidence and optimization_score) in addition to structural results.

File inputs

The API accepts file inputs in two formats:

URL — Provide a publicly accessible URL to the file.
Base64 — Provide the file contents as a Base64-encoded string, along with a media_type (e.g., chemical/x-cif).

Constraints

Constraints guide predictions by specifying spatial relationships. There are two constraint types:

Pocket constraints — Define a binding pocket by specifying a binder_chain_id and contact_residues (a mapping of chain IDs to arrays of 0-based residue indices). Includes a max_distance_angstrom parameter.
Contact constraints — Require two tokens to be within a maximum distance. Tokens can be:
- polymer_contact — Identifies a residue on a polymer chain (chain ID + residue index).
- ligand_contact — Identifies an atom on a ligand chain (chain ID + atom name).

All residue indices in constraints are 0-indexed.

Modifications

Modifications can be applied to residues in protein, RNA, and DNA entities:

CCD modifications — Reference a modification by its CCD code at a specific residue index.
SMILES modifications — Define a custom modification using a SMILES string at a specific residue index.

Bonds

Bonds are separate from constraints and define covalent bonds between specific atoms. Each bond specifies two atoms via atom1 and atom2, where each atom reference includes a chain_id, residue_index, and atom_name.