Skip to content
Go to Boltz API

Core Concepts

Key concepts for working with the Boltz API — entities, metrics, binding, file inputs, constraints, modifications, and bonds.

Molecular systems are described as a list of entities, each with a type and value field, along with one or more chain IDs. The supported entity types are:

  • Protein — Amino acid sequence (single-letter codes) in the value field. Supports modifications and cyclic options.
  • RNA — Ribonucleotide sequence in the value field.
  • DNA — Deoxyribonucleotide sequence in the value field.
  • Ligand (SMILES) — Small molecule defined by a SMILES string in the value field.
  • Ligand (CCD) — Small molecule defined by a CCD code in the value field.

Chain IDs are used throughout the API to reference specific chains in constraints, bonds, binding configuration, and results.

The API returns several confidence and quality metrics with prediction and screening results:

MetricWhat it measures
pTM (predicted TM-score)Global predicted fold quality for the complex (0–1, higher is better). For single-chain inputs, pTM (not ipTM) drives the confidence ranking.
ipTM (interface predicted TM-score)Confidence in the relative positioning of chains across interfaces (0–1, higher is better). Variants protein_iptm and ligand_iptm restrict to protein–protein and protein–ligand interfaces.
pLDDT (predicted Local Distance Difference Test)Per-residue confidence in the local structure, as normalized 0–1 floats (higher is better): complex_plddt averaged over the complex and complex_iplddt with interface residues up-weighted.
PAE (Predicted Aligned Error)Expected positional error between residue pairs, in ångström (lower is better), when the structure is aligned on one residue's frame. For protein-binder designs the minimum at the binder–target interface is surfaced as min_interaction_pae.
PDE (Predicted Distance Error)Expected error in the distance between residue pairs, in ångström (lower is better). Reported as complex_pde across the complex and complex_ipde at the interface.
Structure confidenceMeasures the confidence of the predicted structure (0 = low, 1 = high). It is a composite (≈ 0.8 × complex_plddt + 0.2 × ipTM; pTM for single chains) that also orders the returned samples. Usually high, so use it as a quality/sanity filter rather than the primary ranking key.
Binding confidenceConfidence that protein binding occurs, combining affinity probability with structural quality (0–1). For triage, 0.7+ is typically the high-confidence range (computed when binding is requested).
Optimization scoreRanks relative binding strength for lead optimization, normalized 0–1 (higher is better). Use it to prioritize the top-scoring candidates within the same run rather than as a universal pass/fail threshold (computed when binding is requested).

Binding configuration tells the model to compute binding metrics for the prediction. There are two binding types:

  • Ligand-protein binding (ligand_protein_binding) — Specify a binder_chain_id pointing to a ligand chain. The ligand must have exactly one copy (single chain ID) and the complex must contain only ligands and proteins.
  • Protein-protein binding (protein_protein_binding) — Specify binder_chain_ids pointing to one or more protein chains.

When binding is provided, the prediction output includes binding metrics (binding_confidence and optimization_score) in addition to structural results.

The API accepts file inputs in two formats:

  • URL — Provide a publicly accessible URL to the file.
  • Base64 — Provide the file contents as a Base64-encoded string, along with a media_type (e.g., chemical/x-cif).

Constraints guide predictions by specifying spatial relationships. There are two constraint types:

  • Pocket constraints — Define a binding pocket by specifying a binder_chain_id and contact_residues (a mapping of chain IDs to arrays of 0-based residue indices). Includes a max_distance_angstrom parameter.
  • Contact constraints — Require two tokens to be within a maximum distance. Tokens can be:
    • polymer_contact — Identifies a residue on a polymer chain (chain ID + residue index).
    • ligand_contact — Identifies an atom on a ligand chain (chain ID + atom name).

All residue indices in constraints are 0-indexed.

Modifications can be applied to residues in protein, RNA, and DNA entities:

  • CCD modifications — Reference a modification by its CCD code at a specific residue index. SMILES-based custom residue modifications are not currently supported.

Bonds are separate from constraints and define covalent bonds between specific atoms. Each bond specifies two atoms via atom1 and atom2, where each atom reference includes a chain_id, residue_index, and atom_name.