Skip to content
Go to Boltz API

Core Concepts

Key concepts for working with the Boltz API — entities, metrics, binding, file inputs, constraints, modifications, and bonds.

Molecular systems are described as a list of entities, each with a type and value field, along with one or more chain IDs. The supported entity types are:

  • Protein — Amino acid sequence (single-letter codes) in the value field. Supports modifications and cyclic options.
  • RNA — Ribonucleotide sequence in the value field.
  • DNA — Deoxyribonucleotide sequence in the value field.
  • Ligand (SMILES) — Small molecule defined by a SMILES string in the value field.
  • Ligand (CCD) — Small molecule defined by a CCD code in the value field.

Chain IDs are used throughout the API to reference specific chains in constraints, bonds, binding configuration, and results.

The API returns several confidence and quality metrics with prediction and screening results:

MetricWhat it measures
pTM (predicted TM-score)Overall predicted structural similarity to the true structure. Higher is better.
ipTM (interface predicted TM-score)Confidence in the predicted interface between chains. Variants include protein_iptm and ligand_iptm for specific interaction types.
pLDDT (predicted Local Distance Difference Test)Per-residue confidence in the predicted structure. Reported as normalized 0–1 floats via complex_plddt across the complex and complex_iplddt for the interface.
PDE (Predicted Distance Error)Expected positional error between residue pairs. Reported as complex_pde and complex_ipde for the interface.
Structure confidenceOverall confidence score for the predicted structure.
Binding confidenceConfidence that binding occurs (when binding is requested).
Optimization scoreBinding strength ranking score, useful for lead optimization (when binding is requested).

Binding configuration tells the model to compute binding metrics for the prediction. There are two binding types:

  • Ligand-protein binding (ligand_protein_binding) — Specify a binder_chain_id pointing to a ligand chain. The ligand must have exactly one copy (single chain ID) and the complex must contain only ligands and proteins.
  • Protein-protein binding (protein_protein_binding) — Specify binder_chain_ids pointing to one or more protein chains.

When binding is provided, the prediction output includes binding metrics (binding_confidence and optimization_score) in addition to structural results.

The API accepts file inputs in two formats:

  • URL — Provide a publicly accessible URL to the file.
  • Base64 — Provide the file contents as a Base64-encoded string, along with a media_type (e.g., chemical/x-cif).

Constraints guide predictions by specifying spatial relationships. There are two constraint types:

  • Pocket constraints — Define a binding pocket by specifying a binder_chain_id and contact_residues (a mapping of chain IDs to arrays of 0-based residue indices). Includes a max_distance_angstrom parameter.
  • Contact constraints — Require two tokens to be within a maximum distance. Tokens can be:
    • polymer_contact — Identifies a residue on a polymer chain (chain ID + residue index).
    • ligand_contact — Identifies an atom on a ligand chain (chain ID + atom name).

All residue indices in constraints are 0-indexed.

Modifications can be applied to residues in protein, RNA, and DNA entities:

  • CCD modifications — Reference a modification by its CCD code at a specific residue index.
  • SMILES modifications — Define a custom modification using a SMILES string at a specific residue index.

Bonds are separate from constraints and define covalent bonds between specific atoms. Each bond specifies two atoms via atom1 and atom2, where each atom reference includes a chain_id, residue_index, and atom_name.