File Download Services
The PDB archive
Searches and reports performed on this RCSB PDB website utilize data from the PDB archive. The PDB archive is maintained by the wwPDB at the main archive, files.wwpdb.org (data download details) and the versioned archive, files-versioned.wwpdb.org (versioning details).
Protocols: HTTPS and FTP
The HTTPS protocol offers more flexibility and less overhead and is strongly recommended for usage in scripts and third party software. HTTPS links for this service are described below. The FTP protocol will be gradually phased out.
RCSB PDB additionally hosts the archive as part of the Registry of Open Data on Amazon Web Services (AWS) at https://s3.rcsb.org following the same directory structure.
Automated download of data
The RCSB PDB also provides rsync capabilities, useful when maintaining full copies of the archive. These are 2 example scripts to assist in the automated download of data with rsync:
- ftp://snapshots.rcsb.org/rsyncSnapshots.sh To make a local copy of an annual snapshot or sections of the snapshot. This script is annotated to assist in downloading only sections of the archive.
- https://files.wwpdb.org/pub/pdb/software/rsyncPDB.sh To copy the current contents of the entire archive
Additional information on obtaining and maintaining copies of the entire PDB archive or certain portions of it is available at https://www.wwpdb.org/downloads.html.
Major directories in the PDB archive
The directory pub/pdb is the entry directory for the ftp site.
Some general notes:
- Entry files are date-stamped to show the date they were released
- Entries are grouped by the middle two characters of the 4-character PDB identifier. For example, entry file pdb100d.ent can be found in pub/pdb/data/structures/divided/pdb/00/pdb100d.ent.gz
- The two letter naming convention for structure holdings is retained for the directories within /pub/pdb/data/structures/divided and /pub/pdb/data/structures/divided/obsolete but not for the directories within /pub/pdb/data/structures/all, which contain the structure holdings in undivided layout.
- PDB entries are available in PDB, mmCIF, and PDBML/XML format.
- Only UNIX compressed files are supported for coordinates, structure factors, and restraints.
For information about large structures that cannot be represented in the legacy PDB file format see here.
|/pub/pdb/data/assemblies/mmCIF||Biological assembly coordinate files in mmCIF format|
|/pub/pdb/data/biounit/PDB||Biological assembly coordinate files in PDB format|
|/pub/pdb/data/monomers||PDB Chemical Component Dictionary and other info on monomers|
|/pub/pdb/data/status||Details of entries on hold and in processing|
|/pub/pdb/data/structures/all||Analogous to the divided directory, containing pdb, mmCIF, nmr_restraint, and structure_factors directories, with symbolic links to files in the divided subdirectories. In the ./all directory, files are not divided into two-letter directories, however.|
|/pub/pdb/data/structures/divided||This is the entry point for a user finding a structure. This directory contains the current PDB, in pdb, mmCIF, XML, nmr_restraint, and structure_factors directories, with the files divided according to a two letter organization. Entries are grouped by the middle two characters of the ident code. For example, entry file pdb1abc.ent can be found in pub/pdb/data/structures/divided/pdb/ab|
|/pub/pdb/data/structures/models||Theoretical model files that are maintained separately from the main archive|
|/pub/pdb/data/structures/obsolete||Structures and associated data files no longer part of the archive|
|/pub/pdb/derived_data||Plain text files that list information derived from all PDB entries, such as all PDB sequences in FASTA format.|
|/pub/pdb/doc||Documentation, including file format descriptions and RCSB PDB Newsletters|
Other downloads offered by RCSB PDB
Some of the http links above are also available in a short style (e.g. /download/4hhb.cif.gz). Additionally, for the short style links 2 URLs are available:
- view: The HTTP/HTTPS response headers to the client are set with: Content-Type: text/plain
- download: The HTTP/HTTPS response headers to the client are set with: Content-Type: application/octet-stream and Content-Transfer-Encoding: binary
PDB entry files
PDB entry files are available in several file formats (PDB, PDBx/mmCIF, XML, BinaryCIF), compressed or uncompressed, and with an option to download a file containing only "header" information (summary data, no coordinates).
Small molecule files
Small molecule files, including the ligands/chemical components maintained in the Chemical Component Dictionary and the Biologically Interesting Molecule Reference Dictionary (BIRD) are available in multiple formats.
Experimental data files and 3DEM maps
This table includes structure factors, NMR constraints, chemical shifts, electron density maps and map coefficient files.
|File Format||Action||Storage Compression||Example URL|
|NMR Restraints v2||Download||Compressed||https://files.rcsb.org/download/108d_mr.str.gz|
|NMR Restraints v2||Download||Uncompressed||https://files.rcsb.org/download/108d_mr.str|
|NMR Restraints v2||View||Uncompressed||https://files.rcsb.org/view/108d_mr.str|
|Electron Density 2Fo-Fc Map (with fixed grid spacing at "high resolution/3") - DSN6 format||Download||Uncompressed||https://edmaps.rcsb.org/maps/6dil_2fofc.dsn6|
|Electron Density Fo-Fc Map (with variable grid spacing from "high resolution/3" to "high resolution/2") - DSN6 format||Download||Uncompressed||https://edmaps.rcsb.org/maps/6dil_fofc.dsn6|
|Electron Density Map Coefficients (MTZ format)||Download||Uncompressed||https://edmaps.rcsb.org/coefficients/6dil.mtz|
|Electron Density 2Fo-Fc & Fo-Fc Map (might be downsampled) - BinaryCIF format||Download||Uncompressed||https://maps.rcsb.org/x-ray/6dil/cell/|
Sequence data in FASTA format (full deposited sequence as in SEQRES records).
Please note that the FASTA download service at URL
/pdb/download/downloadFastaFiles.do?structureIdList=4hhb&compressionType=uncompressedhas been discontinued. Users will need to migrate to the new endpoints below. Note that the output of the new endpoints are per entity (with chain identifiers provided in header) instead of per chain.
|FASTA sequences per PDB entry||Download||Uncompressed||/fasta/entry/4HHB/download|
|FASTA sequence per polymer entity (identified by ||Download||Uncompressed||/fasta/entity/4HHB_1/download|
|FASTA sequence per polymer entity instance (chain) (identified by ||Download||Uncompressed||/fasta/chain/4HHB.A/download|
Sequence clusters data
Results of the weekly clustering of protein sequences in the PDB by MMseqs2 at 30%, 40%, 50%, 70%, 90%, 95%, and 100% sequence identity. Note that these files use polymer entity identifiers, instead of chain identifiers to avoid redundancy. The files are plain text with one cluster per line, sorted from largest cluster to smallest.
|Sequence clusters at <identity> % sequence identity clustering||Download||Uncompressed||https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-<identity>.txt|
PDB id holdings data in json format. For more information, see the data API documentation.
|All current PDB ids||Download||Uncompressed||https://data.rcsb.org/rest/v1/holdings/current/entry_ids|
|All unreleased PDB ids||Download||Uncompressed||https://data.rcsb.org/rest/v1/holdings/unreleased/entry_ids|
|All removed PDB ids (obsoleted entries or theoretical models)||Download||Uncompressed||https://data.rcsb.org/rest/v1/holdings/removed/entry_ids|
Chemical Component Dictionary (CCD) Data
A subset of properties is provided for all components from the Chemical Component Dictionary (CCD) which describes chemical properties of all molecules in the PDB archive. The atom file (cca.bcif) provides the following CIF columns:
pdbx_stereo_config. The bond file (ccb.bcif) provides the following CIF columns:
This data can be used by the Mol* ModelServer.
|Chemical Component Atom Data||BinaryCIF||Download||https://models.rcsb.org/cca.bcif|
|Chemical Component Bond Data||BinaryCIF||Download||https://models.rcsb.org/ccb.bcif|