News

Better Management of PDB Archive with File Versioning and Revision History

08/01 wwPDB News

As announced on May 17, 2017, wwPDB will introduce a file versioning system to retain depositor-initiated updates of previously released coordinate entries. A new FTP repository will host versioned files. Versions will be separated into major and minor updates. Updates to atomic coordinates, polymer sequence or chemical description in a PDB coordinate file will trigger a major version increment. Other changes will be classified as minor. All latest major versions of each PDB structure will be retained in the new FTP archive.

wwPDB will deliver versioned files in two phases:

  • Phase 1 (October 2017), we will release the new versioned FTP archive at ftp://ftp-version.wwpdb.org for structural model files in PDBx/mmCIF and PDBML formats.
  • Phase 2 will be released in 2018 and will support depositor-initiated updates of coordinates in PDBx/mmCIF and PDBML formats.

File names in the versioned FTP archive will conform to a new naming scheme, which allows users to easily see the major and minor version number:

<PDB_ID>_<content_type>_v<major_version>-<minor_version>.<file_format_type>.<file_compression_type>

The familiar 4 character PDB accession code will be extended to 8 characters and will include the prefix “pdb”. Thus PDB accession code for entry 1abc would become pdb_00001abc.

For example, the first initial release of PDB entry 1abc would have the following form under the new file-naming scheme: pdb_00001abc_xyz_v1-0.cif.gz

where xyz stands for coordinate content; cif indicates the file format; and gz indicates a zipped UNIX archive file.

The first minor revision of PDB entry 1abc would then have the following name:

pdb_00001abc_xyz_v1-1.cif.gz

If PDB entry 1abc then had a major update, it would have the following name : pdb_00001abc_xyz_v2-0.cif.gz (N.B.: The minor update number will be reset to zero every time a new major update is made.)

The versioned data files for a particular entry will be stored in single directory following a 2 character hash from the penultimate two characters of the PDB code:

../pub/pdb_versioned/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>

For example, major version 1 with minor version 2 file for entry 1ABC would have the following path:

../pub/pdb_versioned/data/entries/ab/pdb_00001abc/pdb_00001abc_xyz_v1-2.cif.gz

Different views of the repository will be provided for the most common use cases as a convenience for repository users. For 2017 phase 1, views by content type and format similar to the current repository will be introduced. All latest major versions are included.

../pub/pdb_versioned/views/<content_type>/<file_format_type>/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>

For example, the coordinate files in mmCIF format for entry 1ABC will be made available at

../pub/pdb_versioned/views/coordinates/mmcif/ab/pdb_00001abc/pdb_00001abc_xyz_v1-2.cif.gz

../pub/pdb_versioned/views/coordinates/mmcif/ab/pdb_00001abc/pdb_00001abc_xyz_v2-0.cif.gz

Data files in the current archive location ftp://ftp.wwpdb.org/pub/pdb/data/structures/ will continue to use the familiar naming style and will contain only the latest version for every entry.

News Index