PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 50%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761212
19771123
1978326
1979228
1980230
1981737
19821855
1983560
19841171
1985879
1986887
1987996
198819115
198933148
199030178
199137215
199251266
1993154420
1994327747
19952641,011
19962981,309
19974571,766
19985382,304
19997173,021
20008193,840
20018494,689
20029125,601
200312926,893
200418258,718
2005206810,786
2006232313,109
2007250515,614
2008233817,952
2009238620,338
2010239222,730
2011213524,865
2012227327,138
2013237029,508
2014276732,275
2015245734,732
2016268037,412
2017281040,222
2018278943,011
2019300846,019
2020360349,622
2021286352,485
2022321155,696