PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761111
19771122
1978325
1979126
1980228
1981634
19821650
1983454
1984963
1985568
1986775
1987782
19881597
198923120
199026146
199132178
199241219
1993124343
1994253596
1995205801
19962371,038
19973361,374
19983921,766
19995302,296
20006312,927
20016243,551
20026534,204
20038615,065
200412676,332
200512787,610
200614449,054
2007157910,633
2008151312,146
2009151213,658
2010146415,122
2011126116,383
2012138717,770
2013142619,196
2014170020,896
2015151322,409
2016164624,055
2017176825,823
2018178627,609
2019185829,467
2020226431,731
2021183933,570
2022229635,866
2023208937,955