PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 30%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761111
19771122
1978325
1979126
1980228
1981735
19821752
1983557
1984966
1985773
1986780
1987787
198816103
198927130
199027157
199135192
199246238
1993131369
1994281650
1995227877
19962611,138
19973891,527
19984541,981
19996102,591
20007253,316
20017244,040
20027714,811
200310455,856
200414837,339
200515658,904
2006180110,705
2007195612,661
2008187514,536
2009187016,406
2010187218,278
2011166419,942
2012176021,702
2013183623,538
2014218425,722
2015198127,703
2016212029,823
2017227132,094
2018228834,382
2019239936,781
2020287539,656
2021233541,991
2022257544,566