PDB Statistics: Number of Unique Protein Sequence Clusters within Released PDB Structures (Annual) at identity 30%, 70%, 95%, and 100%

This chart shows the total number of unique protein sequence clusters in PDB structures released within each year. Unique sequence clusters are defined using the sequence identity (SI). A value of 95% SI is used by default, since for practical purposes this identifies proteins with few differences, and thus can be considered to be the same sequence. The chart can also be redrawn with a more stringent 100% SI criterion, or with lower SI values that identify unique protein families (70% SI) or unique protein folds (30% SI).
The annual numbers of protein sequences may be displayed by specific levels of SI, either separately in different charts or simultaneously in the same chart.

Chart is currently loading

Sequence cluster level:

(Click the label to toggle the display of a bar.)

Year30% identity70% identity95% identity100% identity
197611131313
197714151515
19783333
19794577
19808888
198110101111
198221232323
198314161919
198413141415
198516181818
198614151515
198717171717
198828343550
198946535858
1990637375103
1991758387137
1992869499125
1993243279326481
1994471574658896
1995433505556701
1996505585635779
19977108439101130
1998895107112051557
19991184140015451899
20001332154516772049
20011448167318062192
20021609183719772330
20032132248926663143
20042710315533693991
20052888348537044247
20063317394441954930
20073639435446855559
20083648426745915384
20093737442147765618
20103826451248835836
20113740439947875790
20123905465551186284
20134082495455486871
20144735593167538141
20154532552263407659
20164965605469268489
20175308657876839122
20185465663376639305
201958537187843310239
202066398169975911728
20214780571370558492