Category Archives: Posts

Superfast structural search added using Foldseek!

We just added Foldseek search buttons to our Herpesfolds database.

Just click on the button labeled Foldseek and a new page opens Foldseek and searches structural databases such as the PDB as well as the AlphaFold Protein Structure Database for similar structures!

A big thanks to Milot Mirdita (https://twitter.com/milot_mirdita) as well as the Söding lab and the Steinegger lab for help in implementing it!

Alphafold predictions of Herpesvirus genes

This table gives interactive access to all Alphafold structures of HSV-1 strain 17+ (Uniprot proteome UP000009294), HCMV strain Merlin (Uniprot proteome UP000000938) as well as KSHV strain GK18 (Uniprot proteome UP000000942) visualized by 3Dmol.js

Please note that the largest tegument protein for each virus (HSV-1 UL36, HCMV UL48, and KSHV ORF64) was not run due to GPU memory restrictions.

For help on evaluating the predictions please check this quick tutorial

 

HSV-UL37 model

Protein categoryUniprot IDHSV gene nameHSV protein nameUniprot IDHCMV gene nameHCMV protein nameUniprot IDKSHV gene nameKSHV protein name
ANP04294UL12F5HF49UL98Q2HR95ORF37
CEP1P10191UL7F5HA10UL103F5HAI6ORF42
CEP2P10200UL16F5HAC7UL94F5HEF2ORF33
CEP3P04289UL11F5HI87UL99F5HHY1ORF38
CVC1P10201UL17Q6SW50UL93F5HB39ORF32
CVC2P10209UL25Q6SW65UL77Q2HRB3ORF19
DNBIP04296UL29ICP8F5HDQ6UL57DBPQ2HRD3ORF6DBP
DPOLP04293UL30Q6SW77UL54Q2HRD0ORF9
DUTP10234UL50Q6SW70UL72Q2HR78ORF54
EV45P10229UL45
gBP10211UL27gBF5HB53UL55gBF5HB81ORF8gB
gCP10228UL44gC
gDQ69091US6gD
gEP04488US8gE
gGP06484US4gG
gHP06477UL22gHQ6SW67UL75gHF5HAK9ORF22gH
gIP06487US7gI
gJP06480US5gJ
gKP68331UL53gK
gLP10185UL1gLF5HCH8UL115gLF5HDB7ORF47gL
gMP04288UL10gMQ6SW43UL100gMF5HDD0ORF39gM
gNO09800UL49.5gNF5HHQ0UL73gNF5HFQ0ORF53gN
gOF5HGP1UL74
HELIP10189UL5F5HEN8UL105Q2HR89ORF44
HEPAP10192UL8F5HIG1UL102Q2HR92ORF40
ITPP10221UL37Q6SW85UL47F5HEU7ORF63
KITHP0DTH5UL23TKF5HB62ORF21TK
LTPP10220UL36Q6SW84UL48Q2HR64ORF64
MB43P10227UL43
MCPP06491UL19F5HGT1UL86Q2HRA7ORF25
NEC1P10215UL31F5HFZ4UL53F5H982ORF69
NEC2P10218UL34Q6SW81UL50F5HA27ORF67
NP03P10187UL3
NP04P10188UL4
OBPP10193UL9
PAPP10226UL42F5HC97UL44F5HID2ORF59
PORTLP10190UL6F5HBR4UL104F5HGK9ORF43
PRIMP10236UL52F5HG51UL70F5HIN0ORF56
RIR1P08543UL39ICP6Q6SW87UL45Q2HR67ORF61
RIR2P10224UL40F5HAW0ORF60
RNBP04487US11
SCAFP10210UL26Q6SW62UL80Q2HRB6ORF17
SCPP10219UL35F5HEN7UL48AQ2HR63ORF65
SHUTP10225UL41
TEG1P10230UL46
TEG3P04291UL14
TEG4P10205UL21
TEG5P10231UL47
TEG6P10239UL55
TEG7P10235UL51F5HEA3UL71F5H9W9ORF55
TRM1P10212UL28F5HC79UL56F5H9W4ORF7
TRM2P10217UL33F5HGI9UL51F5HGB6ORF67.5
TRM3P04295UL15F5HCU8UL89F5HGB6ORF29
TRX1P32888UL38F5HA93UL46F5H8Y5ORF62
TRX2P10202UL18F5HIN9UL85F5HGN8ORF26
UNGP10186UL2F5HI85UL114F5HFA1ORF46
Q2HRD5K1
Q77Q38K12
P0C788K14OX2V
Q9QR69K15
Q2HRC7K2VIL6
P90495K3MIR1
Q98157K4.1VMI2
F5HCJ2K4.2VCCL3
F5HF36K4
P90489K5MIR2
F5HET8K6
F5HDA4K7
Q2HR82K8KBZIP
F5HB98K8.1
Q2HRC9ORF10
Q2HRC8ORF11
F5HGJ3ORF16vBCL2
Q2HRB4ORF18
Q2HRC6ORF2DYR
Q2HRB2ORF20
F5HIM6ORF23
F5HFD2ORF24
F5HDY6ORF27
F5HI25ORF28
F5HES7ORF30
Q2HRA1ORF31
Q2HR98ORF34
F5HCD4ORF35
F5HGH5ORF36vPK
Q2HRD4ORF4
F5HDE4ORF45
Q2HR85ORF48
Q2HR83ORF49
F5HCV3ORF50
Q2HR80ORF52
Q2HR75ORF57ICP27
F5HAD1ORF58
F5HG20ORF66
F5HF47ORF69
P90463ORF70TYSY
F5HEZ4ORF71VFLIP
Q77Q36ORF72VCYCL
Q9QR71ORF73LANA1
Q98146ORF74VGPCR
Q9QR70ORF75
F5HF68vIRF-1
Q2HR71vIRF-3
F5HIC6vIRF-2
Q2HR73vIRF-4
Q6SW04IRS1
Q6SWD5RL1
F5HI32RL10
Q6SWD1RL11
Q6SWD0RL12
Q6SWC9RL13
F5HF23RL5A
Q6SWD3RL6
F7V995RL8A
F7V996RL9A
Q6SVX2TRS1
Q6SWC8UL1
Q6SWC0UL10
Q6SWB9UL11UL11P
F5HC71UL111AIL10H
Q6SW37UL112-113EP84
Q6SW34UL116
F5HFA5UL117
F5HC14UL119
Q6SW31UL120
F5HD27UL121
Q6SW29UL122VIE2
F5HCM1UL123VIE1
F5HHS3UL124
Q6SWB8UL13
F5HCP3UL130
F5HET4UL131A
F5HGU6UL132
Q6SW10UL133
F5HAQ7UL135
F5HF35UL136
F5HGQ8UL138
Q6SW14UL139
Q6SWB7UL14
F5HCK7UL140
Q6RJQ3UL141
F5HHH2UL142
F5HAM0UL144
F5HF44UL145
F5HBX1UL146CXCL1
F5HA06UL147
F5H8R0UL147A
F5H8Q3UL148
F5HE74UL148A
F5HAK6UL148B
F5HDE7UL148C
F5HHL7UL148D
Q6SW05UL150
F7V998UL150A
F5HAE6UL15A
F5HG68UL16UL16P
F5HHT4UL17
F5HFB4UL18
F5HI68UL19
Q6SWC7UL2
F5H9Z4UL20
F5HH39UL21A
F5HF90UL22A
F5HDM3UL23
F5H9N4UL24VP22
F5HGJ4UL25PP85
F5HGG3UL26
Q6SWA4UL27
Q6SWA3UL29
F5HGC2UL30
U3KRG9UL30A
Q6SWA0UL31
Q6SW99UL32PP150
Q6SW98UL33
F5HC16UL34
F5HE12UL35
F5HAY6UL36VICA
Q6SW94UL37VGLI
F5HG98UL38
Q6SWC6UL4
Q6SW92UL40
F5HFG3UL41A
F5HHZ3UL42
Q6SW89UL43
Q6SW82UL49
Q6SWC5UL5
Q6SW79UL52
Q6SWC4UL6
Q6SW73UL69ICP27
Q6SWC3UL7
C1BEG3UL74A
Q6SW66UL76
F5HET1UL78
Q6SW63UL79
A0A1P7U1UL8
F5HBC6UL82PP71
Q6SW59UL83PP65
F5HB40UL84
Q6SW55UL87
F5H9F9UL88
F5H9T4UL9
F5HFJ8UL91
F5HAS7UL92
Q6SW48UL95
F5H8R6UL96
Q6SW46UL97
Q6SW03US1
F5HFJ7US10
Q6SVZ5US11
F5HE44US12
F5H9I4US13
F5HD92US14
F5HFH0US15
Q6SVZ0US16
F5H9N9US17
F5HE69US18
F5HAR3US19
F5HE05US2
F5HGH8US20
F5HHT6US21
F5HDC7US22
F5HAZ3US23
F5H8S6US24
F5H991US26
F5HDK1US27
F5HF62US28
F5HG95US29
F5HEU0US3
F5HB41US30
F5HAM4US31
F5HD03US32
F7V999US33A
F5HEF3US34
Q6SVX3US34A
Q6SW00US6
F5HDD3US7
F5HB52US8
F5HC33US9
P36313RL1ICP34.5
P08393RL2ICP0
P08392RS1ICP4
P04290UL13
P10204UL20
P10208UL24
P10216UL32
P06492UL48VP16
P10233UL49VP22
P10238UL54ICP27
P10240UL56
P04413US3
P04485US1ICP22
P06486US10
P03170US12ICP47
P06485US2
O09802US8.5
P06481US9

This is a project in cooperation with the Topf lab at CSSB.

Predictions were run with Colabfold by the Steinegger lab:

  • Mirdita M, Ovchinnikov S and Steinegger M. ColabFold – Making protein folding accessible to all. bioRxiv (2021) doi: 10.1101/2021.08.15.456425

As well as Deepmind for Alphafold2:

Alphafold predictions of HCMV strain Merlin

Alphafold2 prediction of HCMV VP22 – red is better

This file contains Alphafold2 predictions of all HCMV strain Merlin proteins in Uniprot except UL48 (since it is too big to be run on a 16GB GPU) run with Colabfold.

We thank the Steinegger lab:

  • Mirdita M, Ovchinnikov S and Steinegger M. ColabFold – Making protein folding accessible to all. bioRxiv (2021) doi: 10.1101/2021.08.15.456425

As well as Deepmind for Alphafold2:

Alphafold KSHV strain GK18 predictions

ORF75 prediction colored by the pLDDT confidence value (red=better)

In cooperation with the Topf lab at CSSB we currently running Alphafold2 on representative herpesvirus proteomes.

This file contains Alphafold2 predictions of all KSHV strain GK18 proteins in Uniprot except ORF64 (since it is too big to be run on a 16GB GPU) run with Colabfold.

We thank the Steinegger lab:

  • Mirdita M, Ovchinnikov S and Steinegger M. ColabFold – Making protein folding accessible to all. bioRxiv (2021) doi: 10.1101/2021.08.15.456425

As well as Deepmind for Alphafold2:

Evaluating Alphafold predictions

AF gives quality scores for each prediction. A great FAQ can be found at https://alphafold.ebi.ac.uk/faq.

Another great resource is this youtube video by EMBL-EBI:

 Also, have a look at this lecture by John Jumper, the research lead of AlphaFold:

Here are just some examples from our HSV-1 predictions:

The first measure is a depiction of the Multiple Sequence Alignment (MSA) that is used as input for the network.

HSV-1 UL55 MSA

The MSA above from HSV-1 UL55 shows ok coverage of both similar and less similar sequences as well as good coverage for the C-terminus also with less similar sequences.

HSV-1 UL56 MSA

Comapre the MSA of UL56. It is much less well populated and it does not incorporate many less similar sequences.

Now, let’s look at the resulting structure predictions:

UL55 prediction
UL56 prediction

In both cases, the predictions are colored by the pLDDT which is a confidence measure of how well Alphafold “thinks” its prediction is.

Here is an excerpt from the EBI FAQ:

AlphaFold produces a per-residue estimate of its confidence on a scale from 0 – 100 . This confidence measure is called pLDDT and corresponds to the model’s predicted score on the lDDT-Cα metric. It is stored in the B-factor fields of the mmCIF and PDB files available for download (although unlike a B-factor, higher pLDDT is better). pLDDT is also used to colour-code the residues of the model in the 3D structure viewer. The following rules of thumb provide guidance on the expected reliability of a given region:

  • Regions with pLDDT > 90 are expected to be modelled to high accuracy. These should be suitable for any application that benefits from high accuracy (e.g. characterising binding sites). 
  • Regions with pLDDT between 70 and 90 are expected to be modelled well (a generally good backbone prediction). 
  • Regions with pLDDT between 50 and 70 are low confidence and should be treated with caution. 
  • The 3D coordinates of regions with pLDDT < 50 often have a ribbon-like appearance and should not be interpreted. We show in our paper that pLDDT < 50 is a reasonably strong predictor of disorder, i.e. it suggests such a region is either unstructured in physiological conditions or only structured as part of a complex. 
  • Structured domains with many inter-residue contacts are likely to be more reliable than extended linkers or isolated long helices. 
  • Unphysical bond lengths and clashes do not usually appear in confident regions. Any part of a structure with several of these should be disregarded.

Note that the PDB and mmCIF files contain coordinates for all regions, regardless of their pLDDT score. It is up to the user to interpret the model judiciously, in accordance with the guidance above.

The pLDDT per position is also given as a plot for the five models made in every run and gives a simpler overview:

UL55 pLDDT plot, note the higher score at the C-terminus for models 1-3
UL55 pLDDT plot

Note the high overall scores for UL55 and low ones for UL56. The overall low scores for the UL56 prediction should make us cautious.

Finally, the Predicted Alignment Error (PAE) gives an estimate of the relative position of domains. Again an excerpt from the EBI FAQ:

Independent of the 3D structure, AlphaFold produces an output called “Predicted Aligned Error”. This is shown at the bottom of structure pages as an interactive 2D plot.

  • The colour at (x, y) indicates AlphaFold’s expected position error at residue x if the predicted and true structures were aligned on residue y. 
  • If the predicted aligned error is generally low for residue pairs x, y from two different domains, it indicates that AlphaFold predicts well-defined relative positions for them. 
  • If the predicted aligned error is generally high for residue pairs x, y from two different domains, then the relative positions of these domains in the 3D structure is uncertain and should not be interpreted. 

Let’s look at the PAE plots for both UL55 and UL56:

UL55 PAE scores for 5 models. Blue is better

In general, the PAE plot for UL55 looks good. You can see that in models 4 and 5 the position of the C-terminus to most of the protein is uncertain, while it is much better in models 1 to 3.

Now let’s look at UL56 PAEs:

UL56 PAE scores for 5 models

You can immediately see, that the position of most amino acids to each other is unclear in all predictions. The relative position of the predicted alpha-helices should be therefore taken with more than a grain of salt.

 

Visualizing Alphafold predictions

To visualise the structure predictions, unzip the main file and then unzip the file corresponding to your gene of interest. Inside, you will find 5 .pdb files corresponding to five AF2 predictions ranked by their score. Rank1 is generally the one to look at first.

You can upload the .pdb file to 3D View to explore the 3D structure online or use standalone software like PyMOL or Chimera.

Alphafold HSV-1 strain 17+ predictions

UL55 prediction colored by the pLDDT confidence value (red=better)

In cooperation with the Topf lab at CSSB we currently running Alphafold2 on representative herpesvirus proteomes.

This file contains Alphafold2 predictions of all HSV-1 strain 17+ proteins in Uniprot except UL36 (since it is too big to be run on a 16GB GPU) run with Colabfold.

We thank the Steinegger lab:

  • Mirdita M, Ovchinnikov S and Steinegger M. ColabFold – Making protein folding accessible to all. bioRxiv (2021) doi: 10.1101/2021.08.15.456425

As well as Deepmind for Alphafold2: