We have just updated our HerpesFolds database to include proteome-wide structural predictions for all 9 human herpesviruses. We added HSV-2 (strain HG52, UP000001874), VZV (strain Dumas, UP000002602), HHV-6A (isolate U1102, NC_001664), HHV-6B (strain Z29, NC_000898) and HHV-7 (strain RK, NC_001716). Many thanks to Ben Kaufer for helping us with HHV-6A, HHV-6B and HHV-7!
Category Archives: Posts
Herpesfolds updated!
Can’t remember what’s the homolog of your favorite HSV-1 protein in HCMV is? We just updated herpesfolds, giving it much better grouping of related proteins. The table is now also a herpes “cheat sheet”. Check out ultrafast searching through the integrated search box.
Updated AlphaFold predictions for four Herpesviruses
We have just updated our HerpesFold interactive table. The table, which previously contained predictions for HSV-1 strain 17 (UP000009294), HCMV strain Merlin (UP000000938), and KSHV strain GK18 (UP000000942), now also provides predictions for EBV strain B95-8 (UP000153037). This new version is also using an overall better annotation.
Interactive Structure viewers are available again
The interactive structure viewers are fixed again on Herpesfolds. Thanks to David Koes and his awesome 3Dmol.js!
Superfast structural search added using Foldseek!
We just added Foldseek search buttons to our Herpesfolds database.
Just click on the button labeled Foldseek and a new page opens Foldseek and searches structural databases such as the PDB as well as the AlphaFold Protein Structure Database for similar structures!
A big thanks to Milot Mirdita (https://twitter.com/milot_mirdita) as well as the Söding lab and the Steinegger lab for help in implementing it!
Large tegument protein predictions online
Oscar Charles from UCL has run the large tegument proteins with AlphaFold2 which were missing from our database due to their size. Thank you Oscar!
Alphafold predictions of Herpesvirus genes
This table gives interactive access to all Alphafold structures of HSV-1 strain 17+ (Uniprot proteome UP000009294), HCMV strain Merlin (Uniprot proteome UP000000938) as well as KSHV strain GK18 (Uniprot proteome UP000000942) visualized by 3Dmol.js
Please note that the largest tegument protein for each virus (HSV-1 UL36, HCMV UL48, and KSHV ORF64) was not run due to GPU memory restrictions.
For help on evaluating the predictions please check this quick tutorial
HSV-UL37 model
Protein category | Uniprot ID | HSV gene name | HSV protein name | Uniprot ID | HCMV gene name | HCMV protein name | Uniprot ID | KSHV gene name | KSHV protein name |
---|---|---|---|---|---|---|---|---|---|
AN | P04294 | UL12 | F5HF49 | UL98 | Q2HR95 | ORF37 | |||
CEP1 | P10191 | UL7 | F5HA10 | UL103 | F5HAI6 | ORF42 | |||
CEP2 | P10200 | UL16 | F5HAC7 | UL94 | F5HEF2 | ORF33 | |||
CEP3 | P04289 | UL11 | F5HI87 | UL99 | F5HHY1 | ORF38 | |||
CVC1 | P10201 | UL17 | Q6SW50 | UL93 | F5HB39 | ORF32 | |||
CVC2 | P10209 | UL25 | Q6SW65 | UL77 | Q2HRB3 | ORF19 | |||
DNBI | P04296 | UL29 | ICP8 | F5HDQ6 | UL57 | DBP | Q2HRD3 | ORF6 | DBP |
DPOL | P04293 | UL30 | Q6SW77 | UL54 | Q2HRD0 | ORF9 | |||
DUT | P10234 | UL50 | Q6SW70 | UL72 | Q2HR78 | ORF54 | |||
EV45 | P10229 | UL45 | |||||||
gB | P10211 | UL27 | gB | F5HB53 | UL55 | gB | F5HB81 | ORF8 | gB |
gC | P10228 | UL44 | gC | ||||||
gD | Q69091 | US6 | gD | ||||||
gE | P04488 | US8 | gE | ||||||
gG | P06484 | US4 | gG | ||||||
gH | P06477 | UL22 | gH | Q6SW67 | UL75 | gH | F5HAK9 | ORF22 | gH |
gI | P06487 | US7 | gI | ||||||
gJ | P06480 | US5 | gJ | ||||||
gK | P68331 | UL53 | gK | ||||||
gL | P10185 | UL1 | gL | F5HCH8 | UL115 | gL | F5HDB7 | ORF47 | gL |
gM | P04288 | UL10 | gM | Q6SW43 | UL100 | gM | F5HDD0 | ORF39 | gM |
gN | O09800 | UL49.5 | gN | F5HHQ0 | UL73 | gN | F5HFQ0 | ORF53 | gN |
gO | F5HGP1 | UL74 | |||||||
HELI | P10189 | UL5 | F5HEN8 | UL105 | Q2HR89 | ORF44 | |||
HEPA | P10192 | UL8 | F5HIG1 | UL102 | Q2HR92 | ORF40 | |||
ITP | P10221 | UL37 | Q6SW85 | UL47 | F5HEU7 | ORF63 | |||
KITH | P0DTH5 | UL23 | TK | F5HB62 | ORF21 | TK | |||
LTP | P10220 | UL36 | Q6SW84 | UL48 | Q2HR64 | ORF64 | |||
MB43 | P10227 | UL43 | |||||||
MCP | P06491 | UL19 | F5HGT1 | UL86 | Q2HRA7 | ORF25 | |||
NEC1 | P10215 | UL31 | F5HFZ4 | UL53 | F5H982 | ORF69 | |||
NEC2 | P10218 | UL34 | Q6SW81 | UL50 | F5HA27 | ORF67 | |||
NP03 | P10187 | UL3 | |||||||
NP04 | P10188 | UL4 | |||||||
OBP | P10193 | UL9 | |||||||
PAP | P10226 | UL42 | F5HC97 | UL44 | F5HID2 | ORF59 | |||
PORTL | P10190 | UL6 | F5HBR4 | UL104 | F5HGK9 | ORF43 | |||
PRIM | P10236 | UL52 | F5HG51 | UL70 | F5HIN0 | ORF56 | |||
RIR1 | P08543 | UL39 | ICP6 | Q6SW87 | UL45 | Q2HR67 | ORF61 | ||
RIR2 | P10224 | UL40 | F5HAW0 | ORF60 | |||||
RNB | P04487 | US11 | |||||||
SCAF | P10210 | UL26 | Q6SW62 | UL80 | Q2HRB6 | ORF17 | |||
SCP | P10219 | UL35 | F5HEN7 | UL48A | Q2HR63 | ORF65 | |||
SHUT | P10225 | UL41 | |||||||
TEG1 | P10230 | UL46 | |||||||
TEG3 | P04291 | UL14 | |||||||
TEG4 | P10205 | UL21 | |||||||
TEG5 | P10231 | UL47 | |||||||
TEG6 | P10239 | UL55 | |||||||
TEG7 | P10235 | UL51 | F5HEA3 | UL71 | F5H9W9 | ORF55 | |||
TRM1 | P10212 | UL28 | F5HC79 | UL56 | F5H9W4 | ORF7 | |||
TRM2 | P10217 | UL33 | F5HGI9 | UL51 | F5HGB6 | ORF67.5 | |||
TRM3 | P04295 | UL15 | F5HCU8 | UL89 | F5HGB6 | ORF29 | |||
TRX1 | P32888 | UL38 | F5HA93 | UL46 | F5H8Y5 | ORF62 | |||
TRX2 | P10202 | UL18 | F5HIN9 | UL85 | F5HGN8 | ORF26 | |||
UNG | P10186 | UL2 | F5HI85 | UL114 | F5HFA1 | ORF46 | |||
Q2HRD5 | K1 | ||||||||
Q77Q38 | K12 | ||||||||
P0C788 | K14 | OX2V | |||||||
Q9QR69 | K15 | ||||||||
Q2HRC7 | K2 | VIL6 | |||||||
P90495 | K3 | MIR1 | |||||||
Q98157 | K4.1 | VMI2 | |||||||
F5HCJ2 | K4.2 | VCCL3 | |||||||
F5HF36 | K4 | ||||||||
P90489 | K5 | MIR2 | |||||||
F5HET8 | K6 | ||||||||
F5HDA4 | K7 | ||||||||
Q2HR82 | K8 | KBZIP | |||||||
F5HB98 | K8.1 | ||||||||
Q2HRC9 | ORF10 | ||||||||
Q2HRC8 | ORF11 | ||||||||
F5HGJ3 | ORF16 | vBCL2 | |||||||
Q2HRB4 | ORF18 | ||||||||
Q2HRC6 | ORF2 | DYR | |||||||
Q2HRB2 | ORF20 | ||||||||
F5HIM6 | ORF23 | ||||||||
F5HFD2 | ORF24 | ||||||||
F5HDY6 | ORF27 | ||||||||
F5HI25 | ORF28 | ||||||||
F5HES7 | ORF30 | ||||||||
Q2HRA1 | ORF31 | ||||||||
Q2HR98 | ORF34 | ||||||||
F5HCD4 | ORF35 | ||||||||
F5HGH5 | ORF36 | vPK | |||||||
Q2HRD4 | ORF4 | ||||||||
F5HDE4 | ORF45 | ||||||||
Q2HR85 | ORF48 | ||||||||
Q2HR83 | ORF49 | ||||||||
F5HCV3 | ORF50 | ||||||||
Q2HR80 | ORF52 | ||||||||
Q2HR75 | ORF57 | ICP27 | |||||||
F5HAD1 | ORF58 | ||||||||
F5HG20 | ORF66 | ||||||||
F5HF47 | ORF69 | ||||||||
P90463 | ORF70 | TYSY | |||||||
F5HEZ4 | ORF71 | VFLIP | |||||||
Q77Q36 | ORF72 | VCYCL | |||||||
Q9QR71 | ORF73 | LANA1 | |||||||
Q98146 | ORF74 | VGPCR | |||||||
Q9QR70 | ORF75 | ||||||||
F5HF68 | vIRF-1 | ||||||||
Q2HR71 | vIRF-3 | ||||||||
F5HIC6 | vIRF-2 | ||||||||
Q2HR73 | vIRF-4 | ||||||||
Q6SW04 | IRS1 | ||||||||
Q6SWD5 | RL1 | ||||||||
F5HI32 | RL10 | ||||||||
Q6SWD1 | RL11 | ||||||||
Q6SWD0 | RL12 | ||||||||
Q6SWC9 | RL13 | ||||||||
F5HF23 | RL5A | ||||||||
Q6SWD3 | RL6 | ||||||||
F7V995 | RL8A | ||||||||
F7V996 | RL9A | ||||||||
Q6SVX2 | TRS1 | ||||||||
Q6SWC8 | UL1 | ||||||||
Q6SWC0 | UL10 | ||||||||
Q6SWB9 | UL11 | UL11P | |||||||
F5HC71 | UL111A | IL10H | |||||||
Q6SW37 | UL112-113 | EP84 | |||||||
Q6SW34 | UL116 | ||||||||
F5HFA5 | UL117 | ||||||||
F5HC14 | UL119 | ||||||||
Q6SW31 | UL120 | ||||||||
F5HD27 | UL121 | ||||||||
Q6SW29 | UL122 | VIE2 | |||||||
F5HCM1 | UL123 | VIE1 | |||||||
F5HHS3 | UL124 | ||||||||
Q6SWB8 | UL13 | ||||||||
F5HCP3 | UL130 | ||||||||
F5HET4 | UL131A | ||||||||
F5HGU6 | UL132 | ||||||||
Q6SW10 | UL133 | ||||||||
F5HAQ7 | UL135 | ||||||||
F5HF35 | UL136 | ||||||||
F5HGQ8 | UL138 | ||||||||
Q6SW14 | UL139 | ||||||||
Q6SWB7 | UL14 | ||||||||
F5HCK7 | UL140 | ||||||||
Q6RJQ3 | UL141 | ||||||||
F5HHH2 | UL142 | ||||||||
F5HAM0 | UL144 | ||||||||
F5HF44 | UL145 | ||||||||
F5HBX1 | UL146 | CXCL1 | |||||||
F5HA06 | UL147 | ||||||||
F5H8R0 | UL147A | ||||||||
F5H8Q3 | UL148 | ||||||||
F5HE74 | UL148A | ||||||||
F5HAK6 | UL148B | ||||||||
F5HDE7 | UL148C | ||||||||
F5HHL7 | UL148D | ||||||||
Q6SW05 | UL150 | ||||||||
F7V998 | UL150A | ||||||||
F5HAE6 | UL15A | ||||||||
F5HG68 | UL16 | UL16P | |||||||
F5HHT4 | UL17 | ||||||||
F5HFB4 | UL18 | ||||||||
F5HI68 | UL19 | ||||||||
Q6SWC7 | UL2 | ||||||||
F5H9Z4 | UL20 | ||||||||
F5HH39 | UL21A | ||||||||
F5HF90 | UL22A | ||||||||
F5HDM3 | UL23 | ||||||||
F5H9N4 | UL24 | VP22 | |||||||
F5HGJ4 | UL25 | PP85 | |||||||
F5HGG3 | UL26 | ||||||||
Q6SWA4 | UL27 | ||||||||
Q6SWA3 | UL29 | ||||||||
F5HGC2 | UL30 | ||||||||
U3KRG9 | UL30A | ||||||||
Q6SWA0 | UL31 | ||||||||
Q6SW99 | UL32 | PP150 | |||||||
Q6SW98 | UL33 | ||||||||
F5HC16 | UL34 | ||||||||
F5HE12 | UL35 | ||||||||
F5HAY6 | UL36 | VICA | |||||||
Q6SW94 | UL37 | VGLI | |||||||
F5HG98 | UL38 | ||||||||
Q6SWC6 | UL4 | ||||||||
Q6SW92 | UL40 | ||||||||
F5HFG3 | UL41A | ||||||||
F5HHZ3 | UL42 | ||||||||
Q6SW89 | UL43 | ||||||||
Q6SW82 | UL49 | ||||||||
Q6SWC5 | UL5 | ||||||||
Q6SW79 | UL52 | ||||||||
Q6SWC4 | UL6 | ||||||||
Q6SW73 | UL69 | ICP27 | |||||||
Q6SWC3 | UL7 | ||||||||
C1BEG3 | UL74A | ||||||||
Q6SW66 | UL76 | ||||||||
F5HET1 | UL78 | ||||||||
Q6SW63 | UL79 | ||||||||
A0A1P7U1 | UL8 | ||||||||
F5HBC6 | UL82 | PP71 | |||||||
Q6SW59 | UL83 | PP65 | |||||||
F5HB40 | UL84 | ||||||||
Q6SW55 | UL87 | ||||||||
F5H9F9 | UL88 | ||||||||
F5H9T4 | UL9 | ||||||||
F5HFJ8 | UL91 | ||||||||
F5HAS7 | UL92 | ||||||||
Q6SW48 | UL95 | ||||||||
F5H8R6 | UL96 | ||||||||
Q6SW46 | UL97 | ||||||||
Q6SW03 | US1 | ||||||||
F5HFJ7 | US10 | ||||||||
Q6SVZ5 | US11 | ||||||||
F5HE44 | US12 | ||||||||
F5H9I4 | US13 | ||||||||
F5HD92 | US14 | ||||||||
F5HFH0 | US15 | ||||||||
Q6SVZ0 | US16 | ||||||||
F5H9N9 | US17 | ||||||||
F5HE69 | US18 | ||||||||
F5HAR3 | US19 | ||||||||
F5HE05 | US2 | ||||||||
F5HGH8 | US20 | ||||||||
F5HHT6 | US21 | ||||||||
F5HDC7 | US22 | ||||||||
F5HAZ3 | US23 | ||||||||
F5H8S6 | US24 | ||||||||
F5H991 | US26 | ||||||||
F5HDK1 | US27 | ||||||||
F5HF62 | US28 | ||||||||
F5HG95 | US29 | ||||||||
F5HEU0 | US3 | ||||||||
F5HB41 | US30 | ||||||||
F5HAM4 | US31 | ||||||||
F5HD03 | US32 | ||||||||
F7V999 | US33A | ||||||||
F5HEF3 | US34 | ||||||||
Q6SVX3 | US34A | ||||||||
Q6SW00 | US6 | ||||||||
F5HDD3 | US7 | ||||||||
F5HB52 | US8 | ||||||||
F5HC33 | US9 | ||||||||
P36313 | RL1 | ICP34.5 | |||||||
P08393 | RL2 | ICP0 | |||||||
P08392 | RS1 | ICP4 | |||||||
P04290 | UL13 | ||||||||
P10204 | UL20 | ||||||||
P10208 | UL24 | ||||||||
P10216 | UL32 | ||||||||
P06492 | UL48 | VP16 | |||||||
P10233 | UL49 | VP22 | |||||||
P10238 | UL54 | ICP27 | |||||||
P10240 | UL56 | ||||||||
P04413 | US3 | ||||||||
P04485 | US1 | ICP22 | |||||||
P06486 | US10 | ||||||||
P03170 | US12 | ICP47 | |||||||
P06485 | US2 | ||||||||
O09802 | US8.5 | ||||||||
P06481 | US9 |
This is a project in cooperation with the Topf lab at CSSB.
Predictions were run with Colabfold by the Steinegger lab:
- Mirdita M, Ovchinnikov S and Steinegger M. ColabFold – Making protein folding accessible to all. bioRxiv (2021) doi: 10.1101/2021.08.15.456425
As well as Deepmind for Alphafold2:
- Jumper et al. “Highly accurate protein structure prediction with AlphaFold.”
Nature (2021) doi: 10.1038/s41586-021-03819-2
Alphafold predictions of HCMV strain Merlin
This file contains Alphafold2 predictions of all HCMV strain Merlin proteins in Uniprot except UL48 (since it is too big to be run on a 16GB GPU) run with Colabfold.
We thank the Steinegger lab:
- Mirdita M, Ovchinnikov S and Steinegger M. ColabFold – Making protein folding accessible to all. bioRxiv (2021) doi: 10.1101/2021.08.15.456425
As well as Deepmind for Alphafold2:
- Jumper et al. “Highly accurate protein structure prediction with AlphaFold.”
Nature (2021) doi: 10.1038/s41586-021-03819-2
Alphafold KSHV strain GK18 predictions
In cooperation with the Topf lab at CSSB we currently running Alphafold2 on representative herpesvirus proteomes.
This file contains Alphafold2 predictions of all KSHV strain GK18 proteins in Uniprot except ORF64 (since it is too big to be run on a 16GB GPU) run with Colabfold.
We thank the Steinegger lab:
- Mirdita M, Ovchinnikov S and Steinegger M. ColabFold – Making protein folding accessible to all. bioRxiv (2021) doi: 10.1101/2021.08.15.456425
As well as Deepmind for Alphafold2:
- Jumper et al. “Highly accurate protein structure prediction with AlphaFold.”
Nature (2021) doi: 10.1038/s41586-021-03819-2
Evaluating Alphafold predictions
AF gives quality scores for each prediction. A great FAQ can be found at https://alphafold.ebi.ac.uk/faq.
Another great resource is this youtube video by EMBL-EBI:
Also, have a look at this lecture by John Jumper, the research lead of AlphaFold:
Here are just some examples from our HSV-1 predictions:
The first measure is a depiction of the Multiple Sequence Alignment (MSA) that is used as input for the network.
The MSA above from HSV-1 UL55 shows ok coverage of both similar and less similar sequences as well as good coverage for the C-terminus also with less similar sequences.
Comapre the MSA of UL56. It is much less well populated and it does not incorporate many less similar sequences.
Now, let’s look at the resulting structure predictions:
In both cases, the predictions are colored by the pLDDT which is a confidence measure of how well Alphafold “thinks” its prediction is.
Here is an excerpt from the EBI FAQ:
AlphaFold produces a per-residue estimate of its confidence on a scale from 0 – 100 . This confidence measure is called pLDDT and corresponds to the model’s predicted score on the lDDT-Cα metric. It is stored in the B-factor fields of the mmCIF and PDB files available for download (although unlike a B-factor, higher pLDDT is better). pLDDT is also used to colour-code the residues of the model in the 3D structure viewer. The following rules of thumb provide guidance on the expected reliability of a given region:
- Regions with pLDDT > 90 are expected to be modelled to high accuracy. These should be suitable for any application that benefits from high accuracy (e.g. characterising binding sites).
- Regions with pLDDT between 70 and 90 are expected to be modelled well (a generally good backbone prediction).
- Regions with pLDDT between 50 and 70 are low confidence and should be treated with caution.
- The 3D coordinates of regions with pLDDT < 50 often have a ribbon-like appearance and should not be interpreted. We show in our paper that pLDDT < 50 is a reasonably strong predictor of disorder, i.e. it suggests such a region is either unstructured in physiological conditions or only structured as part of a complex.
- Structured domains with many inter-residue contacts are likely to be more reliable than extended linkers or isolated long helices.
- Unphysical bond lengths and clashes do not usually appear in confident regions. Any part of a structure with several of these should be disregarded.
Note that the PDB and mmCIF files contain coordinates for all regions, regardless of their pLDDT score. It is up to the user to interpret the model judiciously, in accordance with the guidance above.
The pLDDT per position is also given as a plot for the five models made in every run and gives a simpler overview:
Note the high overall scores for UL55 and low ones for UL56. The overall low scores for the UL56 prediction should make us cautious.
Finally, the Predicted Alignment Error (PAE) gives an estimate of the relative position of domains. Again an excerpt from the EBI FAQ:
Independent of the 3D structure, AlphaFold produces an output called “Predicted Aligned Error”. This is shown at the bottom of structure pages as an interactive 2D plot.
- The colour at (x, y) indicates AlphaFold’s expected position error at residue x if the predicted and true structures were aligned on residue y.
- If the predicted aligned error is generally low for residue pairs x, y from two different domains, it indicates that AlphaFold predicts well-defined relative positions for them.
- If the predicted aligned error is generally high for residue pairs x, y from two different domains, then the relative positions of these domains in the 3D structure is uncertain and should not be interpreted.
Let’s look at the PAE plots for both UL55 and UL56:
In general, the PAE plot for UL55 looks good. You can see that in models 4 and 5 the position of the C-terminus to most of the protein is uncertain, while it is much better in models 1 to 3.
Now let’s look at UL56 PAEs:
You can immediately see, that the position of most amino acids to each other is unclear in all predictions. The relative position of the predicted alpha-helices should be therefore taken with more than a grain of salt.