Double-Stranded RNA as a Not-Self Alarm Signal: 

To Evade, Most Viruses Purine-Load their RNAs, but Some (HTLV-1, Epstein-Barr) Pyrimidine-Load

A. D. CRISTILLO1, J. R. MORTIMER, I. H, BARRETTE, T. P. LILLICRAP AND D. R. FORSDYKE

(2001) Journal of Theoretical Biology 208, 475-489

Copyright Academic Press under conditions set out on Forsdyke homepage

Anthony Cristillo

colorb02.gif (1462 bytes)

1. Introduction

2. Chargaff Difference Analysis

3. Extremes of Positive and Negative Purine-Loading

4. Extreme Purine-Loading Indices in Retroviruses

5. Extreme Purine-Loading Indices in Herpes Viruses

6. Gene Encoding the Major Latency Transcript Obeys the Rule

7. Simple Sequence Repeats Reinforce Compliance with Szybalski's Rule

8. Disruption of Host Traffic

9. The Role of Simple-Sequence Repeats in Latency-Associated Gene Products

10. Messenger RNAs as "Antibodies"

11. Charge cluster domains decrease immunogenicity of other domains?

12. Low Stem-Loop Potential of the Gly-Ala Repeat-Encoding Region

13. A Role for Non-Genic DNA?

End Note 2001 on low-complexity regions in the Malaria parasite

End Note May 2002 on large scale non-genic transcription

End Note Jan 2006 on autoimmunity

End Note June 2006 on trinucleotide expansions

End Note Feb 2008 on Samuel Karlin

End Note July 2008. Support for our hypothesis

End_Note_Nov_2010._Natures_experiment

End_Note_Nov_2010._Low_GC%_and_MHC_Immune_Surveillance

End_Note_Oct_2012._Mature_Proteins_needed_for_MHC-peptide_Presentation

End_Note_Dec_2012._An_EBNA-1_Analog_in_HTLV-1?

End_Note_Dec_2012._Natures_Experiment_Formalized

End_Note_Feb_2013._GC-rich_Genes_more_Immunogenic

End_Note_Nov_2013._Again,_Mature_Proteins_needed_for_MHC-peptide_Presentation

End_Note_Mar_2014._G-Quadruplexes_in_R-Loaded_EBNA-1_mRNA_Loops

End_Note_Mar_2015._R-Loaded_Viral_mRNA_Loops

End_Note_July_2017._If_G-Quadruplexes,_Why_So_Many_Adenines

End_Note_Aug_2019._Huntington_Initiation_not_at_Protein_Level

End_Note_Aug_2020._Amino_acids_as_placeholders

colorb02.gif (1462 bytes)

Abstract

For double stranded RNA (dsRNA) to signal the presence of foreign (non-self) nucleic acid, self RNA-self RNA interactions should be minimized. Indeed, self RNAs appear to have been fine-tuned over evolutionary time by the introduction of purines in clusters in the loop regions of stem-loop structures. This adaptation should militate against the "kissing" interactions which initiate formation of dsRNA.

    Our analyses of virus base compositions suggest that, to avoid triggering the host cell's dsRNA surveillance mechanism, most viruses purine-loading their RNAs to resemble host RNAs ("stealth" strategy). However, some GC-rich latent viruses (HTLV-1, EBV) pyrimidine-load their RNAs. It is suggested that when virus production begins, these RNAs suddenly increase in concentration and impair host mRNA function by virtue of an excess of complementary "kissing" interactions ("surprise" strategy).

    Remarkably, the only mRNA expressed in the most fundamental form of EBV latency (the "EBNA-1 program") is purine-loaded. This apparent stealth strategy is reinforced by a simple sequence repeat which prefers purine-rich codons. During latent infection the EBNA-1 protein may evade recognition by cytotoxic T-cells, not by virtue of containing a simple sequence amino acid repeat as has been proposed, but by virtue of the encoding mRNA being purine-loaded to prevent interactions with host RNAs of either genic or non-genic origin.

 

1. Introduction

Double-stranded RNA (dsRNA) produces sequence-specific gene silencing in a wide variety of organisms. Although the mechanism is not understood, often silencing appears to operate at the post-transcriptional level resulting in inactivation or degradation of a specific mRNA species, leaving other mRNA species intact (Fire, 1999; Hamilton & Baulcombe, 1999). The response to dsRNA seems likely to have arisen as part of an intracellular mechanism for self/not-self discrimination (Sharp, 1999). Consistent with this, dsRNA has long been known as a powerful inducer of the interferons, which relay alarm signals to other cells, thus inducing a general anti-viral state (Kumar & Carmichael, 1998).

    Protein synthesis is inhibited by very low concentrations of dsRNA (Ehrenfeld & Hunt 1971; Hunter et al., 1975). This involves activation of dsRNA-dependent protein kinase (PKR), which inhibits a protein involved in the initiation of protein synthesis. Potential evasive viral strategies would include the acceptance of mutations to avoid formation of dsRNA, and inhibition of cell components required for the formation of, or the response to, dsRNA (Elia et al., 1996; Mittelstein Scheid, 1999).

    One molecule of dsRNA can trigger interferon induction (Marcus, 1983). Yet dsRNA is formed transiently in large quantities during normal protein synthesis. This involves base pairing between anticodons at the tips of stem-loops in tRNAs with cognate codons in mRNAs. However, the latter pairing involves at the most only five contiguous base pairs (Bossi & Roth, 1980), whereas more than twenty base pairs are required to activate PKR in vitro (Robertson & Mathews, 1996; Tian et al., 2000). While tRNA-mRNA interactions obviously do not trigger intracellular alarms, the fact that tRNA-mRNA interactions occur so efficiently in the cytosol suggests that mRNA-mRNA interactions might occur with equal efficiency (Izant & Weintraub, 1984; Melton, 1985; Bull et al., 1998). Among the RNA species of a potential virus host cell there might be two whose members, by chance, happened to have enough base complementarity for formation of a mutual duplex of a length sufficient to trigger alarms. Thus, there would have been an evolutionary selection pressure favouring mutations in host RNAs which decrease the possibility of their interaction with other "self" RNAs in the same cell.

    Indeed, this appears to have been assisted by "purine-loading" the loop regions of RNAs, thus avoiding the initial loop-loop "kissing" reactions which precede more complete formation of dsRNA (Eguchi et al., 1991). The excess of purines, observed both at RNA and at DNA levels (in mRNA-synonymous DNA strands), is sufficient to be detected as deviations from Chargaff's second parity rule (%A – %T and %G – %C for single strands; Forsdyke & Mortimer, 2001). These local deviations, or "skews," are found in a wide variety of organisms and their viruses, and can facilitate the identification of potential DNA open reading frames (ORFs), their transcription direction (Smithies et al., 1981; Dang et al., 1998; Bell et al., 1998; Bell & Forsdyke, 1999; Lao & Forsdyke, 2000), and origins of replication (Rocha et al. 1999).

    Major questions remain.

  • How does a virus trigger the dsRNA alarm resulting in a virus-specific hostile host response (Sharp, 1999)?

  • While the majority of RNAs of many organisms and their viruses have purine-loading, why do certain viruses pyrimidine-load (Cristillo 1998; Cristillo et al. 1998; Bell & Forsdyke 1999)?

We here present an analysis of the base composition of the genomes of various retroviruses and herpes viruses, which casts some light on these problems. Surprisingly, our results suggest adaptive roles for the simple sequence elements of viruses, and for the repetitive elements and non-genic DNA of their hosts.

 

2. Chargaff Difference Analysis

Transcribed duplex DNA has an mRNA-synonymous strand and an mRNA-template strand. If transcription is to the right of the site of RNA polymerase initiation (promoter), the "top" strand (i.e. the sequence recorded in GenBank) is the mRNA-synonymous strand. Szybalski and coworkers (1966) showed that mRNA-synonymous strands (and hence the corresponding mRNAs) have purine-rich clusters. Combining this observation with Chargaff's second parity rule, it follows for mRNA-synonymous strands that purines in the clusters might be balanced by an equal number of dispersed pyrimidines, or that there might be small deviations from the second rule ("Chargaff differences") in favour of purines, as is indeed found.

    Chargaff differences are simply the differences between the numbers of the classical Watson-Crick pairing bases in a nucleic acid segment ("AT-skew", "GC-skew"). The sign of the differences depends on the direction of subtraction, which in some previous work was determined alphabetically. For some purposes Chargaff differences are best expressed as positive or negative base excesses, which may be combined to provide an index of the degree of purine-loading, with purine excesses scoring positive and pyrimidine excesses scoring negative (Lao & Forsdyke, 2000). If the ORFs (open reading frames) in a sequence are known, purine-loading indices may be calculated either directly from ORF base composition, or from codon-usage tables (Nakamura et al., 1999). The indices are then 1000[(A-T)/N] for the W bases (A and T) and 1000[(G-C)/N] for the S bases (G and C), where N is the total number of bases (i.e. N = W + S). These two values are then summed to obtain a value for the overall purine-loading index (bases per kb).

    Chargaff differences (absolute or %) may be calculated as A-T [ or as (A-T)/W], and as G-C [or as (G-C)/S]. Here, A, T, G and C can be the frequency of each base in 1 kb sequence windows. This approach makes no assumption about the disposition of ORFs, and can be applied to uncharted DNA. When an ORF is located, values for windows whose centres overlap the ORF can be averaged to obtain an approximate value for that ORF. For the importance of 1 kb window sizes and other details see Dang et al., (1998) and Bell and Forsdyke (1999).

    For AT-rich genomes, purine-loading is usually with respect to adenine, whereas for GC-rich genomes, purine-loading is usually with respect to guanine. In thermophilic bacteria, whatever the (G+C)%, purine-loading involves both purines (Lao & Forsdyke, 2000). An organism in which one or both Chargaff differences reflect the purine-loading of a significant excess of mRNAs is held to comply with "Szybalski's transcription direction rule" (Bell & Forsdyke, 1999).

pli001.gif (7229 bytes)

FIG. 1. Distribution of purine-loading among species. The purine-loading of coding regions was calculated from codon usage tables for all species (2958) represented in the August 1999 release of the GenBank database by more than three genes and more than 2500 bases. The purine-loading index (bases/kb) for a particular species was calculated as the sum of 1000[(G-C)/N] and 1000[(A-T)/N], where G, C, A, and T correspond to the numbers of individual bases, and N corresponds to the total number of bases, in the codon usage table. This measure of the purine-loading of RNAs disregards 5' and 3' non-coding sequences, including poly(A) tails.

 

3. Extremes of Positive and Negative Purine-Loading

Indexes of purine-loading calculated from codon usage tables, although not taking into account 5' and 3' non-coding sequences, demonstrate the universality of the purine-loading phenomenon. Fig. 1 shows how purine-loading is distributed across all species for which sequences of more than 3 ORFs and 2500 bases were available. The value for all human genes (excluding mitochondria) is 42 bases/kb, meaning that, on average, there are 42 more purines than pyrimidines for every kilobase of coding sequence. The shoulder with negative purine-loading values (pyrimidine-loading) corresponds mainly to mitochondrial genes, which are disproportionately abundant in GenBank.

pli002.TIF (6357878 bytes)
FIG. 2.

Distribution of purine-loading indices among 494 viruses of eukaryotes, excluding plants. Viruses at the extremes of the distribution are listed in the figure. For example, Brazilian caprine lentivirus (325) and Hepatitis D virus (321) fall in the same interval, and provide a frequency value of 2 at the extreme right of the distribution. Positive indices were observed for murine polyoma virus (32), RSV (54), Vaccinia virus (64), SFV-1 (73), and SV40 (103).

     Fig. 2 shows a subset of these data (494 species), which corresponds to all eukaryotic viruses, with the exclusion of plant viruses. Most viruses have positive purine-loading indices, which are often greater than the average for human ORFs (vertical dashed line). There is a tendency for members of certain viral groups to be either extremely pyrimidine-loaded or extremely purine-loaded. Among retroviruses, at one extreme (highly pyrimidine-loaded) are Human T-cell Leukaemia (Lymphotropic) Virus-1 (HTLV-1) and some similar retroviral species, whereas at the other extreme are Human Immunodeficiency Virus-1 (HIV-1) and some similar retroviral species (highly purine-loaded).

    While the value for the agent of classical acute infectious hepatitis (Hepatitis A virus; 37) is close to the human average (42), that for the agent of classical serum hepatitis (Hepatitis B virus), which produces a chronic infection and requires reverse transcriptase for replication, is highly negative (-127). Hepatitis virus C, which usually produces chronic infections is negative (-27). Hepatitis virus D, which requires coinfection by Hepatitis B virus in order to be packaged, is very positive (+321). Hepatitis virus E, which causes acute infections in humans and emerges periodically from an unknown source (and hence may be chronic in that source), is very negative (-140).

    Members of the Herpes virus group show less extreme Chargaff differences. The shoulder in Fig. 2 from -50 to -20 includes Herpes simplex virus-1 (HSV-1), Epstein-Barr virus (EBV) and human cytomegalovirus (HCMV), –all pyrimidine-loaded. Many other herpes-related viruses show overall purine-loading. The main ascending limb in the range + 10 to + 40 includes Varicella-Zoster virus (VZV), human herpesvirus 6 (HHV6), Herpes saimiri (HVS) and Ictalurid herpesvirus 1. Human herpesvirus 8 (HHV8; Kaposi sarcoma-associated virus) has a value of -1.

    Several species with negative purine-loading indices produce chronic infections (see Section 8) and have high (G+C)% in ORFs. In general, there is an inverse relationship between species (G+C)% and purine-loading index. Thus, a linear regression plot of average (G+C) percentages calculated from codon usage tables for each viral species used in Fig. 2, relative to the corresponding purine-loading indices, has a downward slope which is significantly different from zero (P<0.0001; data not shown). A more detailed analysis was made of the retroviral and herpes groups.

 

4. Extreme Purine-Loading Indices in Retroviruses

In many genomes transcription directions vary so that, while total base composition of the "top" strand of DNA reflects Chargaff's second parity rule (e.g. G = C), Chargaff differences for leftward transcribing ORFs (e. g. C>G) tend to cancel out the differences for rightward transcribing ORFs (e. g. G>C). Thus, with genome-sized sequence windows, compliance with Szybalski's transcription-direction rule, assessed as Chargaff differences, is not usually evident. However, retroviral genomes are transcribed entirely in one direction (considered by convention to be to the right), and mere examination of total base composition can illustrate some major points.

Table 1. Chargaff differences of retroviral genomes

Virusa

Base composition

. .

Chargaff differencesb

.

Purine

loading

indexc

.

W bases

.

S bases

W bases

.

S bases

.

A

T

C

G

(C+G)%

A-T

G-C

.

SFV-1

4195

3696

.

2480

2601

39.2

.

A>T

38.5

.

G>C

9.3

.

48

. . . . .

HIV-1

3411

2163

1772

2373

42.6

A>T

128.4

G>C

61.8

190

. . . . .
HTLV-1

1983

1951

2932

1534

53.2

A>T

3.8

C>G

-166.4

-163

. . . . .

RSV

2216

2035

2362

2704

54.4

A>T

19.4

G>C

36.7

56

aSFV-1, simian foamy virus type 1 (Genbank accession number X54482); HIV-1, human immunodeficiency virus 1 (K03455); HTLV-1, human T cell leukaemia (lymphotropic) virus 1 (D13784); RSV, Rous sarcoma virus (D10652).

bChargaff differences ("base skews" from equifrequency) are expressed as bases/kb. Values were calculated for each entire genome, which was not split into sub-windows. A, T, C and G refer to the numbers of bases.

cThe sum of the purine excesses for the W bases and for the S bases, expressed as bases/kb. In the case of HTLV-1 there is a net pyrimidine excess, so the value is negative.

 

    Table 1 compares the "top" strands of four retroviruses whose (G+C) percentages vary from 39.2% (Simian foamy virus-1; SFV-1; Kupiec et al., 1991), to 54.4% (Rous sarcoma virus; RSV; Schwartz et al., 1983). Three of the genomes obey Szybalski's transcription direction rule for rightward transcription (purines>pyrimidines). For AT-rich genomes (SFV-1, HIV-1; Ratner et al., 1985), A>T and G>C with the W bases providing the largest deviation from the parity rule. Similarly, for the GC-rich RSV genome, A>T and G>C, with the S bases providing the largest deviation.

    Calculations for different parts of the sequences (1 kb windows) show that these purine-rich patterns are sustained throughout, particularly in the case of the base pairs with the largest deviations from the parity rule (data not shown). A similar compliance with Szybalski's rule has been noted in the case of some other AT-rich viruses (SV40, polyoma, vaccinia; Smithies et al., 1981; Bell & Forsdyke, 1999), again with the W bases providing the largest deviation. In contrast, HTLV-1 (Malik et al., 1988) is GC-rich, yet C>G. There is only a weak tendency for compliance with Szybalski's rule with respect to the W bases. The C-rich pattern is sustained throughout the genome, affecting all ORFs.

 

TABLE 2  Distribution of pyrimidine- and purine-loading among leftward- and rightward-transcribing ORFs of various herpes-related virusesa

Direction of

transcription

.

To left

.

To right

W bases

.

S bases

W bases

.

S bases

Virusb

C+G (%)

. . . .
.

HVS

 

35.0

T>A

.

C>G

.

A>T

.

C>G

22:15 (P=0.020)

28:9 (P=0.00016)

35:5 (P<0.00003)

28:12 (P=0.0035)

. . . .

VZV

 

46.1

T>A

C>G

A>T

C>G

19:15 (P=0.15)

27:7 (P=0.001)

24:13 (P=0.02)

25:12 (P=0.04)

. . . .

EBV

 

60.1

T>A

G>C

T>A

C>G

23:21 (P=0.24)

30:14 (P=0.00007)

49:37 (P=0.08)

64:22 (P<0.00001)

. . . .

HSV

 

68.3

A>T

G>C

T>A

C>G

20:17 (P=0.15)

29:8 (P=0.0007)

24:17 (P=0.037)

32:9 (P=0.00019)

aEach sequence was examined using 1 kb windows moving in steps of 0.1 kb. Values for windows whose centres overlapped an ORF were averaged to obtain a value for that ORF. Each data set shows the relative proportion of ORFs with positive or negative Chargaff differences (i.e. skew such that there is either purine-loading, or pyrimidine-loading of the corresponding mRNAs), and the probability (P) that the asymmetry in numbers of positive and negative ORFs is not significant. The Wilcoxan signed ranks test performed with Minitab software (Meyer & Krueger, 1994) takes into account the magnitudes of Chargaff differences.

bHVS, Herpes saimiri virus (X64346); VZV, Varicella-Zoster virus (X04370); EBV, Epstein-Barr virus (V01555); HSV-1, Herpes simplex virus-1 (X14112).

 

5. Extreme Purine-Loading Indices in Herpes Viruses

Table 2 shows that, like HTLV-1, GC-rich members of the herpes virus family (EBV, HSV-1) do not follow Szybalski's transcription direction rule (i.e. the majority of mRNAs are pyrimidine-loaded). This applies strongly to the S bases (e. g. C>G when transcription is to the right, and G>C when transcription is to the left), but less so to the W bases.

    For most ORFs of AT-rich members of the Herpes virus family (HVS, VZV), A>T when transcription is to the right (i.e. they follow the transcription-direction rule), but the S bases do not follow the rule (C>G when transcription is to the right). When transcription is to the left, both the W and the S bases follow Szybalski's rule (pyrimidines>purines), but this is most evident in the case of the S bases. On balance, like AT-rich retroviruses (Table 1), the AT-rich herpes viruses seem to follow the rule. 

TABLE 3  Distribution of Chargaff differencesa among ORFs of Herpes-related viruses

Virusb

.

TRANSCRIPTION TO LEFT

.

TRANSCRIPTION TO RIGHT

.

A>T

T>A

Pc

Ruled

.

A>T

T>A

P

Rule

.

HVS 35%

.

37

15

22

0.02

+

.

40

35

5

<0.0003

+

.

G>C

9

1GA

8GT

0.006

+

12

9GA

3GT

0.054

+

C>G

28

14CA

14CT

0.21

-

28

26CA

2CT

<0.001

+

P

0.0002

0.001

0.037

.

0.003

0.001

0.39

.

Rule

+

+

+

-

-

+

.

HHV6
42%

.

70

38

32

0.23

-

.

49

26

23

0.45

+

.

G>C

28

3GA

25GT

<0.001

+

28

8GA

20GT

0.045

-

C>G

42

35CA

7CT

<0.001

-

21

18CA

3CT

<0.001

+

P

0.015

<0.001

<0.001

.

0.47

0.008

0.008

.

Rule

+

+

-

+

-

+

.

VZV 46%

.

34

15

19

0.151

+

.

37

24

13

0.022

+

.

G>C

7

2GA

5GT

0.045

+

12

9GA

3GT

0.063

+

C>G

27

13CA

14CT

0.447

+

25

15CA

10CT

0.089

+

P

0.001

0.002

0.071

.

0.038

0.124

0.054

.

Rule

+

+

+

-

-

-

.

HCMV
57%

.

126

62

64

0.104

+

.

82

44

38

0.405

+

.

G>C

80

35GA

45GT

0.381

+

40

15GA

25GT

0.015

-

C>G

46

27CA

19CT

0.037

-

42

29CA

13CT

0.002

+

P

0.0008

0.22

<0.001

.

0.111

0.001

0.015

.

Rule

-

-

-

-

-

+

.

EBV
60%

.

44

21

23

0.24

+

.

86

37

49

0.081

-

.

G>C

30

10GA

20GT

0.015

+

22

10GA

12GT

0.425

-

C>G

14

11CA

3CT

0.008

-

64

27CA

37CT

0.066

-

P

0.00007

0.105

<0.001

.

<0.00001

<0.001

<0.001

.

Rule

-

+

-

-

-

-

.

HSV-1
68%

.

37

20

17

0.15

-

.

41

17

24

0.037

-

.

G>C

29

15GA

14GT

0.80

-

9

5GA

4GT

0.594

+

C>G

8

5CA

3CT

0.69

-

32

12CA

20CT

0.014

-

P

0.0007

0.002

0.008

.

0.0002

0.017

0.002

.

Rule

-

-

-

-

-

-

aChargaff differences were calculated as in Table 2. Bold numbers with subscripts are ORFs in a given category. For example, of 37 HVS ORFs transcribed to the left, a minority (15) have A>T. Of this 15, one has G>C and 14 have C>G. Thus, of the 37 ORFs, 1 is in the GA group, 8 are in the GT group, 14 are in the CA group, and 14 are in the CT group.

bAbbreviated names of viruses with their (G+C)%. HHV6, human herpesvirus 6 (X83413); HCMV, human cytomegalovirus (X17403).

cProbability (P) values at the bottom of columns refer to the proportions of ORFs with G-excess over C, relative to ORFs with C-excess over G . Probability (P) values at the right of the rows refer to the proportions of ORFs with A-excess over T, relative to ORFs with T-excess over A. The significance of departures from equifrequency was calculated using the Wilcoxan signed-ranks test.

d "+" refers to compliance with Szybalski's rule (e.g. excess purines when transcription is to the right). "-" refers to a deviation from Szybalski's rule (e.g. excess pyrimidines when transcription is to the right). Designation of "+" or "-" is based on the sum of Chargaff difference values, which closely corresponds to the relative numbers of "+" and "-" ORFs.

    From Table 2 it is evident that an individual ORF may be enriched for one of the W bases and one of the S bases, with four possible combinations (GA, GT, CA, CT). Table 3 shows that the ORFs of herpes viruses are distributed over all four groups, but with significant biases (bold number quartets with subscripts designating group). Rightward-transcribed ORFs which do not follow Szybalski's rule with respect to both the W and the S bases would be in the CT group. This group dominates in the cases of the two viruses of highest (G+C)% (EBV and HSV-1). Rightward-transcribed ORFs which follow Szybalski's rule with respect to both the W and the S bases would be in the GA group. This group is poorly represented in EBV and HSV-1. For EBV there are 37 rightward ORFs in the CT group, and only 10 in the GA group. Since assignment of functions to ORFs is not yet complete, whether the group biases relate to function remains for future study. That there can be a relationship is suggested by an ORF in the GA group, which encodes the EBNA-1 latency protein.

pli003.TIF (4105762 bytes)

FIG. 3. Chargaff difference analysis of a section of the Epstein-Barr virus genome. ORFs are shown as open boxes with arrows indicating transcription direction. A, T, G, and C are the number of bases counted in 1 kb windows which were moved along the sequence in 0.1 kb steps. Each data point corresponds to the middle of a window. Chargaff differences (%) are expressed either as G - C (red squares), or as A - T (yellow circles). The major ORF encoding Epstein-Barr nuclear antigen-1 (EBNA-1) is labelled.

 


6. Gene Encoding the Major Latency Transcript Obeys the Rule

Fig. 3 shows Chargaff difference analysis of the section of the GC-rich Epstein-Barr virus (EBV) genome from which a major latency-associated transcript (encoding EBNA-1 protein) is derived. Whereas, like the majority of EBV genes, most neighbouring genes are pyrimidine-loaded (C>G when transcription is to the right; G>C when transcription is to the left), the rightward-transcribed gene encoding EBNA-1 protein follows Szybalski's rule (G>C; A>T), and very dramatically so.

    The EBNA-1-encoding gene is exceptional. It is the only viral gene expressed in the most fundamental type of EBV latency (the "EBNA-1 only program"; Thorley-Lawson et al., 1996). Among the other EBV latency-associated genes, those encoding EBNAs 2-6 purine-load their mRNAs only with respect to the W bases, and much less dramatically than the gene encoding EBNA-1; those encoding LMP1 and LMP2 pyrimidine-load their mRNAs with respect to both the W bases and the S bases (data not shown).

 

7. Simple Sequence Repeats Reinforce Compliance with Szybalski's Rule

It appears that, unlike most other EBV genes, the rightward-transcribed gene encoding EBNA-1 has been under pressure to accepted mutations which increase the purine content of the top (mRNA-synonymous) strand. If this were not possible without disrupting protein functional domains, the gene might have locally increased its content of purine-rich codons in inter-domain regions. Indeed, the EBNA-1 gene has a long "simple sequence" region (Karlin et al., 1988) containing exclusively either glycine codons (GGN), or alanine codons (GCN). Table 4 shows that choices of third bases (N) in these codons are almost exclusively purines (Karlin et al., 1990). Although the EBNA-1 gene (BKRF1) without the simple sequence is already slightly purine-loaded, the additional purines in the simple sequence greatly increase Chargaff difference values in favour of Szybalski's rule (Table 5).

TABLE 4 Codon usagesa of genes containing long simple sequences in Herpes simplex-related viruses

Virus

Gene

Codonsb

Complete

protein

.

Less the

simple

sequence

.

Simple

sequence

alone

.

Human

averagec

(%)

.
.

EBV

BKRF1
(EBNA-1)

Gly

GGG

63

.

11

.

52

.

24

GGA

144

43

101

25

GGT

25

24

1

16

GGC

19

19

0

34

.
.

Ala

GCG

4

.

3

.

1

.

10

GCA

85

2

83

23

GCT

6

6

0

26

GCC

8

8

0

40

.

HVS

ORF 48

Gly

GGG

34

.

2

.

32

.

24

GGA

55

7

48

25

GGT

1

1

0

16

GGC

6

6

0

34

.
.

Glu

GAG

92

.

10

.

82

.

58

GAA

75

29

46

42

.
.

ORF 73

Gly

GGG

2

.

2

.

0

.

24

GGA

15

7

8

25

GGT

1

1

0

16

GGC

1

1

0

34

.
.

Glu

GAG

2

.

1

.

1

.

58

GAA

128

3

125

42

.
.

Arg

AGG

2

.

2

.

0

.

20

AGA

17

11

6

20

CGG

1

1

0

21

CGA

1

1

0

11

CGT

4

3

1

8

CGC

1

1

0

19

.
.

Ala

GCG

2

.

2

.

0

.

10

GCA

8

6

2

23

GCT

43

6

37d

26

GCC

0

0

0

40

.

VZV

ORF 11

Gly

GGG

6

.

6

.

0

.

24

GGA

28

16

12

25

GGT

11

11

0

16

GGC

5

5

0

34

.
.

Glu

GAG

58

.

14

.

44

.

58

GAA

30

30

0

42

Asp

GAT

20

19

1

47

GAC

35

16

19

53

.
.

Ala

GCG

31

.

11

.

20

.

10

GCA

19

19

0

23

GCT

13

13

0

26

GCC

13

13

0

40

 aValues are the absolute number of codons in each protein segment.

bThe main amino acids contributing to each simple sequence are listed (e.g. the EBNA-1 protein simple sequence has alternating glycines and alanines).

cData are from 17625 human genes (Nakamura et al., 1999). Percentage distributions within each codon family are shown.

dPyrimidine-loading by virtue of this alanine codon is more than offset by the large excess of glutamate codons in ORF 73.

 

TABLE 5 Contribution of long simple sequences to Chargaff differences of ORFs of Herpes simplex-related viruses

Chargaff difference formulaa

.

A - T

.

G - C

Virus

Gene

.

CpG

island

(local)b

.

ORF length in codons

Compl-
ete
ORF

.

Less the
simple
sequ-
ence

.

Simple
sequ
ence
alone

Compl-
ete
ORF

.

Less
the simple
sequ-
ence

.

Simple
sequ-
ence
alone

Comp-
lete
ORF

Simple
sequ
-ence

.

EBV

BKRF1

.

-

.

642

238

.

A>T (141.2)

.

A>T (46.2)

.

A>T (95.0)

.

G>C (270.0)

.

G>C (86.2)

.

G>C (187.4)

.

HVS

ORF 48

.

-

.

797

301

.

A>T (135.9)

.

A>T (31.8)

.

A>T (104.1)

.

G>C (225.0)

.

G>C (25.9)

.

G>C (199.1)

.
.

ORF 73

.

-

.

407

183

.

A>T (270.3)

.

A>T (73.7)

.

A>T (196.6)

.

G>C (110.5)

.

C>G (-11.5)

.

G>C (122.0)

.

VZV

ORF 11

.

+

.

819

102

.

A>T (11.0)

.

T>A (-18.7)

.

A>T (29.7)

.

G>C (57.8)

.

G>C (5.3)

.

G>C (52.5)

.

HHV6

LJ1

.

+

.

321

117

.

T>A (-140.2)

.

T>A (-77.9)

.

T>A (-62.3)

.

G>C (145.4)

.

C>G (-36.3)

.

G>C (181.7)

aChargaff differences (bases/kb) were calculated directly from the ORFs (and not by summing windows within the ORF).

bCpG islands are arbitrarily defined as >80 CpG dinucleotides/1 kb sequence window. The presence of a CpG island in regulatory regions suggests activity during viral latency. Absence of a local CpG island may mean that the promotor operating during latency is distant from the ORF (Tao et al., 1998). HSV-1 is not generally CpG depleted and has no ORF with a simple sequence >70 codons.

   In several other members of the Herpes virus family there are similar purine-biases in long (>100 amino acids) simple sequence-encoding regions within genes which may be latency-associated (Tables 4, 5). These include ORF 48 of HVS (T cell tropic), which is located in the HVS genome in a similar position to the EBNA-1-encoding gene in the genome of EBV (B cell tropic).

    The ORF encoding the latency-associated nuclear antigen (LANA) of HHV8 also contains a long simple sequence repeat (Rainbow et al., 1997). Fig. 4 shows Chargaff differences in the region of the ORF. Being leftward-transcribed, it obeys Szybalski's rule in having an excess of pyrimidines in the top strand (C>G; T>A), whereas most neighbouring genes, whatever their transcription direction, disobey the rule. Thus, there is purine-loading of LANA mRNA which is reflected in codon choice (data not shown).

    Human herpes virus 6 (GenBank accession number X83413) has a 117 codon simple sequence in ORF LJ1, which displays purine-loading with respect to the S bases, and has a CpG island; such association with CpG islands (Table 5) is an expected feature of the promoter regions of latency-associated genes with hypomethylated CpGs (Honess et al., 1989; Tao et al., 1998).

    Intriguingly, there is a long region of repetitive DNA in an ORF of unknown function (ORF 50) in Ictalurid Herpesvirus 1 (channel catfish virus; GenBank M75136); here again, codon-choice (with respect to glycine, valine and alanine) suggests purine-loading (data not shown).

pli004.TIF (3265836 bytes)

FIG. 4. Chargaff difference analysis of a section of the genome of Kaposi's sarcoma associated herpesvirus (HHV8). ORFs are numbered. Those with the prefix "K" may be unique to this virus. The major latency-associated nuclear antigen (LANA) is encoded by ORF 73. Note that, since transcription is to the left, the upper (template) strand of ORF 73 is pyrimidine-loaded, so that the lower (mRNA-synonymous) strand, and hence the corresponding mRNA, would be purine-loaded. Other details are as in Figure 3.

 

8. Disruption of Host Traffic

Viruses can have acute or chronic (persistent) patterns of host infection (Villarreal et al., 2000). Certain acutely lytic viruses (e.g. Hepatitis A, Vaccinia) purine-load their RNAs in compliance with Szybalski's rule, whereas viruses causing chronic (sometimes sub-clinical) infections tend to pyrimidine-load their RNAs (e.g. Hepatitis B; Bell & Forsdyke, 1999). Prolonged and profound clinical latency is a characteristic of some viruses that pyrimidine-load (Tables 1-3; Figure 3). In contrast to individuals latently infected with HIV-1, most individuals infected with HTLV-1 remain asymptomatic and live normal lives. Cytotoxic T cells appear able to target only peptides from the Tax protein (Gould & Bangham, 1998). Furthermore, HTLV-1 is likely to transfer between individuals when integrated in host DNA within intact cells. Virions alone show low infectivity. Similarly, Herpes simplex-related viruses permanently infect many individuals in their host species, who are often asymptomatic (Baer et al., 1984; Davison & Scott, 1986; McGeoch et al., 1988; Albrecht et al., 1992).

    Thus, certain persistent GC-rich viruses appear to risk interactions with host RNAs, which would be initiated through complementary base pairing between loops. In vivo, C-rich loops of virus RNAs would interact with G-rich loops of host RNAs, (just as in vitro poly(rC) interacts rapidly at low temperatures with mRNA-synonymous DNA strands; Szybalski et al., 1966). When in a functionally latent state most virus mRNAs would not be transcribed; thus the risk would be minimized. When triggered to move from the latent state to one of rapid productive cytolysis, the viruses would transcribe RNAs which, when released from the nucleus, might suddenly flood the cytosol with RNAs "driving on the wrong side of the road". The multiplicity of distracting loop-loop RNA interactions might slow host cell "traffic" and impair defence responses, including those triggered by any dsRNA which was formed. This "surprise" strategy might be of adaptive value to the virus.

 

9. The Role of Simple-Sequence Repeats in Latency-Associated Gene Products

In the functionally latent state most viral mRNAs would not be expressed, and so would not be available to interact with host mRNAs. However, although HTLV-1 has no latency-specific transcripts, most herpesviruses do. Remarkably, often herpesvirus transcripts include purine-biased simple sequence elements (see Section 7).

    Like those of other members of the herpes virus family, the EBV genome is very compact with little intergenic DNA; this suggests an evolutionary selection pressure to eliminate non-functional sequences. The long simple sequence (encoding Gly-Ala repeats) in the EBNA-1 gene has been explained by Karlin et al. (1988, 1990; Karlin 1995) as an adaptation operative at the protein level. However, the Gly-Ala region can be removed experimentally without affecting the known functions of EBNA-1 (Yates & Camiolo, 1988; Summers et al. 1997).

    The paradox appeared to be resolved by evidence that the Gly-Ala region functions in cis at the protein level to inhibit antigen processing for MHC presentation (Levitskaya et al., 1995, 1997; Mukherjee et al., 1998). However, in order to express the protein these authors used a vector which first had to express the corresponding mRNA; this was then translated into the protein. Their evidence is consistent with the Gly-Ala region being simply a device for purine-loading a foreign mRNA ("non-self") to make it appear like host mRNA ("self"); this might subvert intracellular self/not-self discrimination (Forsdyke, 1994; 1995a,b; 1999).

    If selection had been acting at the protein level to conserve the Gly-Ala region there should not have been such extreme codon bias (Table 4). On the other hand, the bias might have arisen by the amplification of an initially small Gly-Ala-coding segment, which just happened to have the purine bias. However, we note that in several other members of the Herpes virus family there are similar purine-biases in long (>100 amino acids) simple sequence-encoding regions within genes likely to be latency-associated (see Section 7). These biases might also be a consequence of a selection pressure for purine-loading of mRNA to assist maintenance of latency. For example, the LANA protein of HHV8 stabilizes latency by preventing p53-mediated apoptosis (Friborg et al., 1999).

 

10. Messenger RNAs as "Antibodies"

So why is EBNA-1 mRNA purine-loaded? Distracted by the messenger role of mRNA molecules, we may fail to note that the diverse spectrum of cell mRNA species, like the diverse spectrum of antibodies in serum, constitutes a repertoire of specificities with the potential to react with complementary sections of non-self RNA "antigens".

    Just as interactions between antibody and foreign antigen provoke extracellular inflammatory responses to the antigen, so interactions between host RNA and foreign RNA might provoke intracellular responses to foreign RNA, which could include gene silencing. If EBNA-1 mRNA ("sense") in latent EBV-infected cells were not purine-loaded to avoid "kissing" interactions, then it is possible that a self RNA species would have a sufficient degree of complementarity ("antisense") to progress beyond kissing interactions. Molecules of dsRNA of a length sufficient to alert host defence systems might then be formed (Susuki et al., 1999). The alarm might serve to increase MHC protein expression since only newly synthesized MHC proteins bind peptides efficiently for presentation to T cells (Townsend et al., 1990). Thus, through its purine-loading of EBNA-1 mRNA ("stealth" strategy; Cristillo et al., 1996), EBV would fail to provoke gene silencing or increase MHC expression. This would impede the MHC-dependent cytotoxic T cell response (Callan et al., 1998), and so assist maintenance of the latent state.

    There are reports of natural antisense RNAs derived from overlapping genes with different transcriptional orientations (Vanhee-Brossollet & Vaquero, 1998). In the light of the present thesis, such transcripts should not normally coexist in the same cell or intracellular compartment, or should be special cases for which there are adaptations to prevent the inadvertent firing, or response to, dsRNA alarms.

 

11. Charge cluster domains decrease immunogenicity of other domains?

The use of simple sequence to purine-load mRNAs means that, at the protein level, the simple sequences often contain runs of charged amino acids (e. g. Glu, Asp). Karlin (1995) refers to such regions as "hyper-charge runs", and notes that "for most of the hypercharge runs [in proteins] there is considerable variation in codon usage, which suggests an important function for these charge runs" (our parentheses and italics). However, our studies show that the variation is restricted to purine-rich codons (Table 4), which is more consistent with selection acting at the nucleic acid level.

    Many of the codons characteristic of the triplet expansion diseases, some of which generate charge runs, are also purine-rich (Green & Wang, 1994; Hancock & Santibanez-Koref, 1998). We suggest that charge runs themselves may not have an important function with respect to the function of the end-product (protein) of the ORF in which they locate (although they may affect protein solubility). When attempting to relate a protein's sequence to its biological function, the possibility that the major selective pressure has been at the nucleic acid level must be considered (Ball, 1973; Rocha et al., 1999; Lao & Forsdyke, 2000). As the result, a protein of less than optimum function may be synthesized, or the protein sequence may have to counter-adapt to improve function in the face of a primary selection pressure operating at the level of the corresponding nucleic acid (Forsdyke, 1996; Forsdyke & Mortimer, 2001).

    For example, to counter a tendency of its protein product to provoke autoimmune attack by cytotoxic T cells, there would be a selection pressure for a gene to purine-load its mRNAs, thus generating long charge-rich alpha-helices which might be irrelevant to the function of the protein itself (Dohlman et al., 1993). In this respect, we note the prevalence of charge clusters in antigens implicated in various autoimmune diseases. The clusters do not coincide with major autoantigenic epitopes (Brendel et al., 1991). This suggests that charge cluster domains may not be the primary cause of the diseases, as has been supposed, but may have evolved in response to the disease-provoking characteristics of other domains (i.e. the domains corresponding to autoantigenic epitopes).

    Unlike EBV, HSV-1 does not show CpG suppression (indicating no general methylation of CpGs), and there are no long simple sequence regions. However, HSV-1 would be expected to have pyrimidine-rich loops in most mRNAs (Tables 2, 3). Intriguingly, the main HSV-1 RNAs transcribed during latency correspond to the "antisense" strand (Croen et al., 1987; Stevens et al., 1987). We predict that any parts of these latency-associated transcripts which persist in the cell would be relatively purine-rich (Goldenberg et al., 1997; Arthur et al., 1998). A similar prediction applies to certain antisense transcripts found in EBV latency (Karran et al., 1992; Brooks et al., 1993).

pli005.TIF (5865958 bytes)

FIG. 5. Fold analysis of a segment of the Epstein-Barr virus genome containing the EBNA-1-encoding gene (labelled box with horizontal arrow indicating transcription to the right). The region of the Gly-Ala repeat is marked by shading and two vertical dashed lines. FONS, values for the folding of the natural sequence. High negative FONS values (e.g. –40 kcal/mol) correspond to high folding potential (stem-loop potential; Forsdyke, 1998). FORS-M, values for the base composition-dependent component of the FONS values. FORS-D (upper plot with standard errors of the mean), values for the base order-dependent component of the FONS values (such that FONS = FORS-M + FORS-D).

 

12. Low Stem-Loop Potential of the Gly-Ala Repeat-Encoding Region

Simple sequence elements in proteins, as found in the trinucleotide expansion diseases (e.g. polyglutamine tracts encoded by poly(CAG)), sometimes cause intracellular protein aggregation. It is possible that such aggregates are responsible for the underlying pathology (Hancock & Santibanez-Koref, 1998; Forsdyke, 2000). However, trinucleotide repeats are often capable of adopting stem-loop conformations, which at the RNA level can activate PKR. This may contribute to the disease mechanism (Tian et al., 2000). In the case of the gene encoding EBNA-1, the region of the repeat has very low stem-loop potential, as revealed by a sustained low negative value for the folding of the natural sequence (FONS value for the mRNA-synonymous strand; Fig. 5). This is contributed to both by the base composition of the repeat (FORS-M value), and by its base order (FORS-D value; Forsdyke, 1998). In the corresponding mRNA, most parts encoding known functions of the protein may adopt compact secondary structures, whereas the part encoding the simple sequence repeat and the region on its immediate 3' side may have a structure more available for intermolecular interactions.

 

13. A Role for Non-Genic DNA?

The threshold for binding to PKR is approximately 15 trinucleotide repeats (45 bases), which corresponds to a dsRNA segment of approximately 22 bases (Tian et al. 2000). There are 422 possible combinations in the universe of 22 base sequences, of which half (422/2) complements the other half. A virus encoding 10 mRNAs of average length 1021 bases, would have 10000 (approx. 47) "windows" of 22 bases, any one of which could potentially act as an RNA "antigenic determinant" in the host cell. Assuming 10000 different host mRNA species in a cell, there would be 10000 x 1000 (approx. 412) potential complementary RNA "windows" in host mRNAs. With a much higher mutation rate, a virus species might appear capable of adapting to ensure that its 47 specificities did not complement the host's 412 specificities. Various factors militate against this.

    First, a high degree of polymorphism among host transcripts (Sunyaev et al. 2000) would make it likely that what a member of a virus species "learned" (through mutation) on one member of its host species, would not be applicable to the next member of the host species which it encountered (Forsdyke, 1991; Forsdyke, 2000, 2001). Second, due to a low level of read-through transcription (failure of transcription termination) of host mRNAs, a low level of transcription of extragenic DNA occurs (Heximer et al., 1998). Thus, the maximum potential repertoire of "RNA antibodies" would be limited only by genome size (approx. 416 potential specificities in humans). Indeed, one function of the promoters of repetitive DNA elements (e.g. human Alu elements) might be to provide such read-through transcription, as has been observed (Manley & Colozzo, 1982; Feuchter et al., 1992). It would be of adaptive advantage to the host to activate such promoters under conditions of cell stress (heat shock or viral infection); again, this is observed (transcription by RNA polymerase III; Jang & Latchman, 1989; Liu et al., 1995).

    Thus, non-genic "junk" DNA (Dang et al. 1998) can be viewed in much the same way as we view the diverse genes encoding the variable regions of immunoglobulin antibodies. Just as B-cells capable of synthesizing a unique anti-self antibody would be eliminated during somatic time to prevent self-reactivity, so junk DNA would be screened over evolutionary time (by positive selection of individuals in which favourable mutations had been collected together by recombination) to decrease the probability of two complementary "self" transcripts interacting to form dsRNA segments of more than 21 bases. High polymorphism of non-genic DNA (Beck et al. 1996; Nickerson et al. 1998) would make it difficult for viruses to anticipate the RNA "antibody" repertoire of future hosts (Forsdyke, 1999, 2000, 2001).

   We thank J. Gerlach for assistance with computer configuration, J. T. Smith for statistical advice, and G. McPherson for assistance with the Silicon Graphics Computer maintained by Base4 BioInformatics Inc., Mississauga. The National Research Council of Canada, Academic Press, Cold Spring Harbor Laboratory Press, and Elsevier Publishing Corporation gave permission for the display of full-text versions of some of the cited references at our internet site.

REFERENCES

ALBRECHT, J. C., NICHOLAS, J., BILLER, D., CAMERON, K. R., BIESINGER, B., NEWMAN, C., WITTMANN, S., CRAXTON, M. A., COLEMAN, H., FLECKENSTEIN, B. & HONESS, R. W. (1992). Primary structure of the Herpes saimiri genome. J. Virol. 66, 5047-5058.

ARTHUR, J. L., EVERETT, R., BRIERLEY, I. & EFSTATHIOU, S. (1998). Disruption of the 5' and 3' splice sites flanking the major latency-associated transcripts of herpes simplex virus type 1: evidence for alternative splicing in lytic and latent infections. J. Gen. Virol. 79, 107-116.

BAER, R. et al., (1984). DNA sequence and expression of the B95-8 Epstein-Barr virus genome. Nature 310, 207-211.

BALL, L. A. (1973). Secondary structure and coding potential of the coat protein gene of bacteriophage MS2. Nature New Biol. 242, 44-45.

BECK, S., ABDULLA, S., ALDERTON, R. P., GLYNNE, R. J., GUT, I. G., HOSKING, L. K., JACKSON, A., KELLY, A., NEWELL, W. R., SANSEAU, P., RADLEY, E., THORPE, K. L. & TROWSDALE, J. (1996). Evolutionary dynamics of non-coding sequences within the class II region of the human MHC. J. Mol. Biol. 255, 1-13.

BELL, S. J., CHOW, Y. C., HO, J. Y. K. & FORSDYKE, D. R. (1998). Correlation of CHI orientation with transcription indicates a fundamental relationship between recombination and transcription. Gene 216, 285-292.

BELL, S. J. & FORSDYKE, D. R. (1999). Deviations from Chargaff's second parity rule correlate with direction of transcription. J. theor. Biol. 197, 63-76.

BOSSI, L. & ROTH, J. R. (1980). The influence of codon context on genetic code translation. Nature 286, 123-127.

BRENDEL, V., DOHLMAN, J., BLAISDELL, B. E. & KARLIN, S. (1991). Very long charge runs in systemic lupus erythematosus-associated autoantigens. Proc. Natl. Acad. Sci. USA 88, 1536-1540.

BROOKS, L. A., LEAR, A. L., YOUNG, L. S. & RICKINSON, A. B. (1993). Transcripts from the Epstein-Barr virus BamH1 A fragment are detectable in all three forms of virus latency. J. Virol. 67, 3182-3190.

BULL, J. J., JACOBSON, A., BADGETT, M. R. & MOLINEUX, I. J. (1998). Viral escape from antisense RNA. Mol. Microbiol. 28: 835-846.

CALLAN, M. F. C., TAN, L., ANNELS, N., OGG, G. S., WILSON, J. D. K., O'CALLAGHAN, C. A., STEVEN, N., MCMICHAEL, A. J. & RICKINSON, A. B. (1998). Direct visualization of antigen-specific CD8+ T cells during the primary immune response to Epstein-Barr virus in vivo. J. Exp. Med. 187, 1395-1402.

CROEN, K. D., OSTROVE, J. M., DRAGOVIC, L. J., SMIALEK, J. E. & STRAUS, S. E. (1987). Latent Herpes simplex virus in human trigeminal ganglia. Detection of an immediate early gene "anti-sense" transcript by in situ hybridization. New Eng. J. Med. 317, 1427-1432.

CRISTILLO, A. D. (1998). Characterization of G0 /G1 switch genes in cultured T lymphocytes. Ph.D. Thesis. Queen's University, Kingston, Ontario.

CRISTILLO, A. D., HEXIMER, S. P. & FORSDYKE, D. R. (1996). A "stealth" approach to inhibition of lymphocyte activation by oligonucleotide complementary to the putative G0/G1 switch regulatory gene G0S30/EGR1/NGFI-A. DNA Cell Biol. 15, 561-570.

CRISTILLO, A. D., LILLICRAP, T. P. & FORSDYKE, D. R. (1998). Purine-loading of EBNA-1 mRNA avoids sense-antisense "collisions". FASEB. J. 12, A1453. Abstract #828. Click Here [Added note Feb 2008: In 1997 Tim Lillicrap assisted these studies, which were mainly performed by Tony Cristillo. The manuscript was rejected by several journals before finding a home. However, the delay was useful in that relevant studies of James Mortimer and Isabelle Barrette could be added.]

DANG, K. D., DUTT, P. B. & FORSDYKE, D. R. (1998). Chargaff differences correlate with transcription direction in the bithorax complex of Drosophila. Biochem. Cell Biol. 76, 129-137.

DAVISON, A. J. & SCOTT, J. E. (1986). The complete DNA sequence of Varicella-Zoster virus. J. Gen. Virol. 67, 1759-1815.

DOHLMAN, J. G., LUPAS, A. & CARSON, M. (1993). Long charge-rich alpha-helices in systemic autoantigens. Biochem. Biophys. Res. Comm. 195, 686-696.

EGUCHI, Y., ITOH, T. & TOMIZAWA, J. (1991). Antisense RNA. Annu. Rev. Biochem. 60, 631-652.

EHRENFELD, E. & HUNT, T. (1971). Double-stranded poliovirus RNA inhibits initiation of protein synthesis by reticulocyte lysates. Proc. Natl. Acad. Sci. USA 68, 1075-1078.

ELIA, A., LAING, K. G., SCHOFIELD, A., TILLERAY, V. J. & CLEMENS, M. J. (1996). Regulation of the double-stranded RNA-dependent protein kinase PKR by RNAs encoded by a repeated sequence of the Epstein-Barr virus genome. Nucleic Acids Res. 24, 4471-4478.

FEUCHTER, A. E., FREEMAN, J. D. & MAGER, D. L. (1992). Strategy for detecting cellular transcripts promoted by human endogenous long terminal repeats: identification of a novel gene (CDC4L) with homology to yeast CDC4. Genomics 13, 1237-1246.

FIRE, A. (1999). RNA-triggered gene silencing. Trends Genet. 15, 358-363.

FORSDYKE, D. R. (1991). Early evolution of MHC polymorphism. J. theor. Biol. 150, 451-456.

FORSDYKE, D. R. (1994). Relationship of X chromosome dosage compensation to intracellular self/not-self discrimination: a resolution of Muller's paradox? J. theor. Biol. 167, 7-12.

FORSDYKE, D. R. (1995a). Entropy-driven protein self-aggregation as the basis for self/not-self discrimination in the crowded cytosol. J. Biol. Sys. 3, 273-287.

FORSDYKE, D. R. (1995b). Fine tuning of intracellular protein concentrations, a collective protein function involved in aneuploid lethality, sex determination and speciation? J. theor. Biol. 172, 335-345.

FORSDYKE, D. R. (1996). Different biological species "broadcast" their DNAs at different (G+C)% "wavelengths. J. theor. Biol. 178, 405-417.

FORSDYKE, D. R. (1998). An alternative way of thinking about stem-loops in DNA. A case study of the human G0S2 gene. J. theor. Biol. 192, 489-504.

FORSDYKE, D. R. 1999. Heat shock proteins as mediators of "danger" signals: implications of the slow evolutionary fine-tuning of sequences for the antigenicity of cancer cells. Cell Stress Chaperones 4: 205-210.

FORSDYKE, D. R. (2000). Double-stranded RNA and/or heat-shock as initiators of chaperone mode switches in diseases associated with protein aggregation. Cell Stress Chaperones 5, 375-376.

FORSDYKE, D. R. (2001). The Origin of Species, Revisited. Montreal: McGill-Queens University Press.

FORSDYKE, D. R. & MORTIMER, J. R. (2000). Chargaff's legacy. Gene 261, 127-137.

FRIBORG, J., KONG, W-P., HOTTINGER, M. O. & NABEL, G. J. (1999). P53 inhibition by the LANA protein of KSHV protects against cell death. Nature 402, 889-894.

GOLDENBERG, D., MADOR, N., BALL, M. J., PANET, A. & STEINER, I. (1997). The abundant latency-associated transcripts of Herpes simplex virus type 1 are bound to polyribosomes in cultured neuronal cells and during latent infection in mouse trigeminal ganglia. J. Virol. 71, 2897-2904.

GOULD, K. G. & BANGHAM, C. R. M. (1998). Virus variation, escape from cytotoxic T lymphocytes and human retroviral persistence. Sem. Cell Devel. Biol. 9, 321- 328.

GREEN, H. & WANG, N. (1994). Codon reiteration and the evolution of proteins. Proc. Natl. Acad. Sci. USA 91, 4298-4302.

HAMILTON, A. J. & BAULECOMBE, D. C. (1999). Role of a species of small   antisense RNA in post-transcriptional gene silencing in plants. Science 286, 950-951.

HANCOCK, J. M. & SANTIBANEZ-KOREF, M. F. (1998). Trinucleotide expansion diseases in the context of micro- and mini-satellite evolution. EMBO. J. 17, 5521-5524.

HEXIMER, S. P., CRISTILLO, A. D., RUSSELL, L. & FORSDYKE, D. R. (1998). Expression and processing of G0/G1 Switch Gene 24 (G0S24/TIS11/TTP/NUP475) RNA in cultured human blood mononuclear cells. DNA Cell Biol. 17, 249-263.

HONESS, R. W., GOMPELS, U. A., BARRELL, B. G., CRAXTON, M., CAMERON, K. R., STADEN, R., CHANG, Y.-N. & HAYWARD, G. S. (1989). Deviations of expected frequencies of CpG dinucleotides in Herpesvirus DNAs may be diagnostic of differences in the states of their latent genomes. J. Gen. Virol. 70, 837-855.

HUNTER, T., HUNT, T., JACKSON, R. J. & ROBERTSON, H. D. (1975). The characteristics of inhibition of protein synthesis by double-stranded ribonucleic acid in reticulocyte lysates. J. Biol. Chem. 250, 409-417.

IZANT, J. G. & WEINTRAUB, H. (1984). Inhibition of thymidine kinase gene expression by anti-sense RNA: a molecular approach to genetic analysis. Cell 36, 1007-1015.

JANG, K. L. & LATCHMAN, D. S. (1989). HSV infection induces increased transcription of Alu repeated sequences by RNA polymerase III. FEBS. Lett. 258, 255-258.

KARLIN, S. (1995). Statistical significance of sequence patterns in proteins. Curr. Opin. Struct. Biol. 5, 360-371.

KARLIN, S., BLAISDELL, B. E., MOCARSKI, E. S. & BRENDEL, V. (1988). A method to identify distinctive charge configurations in protein sequences, with application to human herpesvirus polypeptides. J. Mol. Biol. 205, 165-177.

KARLIN, S., BLAISDELL, B. E. & SCHACHTEL, G. A. (1990). Contrasts in codon usage of latent versus productive genes of Epstein-Barr virus: data and hypothesis. J. Virol. 64, 4264-4273.

KARRAN, L., GAO, Y., SMITH, P. R. & GRIFFIN, B. E. (1992). Expression of a family of complementary-strand transcripts in Epstein-Barr virus-infected cells. Proc. Natl. Acad. Sci. USA 89, 8058-8062.

KUMAR, M. & CARMICHAEL, G. G. (1998). Antisense RNA: function and fate of duplex RNA in cells of higher eukaryotes. Microbiol. Mol. Biol. Rev. 62, 1415-1434.

KUPIEC, J. J., KAY, A., HAYAT, M., RAVIER, R., PERIES, J. & GALIBERT, F. (1991). Sequence analysis of the simian foamy virus type 1 genome. Gene 101, 185-194.

LAO, P. J. & FORSDYKE, D. R. (2000). Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Res. 10, 228-236.

LEVITSKAYA, J., CORAM, M., LEVITSKY, V., IMREH, S., STEIGERWALD-MULLEN, P. M., KLEIN, G., KURILLA, M. G. & MASUCCI, M. G. (1995). Inhibition of antigen processing by the internal repeat region of the Epstein-Barr virus nuclear antigen-1. Nature 375, 685-688.

LEVITSKAYA, J., SHARIPO, A., LEONCHIKS, A., CIECHANOVER, A. & MASUCCI, M. G. (1997). Inhibition of ubiquitin/proteasome-dependent protein degradation by the Gly-Ala repeat domain of the Epstein-Barr virus nuclear antigen 1. Proc. Natl. Acad. Sci. USA 94, 12616-12621.

LIU, W-M., CHU, W-M, CHOUDARY, P. V. & SCHMID, C. W. (1995). Cell stress and translational inhibitors transiently increase the abundance of mammalian SINE transcripts. Nucleic Acids Res. 23, 1758-1765.

MALIK, K. T., EVEN, J. & KARPAS, A. (1988). Molecular cloning and complete nucleotide sequence of an adult T cell leukaemia virus/Human T cell leukaemia virus type I isolate of Caribbean origin. J. Gen. Virol. 69, 1695-1710.

MANLEY, J. L. & COLOZZO, M. T. (1982). Synthesis in vitro of an exceptionally long RNA transcript promoted by an AluI sequence. Nature 300, 376-379.

MARCUS, P. (1983). Interferon induction by viruses: one molecule of dsRNA as the threshold for induction. Interferon 5, 115-180.

MCGEOCH, D. J., DALRYMPLE, M. A., DAVISON, A. J., DOLAN, A., FRAME, M. C., MCNAB, D., PERRY, L. J., SCOTT, J. E. & TAYLOR, P. (1988). The complete DNA sequence of the long unique region in the genome of Herpes simplex virus type 1. J. Gen. Virol. 69, 1531-1574.

MELTON, D. A. (1985). Injected antisense RNAs specifically block messenger RNA translation in vivo. Proc. Natl. Acad. Sci. USA 82, 144-148.

MEYER, R. K. & KRUEGER, D. D. (1994). Minitab Computer Supplement. Macmillan College Publishing, New York.

MITTELSTEN SCHEID, O. 1999. New tool for Swiss army knife. Nature 397, 25.

MUKHERJEE, S., TRIVEDI, P., DORFMAN, D. M., KLEIN, G. & TOWNSEND, A. (1998). Murine cytotoxic T lymphocytes recognize an epitope in an EBNA-1 fragment, but fail to lyse EBNA-1-expressing mouse cells. J. Exp. Med. 187, 445-450.

NAKAMURA, Y., GOJOBORI, T. & IKEMURA, T. (1999). Codon usage tabulated from the international DNA sequence databases: its status 1999. Nucleic Acids Res. 27, 292.

NICKERSON, D. A., TAYLOR, S. L., WEISS, K. M., CLARK. A. G., HUTCHINSON, R. G., STENGARD, J., SALOMAA, V., VARTIAINEN, E., BOERWINKLE, E. & SING, C. (1998). DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nature Genet. 19, 233-240.

RAINBOW, L., PLATT, G. M., SIMPSON, G. R., SARID, R., GAO, S-J., STOIBER, H., HERRINGTON, C. S., MOORE, P. S. & SCHULZ, T. F. (1997). The 222- to 234-kilodalton latent nuclear protein (LNA) of Kaposi's sarcoma-associated herpesvirus (human herpes virus 8) is encoded by orf73 and is a component of the latency-associated nuclear antigen. J. Virol. 71, 5915-5921.

RATNER, L. et al. (1985). Complete nucleotide sequence of the AIDS virus. Nature 313, 277-284.

ROBERTSON, H. D. & MATHEWS, M. B. (1996). The regulation of the protein kinase PKR by RNA. Biochimie 78, 909-914.

ROCHA, E. P. C., DANCHIN, A. & VIARI, A. (1999). Universal replication biases in bacteria. Mol. Microbiol. 32, 11-16.

SCHWARTZ, D. E., TIZARD, R. & GILBERT, W. (1983). Nucleotide sequence of Rous sarcoma virus. Cell 32, 853-869.

SHARP, P. (1999). RNAi and double-strand RNA. Genes Devel. 13, 139-141.

SMITHIES, O., ENGELS, W. R., DEVEREUX, J. R., SLIGHTOM, J. L. & SHEN, S. (1981). Base substitutions, length differences and DNA strand asymmetries in the human Gl and Al fetal globin gene region. Cell 26, 345-353.

STEVENS, J. G., WAGNER, E. K., DEVI-RAO, G. B., COOK, M. L. & FELDMAN, L. T. (1987). RNA complementary to a herpesvirus a gene mRNA is prominent in latently infected neurons. Science 235, 1056-1059.

SUMMERS, H., FLEMING, A. & FRAPPIER, L. (1997). Requirements for Epstein-Barr nuclear antigen 1 (EBNA-1)-induced permanganate sensitivity of the Epstein-Barr latent origin of DNA replication. J. Biol. Chem. 272, 26434-26440.

SUNYAEV, S. R., LATHE, W. C., RAMENSKY, V. E. & BORK, P. (2000). SNP frequencies in human genes: an excess of rare alleles and differing modes of selection. Trends Genet. 16, 335-337.

SUZUKI, K., MORI, A., ISHII, K. J., SAITO, J., SINGER, D. S., KLINMAN, D. M., KRAUSE, P. R. & KOHN, L. D. (1999). Activation of target-tissue immune-recognition molecules by double-strand polynucleotides. Proc. Natl. Acad. Sci. USA 96: 2285-2290.

SZYBALSKI, W, KUBINSKI, H, & SHELDRICK, P. (1966). Pyrimidine clusters on the transcribing strands of DNA and their possible role in the initiation of RNA synthesis. Cold Spring Harbor Symp. Quant. Biol. 31, 123-127.

TAO, Q., ROBERTSON, K. D., MANNS, A., HILDESHEIM, A. & AMBINDER, R. F. (1998). The Epstein-Barr virus major latent promoter Qp is constitutively active, hypomethylated, and methylation sensitive. J. Virol. 72, 7075-7083.

THORLEY-LAWSON, D. A., MIYASHITA, E. M. & KAHN, G. (1996). Epstein-Barr virus and the B cell: That's all it takes. Trends Microbiol. 4, 204- 207.

TIAN, B., WHITE, R. J., XIA, T., WELLE, S., TURNER, D. H., MATHEWS, M. B. & THORNTON, C. A. (2000). Expanded CUG repeat RNAs form hairpins that activate the double-stranded RNA-dependent protein kinase PKR. RNA 6, 79-87.

TOWNSEND, A., ELLIOTT, T., CERUNDULO, V., FOSTER, L., BARBER, B. & TSE, A. (1990). Assembly of MHC class 1 molecules analysed in vitro. Cell 62, 285-295.

VANHEE-BROSSOLLET, C. & VAQUERO, C. (1998). Do natural antisense transcripts make sense in eukaryotes? Gene 211, 1-9.

VILLARREAL, L. P., DEFILIPPIS, V. R. & GOTTLIEB, K. A. (2000). Acute and persistent viral life strategies and their relationship to emerging diseases. Virology 272, 1-6.

YATES, J. L. & CAMIOLO, S. M. (1988). Dissection of DNA replication and enhancer activation functions of Epstein-Barr virus nuclear antigen 1. Cancer Cells 6, 197-205.

 

End Note 2001 (not in original paper): A key observation in this paper is that the EBNA-1 protein is longer than necessary because of the need to contain a low complexity GLY-ALA repeat region between functional domains. This allows purine-loading at the nucleic acid level. Essentially the same observation was reported by Pizzi and Frontali in Genome Research (2001) 11, 218-229. They noted:

 

"Proteins from Plasmodium falciparum, the etiological agent of the most severe form of human malaria, are often larger than homologous proteins from other organisms. When multiple alignment is possible, the size difference can be seen to be due to the presence of long insertions separating well-conserved blocks that are adjacent in the homologous proteins.... The insertions are characterized by highly recurrent amino acid usage" and correspond to "low complexity regions ... believed to encode non-globular domains of unknown function that are extruded from the protein core and do not impair the functional folding of the protein."... The recurrent amino acids "correlate with A-richness in codons."
 

End Note May 2002 on large scale non-genic transcription

Supporting Section 13 Kapranov et al (Science 296, 916-919) report large scale, not necessarily genic, transcriptional activity by human chromosomes 21 and 22.

"It is noted that as much as an order of magnitude more of the genomic sequence is transcribed than accounted for by the predicted and characterized exons."

 

End Note Jan 2006 on autoimmunity

Supporting Section 11 Carl, Temple and Cohen (2005; Arthritis Research & Therapy 7, R1360-74) note that "most nuclear systemic autoantigens are extremely disordered proteins." The disordered regions have a distinctive composition of amino acids, many of which correspond to purine-rich codons, but the disordered regions themselves are not particularly antigenic. If such regions arose as products of purine-loaded insertions into parts of genes corresponding to inter-domain regions of protein, then they would be expected to be disordered relative to the functional regions of the protein.

End Note June 2006 on trinucleotide expansions

In Section 12 the possibility that the underlying pathology in various trinucleotide expansion diseases reflected a gain-of-function (purine-loading) mutation at the RNA level. When this idea was presented at the 1998 EMBO Workship on Trinucleotide Expansion Diseases (see below), a member of the audience pointed out that, although many trinucleotide expansions would result in an increased purine-content of the corresponding mRNA, some expansions involve pyrimidines (e.g. repeats of CTG). To explain this the ad hoc postulate was made that in such cases there would exist an antisense transcript (e.g. repeats of CAG). Recently evidence for this has been found in two independent cases. 

1. Cho, D. H. et al. (2005) Mol. Cell 20, 483-489. Antisense transcription and heterochromatin at the DM1 CTG repeats are constrained by CTCF.

2. Moseley, M. L. et al. (2006) Nature Genet. 38, 758-769. Bidirectional expression of CUG and CAG expansion transcripts and intranuclear polyglutamine inclusions in spinocerebellar ataxia type 8.

End Note Feb 2008 on Samuel Karlin

Papers of Samual Karlin (1924-2007) and his "biometric" associates are cited frequently above. The New York Times published his obituary (see Globe & Mail Toronto 25 Feb 2008). Having made many contributions to pure mathematics, in the 1960s he turned to population genetics in the mode of Ronald Fisher and Sewall Wright. There were productive interactions with one of Fisher's students, Walter Bodmer, who had visited Stanford to work with Joshua Lederberg. At age 65 (circa 1989) Karlin contributed to the development of Stephen Altschul's "Blast" method for aligning sequences ("basic local alignment search tool").

"In his later years, Dr. Karlin was more apt to call himself a biologist than a mathematician, although all his biological work was mathematical. He relished poring through genetic data looking for anomalies. ... Dr. Karlin guided more than 70 students ... to their doctorates. He was known for the emotional force, not to mention volume, with which he argued scientific points. 'I would not say he was intellectually gentle,' Dr. Altman said."
 
In 1999 it was not easy to reply directly to criticisms, but when he got carried away in a piece in Trends in Microbiology (7, 305-308), Rocha, Danchin and Vairi took him to task.

End Note July 2008. Support for our hypothesis

Our observation of the extreme codon bias in favour of purines in the Glycine-Alanine repeat domain showed that the domain exerted a major influence at the nucleic acid level. The work (Cristillo, Lillicrap & Forsdyke 1997) was stonewalled by anonymous reviewers before eventual acceptance (see above note in reference list). A decade later our work was supported by Tellam et al. (2008) who showed that decreasing the purine-loading without changing the encoded protein enhanced the expression of fragments of the EBNA1 protein complexed with MHC protein at the cell surface - thus triggering a T-cell response.

Tellam, J. et al. (2008) Regulation of protein translation through mRNA structure influences MHC class I loading and T cell recognition. Proceedings of the National Academy of Sciences, USA 105, 9319-9324.

Starck, S. R., Cardinaud, S. & Shastri, N. (2008) Immune surveillance obstructed by viral mRNA. Proceedings of the National Academy of Sciences, USA 105, 9135-9136.

Cristillo, A. D., Lillicrap, T. P. & Forsdyke, D. R. (1997) Proceedings of the National Academy of Sciences, USA (rejected for publication) Determination of transcription direction simply by counting bases in DNA: analysis of retroviral and herpes virus families.


End Note Nov 2010. Nature's experiment

A much simpler supportive experiment, the converse of that of Tellam et al. (above), would be to change the protein without changing the nucleic acid. Under our hypothesis, this would be expected not to enhance the expression of fragments of the EBNA1 protein at the cell surface. Thus, a T-cell response would still be prevented. By inserting or subtracting a single base at an appropriate position in a coding sequence one can change a reading frame, thus drastically changing the amino acid composition of the protein, but only marginally (one base) changing the nucleic acid sequence. Remarkably, Nature has already performed this experiment, as was discovered by Zaldumbide et al. (2007).

    DNA sequence similarities indicate that EBV (human herpes virus 4) and KSHV (human herpes virus 8) are homologous (evolved from a common ancestor). The function of the repeat in the EBNA1 protein of EBV is, according to the conventional wisdom, similar to the repeat in the LANA1 protein of KSHV. So they would be expected to have similar protein sequences. But they do not. Nevertheless, they retain homologous nucleic acid sequences in the repeat region. Zaldumbide et al (2007) showed that the difference in protein sequences can be accounted for by a reading frame shift. This suggests that the nature of the encoded protein is largely irrelevant for the inhibition of the display of peptides for T cell activation. It is the nucleic acid seqence that is preserved because selection has acted at the nucleic acid level, not at the protein level. This is not to exclude the possibility that the protein may autoregulate its own synthesis, or even that the mRNA may have a hand in this (see below). But what protein is made may still be sufficient for peptide display. A purine-loaded mRNA sequence may be necessary to prevent this display. While still thinking at the level of protein synthesis itself, rather than of the subsequent display of that protein which manages to get synthesized, Zaldumbide et al. wrote:

 

 It is interesting to note that although the Gly-Ala repeat of EBNA-1 has no sequence similarity to the central repeat unit of LANA-1, a frame-shift in this repeat can generate a new repeat that has more than 65% of similarity with the central region of LANA-1 (Fig. 6). This suggests that these repeats may have a common ancestry. Moreover in a recent study (A. Z. and M. O. unpublished data) we demonstrate that this new alternative sequence GZ (glycine, glutamic/aspartic acid) is able to prevent epitope presentation to specific CTLs as efficiently as the original Gly-Ala repeat.

 

Citing Zaldumbide et al., it was noted (Kwun et al. 2007) that "While EBNA1 and LANA1 have no amino acid similarity, the gene sequences of the two viral repeat sequences are highly homologous to each other but are frameshifted for translation. One appealing possibility is that these two proteins have similar mRNA secondary structures with common effects on protein synthesis retardation." Kwun et al. presented data purporting to exclude nucleic acid structure (insertion of a stop codon to prevent protein generation, but not impair mRNA structure, and testing this in an in vitro translation system). They concluded that "specific peptide sequences rather than RNA structures are responsible for synthesis retardation." In a later review Blake (2010) cited our above paper, and considered that "Their predicted model is that the lack of any real secondary structure across the GAr domain causes an inhibition of elongation rate" (which is not our model).

In his review Blake cited with approval Apcher et al. (2009) who had presented evidence that "the nascent GAr peptide delays assembly of the initiation complex on its own mRNA." Thus, most workers in the field remain focused on function at the level of protein synthesis. On the other hand, that there might be a mechanism for decreasing synthesis of the EBNA1 protein -- be it exerted at the RNA level (Tellem) or at the protein level (Apcher) -- still leaves us with the fact that some EBNA1 protein (in full or part) will be synthesized. Thus, if the MHC presentation machinery were in an appropriately receptive state, peptides from that protein might still be MHC-presented in sufficient quantities to trigger T cells. We suggest that the GAr-encoding sequence has been evolutionarily selected primarily because of a role in influencing the acquisition of a receptive state, not because of a role at the protein synthesis level.

 
Apcher S, Komarova A, Daskalogianni C, Yin Y, Malbert-Colas L & Fahraeus R (2009) mRNA translation regulation by the Gly-Ala repeat of Epstein-Barr virus nuclear antigen 1. Journal of Virology 83, 1289-1298.

Blake N (2010) Immune evasion by gammaherpesvirus genome maintenance proteins. Journal of General Virology 91, 829-846.

Kwun HJ, Silva SR da, Shah IM, Blake N, Moore PS & Chang Y (2007) Kaposi sarcom-associated herpes virus latency-associated nuclear antigen 1 mimics Epstein-Barr virus EBNA1 immune evasion through central repeat domain effects on protein processing. Journal of Virology 81, 8225-8235.

Zaldumbide A, Ossevoort M, Wiertz EJHJ, Hoeben RC (2007) In cis inhibition of antigen processing by the latency-associated nuclear antigen 1 of Kaposi sarcoma Herpes virus. Molecular Immunology  44, 1352-1360.


End Note Nov 2010. Low GC% and MHC Immune Surveillance

Since purine-loading is seen as a strategy for evading MHC surveillance (pathogenicity), it is important to note that there is an inverse relationship between purine-loading (AG%) and GC% (Saccone et al. 2000; Lao & Forsdyke 2000; Mortimer & Forsdyke 2003). This predicts an association between low GC% and evasion of MHC surveillance. Indeed, such an association was reported by Calis et al. (2010) for an important class of human MHC molecules (HLA-A). Although Calis et al. made a good case for association of pathogenicity with the presence of the FINKY (FLYMINK) amino acids (corresponding to low GC% codons), their display of amino acid frequencies (their Figure 3) suggested association of pathogenicity with certain amino acids corresponding to AG-rich codons (EKND). But there was a low, but not too low, level of G. Also arginine (R) did not correlate, but Calis et al. made no distinction between AG-rich and non-AG-rich arginine codons. In addition they reported that knocking out lysine (K) had the largest effect on peptide presentation (their Figure S3). Lysine belongs both to the low GC% codon set and the high AG% codon set.

 
Calis JJA, Sanchez-Perez GF, Kesmir C (2010) MHC class I molecules exploit the low G + C content of pathogen genomes for enhanced presentation. European Journal of Immunology 40, 2699-2709.

Lao PJ, Forsdyke DR. 2000 Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Research 10, 228-236.

Mortimer JR, Forsdyke DR (2003) Comparison of responses by bacteriophage and bacteria to pressures on the base composition of open reading frames. Applied Bioinformatics 2, 47-62.

Saccone C, Gissi C, Lanave C, Larizza A, Pesole G, Reyes A (2000) Evolution of the mitochondrial genetic system: an overview. Gene 261, 153-159.

 

End Note Oct 2012. Mature Proteins needed for MHC-peptide Presentation

A major problem with the differential aggregation hypothesis for selective peptide presentation was that the Yewdell and Fahraeus laboratories were churning out elegant papers advancing the view that the peptides presented were generated during protein synthesis, rather than after mature proteins were liberated from polysomes. A new paper (Farfan-Arrubas, Stern & Rock 2012) now presents evidence that mature proteins are just as efficient, if not more efficient, sources of MHC-peptide complexes, as nascent proteins. Their conclusions are, however, cautious: "Our results do not argue against either the mere existence of DRiPs [defective ribosomal products] or their potential utilization as a source for presented peptides. However, they do strongly argue that DRiPs are not the major source of presented peptides in any of the systems examined here."

 
Farfan-Arribas DJ, Stern LJ, Rock KL (2012) Using intein catalysis to probe the origin of major histocompatibility complex class I-presented peptides. Proceedings of the National Academy of Sciences, USA doi/10.1073/pnas.1210271109

 

End Note Dec 2012. An EBNA-1 Analog in HTLV-1?

EBV and HTLV-1 are both deeply latent, GC-rich viruses, which have the ability to persist in their human hosts for long periods without any obvious detrimental effects. As discussed above, EBV maintains latency with the help of the EBNA-1 protein, which consequently it is obliged to express. Unlike most EBV genes its mRNA is R-loaded, not Y-loaded. Is there something equivalent in the much smaller HTLV-1 genome, or is HTLV-1 constrained by virtue of this smallness to adopt some other strategy? The HTLV-1 provirus within the host genome encodes many Y-rich mRNAs in its "top" "sense" strand. But Larocca et al. (1989) reported a "bottom" strand "antisense" transcript. This is heavily R-loaded. Recent work shows that the mRNA is translated into a basic zipper protein (HBZ) which, in contrast to the Tax protein that is encoded by a Y-rich mRNA, is poorly immunogenic. HBZ is increasingly seen as playing a major role in evasion of host immunosurveillance (Cook et al. 2013).

Larocca D, Chao LA, Seto MH, Brunck TK (1989) Human T-cell leukaemia virus minus strand transcription in infected T-cells. Biochemical and Biophysical Research Communications 163, 1006-1013. 
Cook LB, Elemans M, Rowan AG, Asquith B. (2013) HTLV-1: Persistence and pathogenesis. Virology 435, 131-140.


End Note Dec 2012. Nature's Experiment Formalized

Zalumbide et al. (2007) had attributed the immunosuppressive power of the repeat sequence in the LANA-1 encoding gene of KSHV to function at the protein level. But I pointed out (End Note, Nov 2010) that their actual data implied RNA-level function in this respect. Kwun et al. (2007) had noted the similarities in the EBNA-1 and LANA-1 repeats at the mRNA level, indicating a common evolutionary origin, but still maintained that function would be at the protein level. Tellam et al. (2012) have now formally repeated "Nature's experiment" and changed the reading frame of the EBNA-1 repeat so that the protein sequence of the repeat was drastically altered, while the mRNA sequence remained essentially unchanged (still heavily purine-loaded). Since this did not change the host cell's ability to decrease immune recognition, they "conclude that the cis-inhibitory effect of the internal repeat sequences of gammaherpesviruses operates at the nucleotide level and is unlikely to be mediated through the direct action of the GAr [Gly and Ala rich] polypeptide." While not excluding "a more general immune evasive strategy", Tellam et al. (2012) propose that "an unusual RNA secondary structure within the repeat region may interfere with translation of the EBNA1 mRNA by inhibiting ribosome transit through the purine-rich sequence, thereby leading to a reduction in the levels of EBNA1 such that the infected cell evades the normal host immune surveillance mechanisms."

Tellam JT, Lekieffre L, Zhong J, Lynn DJ, Khanna R (2012) Messenger RNA sequence rather than protein sequence determines the level of self-synthesis and antigen presentation of the EBV-encoded antigen, EBNA-1. PLOS Pathogens 8, issue 12, e1003112.

 

End Note Feb 2013. GC-rich Genes more Immunogenic

The interpretation of Calis et al. (2010; see note above) was questioned in an accompanying paper (Lavasseur & Pontarotti 2010). There were other studies (e.g. Khrustalev & Barkovsky 2011), that also focused on GC% (rather than the inversely related AG%). Using various programs that reveal the antigenic parts of various alphaherpesviruse proteins (epitopes) that stimulate B cells to make antibodies, there was a correlation of high GC% of a gene (i.e. low AG%) with the immunogenicity of its protein product. Thus, high AG% (low GC%) correlates with low immunogenicity. Of course, within the host cell a virus potentially displays both its nucleic acids and its proteins. Although it may be parts of the proteins that are eventually displayed to the host's extracellular immune system, we argue (above) that it is viral mRNA that, by purine-loading, avoids triggering the intracellular defences that first lead to peptide presentation by MHC complexes.

 
Lavasseur A, Pontarotti P (2010) Was the ancestral MHC involved in innate immunity. Using intein catalysis to probe the origin of major histocompatibility complex class I-presented peptides. European Journal of Immunology  40, 2682-85
Khrustalev VV, Barkovsky EV (2011) Percentage of highly immunogenic amino acid residues forming B-cell epitopes is higher in homologous proteins encoded by GC-rich genes. Journal of Theoretical Biology  282, 71-79

End Note Nov 2013. Again, Mature Proteins needed for MHC-peptide Presentation

The elegant Yewdell and Fahraeus papers have again been brought into question by work from the Rock laboratory (see 2012 above). Their conclusions (Colbert et al. 2013) are now less restrained: "For the constructs we have analysed, mature functional proteins, rather than defective ribosomal products, are the predominant source of MHC class I presented peptides."

 
Colbert JD et al. (2013) Substrate-induced protein stabilization reveals a predominant contribution from mature proteins to peptides presented on MHC class I. Journal of Immunology 191, 5410-5419.
 


End Note Feb 2014. Yet again, Mature Proteins needed for MHC-peptide Presentation

And the story continues: "The proposal that MHC class I molecules preferentially present peptides that come from errors in protein biosynthesis or failures of successful folding (DRiPs), is an attractive idea, and one that has influenced general conceptions of how the immune system monitors for non-native gene products. However, on close examination, the data that actually support this proposal are still limited, often indirect, and based on questionable assumptions. ...  Additionally, recent data indicate that for many antigens, most of the MHC class I-presented peptides probably originate from functional mature molecules, and not defective, newly synthesized proteins. The immune system thus appears to monitor the peptides that are generated through proteasomal breakdown of both short-lived and stable cellular proteins. However, further research is necessary to quantify the relative importance of the newly synthesized and mature proteins in generating the bulk of MHC class I-presented peptides and whether DRiP-like mechanisms may function in certain cases."

 
Rock KL et al. (2014) Re-examining Class I presentation and the DRIP hypothesis. Trends in Immunology  in press.

End Note Mar 2014. G-Quadruplexes in R-Loaded EBNA-1 mRNA Loops

Tellam, Khanna and their coworkers report that the number and arrangement of G-residues in the purine-loaded loops of the Gly-Ala repeat regions of EBNA-1 mRNA are such as to facilitate G-quadruplex formation, thus slowing down mRNA translation, which is likely to decrease the concentration of the EBNA-1 protein: "We identified putative G-quadruplex sequences within the ORF of EBV-encoded EBNA1 mRNA and confirmed that these motifs form intramolecular parallel G-quadruplex structures in the context of long transcripts. Remarkably, similar G-quadruplex motifs were also mapped within the conserved purine-rich repetitive sequences of viral ORFs encoding numerous other gammaherpesviral maintenance proteins, suggesting that this class of virus may exploit RNA G-quadruplex structure to downregulate their maintenance protein expression levels to escape immune recognition. ... In the present study, we define the role of RNA G-quadruplex motifs within purine-rich ORFs and demonstrate their ability to impede ribosomal activity during translational elongation. We have shown in this study that G-quadruplex structures can alter the association of ribosomes with mRNAs by inducing premature termination and/or ribosome stalling and dissociation events."

    These observations are consistent with the idea that the intracellular self/not-self discrimination process that precedes MHC-peptide presentation avoids presentation of "self" by decreasing both (i) dsRNA (achieved by R-loading self mRNAs), and (ii) specific aggregation (achieved by synthesising self protein at a rate insufficient to allow specific protein aggregates). EBV adopts this mechanism for the key EBNA-1 mRNA, the G-quadruplexes of which slow protein synthesis. Thus, to the hypothesis that failure to form dsRNA would prevent a "receptive state" in MHC for peptide acceptance, should be added effects at the translational level that might impede peptide formation.

 
Murat P et al. (2014) G-Quadruplexes Regulate EBV-Encoded Nuclear Antigen 1 mRNA Translation. Nature Chemical Biology  10, 358-364.

End Note Mar 2015. R-Loaded Viral mRNA Loops

Murat and Tellam (2015) have updated their work on the postulated inhibition of protein synthesis by G-residues in the purine-loaded loops of the Gly-Ala repeat regions of EBNA-1 mRNA. Because the Gs would form G-quadruplexes, they would slow down mRNA translation and thus lower protein concentration.  The authors postulate a "threshhold" level of a protein, below which it would not be immunogenic. But they do not speculate on why there should be such a threshhold. While G-quadruplex formation might explain why high GC% mRNAs that are R-loaded (i.e. high G content) might impede specific MHC-peptide formation, it would not seem to explain how low GC% mRNAs that are R-loaded (i.e. high A content) should impede specific MHC-peptide formation (if indeed they do). In other words, what is driving R-loading in low GC% species? 

Murat P & Tellam J (2015) Effects of mRNA structure and other translational control mechanisms on major histocompatibility complex-I mediated antigen presentation.-Quadruplexes Regulate EBV-Encoded Nuclear Antigen 1 mRNA Translation. WIREs RNA  6, 157-171.


End Note July 2017. If G-Quadruplexes, Why So Many Adenines?

PubMed Comment on Nucleolin directly mediates Epstein-Barr virus immune evasion through binding to G-quadruplexes of EBNA mRNA.

It is good to see the problem of EBV immune evasion focused, not on the translation product of EBNA1 mRNA (1), but on the mRNA itself (2). However, it is puzzling that the sequence encoding the glycine-alanine repeats is enriched not only in guanines (Gs), but also in adenines (As). In such a GC-rich genome (60% GC), there is a scarcity of As, yet they are concentrated in the glycine-alanine repeat-encoding region. In other words, codons have been selected for their general purine-richness, not just for their G-richness (3). While it is conceivable that the As somehow assist the formation of G-quadruplexes by consecutive Gs, consideration might have been given to the hypothesis that the G-quadruplexes may be a helpful by-product of the fundamental need to purine-load the mRNA.

EBV is not alone in this respect. EBV and HTLV-1 share common characters. Both are deeply latent, GC-rich viruses. They persist in their human hosts for long periods often with no obvious detrimental effects. Most of their proteins are encoded by pyrimidine-rich mRNAs. The HTLV-1 provirus encodes its pyrimidine-rich mRNAs in its "top" sense strand. But there is a "bottom" strand transcript. This is heavily R-loaded and is translated into a basic zipper protein (HBZ) which is poorly immunogenic and is increasingly seen, like EBNA-1, as playing a major role in immune evasion (4-5).  

1.      Levitskaya, J. et al. (1995) Inhibition of antigen processing by the internal repeat region of the Epstein-Barr virus nuclear antigen-1. Nature 375:685-688. <PMID:7540727>

2. Lista MJ et al. (2017) Nucleolin directly mediates Epstein-Barr virus immune evasion through binding to G-quadruplexes of EBNA-1 mRNA. Nature Commun 8:16043. <PMID:28685753>

3. Cristillo AD et al. (2001) Double-stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, Epstein-Barr) pyrimidine-load. J Theor Biol 208:475-491.<PMID:11222051>

4. Cook LB et al. (2013) HTLV-1: Persistence and pathogenesis. Virology 435:131-140. <PMID:23217623>

5. Shiohama et al. (2016) Absolute quantification of HTLV-1 basic leucine zipper factor (HBZ) protein and its plasma antibody in HTLV-1 infected individuals with different clinical status. Retrovirology 13:29 <PMID:27117327>


End Note Aug 2019. Huntington Initiation not at Protein Level

That prior signalling at the nucleic acid level may be necessary for the emergence of pathology in the polyglutamine expansion diseases (see section 12 of paper) is suggested by Gusella and the Huntington's Disease Consortion. For detailed discussion see Chapter 14 of Evolutionary Bioinformatics (Forsdyke 2016).

 
Gusella  JF et al. (2019) CAG repeat, not polyglutamine length, determines timing of Huntinton's disease onset. Cell 178, 887-900.

 

End Note Aug 2020. Amino acids as placeholders

 

The LANA protein's acidic domains that can "read" the basic non-acetylated domains of other proteins, may include p53 among their targets (1). Viral long-term persistence (latency) may depend on this mechanism: "LANA's acidic domain reader is critical for viral latency," and thus "this LANA sequence may exert its effects on viral persistence through p53." Other proteins that, like p53, have one basic amino acid domain, may be targeted by one or more of the LANA protein's three dispersed acidic amino acid domains. However, "the basis for LANA's extended acidic domain reader sequence is unclear," so it could act "as a multivalent receptor for one or more unacetylated partners."


This mechanism could explain the domain's ability to "inhibit cis MHC class I peptide presentation," which would protect latent virus from host immune attack (1). Here the authors invoke "central repeat domain effects on protein processing," so interfering with the generation of LANA peptides that would complex with host MHC proteins (2,3). However, the LANA protein domain's acidic amino acids may be merely passive "placeholders" that exists by default because the domain, in fact, operates at the level of the mRNA that encodes the protein (4-7).


Evidence for this hypothesis was obtained by experimentally changing the reading frame (by addition or subtraction of a single base). Thus, the protein sequence of the domain was drastically altered while the mRNA sequence remained essentially unchanged. Since this did not affect viral ability to decrease host immune recognition, it was concluded "that the cis-inhibitory effect of the internal repeat sequences of gamma-herpesviruses operates at the nucleotide level and is unlikely to be mediated through the direct action of the - polypeptide" (6). While the latter considered the EBNA1 protein of Epstein-Barr virus (EBV), 'Nature' herself has carried out the analogous experiment with the evolutionarily related KSHV. Thus, citing Zaldumbide et al. (8), it was noted that: "While EBNA1 and LANA1 have no amino acid similarity, the gene sequences of the two viral repeat sequences are highly homologous to each other but are frameshifted for translation" (2).


Since the acidic amino acid-encoding domains are "extended" throughout the length of the LANA mRNA (1,7), some general function is indicated. This would be consistent with the hypothesis that the corresponding codons - rich in both purines (A and G) - "purine-load" the mRNA to decrease the probability of chance interactions with host mRNAs that are generally purine-rich. Since purines pair poorly with purines, this would decrease the probability of generating double-stranded RNA regions to activate intracellular alarms that can influence mRNA translation (4). Thus, decreasing the purine-loading, without changing the encoded protein, enhances cell surface expression of peptides complexed with MHC protein - so triggering a T-cell response (4-7). Evolutionarily, this would be the primary role of the acidic amino acid-encoding domains. Naming the LANA sequence "reader" presumes it actually "reads" rather than acts as a default space filler - the actual role being at the mRNA level. But other functions, perhaps involving p53, could have arisen secondarily.

For more details please see endnotes at: http://wayback.archive-it.org/7641/20161129204729/http://www.queensu.ca/academia/forsdyke/EBV.htm

 
1. Juillard F et al. (2020) KSHV LANA acetylation-selective acidic domain reader sequence mediates virus persistence. Proc Natl Acad Sci USA www.pnas.org/cgi/doi/10.1073/pnas.2004809117

2. Kwun HJ et al. (2007) Kaposi's sarcoma-associated herpesvirus latency-associated nuclear antigen 1 mimics Epstein-Barr virus EBNA1 immune evasion through central repeat domain effects on protein processing. J Virol 81, 8225-8235.

3. Kwun HJ et al. (2011) The central repeat domain 1 of Kaposi's sarcoma-associated herpesvirus (KSHV) latency associated-nuclear antigen 1 (LANA1) prevents cis MHC class I peptide presentation. Virology 412, 357-365.

4. Cristillo AD et al. (2011) Double-stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1 and Epstein-Barr) pyrimidine-load. J Theor Biol 208, 475-489.

5. Tellam JT et al. (2008) Regulation of protein translation through mRNA structure influences MHC class I loading and T cell recognition. Proc Natl Acad Sci USA 105, 9319-9324.

6. Tellam JT et al. (2012) Messenger RNA sequence rather than protein sequence determines the level of self-synthesis and antigen presentation of the EBV-encoded antigen, EBNA-1. PLOS Pathogens 8, e1003112.

7. Sorel O et al. (2017) Macavirus latency-associated protein evades immune detection through regulation of protein synthesis in cis depending upon its glycine/glutamate-rich domain. PLoS Pathogens 13, e1006691.

8. Zaldumbide A et al. (2007) In cis inhibition of antigen processing by the latency-associated nuclear antigen I of Kaposi sarcoma Herpes virus. Mol Immunol 44, 1352-1360.

 



colorb02.gif (1462 bytes)

Go to: Bioinformatics Index   (Click Here)

Go to: Theoretical Immunology Index   (Click Here)

Go to: Homepage (Click Here)

Go to: Videolectures (Click Here)

Go to: Abstract of the above from 1998 Triplet Repeat conference (Click Here)

Go to: Abstract of the above from 1998 FASEB Meeting (Click Here) 

colorb02.gif (1462 bytes)

This page was established circa 2001 and was last edited on 19 Sep 2020 by Donald Forsdyke