Homogenization of sub-genome secretome gene expression patterns in the allodiploid fungus Verticillium longisporum

Allopolyploidization, genome duplication through interspecific hybridization, is an important evolutionary mechanism that can enable organisms to adapt to environmental changes or stresses. The increased adaptive potential of allopolyploids can be particularly relevant for plant pathogens in their ongoing quest for host immune response evasion. To this end, plant pathogens secrete a plethora of molecules that enable host colonization. Allodiploidization has resulted in the new plant pathogen Verticillium longisporum that infects different hosts than haploid Verticillium species. To reveal the impact of allodiploidization on plant pathogen evolution, we studied the genome and transcriptome dynamics of V. longisporum using next-generation sequencing. V. longisporum genome evolution is characterized by extensive chromosomal rearrangements, between as well as within parental chromosome sets, leading to a mosaic genome structure. In comparison to haploid Verticillium species, V. longisporum genes display stronger signs of positive selection. The expression patterns of the two sub-genomes show remarkable resemblance, suggesting that the parental gene expression patterns homogenized upon hybridization. Moreover, whereas V. longisporum genes encoding secreted proteins frequently display differential expression between the parental sub-genomes in culture medium, expression patterns homogenize upon plant colonization. Collectively, our results illustrate of the adaptive potential of allodiploidy mediated by the interplay of two sub-genomes. Author summary Hybridization followed by whole-genome duplication, so-called allopolyploidization, provides genomic flexibility that is beneficial for survival under stressful conditions or invasiveness into new habitats. Allopolyploidization has mainly been studied in plants, but also occurs in other organisms, including fungi. Verticillium longisporum, an emerging fungal pathogen on brassicaceous plants, arose by allodiploidization between two Verticillium spp. We used comparative genomics to reveal the plastic nature of the V. longisporum genomes, showing that parental chromosome sets recombined extensively, resulting in a mosaic genome pattern. Furthermore, we show that non-synonymous substitutions frequently occurred in V. longisporum. Moreover, we reveal that expression patterns of genes encoding secreted proteins homogenized between the V. longisporum sub-genomes upon plant colonization. In conclusion, our results illustrate the large adaptive potential upon genome hybridization for fungi mediated by genomic plasticity and interaction between sub-genomes.


155
In allodiploid organisms, parental origin determination is elementary to investigate 156 genome evolution in the aftermath of hybridization. As species D1 is phylogenetically closer 157 related, and consequently has a higher sequence identity, to V. dahliae than species A1, V.
158 longisporum genomic regions were previously provisionally assigned to either species D1 or 159 A1 [39]. Here, we determined the parental origin of V. longisporum genomic regions more 160 precisely. The difference in phylogenetic distance of species A1 and D1 to V. dahliae caused 161 that V. longisporum genome alignments to V. dahliae displayed a bimodal distribution with 162 one peak at 93.1% and another peak at 98.4% sequence identity that represent the two parents 163 with a dip at 96.0% (S1 Fig). In order to separate the two sub-genomes, regions with an 164 average sequence identity to V. dahliae of <96% were assigned to species A1, whereas 165 regions with an identity of 96% were assigned to species D1 (Fig 1). In this manner, 36.2 166 Mb of V. longisporum strain VLB2 was assigned to species A1 and 35.7 Mb to species D1.
167 For V. longisporum strain VL20, 36.3 Mb was assigned to species A1 and 35.2 Mb to species 168 D1. Only 1.0 and 0.8 Mb of strains VLB2 and VL20, respectively, could not be aligned to V.
169 dahliae and thus remained unassigned.

170
To trace the chromosome sets of the original parents of the hybrid, the parental origin 171 of individual contigs was determined. In total, 8 of the 10 largest contigs of V. longisporum 172 strain VLB2 as well as strain VL20 consist of regions originating from both species A1 and 173 species D1 (Fig 1). Thus, parental chromosome sets cannot be separated from one another as 174 V. longisporum apparently evolved a mosaic genome structure in the aftermath of 175 hybridization.
15 325 also in planta there is no expression dominance of one of the V. longisporum sub-genomes.

330
To elucidate a putative association of gene expression differences between V.
331 longisporum and V. dahliae with their distinct host ranges, the fraction of differently 332 expressed genes that encode secreted proteins was determined. For Verticillium grown in 333 culture medium, 16.5% of the differentially expressed genes between V. longisporum and V.
334 dahliae encode secreted proteins, whereas this is only 5.3% for genes without differential 335 expression ( Fig 6A). This enrichment of genes encoding secreted proteins was also found for  Table). Similarly, genes 340 encoding secreted proteins were also significantly enriched for differentially expressed genes 341 between V. longisporum and V. dahliae in planta ( Fig 6B). However, the fraction of 9.7% was 342 significantly less than 16.5% for Verticillium grown in culture medium ( Fig 6B) (Fig 7). Thus, similar to V. longisporum and V.
351 dahliae orthologs, differentially expressed homeologs are enriched for genes that encode 352 secreted proteins. Intriguingly, 7.9% of the genes with differential homeolog expression in 353 planta encode secreted proteins, which is a similar fraction as for homeologs without 354 differential expression (8.4%; Fig 7). Thus, upon plant colonization, there is no enrichment of 355 genes that encode secreted proteins for differentially expressed homeologs. This lack of 356 enrichment may be due to increased expression differences in planta between homeologs that 357 encode non-secreted proteins. Alternatively, expression levels of homeologs encoding 358 secreted proteins may homogenize in planta (Fig 7). In total, 11.2% and 9.5% of the genes 359 that are differently regulated in planta from culture medium encode secreted proteins in sub-360 genome A1 and D1, respectively (Fig 7). This is a significantly larger fraction than for genes 361 without differential expression: 7.2% and 7.9% for sub-genomes A1 and D1, respectively. In

511 Parental origin determination
512 Sub-genomes were divided based on the differences in sequence identities between species 513 A1 and D1 with V. dahliae. V. longisporum genomes of VLB2 and VL20 were aligned to the 514 complete genome assembly of V. dahliae JR2 using NUCmer, which is part of the MUMmer 515 package v3.23 [46,74]. Here, only 1-to-1 alignments longer than 10 kb and with a minimum 516 of 80% identity were retained. Subsequent alignments were concatenated if they aligned to 517 the same contig with the same orientation and order as the reference genome. The average 518 nucleotide identity was determined for every concatenated alignment and used to divide the 519 genomes into sub-genome.

520
The parental origin determination based on sequence identities of the exonic regions of Genes occurring in multiple copy were identified using nucleotide BLAST (v2.6.0+) and the 532 sequence identity between these genes was determined. Here, hits with a minimum subject 533 and query coverage of 80% were used.

534
The VLB2 genome assembly was aligned to VL20 to identify synteny breaks using 535 NUCmer, which is part of the MUMmer package v3.23 [74]. Subsequent alignments were 536 concatenated if they aligned to the same contig with the same orientation and order as the 537 reference genome. In order to confirm synteny breaks, filtered V. longisporum long 538 sequencing reads of VLB2 [39] were aligned to the V. longisporum VL20 genome with the 539 Burrows-Wheeler Aligner (BWA) [75] and further processed with the samtools package 540 (v1.3.1) [76]. Synteny breaks were visualized using the R package Sushi [77] and the 541 Integrative Genomics Viewer [78]. The association between breaks with repeats was tested 542 through permutation. First, the fraction of synteny breaks flanked by repeats was determined.
543 Here, synteny breaks were assigned to reside in a "repeat-rich" region if a 1 kb window 544 around the break consisted for more than 10% of repeats. The V. longisporum VL20 genome      longisporum strain VLB2 was used. Only genes with one copy in V. dahliae and two orthologs in the V. longisporum strains VLB2 and VL20 (one from sub-genome A1 and one from sub-genome D1) were considered. longisporum D1 sub-genome.