Most of the yeast strains used in fermented beverages and foods are classified as Saccharomyces cerevisiae. However, different strains are suitable for different fermentation processes. The purpose of this work is the proposal of a standardized methodology for the molecular genotyping of S. cerevisiae strains based on polymorphisms at microsatellite loci and/or single nucleotide polymorphisms (SNPs). Single nucleotide variants in the coding region of FLO8, a key regulator of flocculation and pseudohyphae formation, were analyzed in a subset of Uruguayan wine strains. Polymorphism analysis at nine microsatellite loci (selected from 33 loci tested) was performed in a collection of 120 strains, mostly wine strains, from different origins. From a total of 184 different alleles scored, 50 were exclusive alleles that could identify 29 strains. Four selected microsatellite loci are located within or near genes of putative enological interest. The Uruguayan strains are highly diverse and evenly distributed in the phylogenetic reconstructions, suggesting an evolutionary history previous to human use. The Saccharomyces cerevisiae Microsatellites and SNPs Genotyping Database is presented (http://www.pasteur.edu.uy/yeast). Comparison of standardized results from strains coming from different settings (industrial, clinical, environmental) will provide a reliable and growing source of information on the molecular biodiversity of S. cerevisiae strains.
In his early studies on alcoholic fermentation, Louis Pasteur already reported that each type of fermentation requires a certain type of microorganism (Pasteur, 1866). Different strains of the budding yeast Saccharomyces cerevisiae have been used for centuries in baking, distilling, brewing and wine making. Phenotypic variation among wine yeast isolates was recognized by winemakers long before being appreciated by geneticists. Under the same enological conditions, a grape juice can result in a good or bad-quality wine, depending on the S. cerevisiae strain that completes the fermentation. Therefore, most modern winemakers inoculate grape must with a pure culture of a selected S. cerevisiae strain to ensure a reliable and predictable fermentation process. Among many attributes, selected strains must tolerate high ethanol concentrations, flocculate at the end of the fermentation, confer desirable aromas and color, and must not produce undesirable by-products like hydrogen sulfide.
In a completely different application, live Saccharomyces boulardii, now considered a strain of S. cerevisiae (McCullough et al., 1998), has been used as a nutritional supplement and also as a probiotic agent for reequilibration of the intestinal flora (McFarland & Bernasconi, 1993).
Although S. cerevisiae is not considered a pathogen in healthy individuals, it is increasingly isolated from immunocompromised patients. The use of live Saccharomyces in the treatment of diarrhea in Europe has been linked to yeast sepsis (Piarroux et al., 1999). Clinical yeast isolates are able to grow at higher temperatures (41 °C) than laboratory strains, and this characteristic has been correlated with their survival in mice (McCusker et al., 1994). In S. cerevisiae, the physiological response to nutrient deprivation (carbon, nitrogen or amino acid deprivation) results in changes in morphology and cell surface characteristics, switching from spherical or ovoid cells to filaments of invasive pseudohyphae. For fungal pathogens, such as Candida albicans, this dimorphic transition is strongly correlated with invasion of host tissues and virulence (Sudbery et al., 2004).
Recently, the budding yeast S. cerevisiae has attracted interest as an emerging model in ecological and evolutionary genetics. Saccharomyces cerevisiae occupies and flourishes in numerous habitats that are not necessarily associated with human activities. Insects and birds are considered important agents for the dispersal of yeasts. It is well recognized that natural S. cerevisiae isolates, which are generally prototrophs, exhibit very large genotypic and phenotypic diversity, resulting in wide variations in their secondary metabolite production. Metabolic parameters generally vary in a complex, continuous way that can be attributed to a typical polygenic determinism associated to quantitative trait loci (QTL) (for a review see Landry et al., 2006). Three QTLs that control S. cerevisiae sporulation efficiency were mapped to single-nucleotide resolution showing that the interaction of a few genetic variants (SNPs) can have a profound phenotypic effect (Deutschbauer & Davis, 2005). Many traits of industrial strains (ethanol production and tolerance, flocculation, production of desirable or undesirable aromas, etc) or clinical isolates (invasive growth) depend on the expression of multiple loci of variable phenotypic contribution.
Traditional morphological and biochemical tests are of limited value in revealing the genetic diversity of S. cerevisiae strains. In 2001, polymorphism analysis of selected microsatellite loci was proposed as a very powerful and unique method to discriminate S. cerevisiae at the strain level (González Techera et al., 2001; Hennequin et al., 2001). Microsatellites or simple sequence repeats (SSRs) consist of direct tandem repeats of a short DNA motif, usually <10 bp, that are hypervariable in length as a result of DNA-replication errors (Strand et al., 1993). Thus, microsatellites show a substantial level of polymorphism between individuals of the same species and are extensively used for paternity exclusion tests (Helminen et al., 1988), forensic medicine (Hagelberg et al., 1991) and the molecular typing of different eukaryotic organisms, including cultivars of Vitis vinifera (This et al., 2004) and the pathogen yeast C. albicans (Botterel et al., 2001). Methods for the analysis of either microsatellites or SNPs require a previous knowledge of the sequence under study. The choice of a suitable set of polymorphic loci is also crucial for both methods. Microsatellites are particularly suitable for the detection of polyploids and have a higher discrimination power than nucleotide sequence-based methods such as multilocus sequence typing (MLST), particularly when closely related strains are compared (Ayoub et al., 2006). In contrast to microsatellites, SNPs have a low rate of recurrent mutation, making them stable indicators of evolutionary history. SNPs are increasingly used for linkage and biodiversity studies in all kinds of organisms (Wang et al., 1998; Pearson et al., 2004; Clark et al., 2007), including yeasts (Ben-Ari et al., 2005; Aa et al., 2006).
After the entire S. cerevisiae genome was publicly available (Goffeau et al., 1996), different computer searches for short tandem repeats were conducted (Field & Wills, 1998; Katti et al., 2001; Aishwarya et al., 2007). Recently, several high-throughput microsatellite polymorphism analyses have been performed (Legras et al., 2005, 2007; Schuller & Casal, 2007). Although these studies served to confirm the level of polymorphism of several microsatellite loci, they represent stand-alone efforts, and it is still not possible to compare results from different groups or calculate allelic frequencies. The genetic diversity of strains isolated from very different settings, e.g. clinical cases, environmental studies or technological applications, cannot be compared. In this work we report the molecular diversity of S. cerevisiae strains using two powerful and complementary tools for discriminating individuals within any eukaryotic species: microsatellites and SNPs. We propose a standardization method to report data from microsatellite polymorphism analysis, and we present a database that aims to collect and standardize data from different laboratories.
Materials and methods
Yeast strains and media
Saccharomyces cerevisiae AB972 and S288C, two strains used in the sequencing project, are haploid, MAT alpha strains, and were used as standard DNAs of known sequence. The diploid strain BY4743, used for the S. cerevisiae genome deletion project, was also included. Several of the hydrogen sulfide (H2S)-producing strains have been studied in detail by Linderholm (2006). The Argentine native yeast strains have been described by Mercado (2007). The complete list of strains used in this work is included as supplementary material (supplementary Table S1). All yeast strains were grown on YPD [1% (w/v) yeast extract, 2% (w/v) peptone, 2% (w/v) glucose] medium for DNA isolation. All yeast strains were routinely grown at 30 °C, except when tested for growth at 41 °C. Synthetic low-ammonia dextrose (SLAD) plates were prepared as described (Gimeno et al., 1992).
Quick preparation of DNA template for PCR
The pellet corresponding to around 109 cells (early stationary phase) was washed with sterile water and resuspended in 0.4 mL breaking buffer (2% Triton X-100, 1% sodium dodecylsulphate, 10 mM NaCl, 10 mM Tris, pH 8 and 1 mM EDTA, pH 8). The cells were homogenized by vortexing at high speed for 3 min with 0.3 g glass beads (Sigma G9268) in the presence of 0.4 mL phenol pH 8. Then, 0.4 mL TE (10 mM Tris pH 8 and 1 mM EDTA pH 8) was added and vortexed briefly. After centrifugation at 4 °C, the aqueous phase was carefully removed. DNA was ethanol-precipitated, centrifuged and resuspended in TE. DNA concentration and quality was estimated in 0.7% agarose gels and diluted to c. 10–20 ng in 5 μL for PCR reactions.
DNA was isolated from early stationary phase cultures started either from one isolated colony or from a streak. In all cases, these duplicated DNAs showed exactly the same PCR profile for all SSRs tested.
Analysis of SSR loci
Description of the 33 microsatellite loci analyzed is included as supplementary material (supplementary Table S2). The specific pairs of primers used for the nine selected polymorphic loci are shown in Table 1.
Nucleotide sequence of the pairs of primers used to amplify the nine selected SSR loci
Forward primer 5′–3′
Reverse primer 5′–3′
PCR amplifications were performed in a Thermo PXE 0.2 Thermal Cycler. YPL009C, YOR267C, TTA_XIII, YGL013C, YGL028C and AT_X SSR loci were amplified in 20 μL reactions consisting of 10–20 ng DNA, 200 μM of each dNTP, 2 μL of 10X PCR buffer minus Mg, 1 U Taq DNA polymerase, 2.5 mM MgCl2 and 10 pmol of forward and reverse primers. Amplification was performed as follows: 5 min at 94 °C, 30 cycles of (30 s at 94 °C, 30 s at 55 °C, 1 min at 72 °C) and 5 min at 72 °C. To avoid excessive stuttering, TG_VI, GT_X and C4_XV SSR loci were amplified in 20 μL reactions consisting of 10–20 ng DNA, 200 μM of each dNTP, 2 μL of 10X PCR buffer minus Mg, 0.5 U Pfu DNA polymerase (Fermentas), 4 mM MgSO4 and 10 pmol of forward and reverse primers. Amplification was performed as follows: 2 min at 95 °C, 25 cycles of (30 s at 95 °C, 30 s at 55 °C, 1 min at 70 °C) and 5 min at 72 °C. Amplification was confirmed by running an aliquot of the PCR reaction product in 2% agarose gels. DNA concentration was then adjusted, and 1/3 volume of denaturing dye solution (10 mM NaOH, 95% formamide, 0.05% bromophenol blue, 0.05% xylene cyanol) was added. One to four microlitres of this mixture were denatured and electrophoresed in a sequencing gel [6% polyacrylamide gel electrophoresis-Plus acrylamide (Fermentas) plus 7 M urea] and then silver stained according to the Promega Silver Staining Kit. Product sizes were determined by comparison with the known size of the amplified DNA from the sequenced reference strains, the 10-bp DNA marker (Invitrogen), and by mixing several of the different alleles observed. In some cases, the intrinsic stuttering of the microsatellite served as an internal allele ladder.
Absolute values for DNA microsatellite markers were calculated from GenBank database information and are the following: 239 bp for locus YOR267C, 207 bp for locus YPL009C, 238 bp for locus TTA_XIII, 229 bp for locus C4_XV, 274 bp for locus GT_X, 198 bp for locus TG_VI, 245 bp for locus YGL028C, 206 bp for locus AT_X and 228 bp for locus YGL013C. Both AB972 and S288C strains gave the same amplification product size for all SSRs tested, except for locus TG_VI, where one repeat difference was observed between the two strains. In this case, the absolute value of 198 bp was assigned to the amplification product of strain AB972 because this was the strain used to sequence chromosome VI (see ‘FAQs about S. cerevisiae’ in http://www.yeastgenome.org)
SNPs in FLO8
Primers used to amplify part of the coding region of FLO8 were as follows: YER109C-fwd, 5′-GCATGGCAACGAATAGTGA-3′; FLO8-fwd, 5′-CAGCAGCCTTTGCTCAAGATG-3′; Flo8-rev, 5′-GTTCTGCATCGTGTTGTAGCCTTG-3′.
A region of YER109C was amplified with two pairs of primers: FLO8-fwd and Flo8-rev or YER109C-fwd and Flo8-rev.
DNAs were amplified in 20 μL reactions consisting of 10–20 ng DNA, 200 μM of each dNTP, 2 μL of 10X PCR buffer minus Mg, 0.5 U Pfu DNA polymerase (Fermentas), 4 mM MgSO4 and 10 pmol of forward and reverse primers. Amplification was performed as follows: 2 min at 95 °C, 35 cycles of (30 s at 95 °C, 30 s at 55 °C, 1 min at 70 °C) and 5 min at 72 °C. Amplification was confirmed by running an aliquot of the PCR reaction product in 1.5% agarose gels. PCR products were purified and both strands were sequenced at Macrogen (http://dna.macrogen.com). Only SNPs-confirmed sequencing of both strands are reported. The presence of heterozygote DNAs for a certain SNP was confirmed by visually inspecting the superimposed presence of two peaks of different colors for that position in readings from both strands.
Only strains with different genotypes were included in the phylogenetic reconstructions. In order to compare strains with different ploidy, we considered alleles at different loci as independent characters and scored them in a presence/absence (0/1) manner. We estimated the Jaccard coefficient between pairs of strains and grouped them using Neighbor-Joining and unweighted pair-group method with arithmetic average cluster (UPGMA) algorithms using the past 1.74 software (Hammer et al., 2001), rooting the resulting tree by the midpoint method. This method has, at least, three main drawbacks: it does not give the same weight to all loci, it leaves out the correlation between loci and their alleles, and it does not reflect the processes that bring about the differences between strains.
Restricting the analysis to the 82 different diploid strains, mainly wine strains, we computed the Cavalli-Sforza's chord distance (Cavalli-Sforza & Edwards, 1967), which assumes that all differences between strains arise only from genetic drift. We applied the Neighbor Joining algorithm in 1000 Bootstrap pseudo replicates with phylip 3.67 Phylogeny Inference Package (http://evolution.gs.washington.edu/phylip.html), considering the sake strain as the outgroup. In all cases, we visualized the resulting trees with mega 4 software (Tamura et al., 2007).
This molecular-typing work was performed with a collection of 120 S. cerevisiae strains: 48 native wine strains [15 from Uruguay (URU), 28 from Argentina (ARG), five from UCDavis collection (UCD)], 50 commercial wine strains (COM), 12 strains producing different levels of H2S (H2S), one commercial strain for Sake (SAKE), six commercial bread strains (PAN) and three laboratory strains (LAB) (see supplementary material for more details).
At present, almost all commercial wine strains available require the addition of ammonium salts to avoid the production of undesirable aromas. However, there is a growing tendency to avoid this practice because the negative effects of excess nitrogen (residual undesirable compounds, wine contamination during ageing, etc) have been well documented (Bell & Henschke, 2005). Several of the Uruguayan native strains included in this work were selected as low nitrogen demanding strains (González Techera et al., 2001).
Screening for new polymorphic SSR loci
Based on the computer search for SSRs performed by Katti (2001) (publicly available at http://www.ncl-india.org/ssr/) on the completely sequenced genome of S. cerevisiae, we chose and tested the polymorphism of several loci. We performed this initial screening with a representative subset of 23 strains used for different applications and representing different geographical origins (strains in bold and underlined in supplementary Table S1). The criteria for the selection of SSRs to be tested were the following: (1) at least one SSR per chromosome, (2) perfect motifs and long SSRs were preferred, (3) motifs rich in As or Ts that tend to give stuttering were avoided, (4) SSRs present within or near genes of putative enological interest were chosen. Loci previously reported as polymorphic were also included. Primers around 20 bp long were designed with the online primer3 software so as to generate PCR product sizes within 200–250 bp and a unique annealing temperature for all PCR reactions of 55 °C. High-throughput PCR amplification and gel electrophoresis analysis was thus simplified. Criteria for naming the SSR loci were the following: (1) the normalized name of the ORF (according to http://www.yeastgenome.org) if the repeat was located in a coding region and (2) the motif and the chromosome number if the repeat was a perfect repeat on a noncoding region. C4_XV is a compound repeat (containing different triplet repeats) in chromosome XV and therefore we kept the name proposed by Legras (2005). The precise position in base pairs, indicating the chromosome and strand, unambiguously identifies the SSR. Details of the 33 tested loci together with the number of alleles and heterozygocity observed in this subset of 23 strains are included as supplementary material (Supplementary Table S2). Only nine loci were polymorphic out of the 33 tested, within this reference strain population.
Polymorphism analysis at nine selected SSRs
Details of the nine selected SSRs that were analyzed with the collection of 120 strains are included in Table 2. Microsatellites that have been previously used by us or others (González Techera et al., 2001; Legras et al., 2005) were renamed following the standardized naming criteria described above. After obtaining the raw data in bp for the alleles present in the 120 strains for each SSR, we initially thought it would be easier to exchange results among different laboratories if alleles were expressed as number of repeats. In this way, the results would be independent of the primers used. However, when we started to carefully define the number of repeats present in the reference sequenced strain, in some cases the definition was clear cut, but in others it was uncertain. For instance, we assigned 30 triplets for the compound SSR C4_XV, but if this SSR is searched for using another search engine, like the one recently provided in EuMicroSatdb (Aishwarya et al., 2007) (http://www.veenuash.info/web/intro.htm), the recognized repeats turn out to be different: (TAA)9(TAG)7. In the case of SSR TTA_XIII, the number of repeats is 35 if the repeated motif is TTA or 36 if the repeated motif is considered to be TAT. When the SSR was located within a coding region, we counted all the consecutive motifs that coded for the same amino acid.
In an effort to standardize the reporting of results independently of either the primers or the definition of the number of repeats present in the sequenced reference strain, we realized that the best option would be to report the difference in the number of repeats relative to the reference sequenced strain. Thus, the allele sizes were expressed as the difference in the number of repeats present in the locus under study relative to the sequenced reference strain, e.g. +4 if the strain under study showed four repeats more than the sequenced strain analyzed at that locus or −7 if seven repeats less than the reference strain were observed (see Fig. 1). The results for all the strains and loci analyzed can be found in the database (http://www.pasteur.edu.uy/yeast). To confirm the correct assignment of alleles, we constructed allele ladders for each locus by mixing together different amplified DNAs representing all or almost all of the alleles observed. Running these allele ladders in parallel with the samples under study allowed a precise estimation of the difference in the number of repeats relative to the sequenced reference DNA. An example of a typical allele ladder is shown in Fig. 1.
Allelic diversity at microsatellite locus YPL009C. M stands for the 10-bp DNA molecular marker. (1) SC288C PCR product for this locus (seq), (2) partial allele ladder for locus YPL009C, (3) amplification products representing all the alleles observed at locus YPL009C, reported as the difference in the number of repeats relative to the sequenced reference DNA.
A total of 184 different alleles were scored, and 11 strains turned out to be aneuploids. All the commercial bread strains and the strain used for the production of sake showed three to four alleles for most loci. Most of the wine strains were diploids, and a few were aneuploid. A summary of the results is presented in Table 3. These nine SSR loci allowed the discrimination of 93 strains out of the 120 analyzed. There were 50 exclusive alleles that could identify 29 strains. The strain showing the highest number of exclusive alleles (6) was the sake strain, followed by URU12, with four exclusive alleles. A detailed list of exclusive alleles and strains identified is included as supplementary material (Supplementary Table S3). All data obtained and the calculated allelic frequencies for the nine SSRs are included in the database. DNAs used for the allele ladders are available on request.
Summary of results obtained analyzing 120 strains with nine selected SSRs
# alleles stands for the total number of alleles observed, #excl. allele is the total number of alleles present exclusively in certain strain, Exc/total is the fraction of exclusive alleles from the total observed alleles. Max and Min alleles are reported as the difference in the number of repeats relative to the sequenced reference strain. Ho is the observed heterozygocity and PIC the polymorphism information content.
Tree topologies obtained from UPGMA and Neighbor Joining algorithms, which compared all strains, gave similar results. The characteristic pattern of microsatellite data was evident, with deep branches and low definition at the basal nodes (see supplementary Fig. S1). The most differentiated strains are those used for bread, followed by sake and laboratory strains. However, in both cases, one wine strain was also differentiated from the rest. Besides, none of the categories of strains have a unique origin, except for a subgroup of H2S producers.
Restricting the analysis to the 82 diploid strains reproduced the above pattern but also showed that two groups of Argentine strains are monophyletic (Fig. 2).
Dendrogram of diploid Saccharomyces cerevisiae strains. Neighbour-joining tree constructed from the chord distance between yeast strains based on the polymorphism at nine microsatellite loci and rooted considering Sake strain as outgroup. Numbers above nodes are times of occurrence of each node after 1000 pseudoreplicates of bootstrap.
Four polymorphic SSRs are within or near genes of putative enological interest
The four polymorphic SSRs YGL013C, YGL028C, AT_X and TG_VI are located within or near genes of putative enological interest. We defined ‘near’ as <3 kb because on average 1 cM corresponds to 3 kb for S. cerevisiae.
YGL013C is next to ERG4/YGL012W, but on the opposite strand. ERG4 encodes the enzyme sterol C-24(28) reductase, which catalyzes the final step in ergosterol biosynthesis (Zweytick et al., 2000). Ergosterol is an essential component of yeast cells that maintains the integrity of the membrane. High concentrations of ergosterol have been correlated to high ethanol tolerance (Wu et al., 2006). Testing the subgroup of Uruguayan native strains, which include representatives of high (>14%) and low (<12%) ethanol tolerance, we could not demonstrate a possible association of this molecular marker to ethanol tolerance (results not shown).
The repeat present within SCW11/YGL028C is considered a minisatellite, rather than a microsatellite, because the repeated motif has 12 bases. SCW11 codes for a cell wall protein with similarity to glucanases. A null mutant in SCW11 is viable but exhibits defects in separation after division and displays flocculant growth (Cappellaro et al., 1998). The observed alleles in the population are shown in Fig. 3. We sequenced homozygote DNAs with three motifs less and two motifs more than the sequenced reference DNA and confirmed that the variation corresponded to changes in four codons coding for the amino acids SerSerSerThr; therefore, the repeated motif is best defined as SSST. Furthermore, we verified that in all sequenced DNAs, there were four different groups of codons coding for the same amino acid motif SSST:
Allelic diversity at microsatellite locus YGL028C. M stands for the 10-bp DNA molecular marker. (3) Amplification products representing all the alleles observed at locus YGL028C, reported as the difference in the number of repeats relative to the sequenced reference DNA (seq).
TCG TCC TCT ACG
TCG TCC TCT ACT
TCT TCT TCT ACT
TCT TCC TCT ACT
Motif types (1) and (2) remained the same for all sequenced DNAs whereas variations in the number of motif types (3) and (4) accounted for the size differences observed.
The largest allele with 12 motifs (SSST) was not observed in homozygosis. Flocculation assays of strains carrying the different alleles observed in homozygosis did not result in evident phenotypic differences (results not shown).
The production of H2S is an undesirable sensory characteristic in the wine, beer and sake industries. Yeast strain background has a strong influence on H2S production in wine strains of S. cerevisiae (Spiropoulos et al., 2000).
The TG_VI SSR (named as C5 by Legras et al., 2005) is located at around 3 kb from MET10/YFR030W. MET10 codes for the subunit alpha of assimilatory sulfite reductase, which is responsible for the conversion of sulfite into sulfide (Hansen & Kielland-Brandt, 1996). The AT_X SSR is located at 135 bp from MET3/YJR010W. MET3 codes for ATP sulfurylase, the enzyme that catalyzes the primary step of intracellular sulfate activation, ATP sulfurylase is essential for the assimilatory reduction of sulfate to sulfide (Cherest et al., 1985). All of the H2S-producing strains (H2S in the dendograms) show the same allele in homozygosis for AT_X SSR. There is also one predominant allele present in homozygosis for TG_VI in this subset of strains. However, the alleles present in these 12 strains for these two loci next to MET genes are not exclusive to this subgroup and are also present in other strains of our collection.
SNPs in FLO8, a key regulator of flocculation and pseudohyphal growth
FLO8 is a key transcription factor required for flocculation and filamentous growth. The genome reference strain (AB972 or S288C) and most laboratory strains have a null mutation in this gene and a nonflocculant phenotype (Liu et al., 1996). If the first nucleotide of the coding sequence is given number 1, this null mutation consists of the presence of an A in position 425 of FLO8, generating a stop codon at the amino acid level. We designed primers to amplify a 300-bp region encompassing this known null mutation site and sequenced both strands in a subset of Uruguayan native wine strains, AB972 (reference haploid strain), BY4743 (diploid strain used for the S. cerevisiae deletion project) and the very flocculant commercial strain AWRI 350. The polymorphisms observed are shown in Table 4.
The position in bp (number 1 is given to the first nucleotide of the coding sequence), nucleotide present and codon/amino acid resulting is indicated. Both alleles are indicated for heterozygote DNAs.
Filamentous growth was tested on SLAD plates (see Fig. 4). Although pseudohyphae formation and agar invasion were evident in some cases, none of this subset of strains could grow on YPD at 41 °C, as has been reported for clinical isolates.
Pseudohyphal growth on synthetic low-ammonia dextrose plates. Strains were streaked on SLAD media poured onto sterile microscopic slides and incubated at 30°C for 3–4 days. Different levels of pseudohyphal growth are indicated: from no pseudohyphae (−) to formation of pseudohyphal mats (++++). Photographs were taken from representative colonies (left column, × 20). A closer look at the colony borders is shown (right column, × 200).
Saccharomyces cerevisiae microsatellites and SNPs genotyping database
The S. cerevisiae microsatellites and SNP Genotyping Database was conceived to gather and search information of microsatellites (or SSRs) and SNPs alleles from different strains. The aim of this interactive site is to provide a reliable and growing source of information on the biodiversity of S. cerevisiae strains isolated from different geographical origins (for applications as diverse as the production of wine, beer, bread, sake, bioethanol, probiotics, biofertilizers, etc), from clinical cases or from environmental samples. As more and more strains are characterized world-wide, it is imperative to provide a means to exchange and add up results from different laboratories. A convenient spreadsheet to transform raw data to differences in the number of repeats relative to the sequenced strain is provided. The inclusion of internal controls with reference DNAs will serve to validate results from different laboratories. The DNAs used to construct the allele ladders for each locus are available on request. If researchers add their results to this site, the database would serve the following purposes:
identification of new strains of S. cerevisiae,
identification of possible synonyms in strain collections,
identification of possible synonyms in probiotic and clinical isolates,
calculation of allelic frequencies,
genotypic combination of alleles,
assessment of geographical biodiversity and
association of molecular markers to traits of interest.
The database is organized with a user-friendly interface. Three main search entries are provided: by strain, by SSR or by SNP. Strains can be searched by their given names, geographical origin or application. In the SSR search, allele(s) observed for different loci can be entered, and all the matching strains will be displayed. If entered data is not found in the database, the user is prompted to submit it as new data subjected to validation. Only SNPs in FLO8 (this work), CYS4 and MET6 (Linderholm et al., 2006) for a small subset of strains are included for the moment. Links to information in published literature or other websites are provided. The present database (Beta version) includes data from nine different SSRs and three different SNP loci. Data for these loci from as many different strains as possible are welcomed. In the future, we look forward to including data from other polymorphic SSRs and SNPs as well. The database will be continuously updated and improved, responding to the input and suggestions of the yeast community.
Molecular evidence for the presence of S. cerevisiae in wine fermentation dates back to 3150 BC, from pottery jars found in the tomb of the King Scorpion I of Egypt. However, several lines of evidence suggest that the evolutionary history of S. cerevisiae strains is previous to human use and spans millions of years (Blair et al., 2005; Landry et al., 2006).
Although microsatellites have become extremely popular molecular markers, little is known about the role of microsatellites in genome organization, gene regulation, quantitative genetic variation and evolution of genes. Some SSRs are highly polymorphic while others show very little or no variation within individuals. In our initial screening, we discarded three SSRs (YDR289C, YKL172W and YLR177W), which have been claimed as useful to discriminate probiotic from clinical isolates (Hennequin et al., 2001; Malgoire et al., 2005) due to their low discriminative power in our analyzed population. In agreement with Legras (2007), we have found that TTA_XIII, TG_VI, GT_X, C4_XV, YPL009C and YOR267C SSRs show the highest discriminative power. Polymorphisms in YGL013C, YGL028C and AT_X SSRs have not been reported previously. Our microsatellite polymorphism analysis also indicates that sake and bread yeast strains are highly differentiated from most wine strains, in accordance with their technological origins. Nevertheless, this conclusion is taken from a distance which has several drawbacks, as was pointed out in the Materials and methods section of this article. To be conclusive, a suitable distance to compare organisms with different ploidy would need to be developed.
On the other hand, phylogenetic trees from diploid strains suggest that, in general, native strains – Uruguayan (URU), Californian (UCD) and some Argentine (ARG) strains – have a multiple origin. In fact, this origin must have been previous to a possible migration of these strains to America. The American native strains may proceed from European strains or may have been recruited in situ. This second possibility would be contradictory to the hypothesis proposed by Legras (2007). However, our analysis does not include the proper geographic representation of strains to demonstrate an in situ recruitment. If we considered the American native strains as strains of ‘European origin’, our results would be compatible with the proposal of Legras (2007).
The opposite situation is observed with a subset of H2S producers and two groups of Argentine strains, between which a close relationship is suggested, with high bootstrap values for all of them (see Fig. 2). Different methods confirmed this relationship for the subgroup of H2S producers. Interestingly, this result suggests a differentiation in situ of some Argentine strains.
In the S. cerevisiae genome, the majority of the genes containing intragenic minisatellites encode cell wall proteins. Variations in the number of intragenic repeats could provide the functional diversity of cell surface antigens that, in fungi and other pathogens, allows rapid adaptation to the environment and/or elusion of the host immune system. In the same genetic background, Verstrepen (2005) could demonstrate that size variations in an intragenic minisatellite in FLO1 create quantitative alterations in phenotypes like adhesion, flocculation or biofilm formation. Size variations for the minisatellite in SCW11/YGL028C were restricted to only seven alleles in this population, and the largest allele was not found in homozygosis. This limited variation may reflect a functional restriction for the possible protein variants. Variations in minisatellites have been proposed as a method to characterize wine yeast strains (Marinangeli et al., 2004). Although the discrimination power of these loci is not high enough, they can be useful markers because amplification is robust and, in most cases, the allele size differences can be checked on agarose gels.
Analyzing DNAs from eight Uruguayan native strains we found five SNPs in a 300 bp coding region of FLO8. Deutschbauer & Davis (2005) found a polymorphism, on average, every 173 bp between DNAs from two strains with high divergence in sporulation efficiencies. These eight URU native strains studied are very diverse, as evidenced by their different ethanol tolerance, flocculation or pseudohyphae formation and confirmed by microsatellite and SNP molecular analysis.
DNAs from all the wine yeast strains tested (eight Uruguayan native and two commercial strains) showed the same two SNPs: a G at position 425 (resulting in a Trp codon) and an A or heterozygous A/G at position 334 (resulting in a Val or Ile codon). DNAs with SNPs in all the other YER109C positions (306, 315 and 449) coded for either different codons of the same amino acid or different amino acids of the same chemical groups. Val and Ile are both nonpolar amino acids, while Asn and Ser are amino acids with charged polar side chains.
No straightforward correlation could be established between pseudohyphal growth, flocculation and SNPs in FLO8. CP 882 is a highly flocculant strain when compared with CP 881 (González Techera et al., 2001), but both show very few pseudohyphae in SLAD plates. AWRI350 is a highly flocculant strain compared with FQU 02/16, and both show mats of pseudohyphae invading the agar. The three strains (CP KU1, FQU 99/5 and M522) with intermediate levels of pseudohyphal growth showed heterozygote DNAs in the FLO8 region analyzed. Cell–cell adhesion (flocculation) and adhesion to abiotic surfaces or to tissues are properties of medical and industrial relevance. At least three different signaling pathways (some requiring the positive regulator Flo8) are activated in response to stress, nutrient limitation or small signaling molecules. Adhesion is a complex response controlled by integrated pathways working together (Verstrepen & Klis, 2006). A comparative genomics analysis within individuals of S. cerevisiae analyzed at many loci could eventually serve to pinpoint the crucial genotypical differences underlying complex phenotypes.
Nowadays, one of the greatest challenges for geneticists is the dissection of complex quantitative genetic variation into genes at the molecular level. Most traits of biotechnological interest in S. cerevisiae strains are complex traits that depend on multiple genes and their allelic variants. Codominant molecular markers like SSRs and SNPs are widely used for the molecular discrimination of individuals within eukaryotic species, for biodiversity studies, QTL mapping and linkage studies. In plants, molecular markers are used for marker-assisted introgression of favorable alleles in breeding programs (Andersen & Lübberstedt, 2003). In plant breeding, introgression is a common and effective practice to improve specific traits in an already good accession called ‘elite’. Recently, this approach has been used to construct industrial yeast strains (Marullo et al., 2007). Also, 260 SSR markers across the 16 yeast chromosomes were recently developed to discriminate two strains with extreme phenotypes and to genetically dissect the QTL regions responsible for ethanol tolerance in S. cerevisiae (Hu et al., 2007). Saccharomyces cerevisiae provides an ideal framework for QTL analysis due to its high recombination rate, its richly annotated genome, and the fact that genes can be directly manipulated in their genomic context.
Sequencing projects of entire genomes of Saccharomyces yeast strains are in progress and will certainly add very valuable information about SNPs and probably will also serve to choose a set of suitable genes for MLST. However, this ‘brute force’ effort will not substitute or invalidate microsatellite typing. SSR typing is a cheap and accessible method that has the following unique features compared with SNPs (or MLST):
SSRs give clearcut information on ploidy levels. Many industrial strains are aneuploids or polyploids, and this has been associated with an adaptation mechanism (Querol et al., 2003).
SSRs can be easily adapted to a simple method (using agarose gels) to monitor S. cerevisiae strains during alcoholic fermentation (Howell et al., 2004) and to detect the presence of S. cerevisiae–Saccharomyces bayanus hybrids (Masneuf-Pomarède et al., 2007).
The reason why some SSRs are highly polymorphic while others are invariable is still an open question. Variation in the efficiency of DNA mismatch repair at different sites in the yeast genome has been proposed as a possible explanation (Hawk et al., 2005). An assessment of SSR instability (an important phenomenon in cancer development) could also be a by-product of this database.
For closely related S. cerevisiae strains, MLST has proven to be less discriminatory than SSRs (Ayoub et al., 2006).
Precise estimation and comparison of genetic variation among populations requires a large number of SNP relative to microsatellites because microsatellite loci typically have many alleles (more than 30 for S. cerevisiae), whereas two is the norm for SNP loci. Ascertainment bias in SNPs identification can also be a serious issue for studies of population structure since it has the potential to introduce systematic bias in estimates of variation within and among populations (Morin et al., 2004).
There is an increasing need for standardization in the reporting of results from different laboratories as more S. cerevisiae strains and SSR markers are being tested. The discrimination power of the selected SSRs depends on the population of strains analyzed and, therefore, it would be very valuable information to be able to calculate allelic frequencies from strains coming from industrial, clinical or environmental settings. At present, it is not possible to extrapolate microsatellite data from different laboratories. Sizing with ladders, containing many or all of the observed alleles for a given SSR locus, is a common practice when analyzing human microsatellites and it certainly allows comparison of data after careful validation procedures (see ‘Genetic Identity’ at http://www.promega.com). The standard in humans is to report alleles as the absolute number of repeats. Only a small core set of loci have been selected and commercial kits providing premixed primers and allelic ladders are available. Because all users work with the same primers, these allelic ladders can be used to calibrate PCR product sizes to SSR repeat number for genotyping purposes (Butler, 2007). However, in some cases, there is still the need to reach a consensus on the definition of the core repeat structure to prevent confusion and allow a comparison of results between laboratories (see ‘Comment on nomenclature for STR alleles and repeat structure’ at http://www.cstl.nist.gov/biotech/strbase/intro.htm).
The comparability of microsatellite profiles obtained in different laboratories was exhaustively studied by This (2004) in an effort to develop a standardized method for the identification of grape cultivars. Ten laboratories in seven countries analyzed 46 grape cultivars at six SSR loci, and no effort was made to standardize equipment or protocols. All the participants used the same DNAs and the same primers but different enzymes, PCR programs and manual or automatic sequencing. The comparison of absolute allele sizes was impossible because there were discrepancies of up to 3 bp. The data coded as size differences between the smallest allele observed and the allele of the cultivar under study were more consistent but still not satisfactory. Even the automatic scoring of peak sizes can produce artificial shifts due to the simple algorithms used for automatically rounding up fragment sizes. Highest data consistency was obtained when every laboratory used the same selected cultivar-specific fragments as internal size standards, comprising a relatively complete allelic ladder for each of the six microsatellite loci.
We propose a similar standardization method based on the inclusion of the DNA from sequenced S. cerevisiae strains (AB972 and/or S288C) and the usefulness of internal allelic ladders to unequivocally express the results as the difference in the number of repeats relative to the sequenced reference strain. We offer the distribution of selected DNAs, rather than allele ladders, to allow for the use of different platforms (manual or automatic), different primers and/or multiplexing. Every laboratory can create its own allele ladder; the only prerequisite to ensure validation of results is that the same reference sequenced DNA is included plus three to four selected DNAs (which should be the same for all laboratories) that will cover the whole range of allele sizes. The exchange of DNAs prevents confusion associated with the names given to strains. The comparison of data from many strains and different SSRs will allow the selection of a minimal set of robust and highly polymorphic markers with clear fragment patterns. If researchers used the same pairs of primers giving robust amplification products, it would be possible to exchange allele ladders for manual or automatic platforms.
Although DNAs showing different microsatellite patterns surely correspond to different strains, if two or more DNAs show the same pattern, we can only say that we are unable to discriminate them with the analyzed microsatellite loci. Analysis of a higher number of polymorphic loci might be needed to discriminate closely related individuals. However, in some cases, analysis of the strain history reveals that the same strain was transported from one collection to another and different names were assigned to the same strain.
Confirmation of the presence of exclusive alleles for certain strains could eventually serve for identification purposes. A bar-coding method based on microsatellites could be used to identify industrial strains.
The future usefulness of the database presented here as a Beta version will depend on the scale of validated results incorporated by all researchers working with strains of S. cerevisiae from different settings. If the database is welcomed by the S. cerevisiae community, regular updates and improvements will certainly be needed. Comparison of several hundreds or thousands of strains with a large number of polymorphic SSRs and/or SNPs in many genes will eventually allow an association of molecular markers to complex phenotypic traits.
This research work was funded by the Program for Technological Development PDT32/06, Dinacyt, Uruguay and Pedeciba Química, Uruguay. We acknowledge Linda Riles for sending AB972 yeast strain. We are very grateful to Lucy Joseph, Perrine Languet and Mariana Combina for sending us strains and information. We thank Mario Lalinde for visual art services.
(1998) Abundant microsatellite polymorphism in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S. cerevisiae, result from strong mutation pressures and a variety of selective forces. Proc Natl Acad Sci USA 95: 1647–1652.
(2007) Diversity of Saccharomyces strains on grapes and winery surfaces: analysis of their contribution to fermentative flora of Malbec wine from Mendoza (Argentina) during two consecutive yearsFood Microbiol. 24: 403–412.