OUP user menu

The Hansenula polymorpha (strain CBS4732) genome sequencing and analysis

Massoud Ramezani-Rad, Cornelis P. Hollenberg, Juergen Lauber, Holger Wedler, Eike Griess, Christian Wagner, Kaj Albermann, Jean Hani, Michael Piontek, Ulrike Dahlems, Gerd Gellissen
DOI: http://dx.doi.org/10.1016/S1567-1356(03)00125-9 207-215 First published online: 1 November 2003


The methylotrophic yeast Hansenula polymorpha is a recognised model system for investigation of peroxisomal function, special metabolic pathways like methanol metabolism, of nitrate assimilation or thermostability. Strain RB11, an odc1 derivative of the particular H. polymorpha isolate CBS4732 (synonymous to ATCC34438, NRRL-Y-5445, CCY38-22-2) has been developed as a platform for heterologous gene expression. The scientific and industrial significance of this organism is now being met by the characterisation of its entire genome. The H. polymorpha RB11 genome consists of approximately 9.5 Mb and is organised as six chromosomes ranging in size from 0.9 to 2.2 Mb. Over 90% of the genome was sequenced with concomitant high accuracy and assembled into 48 contigs organised on eight scaffolds (supercontigs). After manual annotation 4767 out of 5933 open reading frames (ORFs) with significant homologies to a non-redundant protein database were predicted. The remaining 1166 ORFs showed no significant similarity to known proteins. The number of ORFs is comparable to that of other sequenced budding yeasts of similar genome size.

  • Yeast genomics
  • Sequence analysis
  • Hansenula polymorpha

1 Introduction

Yeasts constitute an important group of industrial microorganisms. Its long tradition of human use, the overwhelming knowledge of its genetics and physiology made the baker's yeast Saccharomyces cerevisiae a eukaryotic model organism for basic research and industrial applications [1]. In 1996, it was the first eukaryotic organism for which the complete genome sequence was established [2]. The initial focus on S. cerevisiae has been extended by investigations of a range of alternative yeast species. As a consequence, the number of fully or partially sequenced budding yeast genomes has continued to grow. Among others, a comparative genomic exploration of 13 species was conducted selected from hemiascomycetous yeasts [3].

The methylotrophic yeast Hansenula polymorpha (syn. Pichia angusta) is one of the most important industrially applied non-conventional yeasts [4,5]. H. polymorpha is a ubiquitous yeast species occurring naturally in spoiled orange juice, maize meal, in the gut of various insect species and in soil. It grows as white to cream, butyrous colonies and does not form filaments [6]. H. polymorpha isolates are homothallic and reproduction occurs vegetatively by budding. H. polymorpha belongs to the fungal family of Saccharomycetaceae, subfamily Saccharomycetoideae [6,7]. Most research has been performed with three basic strains designated as H. polymorpha DL-1, CBS4732 and NCYC495, respectively. These strains are of independent origin and unclear relationship and exhibit different features, including different chromosome numbers. Depending on strain and separation conditions, between two and seven chromosomes can be distinguished [8,9]. Strain CBS4732 (syn. ATCC34438, NRRL-Y-5445; CCY38-22-2) was originally isolated from soil irrigated with waste water from a distillery in Pernambuco, Brazil [10]. Its odc1 derivatives LR9 [11] and RB11 [12] have been developed as hosts for heterologous gene expression [12]. Recombinant compounds produced in these hosts include enzymes like the feed additive phytase [13,14], anticoagulants like hirudin and saratin [1517] and an efficient vaccine against hepatitis B infection [1820]. The significance of H. polymorpha in basic research stems largely from studies focussed on peroxisome homeostasis [21] and nitrate assimilation [22]. Although much is known about the physiology, biochemistry and ultra structure of this yeast (for review see monograph on H. polymorpha[4]), little information is available about the genomic structure and function [23]. Several groups worldwide have initiated studies on its genome several years ago. Included in the comparative genome analysis on 13 hemiascomycetous yeasts mentioned above part of the H. polymorpha (P. angusta) genome sequence was established using a partial random sequencing strategy with a coverage of 0.3 genome equivalents. Using this approach, about 3 Mb of sequencing raw data of the H. polymorpha genome was obtained [3]. We performed a genome analysis aimed at a higher coverage and using a BAC-to-BAC approach. This work now culminated in the comprehensive genome analysis of this organism. A first description of the data generated is provided in this study. Access to the genome data can be granted upon request (G.G.) and after signing a Material Transfer Agreement. The access has already been granted to six academic groups working on various aspects of functional genomics of H. polymorpha.

The present paper describes the results of the sequencing and characterisation of 8.733 Mb assembled into 48 contigs. The sequence covers over 90% of the estimated total genome content of 9.5 Mb located on six chromosomes ranging in size between 0.9 and 2.2 Mb [23]. The established sequence contains 5933 ORFs.

2 Materials and methods

2.1 Construction of the genomic BAC library

For the sequencing of H. polymorpha strain RB11, an odc1 derivative of wild-type strain CBS4732 was selected [12]. For the construction of the genomic BAC library of H. polymorpha, the vector pBACe3.6 was used and prepared according to Osoegawa et al. [24]. H. polymorpha cells from a 50 ml YPD (1% yeast extract, 2% peptone, 2% glucose) culture were washed twice with TSE buffer (25 mM Tris–HCl, 300 mM sucrose, 25 mM EDTA, pH 8) and resuspended in TSE buffer. Then, agarose plugs from these cells were prepared according to the Bio-Rad manual of the Chef DR II pulsed-field gel electrophoresis system (PFGE system) using 1.5% low melting point agarose. Pre-electrophoresis was carried out on a Bio-Rad PFGE system. Partial digestion of genomic DNA was carried out according to Osoegawa et al. [25] using Sau3AI for restriction. Gel electrophoresis was carried out on a Bio-Rad PFGE system according to conditions given at Rod Wing's homepage (Clemson University, Genomics Institute, construction of BAC libraries protocol: 6 V cm−1, 90 s pulse, 13°C 18 h). Agarose digestion with gelase, ligation and transformation were carried out using the same protocol. Subsequent electroporation of DH10B cells (Invitrogen) was again carried out according to Osoegawa et al. [25], and bacteria were plated onto 2×YT plates supplemented with chloramphenicol as selecting agent. Clones obtained from that procedure were picked and used to inoculate 1.2 ml of 2×YT supplemented with chloramphenicol. These bacterial cultures were used to prepare glycerol stocks in 96-well microtitre plate format as resource for all subsequent work.

2.2 Construction of shotgun libraries from BAC DNA

Large-scale preparations of BAC DNA were carried out using the Large-Construct kit from Qiagen (Qiagen GmbH, Hilden, Germany; cat. no. 12462). After sonification and enzymatic repair of the ends, fragments of desired size (usually 1.2–1.5 kb) were isolated from a 1% preparative agarose gel using the MinElute Gel Extraction kit (Qiagen, cat. no. 28604) and inserted into a SmaI-digested and alkaline phosphatase-treated pUC19 vector [26]. Ligation was carried out with the Rapid Ligation kit (Roche) according to the manufacturer's protocol. The ligation mixture was then desalted using a QIAquick kit (Qiagen, cat. no. 28304) according to the instructions of the supplier with the exception of the elution step. This was carried out with ddH2O. 1/10 volume of the eluted DNA was used for transformation of competent Escherichia coli DH10B cells using a Genepulser II device (Bio-Rad). 1 ml Luria–Bertani (LB) medium [26] was added and incubated for 1 h at 37°C. 1/200 and 1/20 volumes of the transformed cells were plated onto Petri dishes containing LB agar, ampicillin, X-Gal and isopropyl thiogalactose (IPTG) [26] and grown overnight at 37°C to determine the yield of recombinant clones. Usually the transformation rate was greater than 108 transformants per μg vector DNA and the white:blue ratio was approximately 10:1 or better.

2.3 Plasmid preparation of shotgun clones

For subsequent DNA sequencing, plasmid DNA from white colonies was isolated after growth in 1.2 ml 2×YT cultures containing ampicillin for 24 h at 37°C and shaking at 220 rpm. Plasmid purification of shotgun clones was carried out using the REAL Prep 96 kit (Qiagen, cat. no. 26173).

2.4 DNA sequencing

DNA sequencing reactions were set up using BigDye Terminator v 2.0 cycle sequencing chemistry (Applied Biosystems, cat. no. 4314416) and purified using the DyeEx 96 (Qiagen, cat. no. 63183). Sequencing data were generated using ABI Prism 3700 sequence analyzers.

2.5 Sequence assembly

Base calling and quality checks were carried out using Phred [27]. Sequences were assembled with Phrap and editing was performed after import into gap4. BAC assemblies and raw data were visualised and edited using the STADEN package (version 4.5; developed by Roger Staden et al.; http://www.mrc-lmb.cam.ac.uk/pubseq/staden_home.html).

2.6 Automated bioinformatic annotation

Fully automated annotation was carried out using the ConSequence™ software system provided by Qiagen (based on Pedant-Pro™ from Biomax Informatics AG) [28].

3 Results and discussion

3.1 Genome sequencing

A BAC library with approximately >17× coverage was constructed in pBACe3.6 and characterised by end-sequencing and restriction digestion. Insert sizes of BAC clones ranged from below 50 to over 100 kb per clone. A total of 2880 BAC clones were generated with an average insert size of 65 kb. 4892 BAC end sequences were generated with 483 bases average read length (phred20). BAC-end sequencing success rate was 85.5%. In total, 213 BAC clones were selected for analysis, out of which 188 BACs representing the minimal tiling path were selected for shotgun sequencing, BAC-by-BAC. Sequencing coverage of BACs was 8.27-fold on average (Fig. 1). The number of BACs with one contig only was 162, with two contigs 15, with three contigs 9 and BACs with four contigs were 2.

Figure 1

Summary of sequencing statistics.

3.2 Genome assembly

The BAC library constructed covers the genome 18-fold. 4892 BAC-end sequences from those clones yielded approximately 2.4 Mb of raw data, covering 25% of the genome (at 1×). On average, every 2 kb one BAC-end sequence is located on the genome, suggestive of an estimating genome size of about 9.78 Mb. Pulsed-field gel electrophoresis of H. polymorpha RB11 chromosomes revealed six bands and the sum of the molecular masses of chromosomal DNA bands suggested a genome size of about 9–10 Mb [5] (Table 1). Mapping the end sequences onto the growing and eventually final genomic sequence showed a very even distribution of those end sequences with no local clustering, underlining the good random cloning of large genomic sub-fragments into this BAC library. The only exception were clones and end sequences falling into the rDNA region of the genome. There were no further large repetitive regions noticed. Smaller repeat regions have all been resolved for each individual BAC. Further, no repeats within BAC/BAC overlapping regions, potentially confounding a correct BAC-to-BAC assembly, were found. In addition to the BAC-to-BAC assembly based on overlapping regions, all BAC-end sequences with their forward/reverse constraints per clone as well as sizing information for individual BAC clones were used to layer a BAC map on top of the resulting assemblies. The consistency of the assembly was checked on the back of that BAC map for each BAC/BAC overlap and assembly. No discrepancies were detected between a single BAC/BAC overlap assembly and the BAC map backbone.

View this table:
Table 1

Overview of genome organisation and assembled sequences in supercontigs

Chromosome karyotypeSize (Mb)Chromosome markerSequencing supercontigSize (bp)
I0.95URA3; CPY (PRC1); GAP5968 770
II1.25rDNA (5.8S, 18S, 26S)6983 699
III1.5HARS111 220 583
IV1.7PEP4 (PRA1); TPS181 290 524
V1.9MOX41 306 376
VI2.2FMD31 494 936
2218 529
71 250 065
69.588 733 482

The genome was assembled into 48 contigs and could be logically joined using clones physically bridging known gaps to eight supercontigs with a unique total size of 8.733 Mb from the six known chromosomes with assigned gene markers to electrophoretically separated chromosomes [5] (Table 1 and Fig. 2). Sequence overlaps between individual BACs with a total size of 1.521 Mb (approximately 15% of the total sequence generated) were used to measure the sequencing accuracy. It was determined to 99.998% or fewer than 1.75 errors in 100 kb. As the same technologies, expertise and work scheme were applied for all sequencing work, we conclude from this analysis that more than 90% of the total genome was sequenced with this high accuracy of 99.998%. The estimated 10% of the genome not yet sequenced includes telomeric regions, approximately 45–50 additional rDNA repeats (with a total of approximately 0.3 Mb only), and small gaps, some of which are indicated as boxes in Fig. 2. These results indicated that using end sequencing as a way to map the BAC clones allowed for high accuracy and eventual direct alignment onto the assembled genomic contigs as well as sequence comparisons between all sequences obtained (BACs but also shotgun sequences from three different shotgun libraries with inserts in the 1, 3 and 6–8 kb range) during the course of the project.

Figure 2

Overview of supercontigs. The framed numbers within a stretch of BACs representing the respective supercontigs indicate the approximated size of a particular gap between neighbouring ends.

3.3 Genome organisation

The Pedant-Pro™ Sequence Analysis Suite was used for gene identification. Out of the sequenced 8.73 Mb, 5933 ORFs have been extracted for proteins longer than 80 amino acids. ORFs whose sequence is entirely contained within another reading frame have been excluded from the analysis. 70 shorter ORFs (<80 amino acids) with significant BLAST similarities have been extracted manually. 4767 ORFs show significant similarities to a non-redundant protein database. Out of the 4767 ORFs with similarities, 4109 showed significant similarity to ORFs from S. cerevisiae. The remaining 1166 ORFs have no significant similarities to known sequences. 410 ORFs are shorter than 100 amino acids. The numbers are not comparable due to different automatic gene-prediction methods and due to the different genomes. Only after an in-depth analysis will an evaluation of the number of questionable ORFs be possible and will maybe reduce the number of ORFs shorter than 100 amino acids. Calculation of the gene density and protein length, taking into account the gene numbers, showed an average length of 1472 bp and an average protein length of 437 amino acids. No experiments have been performed so far for the evaluation of these predicted numbers.

Introns have been identified by homology to known proteins and confirmed by using GeneWise [29]. In a preliminary analysis 91 intron-containing genes were identified in this way. These include all genes identified previously [3] as intron-containing genes. 80 tRNAs were identified, corresponding to all 20 amino acids. From approximately 50 rRNA clusters [5], seven clusters have been fully sequenced. All clusters are completely identical and have a precise length of 5033 bp. Although representing only 10% of the estimated total number of rDNA repeats to be present in H. polymorpha, the seven fully sequenced rDNA repeats are absolutely identical.

The main functional categories and their distribution in the gene set are automatically predicted for: transposable elements, 1%; energy, 5%; cellular communication, signal transduction mechanism, 6%; protein synthesis, 6%; cell rescue, defense and virulence, 9%; cellular transport and transport mechanisms, 12%; cell cycle and DNA processing, 12%; protein fate (folding, modification, destination) 12%; transcription, 14%; and metabolism, 23% (Fig. 3). Localisation was assigned to 2858 ORFs.

Figure 3

Functional comparison of S. cerevisiae and H. polymorpha gene content (general functional categories).

3.4 Comparison with S. cerevisiae sequences

The comparative genomic analysis of closely related organisms allowed us to identify species-specific genes and permitted us to estimate the rates of sequence divergence of the derived proteins. Comparing the genomic organisation of S. cerevisiae to that of H. polymorpha reveals differences and similarities at different levels (Table 2 and Figs. 3 and 4). The overall H. polymorpha genome exhibits a GC content of 47.9% compared to 38.1% found for the S. cerevisiae genome. The amino acid composition properties are essentially driven by GC content. The size of the genome of S. cerevisiae is 13.5 Mb (sequenced non-redundant genome length 12 156 kb) in comparison to the 9.5 Mb (sequenced non-redundant genome length 8733 kb) of H. polymorpha. For the comparison of H. polymorpha to S. cerevisiae we have used the MIPS comprehensive yeast genome database CYGD [30]. It includes 6449 genes. Out of these, 471 genes are marked as questionable. As the exact gene number of S. cerevisiae is still under debate [3] in the literature, we have taken all MIPS genes into account for the comparisons. S. cerevisiae contains 6449 ORFs with an average distance of 1885 bp in comparison to 5933 ORFs in H. polymorpha with an average distance of 1472 bp. The gene density in H. polymorpha appears higher than that in S. cerevisiae when correlating the number of ORFs in the two organisms with the size of the respective genomes. An exhaustive synteny analysis has been performed between H. polymorpha and S. cerevisiae. It revealed up to eight syntenic proteins in both organisms. Six clusters were found to contain six syntenic proteins; two clusters were found to contain seven syntenic proteins and one cluster contains eight syntenic proteins (Table 3).

View this table:
Table 2

Comparisons of the S. cerevisiae and H. polymorpha genomes

S. cerevisiaeH. polymorpha
Genome size (Mb)13.5∼9.5
Sequenced non-redundant genome length (bp)12 156 3078 733 442
GC content (%)38.147.9
Number of ORFs (with similarities)6449 (5978)5933 (4767)
Average ORF distance (bp)18851472
Average protein length (aa)471437
Number of tRNAs27880
Figure 4

Functional comparison of S. cerevisiae and H. polymorpha gene content (functional categories of metabolism).

View this table:
Table 3

Synteny analysis between H. polymorpha and S. cerevisiae

H.p. BACH.p. ORFBLAST E valueS.c. ORFS.c. DescriptionS.c. Chr.
cqbh_00orf1297.00E−24ypr185wAPG13 – protein required for the autophagic process16
cqbh_00orf1584.00E−57ypr186cPZF1 – TFIIIA (transcription initiation factor)16
cqbh_00orf1552.00E−39ypr187wRPO26 – DNA-directed RNA polymerase I, II, III 18 kDa subunit16
cqbh_00orf1350.0ypr189wSKI3 – antiviral protein16
cqbh_00orf1214.00E−69ypr190cRPC82 – DNA-directed RNA polymerase III, 82 kDa subunit16
cqbh_00orf1176.00E−50ypr191wQCR2 – ubiquinol-cytochrome-c reductase 40 kDa chain II16
cqgr.00orf1291.00E−42ylr403wSFP1 – zinc finger protein12
cqgs.00orf1431.00E−101ylr405wsimilarity to Azospirillum brasilense nifR3 protein12
cqag_00orf1483.00E−44ylr406cRPL31B – 60S large subunit ribosomal protein L31.e.c1212
cqhn.00orf1614.00E−10ylr407whypothetical protein12
cqgr.00orf1680.0ylr409cstrong similarity to Schizosaccharomyces pombeβ-transducin12
cqhm.00orf1770.0ylr410wVIP1 – strong similarity to S. pombe protein Asp1p12
cqan_00orf3622.00E−12yjr086wSTE18 – GTP-binding protein γ subunit of the pheromone pathway10
cqan_00orf3575.00E−21yjr088cweak similarity to S. pombe hypothetical protein SPBC14C8.18c10
cqan_00orf3241.00E−123yjr090cGRR1 – required for glucose repression and for glucose and cation transport10
cqan_00orf3043.00E−77yjr091cJSN1 – suppresses the high-temperature lethality of tub2-15010
cqan_00orf2481.00E−32yjr092wBUD4 – budding protein10
cqan_00orf2311.00E−14yjr093cFIP1 – component of pre-mRNA polyadenylation factor PF I10
cqan_00orf2301.00E−24yjr094w-aRPL43B – 60S large subunit ribosomal protein10
cqga.00orf275.00E−38ygr091wPRP31 – pre-mRNA splicing protein7
cqga.00orf191.00E−167ygr092wDBF2 – ser/thr protein kinase related to Dbf20p7
cqga.00orf154.00E−46ygr093wsimilarity to hypothetical S. pombe protein7
cqga.00orf300.0ygr094wVAS1 – valyl-tRNA synthetase7
cqga.00orf423.00E−28ygr095cRRP46 – involved in rRNA processing7
cqga.00orf567.00E−45ygr096wsimilarity to bovine Graves disease carrier protein7
cqfq.00orf754.00E−30ygl191wCOX13 – cytochrome-c oxidase chain VIa7
cqfq.00orf721.00E−145ygl190cCDC55 – ser/thr phosphatase 2A regulatory subunit B7
cqfq.00orf708.00E−33ygl189cRPS26A – 40S small subunit ribosomal protein S26e.c77
cqfq.00orf684.00E−40ygl187cCOX4 – cytochrome-c oxidase chain IV7
cqfq.00orf645.00E−36ygl185cweak similarity to dehydrogenases7
cqfq.00orf603.00E−87ygl184cSTR3 – strong similarity to Emericella nidulans and similarity to other cystathionine β-lyase and Cys3p7
cqav_00orf2728.00E−24ygl111wweak similarity to hypothetical protein S. pombe7
cqav_00orf2762.00E−55ygl110csimilarity to hypothetical protein SPCC1906.02c S. pombe7
cqav_00orf3301.00E−25ygl106wMLC1 – Myo2p light chain7
cqav_00orf3158.00E−69ygl105wARC1 – protein with specific affinity for G4 quadruplex nucleic acids7
cqav_00orf2942.00E−62ygl103wRPL28 – 60S large subunit ribosomal protein L27a.e7
cqav_00orf2922.00E−14ygl102cquestionable ORF7
cqav_00orf2631.00E−100ygl100wSEH1 – nuclear pore protein7
cqbp_00orf172.00E−46ydr447cRPS17B – ribosomal protein S17.e.B4
cqbp_00orf211.00E−125ydr448wADA2 – general transcriptional adapter or co-activator4
cqbp_00orf265.00E−67ydr449csimilarity to hypothetical protein S. pombe4
cqbp_00orf752.00E−63ydr450wRPS18A – ribosomal protein S18.e.c44
cqbp_00orf685.00E−15ydr451cYHP1 – strong similarity to Yox1p4
cqbp_00orf1891.00E−135ydr452wPHM5 – similarity to human sphingomyelin phosphodiesterase (PIR:S06957)4
cqaq_00orf2162.00E−22ydr362cTFC6 – TFIIIC (transcription initiation factor) subunit, 91 kDa4
cqaq_00orf2022.00E−75ydr365cweak similarity to Streptococcus M protein4
cqaq_00orf1913.00E−23ydr367wsimilarity to hypothetical protein SPAC26H5.13c S. pombe4
cqaq_00orf1652.00E−96ydr372csimilarity to hypothetical S. pombe protein4
cqaq_00orf1801.00E−139ydr375cBCS1 – mitochondrial protein of the CDC48/PAS1/SEC18 (AAA) family of ATPases4
cqaq_00orf2511.00E−104ydr380wARO10 – similarity to Pdc6p, Thi3p and to pyruvate decarboxylases4
cqaq_00orf2451.00E−20ydr381wYRA1 – RNA annealing protein4
cqaq_00orf2362.00E−20ydr382wRPP2B – 60S large subunit acidic ribosomal protein4
cqdw.p1orf2177.00E−82ydr061wsimilarity to E. coli modF and photorepair protein phrA4
cqdw.p1orf2080.0ydr062wLCB2 – serine C-palmitoyltransferase subunit4
cqdw.p1orf2711.00E−28ydr067csimilarity to YNL099c4
cqdw.p1orf2284.00E−90ydr069cDOA4 – ubiquitin-specific protease4
cqdw.p1orf2575.00E−34ydr071csimilarity to Ovis aries arylalkylamine N-acetyltransferase4
cqdw.p1orf2507.00E−57ydr072cIPT1 – mannosyl diphosphorylinositol ceramide synthase4

Overall, 80 nuclear tRNA genes were identified in the H. polymorpha genome sequence (Table 4), in comparison to S. cerevisiae where 278 tRNA genes have been found. Despite these differences, both yeasts have nearly the same amount of different tRNA species, in H. polymorpha 40, in S. cerevisiae 41. The lower number of tRNA genes in H. polymorpha is consistent with the tRNA analysis of RST sequences from Pichia sorbitophila[3], a close relative of H. polymorpha. One-third of the P. sorbitophila genome was found to contain 23 nuclear tRNA genes only. The estimated number for the complete P. sorbitophila genome (∼70) is thus comparably low.

View this table:
Table 4

Nuclear tRNA genes identified in the H. polymorpha genome

tRNA speciesAnticodonH. polymorphaS. cerevisiae
Different tRNAs4041

The identification of relevant genes of the mating system and pheromone signal transduction pathway are shown in Table 5. Data analyses indicate that H. polymorpha contains several genes attributed to the regulation of mating, such as STE3, STE6, GPA1, STE18, CDC42, STE50 and STE11. These data suggest that a conserved mitogen-activated protein kinase pathway might regulate mating in H. polymorpha. In addition, the data analyses indicate that H. polymorpha contains a gene that corresponds to the mating type regulatory protein gene at the HMR locus of Kluyveromyces lactis (HMRa1). The cryptic mating type loci like HMRa1 in S. cerevisiae and K. lactis act as reservoirs of mating type information in mating type switching in homothallic yeast strains. The function of this homologue in H. polymorpha remains unknown.

View this table:
Table 5

Mating-specific genes in H. polymorpha

Hp_ORFAA lengthBLAST hitAA lengthBLASTP scoreFunction
BJ_37215Kl_YCR097w126154mating-type regulatory protein, silence copy at HMR locus
BO_26433Sc_STE3470509pheromone a-factor receptor
CA_1301227Sc_STE612901167ATP-binding cassette transporter protein
BI_65700Sc_STE11738690pheromone response
AG_50398Sc_STE50364223pheromone response
AN_362127Sc_STE18110126G protein γ subunit
AY_145295Sc_GPA1472130G protein α subunit
AL_42197Sc_CDC42192248G protein


Erika Wedler, Kathleen Balke, Nicole Lokmer, and Dörte Möstl are acknowledged for their excellent technical work during the entire DNA sequencing phase of the project.


  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
View Abstract