OUP user menu

Concordant gene regulation related to perturbations of three GDP-mannose-related genes

Anssi Törmä, Juha-Pekka Pitkänen, Laura Huopaniemi, Pirkko Mattila, Risto Renkonen
DOI: http://dx.doi.org/10.1111/j.1567-1364.2008.00461.x 63-72 First published online: 1 February 2009


Glycosylation of proteins is one of the most crucial post-translational modifications. In order to access system-level and state-dependent data related to the regulation of glycosylation events, we cultivated yeast cell strains each harboring a selected conditional knockdown construct for a gene (either SEC53, VRG4 or DPM1) related to GDP-mannose synthesis or its utilization in glycan biosynthesis. In order to carry this out efficiently, we developed automated sampling from bioreactor cultivations, a collection of in silico workflows for data analysis as well as their integration into a large data warehouse. Using the above-mentioned approaches, we could show that conditional knocking down of transcripts related to GDP-mannose synthesis or transportation led to altered levels of over 300 transcripts. These transcripts and their corresponding proteins were characterized by their gene ontology (GO) annotations, and their putative transcriptional regulation was analyzed. Furthermore, novel pathways were generated indicating interactions between GO categories with common proteins, putative transcriptional regulators of such induced GO categories, and the large protein–protein interaction network among the proteins whose transcripts indicated altered expression levels. When these results are always added to an ever-expanding data warehouse as annotations, they will incrementally increase the knowledge of biological systems.

  • gene regulation
  • conditional knockdown
  • perturbation
  • gdp-mannose
  • yeast


Glycosylation of proteins is one of the most crucial post-translational modifications within a cell (Lowe, 2003; von Andrian, 2003; Ley & Kansas, 2004; Rudd, 2004; Raman, 2005). In order to access system-level and state-dependent data related to the regulation of glycosylation events, we initiated an effort where we could cultivate yeast cell strains each harboring a selected conditional knockdown construct for a gene-related to GDP-mannose synthesis or its utilization in glycan biosynthesis (Fig. 1). The genes were (1) SEC53 (alias ALG4 or YFL045C), coding for phosphomannomutase (PMM_YEAST), which is involved in the synthesis of GDP-mannose and dolichol-phosphate-mannose, (2) VRG4 (alias VAN2, GOG5, LDB3, VIG4 or YGL225W), coding for Golgi GDP-mannose transporter (GOG5_YEAST), and (3) DPM1 (alias SED3 or YPR183W), coding for dolichol-phosphate mannosyltransferase (DPM1_YEAST) transferring mannose from GDP-Man to Dol-P.


Pathway from glucose to GDP-mannose, and the genes, which were tagged with a conditionally repressible promoter (marked with red X).

A lot of focus in biological research is rapidly shifting from individual genes and proteins to entire biological systems and this requires large-scale experimentation both in the wet laboratory as well as in in silico analysis (Ihmels, 2002, 2004; Rives & Galitski, 2003; Barabasi & Oltvai, 2004; Tanay, 2004; Tong, 2004; Cusick, 2005). To be able to carry out large-scale experimentation within chemostat bioreactor environments allowing well-controlled time-series sampling after perturbations, we first developed an automated sample-taking robot, which was crucial for the collection of large sets of time-series specimens related to perturbation from each cultivation vessel (http://www.medicel.com/products/explorer.html).

After analysis of a small fraction of the specimens with only one data domain, i.e. transcriptomics we then used a new data integration software platform to make use of wet lab data generated by others and integrated our data with it. The cornerstone in this is the fully integrated data warehouse currently harboring over 20 individual databases, such as BIND, EMBL, ENSEMBL, GO, InterPro, KEGG, LIGAND, SGD, SWISS-PRO, etc. Using these integrated data, we then built a ‘mother’ pathway where genes–mRNA–proteins–metabolites are all linked automatically together and various levels of interactions such as protein–protein, DNA–protein and protein–metabolite can be visualized together with the more classical definitions of pathways.

This integrated approach allowed us to make use of the already existing data in various databanks and the fact that all our own state-dependent data were also stored in the same data warehouse. Using in silico workflows, we could rapidly start analysis of the generated wet lab data in relation to the already known information in the data warehouse (http://www.medicel.com/products/integrator.html).

Here we show for the first time the use of such a combination of an effective semi-automated wet lab effort with an efficient in silico analysis environment. We could rapidly show that, using this integrated software platform, the traditional transcriptomics data analysis can be performed, such as normalizations, identification of changes in transcript levels, gene ontology (GO) categories, etc. However, much more needs to be done and thus we can also connect the reacting GO categories to pathways with the common proteins within various GO categories, find putative transcription factors responsible for causing the modified response observed within the reactive GO categories or reacting genes within these categories, connect transcriptionally regulated genes on a pathway, etc. Importantly, all our in silico observations are also imported to the data warehouse and thus also contribute to the existing knowledge.

Materials and methods

Construction of yeast strains

The tetracycline-regulated activator/repressor dual system used in this work has been described (Belli, 1998a, b). First, the vector pCM244 was integrated into the LEU2 – locus of the W303 strain of Saccharomyces cerevisiae. This yielded the control strain W303-24a expressing the tetracycline-activated tetR′-Ssn6 repressor. The conditional knockout (cKO) strains W303–tetR–SEC53, W303–tetR–VRG4 and W303–tetR–DPM1 were all of the genotype MATa ura3-1 ade2-1 his3-11,15 trp1-1 leu2-3,112 can1-100 CMVp(tetR-SSN6)::LEU2 tetO2(SEC53, VRG4 or DPM1, respectively, kanMX4). They were all constructed by replacing the native SEC53, VRG4 or DPM1 promoter, respectively, in the W303-24a strain with a hybrid promoter cassette containing the KanMX4 – gene conferring G418 resistance, the tetracycline-inactivated transactivator (tTA) gene and the tetO2-CYC1TATA promoter sequence. Integration of the promoter cassette was carried out using the one-step PCR targeting method.

Cultivation protocol

The experiments were performed in an aerobic and glucose-limited chemostat system. The cultivation medium was a modification of the mineral medium (Verduyn, 1992; Pitkanen, 2004). The concentration of glucose was 15 g L−1 and the main sources of nitrogen, phosphorus and sulfur were 10 g L−1 (NH4)2SO4, 6 g L−1 KH2PO4 and 2 g L−1 MgSO4·7H2O, respectively. In addition, histidine, tryptophan, adenine, uracil, trace elements and vitamins were included. Fresh medium was fed from a bottle placed on a scale using a peristaltic pump. The temperature was set at 30 °C, stirring at 1000 r.p.m., aeration at 1.0 L min−1 and the feed rate at 106 g h−1. pH was held at 5.0±0.2 with NaOH (2.5 N) and H3PO4 (2 N). The volume of the broth was held at c. 1.9 L by pumping out liquid through an overflow tube connected to a peristaltic pump. The dilution rate (feed rate/liquid volume) was 0.056 h−1 and the residence time (liquid volume/feed rate) was 18 h. After five residence times, a steady state was assumed attained and the system was ready for perturbation.

In order to track the dynamics of the cKO strains, samples for the determination of transcript levels were taken automatically at frequent, regular intervals before and after the steady state was attained. After that, a doxycycline pulse was added to the fermenter. This repressed the SEC53, VRG4 and DPM1 genes, respectively, in the cKO strain. The initial concentration of doxycycline was 1.1 mg L−1, given as 2.1 mg in 10 mL of sterile water. As feed was not stopped, the concentration of doxycycline in the fermenter was diluted exponentially as a function of time.

Sampling protocol

Daily sampling consisted of measuring the OD and the cell dry weight of the culture. In order to track the dynamics of the cKO of SEC53, VRG4 and DPM1 samples for the determination of transcript levels, samples were taken automatically at frequent, regular intervals before and after the perturbation (−12, −6, −2, 0, 0.5, 1, 2, 4, 6, 10, 20, 30, 40, 44, 50 and 60 h). Samples for mRNA analysis were sprayed into precooled (−38 °C) test tubes containing 30 mL of a 70% methanol–water solution. The purpose of this was to stop the chemical reactions in the cells by low temperature and solvent denaturation of enzymes.

A computer-controlled culturing and sampling device Medicel Explorer was built and used for automated growth, environmental control and sampling (http://www.medicel.com/products/explorer.html). The sampling carousel was in a −38 °C ethanol bath and 15-mL samples were withdrawn on 30-mL 70% methanol. For measurement of transcript levels, 2 mL of a thoroughly mixed methanol/sample suspension was transferred into separate centrifuge tubes, the liquid was removed by centrifuging for 5 min at 16 100 r.c.f. at 4 °C and the cell pellets were stored at −80 °C.

Measurement of gene expression levels

The total RNA extraction was performed and the gene expression levels were measured using quantitative real-time PCR assays as described previously (Huopaniemi, 2004; Pitkanen, 2004). Global gene expression levels were measured using Affymetrix GeneChip Yeast Genome S98 oligonucleotide microarrays. cRNA targets were prepared according to Affymetrix's instructions (http://www.affymetrix.com). Scanning and calculation of probe signals were performed using an Affymetrix GeneArray 2500 Scanner and genechip operating software version 1.0, respectively. The microarray data are freely available at http://www.medicel.com.

Processing of microarray data

Preprocessing data files from the genechip operating software were imported into the medicel integrator platform essentially as shown previously (Pitkanen, 2004). For each microarray, the probe-level signals were mapped into mRNAs, rRNAs and snRNAs (6140 transcripts altogether). The signals were averaged in case of multiple probes mapping into the same transcript. If possible, only ‘_at’ types of probes were used and other types were discarded. For transcripts with no ‘_at’ probes, signals from all probe types were averaged. Each averaged microarray data set was then normalized to its median signal. Two kinds of data sets were created from each microarray time-series – a signal-to-noise ratio series and a log 2 fold-change series.

The signal-to-noise series were calculated transcript-wise with the formula Embedded Image where y(t) is the signal of a transcript at time t, Embedded Image is the average signal of the transcript measured before perturbation and scomb (t≤0) is the combined SD of signals before perturbations. The combined SD was calculated using Embedded Image where Embedded Image is the SD of the signals of a transcript before the perturbation in experiment i and ni (t≤0) is the number of such data points in experiment i.

The log 2 fold-change series were calculated transcript-wise with the formula Embedded Image. Identification of reacting transcripts

A transcript was classified as highly reacting when it fulfilled the following three criteria:

  1. In any cKO experiment at t>0: min (sn(t))<−4 or max (sn(t))>4, where avg and std refer to the averages and SDs across all the microarray experiments.

  2. In any cKO experiment at t>0: max (fc(t)−fcctrl(t))>1 or min (fc(t)−fcctrl(t))<−1, depending on which of the criteria in (1) was fulfilled.

  3. For the same transcript in the control experiment at all time points: −3<snctrl(t)<3.


All protein–protein interactions were imported from BIND (Alfarano, 2005a, b) into the Medicel database. Protein–DNA interactions were imported from two high-throughput chromatin-immunoprecipitation data sets (Lee, 2002; Harbison, 2004).

Interaction density

The interaction density D was defined as the fraction of observed edges E signifying protein–protein interactions out of the maximum in a graph of N proteins as nodes: Embedded Image

Enrichment calculations

The enrichment of a particular set of biological objects in a set of objects with a certain property was characterized by three variables: set coverage, property responsiveness and fold-enrichment. The variables are defined in Fig. 2.


Definitions of the variables, property responsiveness and set coverage.


We used the new medicel integrator software platform for all the data analysis and interpretation (http://www.medicel.com/products/integrator.html).

Results and discussion

Identification of transcripts regulated in response to dynamic knockouts

The idea for the whole experimentation was to generate well-controlled perturbations in yeast cells cultured in a chemostat i.e. under static conditions where the transcript levels can be assumed to be as constant as possible before the system was perturbed. A time-series of specimens for mRNA transcriptome analysis was collected before and after the perturbation. To verify that the cKO constructs worked properly i.e. that the doxycycline pulse downregulated the modified genes, we first analyzed mRNA specimens with quantitative reverse transcriptase-PCR analysis. All the corresponding transcripts, SEC53, VRG4 and DPM1 mRNA, rapidly decreased below 10% of the initial values within 30–60 min after the doxycycline pulse (Fig. 3). As the doxycycline was gradually washed out from the media inside the bioreactor, we could also detect the recovery phase of these transcripts after 48 h.


Quantitative reverse transcriptase-PCR results of the downregulation of the conditional knockdown genes after doxycycline pulse. As the concentration decreased with time, the recovery phase began around 48 h after the pulse. The control strain had no cKO constructs and thus did not shown a systematic decrease of the transcript levels of SER53, VRG4 or DPM1 (lower right panel).

To identify transcripts induced or repressed in response to dynamic knockout of transcripts in the glycosylation pathway (either SEC53, DPM1 or VRG4), we collected mRNA specimens from these cultivations and hybridized them on whole yeast genome chips. The averages and SDs of steady-state mRNA expression levels could be interpreted as baseline signal and noise, respectively. We collected a time series of 15 specimens from each conditional knockdown strain, i.e. starting 8 h before the doxycycline-induced downregulation of the given gene and 64 h after the perturbation. A transcript was classified as reacting if it fulfilled certain criteria for detection ability and magnitude of maximum or minimum deviation from the chemostat steady state and if the deviation was greater than that in the control experiment. Altogether, 392 transcripts were found to react to the doxycycline-induced shutting down of one or more of the three glycosylation-related transcripts.

In order to shed some light on the putative biological functions of these reacting genes, their GO enrichment analysis was performed. The GO project is also part of a larger classification effort, the Open Biomedical Ontologies (OBO, 2008). GO provides a controlled vocabulary to describe gene and gene product attributes in any organism, here in S. cerevisiae. The distribution of the group of perturbed genes (n=392), modified even at a single time point, is compared with the distribution of all yeast genes and the most enriched GO categories are depicted (Fig. 4). Only those proteins are listed, that represented the strongest GO enrichment categories. Not all genes had a GO classification and thus the total number of genes in Table 1 is less than the number of altered genes (n=392).


The enrichment of GO Slim categories of protein products of the reacting transcripts after the doxycycline-induced repression of the tagged genes compared against the proteins produced by all the measured transcripts. The number of genes in each colored lobe of the Venn diagram is displayed. The three most enriched GO Slim categories and their fold enrichment are shown separately for the 41 genes that displayed altered regulation after all three perturbations and to all genes that were regulated by two independent perturbations.

View this table:

Enriched GO Slim categories of the set of proteins with changing transcript levels

GO Slim categoryFold enrichmentCategory responsiveness (%)Number of responding proteins
Pseudohyphal growth3.0519.08
Cell wall2.7016.816
Vitamin metabolism2.1013.18
Carbohydrate metabolism1.9912.48

Furthermore, we calculated the values of three variables describing the enrichment of the proteins into Yeast GO Slim categories (GO, 2008; SGD, 2008) against a background consisting of all yeast proteins with cognate probes in Affymetrix YG-S98 chips (Table 1). The set of reacting transcripts overlapped with others by 66% in the SEC53 cKO experiment, 56% in the DPM1 cKO experiment and 67% in the VRG4 experiment.

We also computed the enrichments of the proteins into protein families with certain sequence features (Table 2). These features also contained, among others, stress-induced protein SRP1/TIP1, amino acid/polyamine transporters I and lipid moiety-binding region. This is another way to classify the transcripts and their corresponding proteins, which display altered expression levels related to the perturbations.

View this table:

Enriched categories of UniProt sequence features of the set of proteins with changing transcript levels

Sequence featureFold enrichmentFeature responsiveness (%)Set coverage (%)
InterPro/IPR000992=Stress-induced protein SRP1/TIP13.7023.11.6
InterPro/IPR002293=Amino acid/polyamine transporter I3.2120.01.3

The enriched GO categories lipid metabolism, oxidoreductase activity and peroxisome may be associated with an unfolded protein response (UPR). Glycosylation affects protein folding and cells respond to defects in protein folding with UPR, as reviewed, for example in Schroder & Kaufman (2005). Activities in UPR include, among others, acceleration of phospholipid synthesis to increase the size of the ER and upregulation of the antioxidant capacity of the cell to counteract the increased formation of reactive oxygen species (e.g. H2O2) as a result of increased protein disulfide bonding. The high responsiveness of a property suggests that the property (e.g. GO/peroxisome) could be an important part of the system that reacting transcripts might encode.

A number of developmental processes, i.e. pseudohyphal growth, sporulation, conjugation and morphogenesis, were also enriched within the pool of reacting transcripts. Cullen (2000) have previously made similar observations with mutants defective in glycosylation (including e.g. dpm1). They suggested that the signalling pathway Sho1→Ste20/Ste50→Ste7→Kss1→Ste12 is responsible for the activation of FUS1. This pathway includes components from MAPK pathways regulating osmotic balance, pheromone response and filament growth, which is in agreement with our observations. Cullen, (2004) later found out that a heavily glycosylated plasma membrane signalling protein called Msb2 is at least partially responsible for the activation of filament growth pathways. Underglycosylation may disturb the interactions of Msb2 with MAPK pathway components and it is conceivable that glycosylation defects can affect other signalling proteins in the same manner.

Time-series behavior of over-represented GO categories

In the previous step, we identified a set of GO categories that were well represented among the transcripts reacting to knockdown of glycosylation genes. To distinguish between primary and secondary effects and to compare the responses to different perturbations, we calculated the average expression signal-to-noise ratios for transcripts in the categories in Table 1 as functions of time and perturbed gene (Supporting Information, Fig. S1).

Within SEC53 and DPM1 knockdown experiments, a set of 11 reacting transcripts encoding conjugation proteins stood out as being the first to be induced (mean expression consistently increasing) and the first to reach its maximum. The second wave of a set of reacting transcripts to reach maximum expression consisted of transcripts encoding proteins located in the cell wall or involved in pseudohyphal growth or morphogenesis. The target cKO genes were driven down by the doxycycline perturbation, as shown by a rapid decrease in mRNA (Fig. 3), and it then takes time to remove the corresponding proteins from the cell, after which, alterations in other gene expression profiles can be observed (Fig. S1).

Further in silico analysis

Research of complex systems is iterative. For this reason, it is important to integrate the information obtained from various analyses. However, lists of differentially expressed transcripts and sets of numeric enrichment data are certainly not ideal for conducting biological research and attempting to provide an understanding of the data. As we had generated the integrated data warehouse with a structured information model (http://www.medicel.com), it allowed us to begin to build pathway representing biology at multiple levels of organization [e.g. genes–mRNA–proteins (enzymes)–metabolites–, etc.], and degrees of knowledge (e.g. physical interactions, precise reactions with kinetic laws, hypothetical connections, Fig. S2).

The transcripts showing altered expression levels after perturbations were enriched to GO Slim categories. Here we show how these GO categories are connected to each other with common proteins. Thus, we built a hierarchical pathway containing the enriched GO Slim categories and their protein members, which reacted to any of the three dynamic glycosylation knockouts (Fig. 5). The category members were placed in their own subpathways (accessible from the main pathway), which were connected to the categories by special connections, signifying that the pathway is inside an element. Furthermore, proteins that were members of two or more enriched categories were also copied to the main pathway and connected to the corresponding category pathways, with connections signifying that an element is common to two pathways (Fig. 5). This way it is much easier to study the overlap between enriched categories than with lists and Venn diagrams. The pathway was built using automated and reusable workflows in the medicel integrator (http://www.medicel.com/products/integrator.html).


The transcripts showing altered expression levels after perturbations were enriched to GO Slim categories. Here we show how these GO categories are connected to each other with common proteins. The common reactive transcripts enriched into GO categories such as vitamin metabolism, peroxisome and cell wall.

Transcriptional regulators

Partial evidence for a transcription factor being active is that many of its possible targets are induced or repressed and that the expression levels of these transcripts correlate. We defined the full set of genes regulated by a transcription factor as the set of genes bound by the factor at a permissive threshold of P<0.05 in the high-throughput chromatin immunoprecipitation data by Lee (2002) and Harbison (2004). Next we computed the enrichments between the target sets and the set of reacting transcripts, and identified transcription factors with over 1.33-fold enrichment. Finally, we computed the Pearson correlations over the three time-series generated in the cKO experiments for all pairs of reacting transcripts and compared the correlations in the set of reacting transcripts with each of its intersection with a target set of a transcription factor identified by enrichment. A transcription factor was classified as active if the mean absolute correlation in its reacting target subset was higher than that of the whole set of reacting transcripts and if the difference was statistically significant by the Kruskal–Wallis test (a nonparametric test for the difference of medians of two distributions). Table 3 lists the transcription factors classified as active, along with the corresponding medians of absolute Pearson's correlations. The median absolute correlation of all pairs of reacting transcripts was 0.226.

View this table:

Transcription factors classified as activated in response to dynamic knockout of SEC53, DPM1, or VRG4, by enrichment between their target transcript sets (ChIP P<0.05) (Lee, 2002; Harbison, 2004) and the set of reacting transcript level and by increased median absolute Pearson's correlation of reacting target transcript pairs with respect to all reacting transcript pairs

Transcription factorProcess regulatedFold enrichmentSet coverage (%)
SUT1Sterol uptake1.7310.2
NRG1Glucose repression1.5621.2
TEC1Filament growth1.5311.3
INO4Phospholipid synthesis1.5214.6
YIK1Stress response1.5222.3
  • All the differences were statistically significant by P<0.000012.

These results identified several transcription factors, which were suggested to be active, such as the TEC1 and STE12 involved in filament growth and mating, i.e. belonging to the transcription factors regulating GO Slim category conjugation. This analysis could not pinpoint any single transcriptional regulator for being responsible for the changes seen in the transcriptome after perturbations.

This was expected, because transcriptional regulation is combinatorial and adaptation to new environments is fine-tuned by the concerted action of several transcription factors. Indeed, only 9–12% of all the targets of the transcription factors suggested to be active were identified as transcriptionally reacting (Table 3). To investigate how well the regulation of transcripts in the enriched GO categories in Table 3 could be explained by a few transcription factors, we simply repeated the transcription factor enrichment analysis for each of the categories. This time, however, we set a strict threshold for enrichment, fold enrichment >3 and set coverage >60%, in order to identify transcription factors with a high explanatory power (Table 4). We also added the transcription factors as hypothetical regulators to the reacting GO category pathway. An example of a pathway depicting transcription factors TEC1_YEAST, DIG1_YEAST and STE12_YEAST as putative regulators of GO-category conjugation is shown in Fig. 6. In this network, the transcription factor proteins in the center of the graph regulate (dotted arrows) an interaction together with genes (boxes), which yield (solid arrow) a transcript (strips). This in turn regulates other interactions out of which proteins are formed. The other inputs of this reaction (i.e. amino acids) are left out for clarity.

View this table:

Transcription factors whose targets sets are highly enriched with transcripts annotated to the indicated GO category and reacting in response to knockout of SEC53, DPM1, or VRG4

GO categoryTranscription factorSet coverage (%)Fold enrichmentResponsiveness (%)
Pseudohyphal growthYAP662.55.00.6
Signal transducer activitySKN771.43.80.4
Membrane fractionGAT183.33.80.4

Transcription factors as putative regulators to the reacting GO conjugation category. In this network, the transcription factor proteins in the center of the graph regulate (dotted arrows) an interaction together with genes (boxes), which yield (solid arrow) a transcript (strips). This in turn regulates other interactions out of which proteins are formed. The other inputs of this reaction (i.e. amino acids) are left out for clarity.

It must be noted, however, that there is more to the regulation of these transcript sets than Table 4 was able to explain. For example, the transcripts encoding peroxisomal proteins and transcripts reacting to glycosylation knockouts comprise only 0.9% of the targets of the protein UME6_YEAST. Other factors must promote the regulation of this transcript set and/or prevent the regulation of other UME6_YEAST targets.

Protein–protein interactions

Our hypothesis was that the transcripts that reacted to perturbations in glycosylation might form machineries involved in protein glycosylation, machineries that glycosylated proteins are parts of or machineries generally involved in stress reactions. We utilized all the interactions between S. cerevisiae proteins available in the BIND database (Alfarano, 2005a, b), which were imported to the Medicel database. These interactions include both affinity-purified protein complexes and yeast two-hybrid interactions. We converted affinity-purified complexes to pairwise associations between all the proteins found in such a complex. The resulting interactions are not necessarily physical contacts between proteins. Multiple observations of the same protein pair were reduced to one association to avoid bias from often-studied proteins. We then mapped transcripts detected in our perturbed cKO experiments into proteins within the Medicel data warehouse and studied them in a network of pairwise protein–protein interactions.

We then searched all the connected networks formed solely by the products of the reacting transcripts, i.e. networks where each protein is a product of a reacting transcript and can be traced to any other protein by a chain of protein–protein interactions. We then asked within the 392 reacting transcripts and their corresponding proteins how they could be directly binding to each other based on database information. As a result, we found one connected network consisting of 36 transcriptionally regulated proteins (Fig. 7), and a few other networks with a much smaller number of proteins. These kinds of network analyses are greatly facilitated by the integrated data warehouse allowing automated searches (Fig. 7).


The direct protein–protein-binding network within the 392 reacting transcripts and their corresponding proteins was assayed based on database information. As a result, we found the largest connected network consisting of 36 transcriptionally regulated proteins.

Taken together, we provide evidence that when the results of wet lab and in silico data are always added to an ever-expanding data warehouse as annotations, they will increase the knowledge of biological systems incrementally. This systems-level approach depends on the now-existing component and systems-level databanks and needs to expand to state-dependent data warehouses.

Supporting Information

Fig. S1. The time-series behavior of over representative GO categories in the cKO experiments.

Fig. S2. The schema of the Medicel data warehouse reference pathway. Here genes- mRNA- protein (enzymes) and metabolites are integrated from various source data banks to form a common reference pathway within one data warehouse (www.medicel.com/products/integrator.html).


The financial support of the Academy of Finland, Technology Development Center (TEKES), Helsinki, Sigrid Juselius Foundation, and the Helsinki University Central Hospital Fund, Helsinki, Finland, is gratefully acknowledged. The authors thank Satu Bruun, Marika Hedberg, Sirkka-Liisa Holm and Tuula Kallioinen for their indispensable help in the laboratory.


  • Editor: Hyun Ah Kang


View Abstract