The environmental consequences associated with the use of fossil-sourced fuels and chemicals have brought with it a realization that future development must move in a more sustainable direction. Currently available biofuels or renewably produced chemical, such as bioethanol or biodiesel, are produced from microbial fermentation of sugar-rich crops or by chemical conversion of natural oils or fats. However, these strategies are not sustainable in the long run as fuel and chemical production competes with food supply and arable land usage. Instead of relying on photosynthetic feedstocks that require further conversion, one can engineer photosynthetic cyanobacteria to produce a product of interest directly from CO2 and sunlight. The first part of this thesis aimed to develop new synthetic biology tools for the model cyanobacteria Synechocystis sp. PCC 6803. The second part of the thesis focused on evaluating the regulation of fatty acid synthesis in cyanobacteria, and the production of fatty acid-derived chemicals in Synechocystis.

In paper I, fusion of small affinity proteins (Affibodies) to the major type IV pili protein was shown to mediate successful surface display of the affibody. This surface display strategy was further shown to allow inter-species binding between Synechocystis and Escherichia coli or Staphylococcus carnosus displaying complementary polymerizing affibodies.

In paper II, a CRISPR-interference tool was successfully implemented in Synechocystis for inducible gene repression. Further, its multiplexing ability was proven by simultaneous repression of up to four aldehyde reductase/dehydrogenase genes. In paper III, this established CRISPRi tool was used to target and repress native pathways competing with heterologous fatty alcohol production in Synechocystis. Repressing the gene encoding the PlsX phosphate acyltransferase allowed re-direction of carbon-flux from membrane lipids to fatty alcohol production, with a final best strain producing 10.4 mg g-1 DCW octadecanol and hexadecanol.

In paper IV, the transcriptional response towards perturbations within the fatty acid synthesis pathway was evaluated for the two model cyanobacteria Synechocystis and Synechococcus elongatus PCC 7942. Preliminary results indicate that blocking fatty acid synthesis initiation/elongation causes a transcriptional response of the involved pathway genes only in S. elongatus PCC 7942, indicating differential transcriptional responses in these two strains.

In paper V, metagenomically sourced aldehyde deformylating oxygenase (Ado) variants were evaluated for their alka(e)ne synthesizing ability. Several of these novel Ado enzymes outperformed the generally well-performing Ado from S. elongatus when relating alka(e)ne production to the soluble enzyme amount.

BACKGROUND: Activating mutations in KRAS frequently occur in colorectal cancer (CRC) patients, leading to resistance to EGFRtargeted therapies. METHODS: To better understand the cellular reprogramming which occurs in mutant KRAS cells, we have undertaken a systems-level analysis of four CRC cell lines which express either wild type (wt) KRAS or the oncogenic KRAS(G13D) allele (mtKRAS). RESULTS: RNAseq revealed that genes involved in ribosome biogenesis, mRNA translation and metabolism were significantly upregulated in mtKRAS cells. Consistent with the transcriptional data, protein synthesis and cell proliferation were significantly higher in the mtKRAS cells. Targeted metabolomics analysis also confirmed the metabolic reprogramming in mtKRAS cells. Interestingly, mtKRAS cells were highly transcriptionally responsive to EGFR activation by TGF alpha stimulation, which was associated with an unexpected downregulation of genes involved in a range of anabolic processes. While TGF alpha treatment strongly activated protein synthesis in wtKRAS cells, protein synthesis was not activated above basal levels in the TGF alpha-treated mtKRAS cells. This was likely due to the defective activation of the mTORC1 and other pathways by TGF alpha in mtKRAS cells, which was associated with impaired activation of PKB signalling and a transient induction of AMPK signalling. CONCLUSIONS: We have found that mtKRAS cells are substantially rewired at the transcriptional, translational and metabolic levels and that this rewiring may reveal new vulnerabilities in oncogenic KRAS CRC cells that could be exploited in future.

The availability of proteomics resources hosting protein and peptide standards, as well as the data describing their analytical performances, will continue to enhance our current capabilities to develop targeted proteomics methods for quantitative biology. This study describes the analysis of a resource of 26,840 individually purified recombinant protein fragments corresponding to more than 16,000 human protein-coding genes. The resource was screened to identify proteotypic peptides suitable for targeted proteomics efforts, and we report LC-MS/MS assay coordinates for more than 25,000 proteotypic peptides, corresponding to more than 10,000 unique proteins. Additionally, peptide formation and digestion kinetics were, for a subset of the standards, monitored using a time-course protocol involving parallel digestion of isotope-labeled recombinant protein standards and endogenous human plasma proteins. We show that the strategy by adding isotope-labeled recombinant proteins before trypsin digestion enables short digestion protocols (<= 60 min) with robust quantitative precision. In a proof-of-concept study, we quantified 23 proteins in human plasma using assay parameters defined in our study and used the standards to describe distinct clusters of individuals linked to different levels of LPA, APOE, SERPINAS, and TFRC. In summary, we describe the use and utility of a resource of recombinant proteins to identify proteotypic peptides useful for targeted proteomics assay development.

Meta-analysis of datasets available in public repositories are used to gather and summarise experiments performed across laboratories, as well as to explore consistency of scientific findings. As data quality and biological equivalency across samples may obscure such analyses and consequently their conclusions, we investigated the comparability of 85 public RNA-seq cell line datasets. Thousands of pairwise comparisons of single nucleotide variants in 139 samples revealed variable genetic heterogeneity of the eight cell line populations analysed as well as variable data quality. The H9 and HCT116 cell lines were found to be remarkably stable across laboratories (with median concordances of 99.2% and 98.5%, respectively), in contrast to the highly variable HeLa cells (89.3%). We show that the genetic heterogeneity encountered greatly affects gene expression between same-cell comparisons, highlighting the importance of interrogating the biological equivalency of samples when comparing experimental datasets. Both the number of differentially expressed genes and the expression levels negatively correlate with the genetic heterogeneity. Finally, we demonstrate how comparing genetically heterogeneous datasets affect gene expression analyses and that high dissimilarity between same-cell datasets alters the expression of more than 300 cancer-related genes, which are often the focus of studies using cell lines.

Inter-and intra-tumour heterogeneity is caused by genetic and non-genetic factors, leading to severe clinical implications. High-throughput sequencing technologies provide unprecedented tools to analyse DNA and RNA in single cells and explore both genetic heterogeneity and phenotypic variation between cells in tissues and tumours. Simultaneous analysis of both DNA and RNA in the same cell is, however, still in its infancy. We have thus developed a method to extract and analyse information regarding genetic heterogeneity that affects cellular biology from single-cell RNA-seq data. The method enables both comparisons and clustering of cells based on genetic variation in single nucleotide variants, revealing cellular subpopulations corroborated by gene expression-based methods. Furthermore, the results show that lymph node metastases have lower levels of genetic heterogeneity compared to their original tumours with respect to variants affecting protein function. The analysis also revealed three previously unknown variants common across cancer cells in glioblastoma patients. These results demonstrate the power and versatility of scRNA-seq variant analysis and highlight it as a useful complement to already existing methods, enabling simultaneous investigations of both gene expression and genetic variation.

The plasma proteome offers a clinically useful window into human health. Recent advances from highly multiplexed assays now call for appropriate pipelines to validate individual candidates. Here, a workflow is developed to build dual binder sandwich immunoassays (SIA) and for proteins predicted to be secreted into plasma. Utilizing suspension bead arrays, ≈1800 unique antibody pairs are first screened against 209 proteins with recombinant proteins as well as EDTA plasma. Employing 624 unique antibodies, dilution-dependent curves in plasma and concentration-dependent curves of full-length proteins for 102 (49%) of the targets are obtained. For 22 protein assays, the longitudinal, interindividual, and technical performance is determined in a set of plasma samples collected from 18 healthy subjects every third month over 1 year. Finally, 14 of these assays are compared with with SIAs composed of other binders, proximity extension assays, and affinity-free targeted mass spectrometry. The workflow provides a multiplexed approach to screen for SIA pairs that suggests using at least three antibodies per target. This design is applicable for a wider range of targets of the plasma proteome, and the assays can be applied for discovery but also to validate emerging candidates derived from other platforms.

Cyanobacteria must balance separate demands for energy generation, carbon assimilation, and biomass synthesis. We used shotgun proteomics to investigate proteome allocation strategies in the model cyanobacterium Synechocystis sp. PCC 6803 as it adapted to light and inorganic carbon (C-i) limitation. When partitioning the proteome into seven functional sectors, we find that sector sizes change linearly with growth rate. The sector encompassing ribosomes is significantly smaller than in E. coli, which may explain the lower maximum growth rate in Synechocystis. Limitation of light dramatically affects multiple proteome sectors, whereas the effect of C-i limitation is weak. Carbon assimilation proteins respond more strongly to changes in light intensity than to C-i. A coarse-grained cell economy model generally explains proteome trends. However, deviations from model predictions suggest that the large proteome sectors for carbon and light assimilation are not optimally utilized under some growth conditions and may constrain the proteome space available to ribosomes.

Biological fixation of atmospheric CO2 via the Calvin-Benson-Bassham cycle has massive ecological impact and offers potential for industrial exploitation, either by improving carbon fixation in plants and autotrophic bacteria, or by installation into new hosts. A kinetic model of the Calvin-Benson-Bassham cycle embedded in the central carbon metabolism of the cyanobacterium Synechocystis sp. PCC 6803 was developed to investigate its stability and underlying control mechanisms. To reduce the uncertainty associated with a single parameter set, random sampling of the steady-state metabolite concentrations and the enzyme kinetic parameters was employed, resulting in millions of parameterized models which were analyzed for flux control and stability against perturbation. Our results show that the Calvin cycle had an overall high intrinsic stability, but a high concentration of ribulose 1,5-bisphosphate was associated with unstable states. Low substrate saturation and high product saturation of enzymes involved in highly interconnected reactions correlated with increased network stability. Flux control, that is the effect that a change in one reaction rate has on the other reactions in the network, was distributed and mostly exerted by energy supply (ATP), but also by cofactor supply (NADPH). Sedoheptulose 1,7-bisphosphatase/fructose 1,6-bisphosphatase, fructose-bisphosphate aldolase, and transketolase had a weak but positive effect on overall network flux, in agreement with published observations. The identified flux control and relationships between metabolite concentrations and system stability can guide metabolic engineering. The kinetic model structure and parameterizing framework can be expanded for analysis of metabolic systems beyond the Calvin cycle.

Cyanobacteria experience both rapid and periodic fluctuations in light and inorganic carbon (C-i) and have evolved regulatory mechanisms to respond to these, including extensive posttranscriptional gene regulation. We report the first genome-wide ribosome profiling data set for cyanobacteria, where ribosome occupancy on mRNA is quantified with codon-level precision. We measured the transcriptome and translatome of Synechocystis during autotrophic growth before (high carbon [HC] condition) and 24 h after removing CO2 from the feedgas (low carbon [LC] condition). Ribosome occupancy patterns in the 5' untranslated region suggest that ribosomes can assemble there and slide to the Shine-Dalgarno site, where they pause. At LC, total translation was reduced by 80% and ribosome pausing was increased at stop and start codons and in untranslated regions, which may be a sequestration mechanism to inactivate ribosomes in response to rapid C-i depletion. Several stress response genes, such as thioredoxin M (sll1057), a putative endonuclease (slr0915), protease HtrA (slr1204), and heat shock protein HspA (sll1514) showed marked increases in translational efficiency at LC, indicating translational control in response to Ci depletion. Ribosome pause scores within open reading frames were mostly constant, though several ribosomal proteins had significantly altered pause score distributions at LC, which might indicate translational regulation of ribosome biosynthesis in response to Ci depletion. We show that ribosome profiling is a powerful tool to decipher dynamic gene regulation strategies in cyanobacteria. IMPORTANCE Ribosome profiling accesses the translational step of gene expression via deep sequencing of ribosome-protected mRNA footprints. Pairing of ribosome profiling and transcriptomics data provides a translational efficiency for each gene. Here, the translatome and transcriptome of the model cyanobacterium Synechocystis were compared under carbon-replete and carbon starvation conditions. The latter may be experienced when cyanobacteria are cultivated in poorly mixed bioreactors or engineered to be product-secreting cell factories. A small fraction of genes (<200), including stress response genes, showed changes in translational efficiency during carbon starvation, indicating condition-dependent translation-level regulation. We observed ribosome occupancy in untranslated regions, possibly due to an alternative translation initiation mechanism in Synechocystis. The higher proportion of ribosomes residing in untranslated regions during carbon starvation may be a mechanism to quickly inactivate superfluous ribosomes. This work provides the first ribosome profiling data for cyanobacteria and reveals new regulation strategies for coping with nutrient limitation.

Periampullary adenocarcinoma, including pancreatic cancer, is a heterogeneous group of tumors with dismal prognosis, partially due to lack of reliable targetable and predictive biomarkers. RNA-binding motif protein 3 (RBM3) has previously been shown to be an independent prognostic and predictive biomarker in several types of cancer. Herein, we examined the prognostic value of RBM3 in periampullary adenocarcinoma, as well as the effects following RBM3 suppression in pancreatic cancer cells in vitro. RBM3 mRNA levels were examined in 176 pancreatic cancer patients from The Cancer Genome Atlas. Immunohistochemical expression of RBM3 was analyzed in tissue microarrays with primary tumors and paired lymph node metastases from 175 consecutive patients with resected periampullary adenocarcinoma. Pancreatic cancer cells were transfected with anti-RBM3 siRNA in vitro and the influence on cell viability following chemotherapy, transwell migration and invasion was assessed. The results demonstrated that high mRNA-levels of RBM3 were significantly associated with a reduced overall survival (p = 0.026). RBM3 protein expression was significantly higher in lymph node metastases than in primary tumors (p = 0.005). High RBM3 protein expression was an independent predictive factor for the effect of adjuvant chemotherapy and an independent negative prognostic factor in untreated patients (p for interaction = 0.003). After siRNA suppression of RBM3 in vitro, pancreatic cancer cells displayed reduced migration and invasion compared to control, as well as a significantly increased resistance to chemotherapy. In conclusion, the strong indication of a positive response predictive effect of RBM3 expression in pancreatic cancer may be highly relevant in the clinical setting and merits further validation.

Orthocaspases are prokaryotic caspase homologs – proteases, which cleave their substrates after positively charged residues using a conserved histidine – cysteine (HC) dyad situated in a catalytic p20 domain. However, in orthocaspases pseudo-variants have been identified, which instead of the catalytic HC residues contain tyrosine and serine, respectively. The presence and distribution of these presumably proteolytically inactive p20-containing enzymes has until now escaped attention. We have performed a detailed analysis of orthocaspases in all available prokaryotic genomes, focusing on pseudo-orthocaspases. Surprisingly we identified type I metacaspase homologs in filamentous cyanobacteria. While genes encoding pseudo-orthocaspases seem to be absent in Archaea, our results show conservation of these genes in organisms performing either anoxygenic photosynthesis (orders Rhizobiales, Rhodobacterales, and Rhodospirillales in Alphaproteobacteria) or oxygenic photosynthesis (all sequenced cyanobacteria, except Gloeobacter, Prochlorococcus, and Cyanobium). Contrary to earlier reports, we were able to detect pseudo-orthocaspases in all sequenced strains of the unicellular cyanobacteria Synechococcus and Synechocystis. In silico comparisons of the primary as well as tertiary structures of pseudo-p20 domains with their presumably proteolytically active homologs suggest that differences in their amino acid sequences have no influence on the overall structures. Mutations therefore affect most likely only the proteolytic activity. Our data provide an insight into diversification of pseudo-orthocaspases in Prokaryotes, their taxa-specific distribution, and allow suggestions on their taxa-specific function.

In the era towards precision medicine, we here present the individual specific autoantibody signatures of 193 healthy individuals. The self-reactive IgG signatures are stable over time in a way that each individual profile is recognized in longitudinal sampling. The IgG autoantibody reactivity towards an antigen array comprising 335 protein fragments, representing 204 human proteins with potential relevance to autoimmune disorders, was measured in longitudinal plasma samples from 193 healthy individuals. This analysis resulted in unique autoantibody barcodes for each individual that were maintained over one year's time. The reactivity profiles, or signatures, are person specific in regards to the number of reactivities and antigen specificity. Two independent data sets were consistent in that each healthy individual displayed reactivity towards 0-16 antigens, with a median of six. Subsequently, four selected individuals were profiled on in-house produced high-density protein arrays containing 23,000 protein fragments representing 14,000 unique protein coding genes. Based on a unique, broad and deep longitudinal profiling of autoantibody reactivities, our results demonstrate a unique autoreactive profile in each analyzed healthy individual. The need and interest for broad-ranged and high-resolution molecular profiling of healthy individuals is rising. We have here generated and assessed an initial perspective on the global distribution of the self-reactive IgG repertoire in healthy individuals, by investigating 193 well-characterized healthy individuals.

Photoautotrophic production of fuels and chemicals by cyanobacteria typically gives lower volumetric productivities and titers than heterotrophic production. Cyanobacteria cultures become light limited above an optimal cell density, so that this substrate is not supplied to all cells sufficiently. Here, we investigate genetic strategies for a two-phase cultivation, where biofuel-producing Synechocystis cultures are limited to an optimal cell density through inducible CRISPR interference (CRISPRi) repression of cell growth. Fixed CO2 is diverted to ethanol or n-butanol. Among the most successful strategies was partial repression of citrate synthase gltA. Strong repression (>90%) of gitA at low culture densities increased carbon partitioning to n-butanol 5-fold relative to a nonrepression strain, but sacrificed volumetric productivity due to severe growth restriction. CO2 fixation continued for at least 3 days after growth was arrested. By targeting sgRNAs to different regions of the gitA gene, we could modulate GItA expression and carbon partitioning between growth and product to increase both specific and volumetric productivity. These growth arrest strategies can be useful for improving performance of other photoautotrophic processes.

Identifying the proteome variation in different parts of the body provides fundamental molecular details, enabling further findings and mapping of tissue specific proteins. By combining quantitative transcriptomics with qualitative antibody based proteomics, the Human Protein Atlas (HPA) project aims to protein profile each human protein-coding gene. Genes with varying expression levels in the different tissue types are categorized as tissue elevated in one tissue compared to others, thus connecting genes to potential tissue specific functions. This thesis focuses on the most complex organ in the human body, the brain. With its billions of neurons specifically organized and interconnected, the ability of not only controlling the body but also responsible for higher cognitive functions, the brain is still not fully understood.

In my search for brain important proteins, genes were classified at different stages based on expression levels. In Paper I and II the transcriptome of cerebral cortex was compared with peripheral organs to classify genes with elevated expression in the brain. Brain expression information was expanded by including external data (GTEx and FANTOM5) into the analysis, in Paper III. Thereafter, in Paper IV, the three datasets (HPA, GTEx and FANTOM5) were aligned and combined, enabling a consensus classification with an improved representation of the brain complexity. The most recent classification provided whole body gene expression profiles and out of the 19,670 protein-coding genes, 2,501 were expressed at elevated levels in the brain compared to the other tissue types. Twelve individual regions represented the brain as an organ, and were further analyzed and compared for regional classification of gene expression. One thousand genes showed regional variation in expression level, thus classified as regionally elevated within the brain. Interestingly, less than 500 of the genes classified as brain elevated on the whole body level, and were also regionally elevated in the brain. Many genes with regionally variable expression within the brain showed higher expression in a peripheral organ than in the brain when comparing whole body expression. Based on elevated expression in the brain or brain regions, more than 3,000 genes were suggested to be of high importance to the brain.

In addition, this high-throughput approach to combine transcriptomics and protein profiles in tissues and cells further generated new knowledge in several different other aspects: better understanding of uncharacterized and “missing proteins” (Paper III), validation of an antibody improving classification of pituitary adenoma (Paper V) and in Paper VI the possibility to explore cancer specific expression in relation to clinical data and normal tissue expression.

There are multiple diseases of the brain that are poorly understood on both a cellular and molecular level. While my work mainly focused on identifying and understanding the molecular organization of the normal brain, the ultimate goal of mapping and studying the normal expression baseline is to understand the molecular aspects of disease and identify ways to prevent, treat and cure diseases.

A large portion of human proteins are referred to as missing proteins, defined as protein-coding genes that lack experimental data on the protein level due to factors such as temporal expression, expression in tissues that are difficult to sample, or they actually do not encode functional proteins. In the present investigation, an integrated omics approach was used for identification and exploration of missing proteins. Transcriptomics data from three different sourcesthe Human Protein Atlas (HPA), the GTEx consortium, and the FANTOM5 consortiumwere used as a starting point to identify genes selectively expressed in specialized tissues. Complementing the analysis with profiling on more specific tissues based on immunohistochemistry allowed for further exploration of cell-type-specific expression patterns. More detailed tissue profiling was performed for >300 genes on complementing tissues. The analysis identified tissue-specific expression of nine proteins previously listed as missing proteins (POU4F1, FRMD1, ARHGEF33, GABRG1, KRTAP2-1, BHLHE22, SPRR4, AVPR1B, and DCLK3), as well as numerous proteins with evidence of existence on the protein level that previously lacked information on spatial resolution and cell-type- specific expression pattern. We here present a comprehensive strategy for identification of missing proteins by combining transcriptomics with antibody-based proteomics. The analyzed proteins provide interesting targets for organ-specific research in health and disease.

Background Analysis of muscle biopsies allowed to characterize the pathophysiological changes of Duchenne and Becker muscular dystrophies (D/BMD) leading to the clinical phenotype. Muscle tissue is often investigated during interventional dose finding studies to show in situ proof of concept and pharmacodynamics effect of the tested drug. Less invasive readouts are needed to objectively monitor patients' health status, muscle quality, and response to treatment. The identification of serum biomarkers correlating with clinical function and able to anticipate functional scales is particularly needed for personalized patient management and to support drug development programs. Methods A large-scale proteomic approach was used to identify serum biomarkers describing pathophysiological changes (e.g. loss of muscle mass), association with clinical function, prediction of disease milestones, association with in vivo(31)P magnetic resonance spectroscopy data and dystrophin levels in muscles. Cross-sectional comparisons were performed to compare DMD patients, BMD patients, and healthy controls. A group of DMD patients was followed up for a median of 4.4years to allow monitoring of individual disease trajectories based on yearly visits. Results Cross-sectional comparison enabled to identify 10 proteins discriminating between healthy controls, DMD and BMD patients. Several proteins (285) were able to separate DMD from healthy, while 121 proteins differentiated between BMD and DMD; only 13 proteins separated BMD and healthy individuals. The concentration of specific proteins in serum was significantly associated with patients' performance (e.g. BMP6 serum levels and elbow flexion) or dystrophin levels (e.g. TIMP2) in BMD patients. Analysis of longitudinal trajectories allowed to identify 427 proteins affected over time indicating loss of muscle mass, replacement of muscle by adipose tissue, and cardiac involvement. Over-representation analysis of longitudinal data allowed to highlight proteins that could be used as pharmacodynamic biomarkers for drugs currently in clinical development. Conclusions Serum proteomic analysis allowed to not only discriminate among DMD, BMD, and healthy subjects, but it enabled to detect significant associations with clinical function, dystrophin levels, and disease progression.

Objective Antibodies against posttranslationally modified proteins are a hallmark of rheumatoid arthritis (RA), but the emergence and pathogenicity of these autoantibodies are still incompletely understood. The aim of this study was to analyze the antigen specificities and mutation patterns of monoclonal antibodies (mAb) derived from RA synovial plasma cells and address the question of antigen cross-reactivity. Methods IgG-secreting cells were isolated from RA synovial fluid, and the variable regions of the immunoglobulins were sequenced (n = 182) and expressed in full-length mAb (n = 93) and also as germline-reverted versions. The patterns of reactivity with 53,019 citrullinated peptides and 49,211 carbamylated peptides and the potential of the mAb to promote osteoclastogenesis were investigated. Results Four unrelated anti-citrullinated protein autoantibodies (ACPAs), of which one was clonally expanded, were identified and found to be highly somatically mutated in the synovial fluid of a patient with RA. The ACPAs recognized >3,000 unique peptides modified by either citrullination or carbamylation. This highly multireactive autoantibody feature was replicated for Ig sequences derived from B cells from the peripheral blood of other RA patients. The plasma cell-derived mAb were found to target distinct amino acid motifs and partially overlapping protein targets. They also conveyed different effector functions as revealed in an osteoclast activation assay. Conclusion These findings suggest that the high level of cross-reactivity among RA autoreactive B cells is the result of different antigen encounters, possibly at different sites and at different time points. This is consistent with the notion that RA is initiated in one context, such as in the mucosal organs, and thereafter targets other sites, such as the joints.

A natural way to benchmark the performance of an analytical experimental setup is to use samples of known measured analytes are peptides and not the actual proteins one of the inherent problems of interpreting data is that the composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

Background: Genome-scale metabolic models (GEMs)offer insights into cancer metabolism and have been used to identify potential biomarkers and drug targets. Drug repositioning is a time- and cost-effective method of drug discovery that can be applied together with GEMs for effective cancer treatment. Methods: In this study, we reconstruct a prostate cancer (PRAD)-specific GEM for exploring prostate cancer metabolism and also repurposing new therapeutic agents that can be used in development of effective cancer treatment. We integrate global gene expression profiling of cell lines with >1000 different drugs through the use of prostate cancer GEM and predict possible drug-gene interactions. Findings: We identify the key reactions with altered fluxes based on the gene expression changes and predict the potential drug effect in prostate cancer treatment. We find that sulfamethoxypyridazine, azlocillin, hydroflumethiazide, and ifenprodil can be repurposed for the treatment of prostate cancer based on an in silico cell viability assay. Finally, we validate the effect of ifenprodil using an in vitro cell assay and show its inhibitory effect on a prostate cancer cell line. Interpretation: Our approach demonstate how GEMs can be used to predict therapeutic agents for cancer treatment based on drug repositioning. Besides, it paved a way and shed a light on the applicability of computational models to real-world biomedical or pharmaceutical problems.

Genome-scale metabolic models (GEMs) are comprehensive descriptions of cell metabolism and have been extensively used to understand biological responses in health and disease. One such application is in determining metabolic adaptation to the absence of a gene or reaction, i.e., essentiality analysis. However, current methods do not permit efficiently and accurately quantifying reaction/gene essentiality. Here, we present Essentiality Score Simulator (ESS), a tool for quantification of gene/reaction essentialities in GEMs. ESS quantifies and scores essentiality of each reaction/gene and their combinations based on the stoichiometric balance using synthetic lethal analysis. This method provides an option to weight metabolic models which currently rely mostly on topologic parameters, and is potentially useful to investigate the metabolic pathway differences between different organisms, cells, tissues, and/or diseases. We benchmarked the proposed method against multiple network topology parameters, and observed that our method displayed higher accuracy based on experimental evidence. In addition, we demonstrated its application in the wild-type and ldh knock-out E. coli core model, as well as two human cell lines, and revealed the changes of essentiality in metabolic pathways based on the reactions essentiality score. ESS is available without any limitation at https://sourceforge.net/projects/essentiality-score-simulator.