Latest in

total 1720took 0.14s
Using sequencing coverage statistics to identify sex chromosomes in minke whalesFeb 18 2019The ever-increasing number of genome sequencing and resequencing projects is a central source of insights into the ecology and evolution of non-model organisms. An important aspect of genomics is the elucidation of sex determination systems and identifying ... More
BOAssembler: a Bayesian Optimization Framework to Improve RNA-Seq Assembly PerformanceFeb 14 2019High throughput sequencing of RNA (RNA-Seq) can provide us with millions of short fragments of RNA transcripts from a sample. How to better recover the original RNA transcripts from those fragments (RNA-Seq assembly) is still a difficult task. For example, ... More
OPENMENDEL: A Cooperative Programming Project for Statistical GeneticsFeb 14 2019Statistical methods for genomewide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, ... More
PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasetsFeb 12 2019Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. RNA-seq based transcriptome sequencing has been extensively used for identification of lncRNAs. However, accurate identification ... More
Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing AlgorithmFeb 12 2019A large proportion of the basepairs in the long reads that third-generation sequencing technologies produce possess sequencing errors. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize ... More
Achieving GWAS with Homomorphic EncryptionFeb 12 2019One way of investigating how genes affect human traits would be with a genome-wide association study (GWAS). Genetic markers, known as single-nucleotide polymorphism (SNP), are used in GWAS. This raises privacy and security concerns as these genetic markers ... More
Scalable optimal Bayesian classification of single-cell trajectories under regulatory model uncertaintyFeb 08 2019Single-cell gene expression measurements offer opportunities in deriving mechanistic understanding of complex diseases, including cancer. However, due to the complex regulatory machinery of the cell, gene regulatory network (GRN) model inference based ... More
Some Enumeration Problems in the Duplication-Loss Model of Genome RearrangementFeb 01 2019Tandem-duplication-random-loss (TDRL) is an important genome rearrangement operation studied in evolutionary biology. This paper investigates some of the formal properties of TDRL operations on the symmetric group (the space of permutations over an $ ... More
Adaptive Monte Carlo Multiple Testing via Multi-Armed BanditsFeb 01 2019Monte Carlo (MC) permutation testing is considered the gold standard for statistical hypothesis testing, especially when standard parametric assumptions are not clear or likely to fail. However, in modern data science settings where a large number of ... More
Predicting Toxicity from Gene Expression with Neural NetworksJan 31 2019We train a neural network to predict chemical toxicity based on gene expression data. The input to the network is a full expression profile collected either in vitro from cultured cells or in vivo from live animals. The output is a set of fine grained ... More
GeNet: Deep Representations for MetagenomicsJan 30 2019We introduce GeNet, a method for shotgun metagenomic classification from raw DNA sequences that exploits the known hierarchical structure between labels for training. We provide a comparison with state-of-the-art methods Kraken and Centrifuge on datasets ... More
Proteomic and metagenomic insights into prehistoric Spanish Levantine Rock ArtJan 24 2019The Iberian Mediterranean Basin is home to one of the largest groups of prehistoric rock art sites in Europe. Despite the cultural relevance of prehistoric Spanish Levantine rock art, pigment composition remains partially unknown, and the nature of the ... More
Spatial clustering and common regulatory elements correlate with coordinated gene expressionJan 18 2019Many cellular responses to surrounding cues require temporally concerted transcriptional regulation of multiple genes. In prokaryotic cells, a single-input-module motif with one transcription factor regulating multiple target genes can generate coordinated ... More
A Hybrid HMM Approach for the Dynamics of DNA MethylationJan 18 2019The understanding of mechanisms that control epigenetic changes is an important research area in modern functional biology. Epigenetic modifications such as DNA methylation are in general very stable over many cell divisions. DNA methylation can however ... More
The Mahalanobis kernel for heritability estimation in genome-wide association studies: fixed-effects and random-effects methodsJan 09 2019Linear mixed models (LMMs) are widely used for heritability estimation in genome-wide association studies (GWAS). In standard approaches to heritability estimation with LMMs, a genetic relationship matrix (GRM) must be specified. In GWAS, the GRM is frequently ... More
Figure 1 Theory Meets Figure 2 Experiments in the Study of Gene ExpressionDec 30 2018It is tempting to believe that we now own the genome. The ability to read and re-write it at will has ushered in a stunning period in the history of science. Nonetheless, there is an Achilles heel exposed by all of the genomic data that has accrued: we ... More
Parallel Clustering of Single Cell Transcriptomic Data with Split-Merge Sampling on Dirichlet Process MixturesDec 25 2018Motivation: With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological ... More
Pan-Cancer Epigenetic Biomarker Selection from Blood Samples Using SASDec 21 2018A key focus in current cancer research is the discovery of cancer biomarkers that allow earlier detection with high accuracy and lower costs for both patients and hospitals. Blood samples have long been used as a health status indicator, but DNA methylation ... More
Bayesian Manifold-Constrained-Prior Model for an Experiment to Locate XceDec 20 2018We propose an analysis for a novel experiment intended to locate the genetic locus Xce (X-chromosome controlling element), which biases the stochastic process of X-inactivation in the mouse. X-inactivation bias is a phenomenon where cells in the embryo ... More
α7 nicotinic acetylcholine receptor signaling modulates ovine fetal brain astrocytes transcriptome in response to endotoxin: comparison to microglia, implications for prenatal stress and development of autism spectrum disorderDec 17 2018Dec 31 2018Neuroinflammation in utero may result in lifelong neurological disabilities. Astrocytes play a pivotal role, but the mechanisms are poorly understood. No early postnatal treatment strategies exist to enhance neuroprotective potential of astrocytes. We ... More
Integrating omics and MRI data with kernel-based tests and CNNs to identify rare genetic markers for Alzheimer's diseaseDec 02 2018For precision medicine and personalized treatment, we need to identify predictive markers of disease. We focus on Alzheimer's disease (AD), where magnetic resonance imaging scans provide information about the disease status. By combining imaging with ... More
Interlacing Personal and Reference Genomes for Machine Learning Disease-Variant DetectionNov 26 2018DNA sequencing to identify genetic variants is becoming increasingly valuable in clinical settings. Assessment of variants in such sequencing data is commonly implemented through Bayesian heuristic algorithms. Machine learning has shown great promise ... More
Private Shotgun DNA SequencingNov 23 2018Current techniques in sequencing a genome allow a service provider (e.g. a sequencing company) to have full access to the genome information, and thus the privacy of individuals regarding their lifetime secret is violated. In this paper, we introduce ... More
Inference of the three-dimensional chromatin structure and its temporal behaviorNov 22 2018Understanding the three-dimensional (3D) structure of the genome is essential for elucidating vital biological processes and their links to human disease. To determine how the genome folds within the nucleus, chromosome conformation capture methods such ... More
Prediction of Alzheimer's disease-associated genes by integration of GWAS summary data and expression dataNov 12 2018Alzheimer's disease is the most common cause of dementia. It is the fifth-leading cause of death among elderly people. With high genetic heritability (79%), finding disease causal genes is a crucial step in find treatment for AD. Following the International ... More
The long non-coding RNA HOTAIR is transcriptionally activated by HOXA9 and is an independent prognostic marker in patients with malignant gliomaNov 09 2018The lncRNA HOTAIR has been implicated in several human cancers. Here, we evaluated the molecular alterations and upstream regulatory mechanisms of HOTAIR in glioma, the most common primary brain tumors, and its clinical relevance. HOTAIR gene expression, ... More
TF-MoDISco v0.4.4.2-alpha: Technical NoteOct 31 2018TF-MoDISco (Transcription Factor Motif Discovery from Importance Scores) is an algorithm for identifying motifs from basepair-level importance scores computed on genomic sequence data. This paper describes the methods behind TF-MoDISco version ... More
A Comparison of Microbial Genome Web PortalsOct 30 2018Microbial genome web portals have a broad range of capabilities that address a number of information-finding and analysis needs for scientists. This article compares the capabilities of the major microbial genome web portals to aid researchers in determining ... More
Quantum Structures in Human Decision-making: Towards Quantum Expected UtilityOct 29 2018{\it Ellsberg thought experiments} and empirical confirmation of Ellsberg preferences pose serious challenges to {\it subjective expected utility theory} (SEUT). We have recently elaborated a quantum-theoretic framework for human decisions under uncertainty ... More
Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count dataOct 22 2018Precision medicine aims for personalized prognosis and therapeutics by utilizing recent genome-scale high-throughput profiling techniques, including next-generation sequencing (NGS). However, translating NGS data faces several challenges. First, NGS count ... More
Towards the Latent TranscriptomeOct 08 2018Dec 10 2018In this work we propose a method to compute continuous embeddings for kmers from raw RNA-seq data, without the need for alignment to a reference genome. The approach uses an RNN to transform kmers of the RNA-seq reads into a 2 dimensional representation ... More
Cancer classification and pathway discovery using non-negative matrix factorizationSep 27 2018Oct 08 2018Extracting genetic information from a full range of sequencing data is important for understanding diseases. We propose a novel method to effectively explore the landscape of genetic mutations and aggregate them to predict cancer type. We used multinomial ... More
Extreme Scale De Novo Metagenome AssemblySep 19 2018Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require ... More
A reproducible effect size is more useful than an irreproducible hypothesis test to analyze high throughput sequencing datasetsSep 07 2018Motivation: P values derived from the null hypothesis significance testing framework are strongly affected by sample size, and are known to be irreproducible in underpowered studies, yet no suitable replacement has been proposed. Results: Here we present ... More
Whole genome resequencing reveals diagnostic markers for investigating global migration and hybridization between minke whale speciesSep 06 2018Background: In the marine environment, where there are few absolute physical barriers, contemporary contact between previously isolated species can occur across great distances, and in some cases, may be inter-oceanic. [..] in the minke whale species ... More
Gene Shaving using influence function of a kernel methodSep 05 2018Identifying significant subsets of the genes, gene shaving is an essential and challenging issue for biomedical research for a huge number of genes and the complex nature of biological networks,. Since positive definite kernel based methods on genomic ... More
Quantitative and functional post-translational modification proteomics reveals that TREPH1 plays a role in plant thigmomorphogenesisAug 13 2018Plants can sense both intracellular and extracellular mechanical forces and can respond through morphological changes. The signaling components responsible for mechanotransduction of the touch response are largely unknown. Here, we performed a high-throughput ... More
Genome-Wide Association Studies: Information Theoretic Limits of Reliable LearningAug 10 2018In the problems of Genome-Wide Association Study (GWAS), the objective is to associate subsequences of individuals' genomes to the observable characteristics called phenotypes. The genome containing the biological information of an individual can be represented ... More
Fast computation of the principal components of genotype matrices in JuliaAug 09 2018Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix ... More
Deep Neural Network for Analysis of DNA Methylation DataAug 02 2018Many researches demonstrated that the DNA methylation, which occurs in the context of a CpG, has strong correlation with diseases, including cancer. There is a strong interest in analyzing the DNA methylation data to find how to distinguish different ... More
Explaining Parochialism: A Causal Account for Political Polarization in Changing Economic EnvironmentsJul 28 2018Political and social polarization are a significant cause of conflict and poor governance in many societies, thus understanding their causes is of considerable importance. Here we demonstrate that shifts in socialization strategy similar to political ... More
Reconstructing Latent Orderings by Spectral ClusteringJul 18 2018Spectral clustering uses a graph Laplacian spectral embedding to enhance the cluster structure of some data sets. When the embedding is one dimensional, it can be used to sort the items (spectral ordering). A number of empirical results also suggests ... More
The exon junction complex undergoes a compositional switch that alters mRNP structure and nonsense-mediated mRNA decay activityJul 02 2018The exon junction complex (EJC) deposited upstream of mRNA exon junctions shapes structure, composition and fate of spliced mRNA ribonucleoprotein particles (mRNPs). To achieve this, the EJC core nucleates assembly of a dynamic shell of peripheral proteins ... More
Deep SNP: An End-to-end Deep Neural Network with Attention-based Localization for Break-point Detection in SNP Array Genomic dataJun 22 2018Diagnosis and risk stratification of cancer and many other diseases require the detection of genomic breakpoints as a prerequisite of calling copy number alterations (CNA). This, however, is still challenging and requires time-consuming manual curation. ... More
Innovative method for reducing uninformative calls in non-invasive prenatal testingJun 22 2018Non-invasive prenatal testing or NIPT is currently among the top researched topic in obstetric care. While the performance of the current state-of-the-art NIPT solutions achieve high sensitivity and specificity, they still struggle with a considerable ... More
Towards Gene Expression Convolutions using Gene Interaction GraphsJun 18 2018We study the challenges of applying deep learning to gene expression data. We find experimentally that there exists non-linear signal in the data, however is it not discovered automatically given the noise and low numbers of samples used in most research. ... More
RPM-Drive: A robust, safe, and reversible gene drive system that remains functional after 200+ generationsJun 13 2018Jul 31 2018Despite the advent of several novel, synthetic gene drive mechanisms and their potential to one-day control a number of devastating diseases, among other applications, practical use of these systems remains contentious and risky. In particular, there ... More
Holographic Neural ArchitecturesJun 04 2018Representation learning is at the heart of what makes deep learning effective. In this work, we introduce a new framework for representation learning that we call "Holographic Neural Architectures" (HNAs). In the same way that an observer can experience ... More
Information Constraints on Auto-Encoding Variational BayesMay 22 2018Nov 29 2018Parameterizing the approximate posterior of a generative model with neural networks has become a common theme in recent machine learning research. While providing appealing flexibility, this approach makes it difficult to impose or assess structural constraints ... More
A Markovian genomic concatenation model guided by persymmetric matricesMay 06 2018Sobottka and Hart (2011) made use of a Markovian concatenation model to observe novel statistical symmetries in the mononucleotide and dinucleotide distributions of a collection of bacterial chromosomes. The model roughly approximates the first-order ... More
SIG-DB: leveraging homomorphic encryption to Securely Interrogate privately held Genomic DataBasesMar 26 2018Genomic data are becoming increasingly valuable as we develop methods to utilize the information at scale and gain a greater understanding of how genetic information relates to biological function. Advances in synthetic biology and the decreased cost ... More
Role of salt valency in the switch of H-NS proteins between DNA-bridging and DNA-stiffening modesMar 26 2018This work investigates the interactions of H-NS proteins and bacterial genomic DNA through computer simulations performed with a coarse-grained model. The model was developed specifically to study the switch of H-NS proteins from the DNA-stiffening to ... More
The bromodomain-containing protein Ibd1 links multiple chromatin related protein complexes to highly expressed genes in Tetrahymena thermophilaMar 22 2018Background: The chromatin remodelers of the SWI/SNF family are critical transcriptional regulators. Recognition of lysine acetylation through a bromodomain (BRD) component is key to SWI/SNF function; in most eukaryotes, this function is attributed to ... More
Differential Expression Analysis of Dynamical Sequencing Count Data with a Gamma Markov ChainMar 07 2018Next-generation sequencing (NGS) to profile temporal changes in living systems is gaining more attention for deriving better insights into the underlying biological mechanisms compared to traditional static sequencing experiments. Nonetheless, the majority ... More
Quantum annealing versus classical machine learning applied to a simplified computational biology problemFeb 28 2018Transcription factors regulate gene expression, but how these proteins recognize and specifically bind to their DNA targets is still debated. Machine learning models are effective means to reveal interaction mechanisms. Here we studied the ability of ... More
IntLIM: Integration using Linear Models of metabolomics and gene expression dataFeb 28 2018Integration of transcriptomic and metabolomic data improves functional interpretation of disease-related metabolomic phenotypes, and facilitates discovery of putative metabolite biomarkers and gene targets. For this reason, these data are increasingly ... More
SMAGEXP: a galaxy tool suite for transcriptomics data meta-analysisFeb 22 2018Bakground: With the proliferation of available microarray and high throughput sequencing experiments in the public domain, the use of meta-analysis methods increases. In these experiments, where the sample size is often limited, meta-analysis offers the ... More
Genomics as a Service: a Joint Computing and Networking PerspectiveFeb 15 2018Sep 26 2018This paper provides a global picture about the deployment of networked processing services for genomic data sets. Many current research make an extensive use genomic data, which are massive and rapidly increasing over time. They are typically stored in ... More
Eight-cluster structure of chloroplast genomes differs from similar one observed for bacteriaFeb 08 2018Previously, a seven-cluster pattern claiming to be a universal one in bacterial genomes has been reported. Keeping in mind the most popular theory of chloroplast origin, we checked whether a similar pattern is observed in chloroplast genomes. Surprisingly, ... More
Molecular Regulation of Histamine SynthesisFeb 07 2018Histamine is a critical mediator of IgE/ cell-mediated anaphylaxis, a neurotransmitter and a regulator of gastric acid secretion. Histamine is a monoamine synthesized from the amino acid histidine through a reaction catalyzed by the enzyme histidine decarboxylase ... More
Differential proteomics highlights macrophage-specific responses to amorphous silica nanoparticlesJan 25 2018The technological and economic benefits of engineered nanomaterials may be offset by their adverse effects on living organisms. One of the highly produced nanomaterials under such scrutiny is amorphous silica nanoparticles, which are known to have an ... More
Generalized Similarity U: A Non-parametric Test of Association Based on SimilarityJan 04 2018Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotype. The phenotype can be univariate disease ... More
Attention based convolutional neural network for predicting RNA-protein binding sitesDec 06 2017RNA-binding proteins (RBPs) play crucial roles in many biological processes, e.g. gene regulation. Computational identification of RBP binding sites on RNAs are urgently needed. In particular, RBPs bind to RNAs by recognizing sequence motifs. Thus, fast ... More
Extending species-area relationships (SAR) to diversity-area relationships (DAR)Nov 16 2017I extend the traditional SAR, which has achieved status of ecological law and plays a critical role in global biodiversity assessment, to the general (alpha- or beta-diversity in Hill numbers) diversity area relationship (DAR). The extension was motivated ... More
A deep generative model for single-cell RNA sequencing with application to detecting differentially expressed genesOct 13 2017Oct 17 2017We propose a probabilistic model for interpreting gene expression levels that are observed through single-cell RNA sequencing. In the model, each cell has a low-dimensional latent representation. Additional latent variables account for technical effects ... More
Large scale evaluation of differences between network-based and pairwise sequence-alignment-based methods of dendrogram reconstructionSep 26 2017Dendrograms are a way to represent evolutionary relationships between organisms. Nowadays, these are inferred based on the comparison of genes or protein sequences by taking into account their differences and similarities. The genetic material of choice ... More
Accurate Genomic Prediction Of Human HeightSep 19 2017We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show ... More
Identifying Genetic Risk Factors via Sparse Group Lasso with Group Graph StructureSep 12 2017Genome-wide association studies (GWA studies or GWAS) investigate the relationships between genetic variants such as single-nucleotide polymorphisms (SNPs) and individual traits. Recently, incorporating biological priors together with machine learning ... More
A deep generative model for gene expression profiles from single-cell RNA sequencingSep 07 2017Jan 16 2018We propose a probabilistic model for interpreting gene expression levels that are observed through single-cell RNA sequencing. In the model, each cell has a low-dimensional latent representation. Additional latent variables account for technical effects ... More
Phylogenetic Convolutional Neural Networks in MetagenomicsSep 06 2017Background: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture ... More
Quantifying homologous proteins and proteoformsAug 05 2017Many proteoforms - arising from alternative splicing, post-translational modifications (PTMs), or paralogous genes - have distinct biological functions, such as histone PTM proteoforms. However, their quantification by existing bottom-up mass-spectrometry ... More
Training-free Measures Based on Algorithmic Probability Identify High Nucleosome Occupancy in DNA SequencesAug 05 2017Oct 16 2018We introduce and study a set of training-free methods of information-theoretic and algorithmic complexity nature applied to DNA sequences to identify their potential capabilities to determine nucleosomal binding sites. We test our measures on well-studied ... More
MDA in Capillary for Whole Genome AmplificationMay 30 2017Whole genome amplification (WGA) plays an important role in sample preparation of low-input templates for high-throughput sequencing. Multiple displacement amplification (MDA), a popular isothermal WGA method, suffers a major hurdle of highly uneven amplification. ... More
Evolutionary dynamics of the cryptocurrency marketMay 15 2017Nov 21 2017The cryptocurrency market surpassed the barrier of \$100 billion market capitalization in June 2017, after months of steady growth. Despite its increasing relevance in the financial world, however, a comprehensive analysis of the whole system is still ... More
ASB1 differential methylation in ischaemic cardiomyopathy. Relationship with left ventricular performance in end stage heart failure patientsApr 04 2017Aims: Ischaemic cardiomyopathy (ICM) leads to impaired contraction and ventricular dysfunction causing high rates of morbidity and mortality. Epigenomics allows the identification of epigenetic signatures in human diseases. We analyse the differential ... More
Development and characterization of Brassica juncea fruticulosa introgression lines exhibiting resistance to mustard aphidMar 23 2017Background: Mustard aphid is a major pest of Brassica oilseeds. No source for aphid resistance is presently available in Brassica juncea . A wild crucifer, Brassica fruticulosa is known to be resistant to mustard aphid. An artificially synthesized amphiploid, ... More
HSEARCH: fast and accurate protein sequence motif search and clusteringJan 02 2017Protein motifs are conserved fragments occurred frequently in protein sequences. They have significant functions, such as active site of an enzyme. Search and clustering protein sequence motifs are computational intensive. Most existing methods are not ... More
Large scale modeling of antimicrobial resistance with interpretable classifiersDec 03 2016Antimicrobial resistance is an important public health concern that has implications in the practice of medicine worldwide. Accurately predicting resistance phenotypes from genome sequences shows great promise in promoting better use of antimicrobial ... More
Multi-stage Clustering of Breast Cancer for Precision MedicineDec 02 2016Cancer has become one of the most widespread diseases in the world. Specifically, breast cancer is diagnosed more often than any other type of cancer. However, breast cancer patients and their individual tumors are often unique. Identifying the underlying ... More
A Noise-Filtering Approach for Cancer Drug Sensitivity PredictionDec 02 2016Accurately predicting drug responses to cancer is an important problem hindering oncologists' efforts to find the most effective drugs to treat cancer, which is a core goal in precision medicine. The scientific community has focused on improving this ... More
A Noise-Filtering Approach for Cancer Drug Sensitivity PredictionDec 02 2016Dec 05 2016Accurately predicting drug responses to cancer is an important problem hindering oncologists' efforts to find the most effective drugs to treat cancer, which is a core goal in precision medicine. The scientific community has focused on improving this ... More
Dynamical System Modeling to Simulate Donor T Cell Response to Whole Exome Sequencing-Derived Recipient Peptides: Understanding Randomness in Clinical Outcomes Following Stem Cell TransplantationNov 28 2016Alloreactivity following stem cell transplantation (SCT) is difficult to predict in patients undergoing transplantation from HLA matched donors. In this study we performed whole exome sequencing of SCT donor-recipient pairs (DRP). This allowed determination ... More
Specificity-determining DNA triplet code for positioning of human pre-initiation complexNov 23 2016The notion that transcription factors bind DNA only through specific, consensus binding sites has been recently questioned. In a pioneering study by Pugh and Venters no specific consensus motif for the positioning of the human pre-initiation complex (PIC) ... More
Fast low-level pattern matching algorithmNov 18 2016This paper focuses on pattern matching in the DNA sequence. It was inspired by a previously reported method that proposes encoding both pattern and sequence using prime numbers. Although fast, the method is limited to rather small pattern lengths, due ... More
Duplication Distance to the Root for Binary SequencesNov 17 2016We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form $\seq x = \seq a \seq b \seq c \to \seq y = \seq a \seq b \seq b \seq ... More
Genomic Region Detection via Spatial Convex ClusteringNov 15 2016Several modern genomic technologies, such as DNA-Methylation arrays, measure spatially registered probes that number in the hundreds of thousands across multiplechromosomes. The measured probes are by themselves less interesting scientifically; instead ... More
Polygenic score analyses of schizophrenia and bipolar disorder with cardiometabolic traitsNov 10 2016Cardiovascular diseases (CVD) represent a major health issue in patients with schizophrenia and bipolar disorder (BD). While many studies have shown increased CVD risks in schizophrenia and BD, the underlying mechanisms remain unclear. Psychiatric medications ... More
Mouse T cell repertoires as statistical ensembles: overall characterization and age dependenceNov 09 2016The ability of the adaptive immune system to respond to arbitrary pathogens stems from the broad diversity of immune cell surface receptors (TCRs). This diversity originates in a stochastic DNA editing process (VDJ recombination) that acts each time a ... More
Reverse vaccinology in Plasmodium falciparum 3D7Nov 05 2016A timely immunization can be effective against certain diseases and can save thousands of lives. However, for some diseases it has been difficult, so far, to develop an efficient vaccine. Malaria, a tropical disease caused by a parasite of the genus Plasmodium, ... More
Efficient causal inference with hidden confounders from genome-transcriptome variation dataNov 03 2016Natural genetic variation between individuals in a population leads to variations in gene expression that are informative for the inference of gene regulatory networks. Particularly, genome-wide genotype and transcriptome data from the same samples allow ... More
Efficient causal inference with hidden confounders from genome-transcriptome variation dataNov 03 2016Nov 06 2016Natural genetic variation between individuals in a population leads to variations in gene expression that are informative for the inference of gene regulatory networks. Particularly, genome-wide genotype and transcriptome data from the same samples allow ... More
Computational genomic algorithms for miRNA-based diagnosis of lung cancer: the potential of machine learningOct 28 2016The advent of large scale, high-throughput genomic screening has introduced a wide range of tests for diagnostic purposes. Prominent among them are tests using miRNA expression levels. Genomics and proteomics now provide expression levels of hundreds ... More
Aligning coding sequences with frameshift extension penaltiesOct 27 2016Frameshift translation is an important phenomenon that contributes to the appearance of novel Coding DNA Sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of genes coding regions. Frameshift translations ... More
Stratification of patient trajectories using covariate latent variable modelsOct 27 2016Standard models assign disease progression to discrete categories or stages based on well-characterized clinical markers. However, such a system is potentially at odds with our understanding of the underlying biology, which in highly complex systems may ... More
Functional architecture and global properties of the Corynebacterium glutamicum regulatory network: novel insights from a dataset with a high genomic coverageOct 26 2016Corynebacterium glutamicum is a Gram-positive, anaerobic, rod-shaped soil bacterium able to grow on a diversity of carbon sources like sugars and organic acids. It is a biotechnological relevant organism because of its highly efficient ability to biosynthesize ... More
A single step protein assay that is both detergent and reducer compatible: The cydex blue assayOct 24 2016Determination of protein concentration in often an absolute pre-requisite in preparing samples for biochemical and proteomic analyses. However, current protein assay methods are not compatible with both reducers and detergents, which are however present ... More
Full Reconstruction of Non-Stationary Strand-Symmetric Models on Rooted PhylogeniesOct 17 2016Understanding the evolutionary relationship between species is of fundamental importance to the biological sciences. The location of the root in any phylogenetic tree is critical as it gives an order to evolutionary events. None of the popular models ... More
Full Reconstruction of Non-Stationary Strand-Symmetric Models on Rooted PhylogeniesOct 17 2016Nov 13 2016Understanding the evolutionary relationship among species is of fundamental importance to the biological sciences. The location of the root in any phylogenetic tree is critical as it gives an order to evolutionary events. None of the popular models of ... More
Pan-genome Analysis of the Genus SerratiaOct 13 2016Pan-genome analysis is a standard procedure to decipher genome heterogeneity and diversification of bacterial species. Specie evolution is traced by defining and comparing the core (conserved), accessory (dispensable) and unique (strain-specific) gene ... More
A Unified Model for Differential Expression Analysis of RNA-seq Data via L1-Penalized Linear RegressionOct 11 2016The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. ... More