Monday, March 26, 2018


As biology turns increasingly into a data-rich science, the massive amount of data generated by high-throughput technologies present both new opportunities and serious challenges. We are interested in bioinformatics algorithm development and integrative mining for massive biological “BIG DATA”, to understand gene regulation at single-cell resolution and, ultimately, decipher the “coded message” in genome.

To this end, we employ cutting-edge high-performance computing, machine learning, and data visualization technology, and work at the the critical interface between bioinformatics, functional genomics. During past decades, we developed multiple online bioinformatic software tools and databases for efficient analyses of large-scale omics data. These tools and databases have had over 1.5 billion hits from users worldwide during past five years, demonstrating their global significance and impact.

Taking advantage of these powerful bioinformatics technical infrastructure, we probe and model the regulatory network systematically, with the aim to understanding the intricate mechanisms for cellular identity and fate decision, and ultimately, to the realization of a computational representation for a functioning cell, or the “cell in silico.

In particular, we believe that a holistic understand of cellular identity requires disentangling causal relationships from mere statistical associations, which is particularly critical in calling the bona fide “driver factors” for cell fate determination . However, conventional data-driven methods often prove inadequate when applied to a global scale. Inspired by the recent exciting achievement of Large Language Model (LLM), we’re devote to building up a Regulatory Language Model (RLM) which could learn (and model) the sophisticated interactions between genes and their regulatory elements by utilizing the wealth of large-scale single-cell multi-modality omics data and other functional genomic data, alike to what the LLMs like ChatGPT did to human languages.