As biology turns increasingly into a data-rich science, the massive amount of data generated by high-throughput technologies present both new opportunities and serious challenges. We are interested in bioinformatics algorithm development and integrative mining for massive biological “BIG DATA”, to understand gene regulation at single-cell resolution and, ultimately, decipher the “coded message” in genome.
To this end, we employ cutting-edge high-performance computing, machine learning, and data visualization technology, and work at the the critical interface between bioinformatics, functional genomics. During past decades, we developed multiple online bioinformatic software tools and databases for efficient analyses of large-scale omics data. These tools and databases have had over 400 million hits from users worldwide, demonstrating their global significance and impact.
Handle Biological “BIG DATA” Effectively and Efficiently. To store, manage, and analyze Peta-scale omics data, we designed, via close collaboration with colleagues from academia and industry, multiple powerful computing and data integrating infrastructures. Moreover, we also make several freely available to the global community, as both open source toolkit and public webserver (such as the WebLab and ABrowse).
Decipher the Function and Evolution of Gene Regulatory System. Taking advantage of these powerful bioinformatics technical infrastructure, we have been studying the functionality and evolutionary dynamics of two important classes of regulators, transcription factors in plants and long noncoding RNAs in human and several other organisms. In addition to multiple algorithms and databases for functionally identifying and annotating these regulators, we also found that novel (i.e., evolutionarily young) regulators can play key roles in multiple biological processes by “re-wiring” existing regulatory circuits. Moreover, our pioneering study on gene loss further sheds lights onto the one of the essential “grand questions”: how and why a well-established gene can get lost during evolution.
Interpret Genetic Variants Accurately. As part of Chinese Precision Medicine Initiative, the lab has been devoting to develop novel bioinformatic toolkit for accurately annotating and interpreting genetic variants’ functional effects in disease since 2016. Our efforts result in the Context-Oriented Predictor for variant Effect (COPE), the first context-sensitive variant annotation tool.