Monday, March 26, 2018


Reverse chronological order, with * for (co-)corresponding and for equal contribution.

Recent Peer-Reviewed Papers


Handle Biological “BIG DATA” Effectively and Efficiently

  1. Cao Z. J., Wei L., Lu S., Yang D. C., Gao G.*. Searching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST. Nat. Commun. 11, 3458 (2020).[PubMed][Website][Press]
  2. Ke L., Yang D.C., Wang Y., Ding Y., Gao G.* 2020. AnnoLnc2: the one-stop portal to systematically annotate novel lncRNAs for human and mouse. Nucleic Acids Res 48(W1):W230-W238. [PubMed][Website][Press]
  3. Luo X., Tu X., Ding Y., Gao G.*, Deng M.* 2020. Expectation pooling: An effective and interpretable pooling method for predicting DNA-protein binding. Bioinformatics 36(5):1405. [PubMed][Website][Press]
  4. Xiong L., Xu K., Tian K., Shao Y., Tang L., Gao G., Zhang M., Jiang T., Zhang Q. C. 2019. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat Commun 10(1): 4576.[PubMed][Website][Press]
  5. Cheng S. J., Jiang S., Shi F. Y., Ding Y., Gao G.* 2018. Systematic identification and annotation of multiple-variant compound effects at transcription factor binding sites in human genome. J Genet Genomics 45(7): 373-379. [PubMed][Website][Press]
  6. Cheng S. J., Shi F. Y., Liu H., Ding Y., Jiang S., Liang N., Gao G.* 2017. Accurately annotate compound effects of genetic variants using a context-sensitive framework. Nucleic Acids Res 45(10): e82. [PubMed][Website]
  7. Kang Y. J., Yang D. C., Kong L., Hou M., Meng Y. Q., Wei L., Gao G.* 2017. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res 45(W1): W12-W16. [PubMed][Website]
  8. Tang Z., Li C., Kang B., Gao G., Li C., Zhang Z.* 2017. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res 45(W1): W98-W102. (Featured as ESI Highly Cited (Top 1%) Paper) [PubMed]
  9. Hou M., Tian F., Jiang S., Kong L., Yang D., Gao G.* 2016. LocExpress: a web server for efficiently estimating expression of novel transcripts. BMC Genomics 17(13): 175-179. (Featured as “Best Paper” at InCoB’16) [PubMed][Website]
  10. Hou M., Tang X., Tian F., Shi F., Liu F., Gao G.* 2016. AnnoLnc: a web server for systematically annotating novel human lncRNAs. BMC Genomics 17(1): 931. [PubMed][Website]
  11. Hu B., Jin J., Guo A. Y., Zhang H., Luo J., Gao G.* 2015. GSDS 2.0: an upgraded gene feature visualization server. Bioinformatics 31(8): 1296-1297. (Featured as ESI Highly Cited (Top 1%) Paper) [PubMed][Website]
  12. Xiao A., Cheng Z., Kong L., Zhu Z., Lin S., Gao G.*, Zhang B.* 2014. CasOT: a genome-wide Cas9/gRNA off-target searching tool. Bioinformatics 30(8): 1180-1182. (Featured as ESI Highly Cited (Top 1%) Paper) [PubMed][Website]

Decipher the Function and Evolution of Gene Regulatory Network

  1. Tian F., Yang D. C., Meng Y. Q., Jin J.*, Gao G.* 2020. PlantRegMap: charting functional regulatory maps in plants. Nucleic Acids Res 48(D1):D1104. [PubMed][Website][Press]
  2. Jiang S., Cheng S. J., Ren L. C., Wang Q., Kang Y. J., Ding Y., Hou M., Yang X. X., Lin Y., Liang N., Gao G.* 2019. An expanded landscape of human long noncoding RNA. Nucleic Acids Res 47(15):7842. (Featured as China’s top ten bioinformatics database of 2019) [PubMed][Website][Press]
  3. Peng L., Cheng S. J., Lin Y., Cui Q., Luo Y., Chu J., Shao M., Fan W., Chen Y., Lin A., Xi Y., Sun Y., Zhang L., Zhang C., Tan W., Gao G.*, Wu C.*, Lin D. 2018. CCGD-ESCC: A Comprehensive Database for Genetic Variants Associated with Esophageal Squamous Cell Carcinoma in Chinese Population. Genomics Proteomics Bioinformatics 16(4): 262-268. [PubMed][Website][Press]
  4. Xu R., Xu Y., Huo W., Lv Z., Yuan J., Ning S., Wang Q., Hou M., Gao G., Ji J., Chen J., Guo R.*, Xu D.* 2018. Mitosis-specific MRN complex promotes a mitotic signaling cascade to regulate spindle dynamics and chromosome segregation. Proc Natl Acad Sci U S A 115(43): E10079-E10088. [PubMed]
  5. Zhong L., Mu H., Wen B., Zhang W., Wei Q., Gao G., Han J.*, Cao S.* 2018. Long non-coding RNAs involved in the regulatory network during porcine pre-implantation embryonic development and iPSC induction. Sci Rep 8(1): 6649. [PubMed]
  6. Jin J., Tian F., Yang D. C., Meng Y. Q., Kong L., Luo J.*, Gao G.* 2017. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res 45(D1): D1040-D1045. (Featured as ESI Highly Cited (Top 1%) Paper) [PubMed][Website]
  7. Zhou Y., Wang P., Tian F., Gao G., Huang L.*, Wei W.*, Xie X. S.* 2017. Painting a specific chromosome with CRISPR/Cas9 for live-cell imaging. Cell Res 27(2): 298-301. [PubMed]
  8. Chen Z. X.*, Oliver B., Zhang Y. E., Gao G., Long M.* 2017. Expressed Structurally-stable Inverted Duplicates in Mammalian Genomes as Functional Noncoding Elements. Genome Biol Evol 9(4): 981-992. [PubMed]
  9. Feng S., Zhao Y., Xu Y., Ning S., Huo W., Hou M., Gao G., Ji J., Guo R.*, Xu D.* 2016. Ewing Tumor-associated Antigen 1 Interacts with Replication Protein A to Promote Restart of Stalled Replication Forks. J Biol Chem 291(42): 21956-21962. (Featured as “Highlights of 2016” by J. Biol. Chem.) [PubMed]
  10. Jin J., He K., Tang X., Li Z., Lv L., Zhao Y., Luo J., Gao G.* 2015. An Arabidopsis Transcriptional Regulatory Map Reveals Distinct Functional and Evolutionary Features of Novel Transcription Factors. Mol Biol Evol 32(7): 1767-1773. [PubMed][Website]
  11. Zhao Y., Tang L., Li Z., Jin J., Luo J., Gao G.* 2015. Identification and analysis of unitary loss of long-established protein-coding genes in Poaceae shows evidences for biased gene loss and putatively functional transcription of relics. BMC Evol Biol 15: 66. (Featured as “Very Good (being of special significance in its field)” by Faculty of 1000) [PubMed][Website]
  12. Xing M., Yang M., Huo W., Feng F., Wei L., Jiang W., Ning S., Yan Z., Li W., Wang Q., Hou M., Dong C., Guo R., Gao G., Ji J., Zha S., Lan L., Liang H., Xu D.* 2015. Interactome analysis identifies a new paralogue of XRCC4 in non-homologous end joining DNA repair pathway. Nat Commun 6: 6233. [PubMed]
  13. Gao G., Vibranovski M. D., Zhang L., Li Z., Liu M., Zhang Y. E., Li X., Zhang W., Fan Q., VanKuren N. W., Long M.*, Wei L.* 2014. A long-term demasculinization of X-linked intergenic noncoding RNAs in Drosophila melanogaster. Genome Res 24(4): 629-638. [PubMed]
  14. Jin J., Zhang H., Kong L., Gao G.*, Luo J.* 2014. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Res 42(1): D1182-1187. (Featured as ESI Highly Cited (Top 1%) Paper) [PubMed][Website]
  15. Li H., Yue R., Wei B., Gao G., Du J., Pei G.* 2014. Lysophosphatidic acid acts as a nutrient-derived developmental cue to regulate early hematopoiesis. EMBO J 33(12): 1383-1396. [PubMed]


  1. Ding Y., Wang M., He Y., Ye A. Y., Yang X., Liu F., Meng Y., Gao G.*, Wei L.* 2014. “Bioinformatics: Introduction and Methods,” a Bilingual Massive Open Online Course (MOOC) as a New Example for Global Bioinformatics Education. PLoS Comput Biol 10(12): e1003955. (The very first Education paper from China mainland at PLoS Comput Biol) [PubMed][Website]



  1. Ding Y., Li J. Y., Wang M., Tu X. M., Gao G.*. An exact transformation for CNN kernel enables accurate sequence motif identification and leads to a potentially full probabilistic interpretation of CNN. [bioRxiv][Website]
  2. Li J. Y., Jin S., Tu X. M., Ding Y.*, Gao G.*. Effective identification of sequence patterns via a new convolutional model with adaptively learned kernels. [bioRxiv][Website]
  3. Shi F. Y., Wang Y., Huang D., Liang Y., Liang N., Chen X. W., Gao G.*. Computational Assessment of the Regulation-Modulating Potential for Noncoding Variants. [bioRxiv][Website]
  4. Tian F., Zhou F.,Li X., Ma W. P., Wu H. G., Yang M., Alec R. Chapman, David F. Lee, Tan L. Z., Xing D., Yin G. J., Ayjan Semayel, Wang J., Wang J., Sun W. J., He R. S., Zhang S. W., Cao Z. J., Wei L., Lu S., Yang D. C., Mao Y. N., Gao Y., Chen K. X., Zhang Y., Liu X. X., Yong J., Yan L. Y., Huang Y. Y., Qiao J.*, Tang F. C.*, Gao G.*, Xie X.*. Genomic Architecture of Cells in Tissues (GeACT): Study of Human Mid-gestation Fetus. [bioRxiv][Website]

Peer-Reviewed Papers between 2006 and 2014


  1. Tang X., Hou M., Ding Y., Li Z., Ren L., Gao G.* 2013. Systematically profiling and annotating long intergenic non-coding RNAs in human embryonic stem cell. BMC Genomics 14(Suppl 5): S3. [PubMed]
  2. Shu J., Wu C., Wu Y., Li Z., Shao S., Zhao W., Tang X., Yang H., Shen L., Zuo X., Yang W., Shi Y., Chi X., Zhang H., Gao G., Shu Y., Yuan K., He W., Tang C.*, Zhao Y., Deng H.* 2013. Induction of pluripotency in mouse somatic cells with lineage specifiers. Cell 153(5): 963-975. [PubMed]
  3. Wang J., Kong L., Gao G., Luo J.* 2013. A brief introduction to web-based genome browsers. Brief Bioinform 14(2): 131-143. [PubMed]
  4. Xiao A., Wu Y., Yang Z., Hu Y., Wang W., Zhang Y., Kong L., Gao G., Zhu Z., Lin S.*, Zhang B.* 2013. EENdb: a database and knowledge base of ZFNs and TALENs for endonuclease engineering. Nucleic Acids Res 41(Database issue): D415-422. [PubMed]
  5. Huang Y., Xie C., Ye A. Y., Li C. Y., Gao G., Wei L.* 2013. Recent adaptive events in human brain revealed by meta-analysis of positively selected genes. PLoS One 8(4): e61280. [PubMed]
  6. Kong L., Wang J., Zhao S., Gu X., Luo J.*, Gao G.* 2012. ABrowse – a customizable next-generation genome browser framework. BMC Bioinformatics 13: 2. [PubMed][Website]
  7. Yue R., Li H., Liu H., Li Y., Wei B., Gao G., Jin Y., Liu T., Wei L., Du J., Pei G.* 2012. Thrombin receptor regulates hematopoiesis and endothelial-to-hematopoietic transition. Dev Cell 22(5): 1092-1100. [PubMed]
  8. Wang J., Kong L., Zhao S., Zhang H., Tang L., Li Z., Gu X., Luo J.*, Gao G.* 2011. Rice-Map: a new-generation rice genome browser. BMC Genomics 12: 165. [PubMed][Website]
  9. Chen Z. X., Zhang Y. E., Vibranovski M., Luo J., Gao G.*, Long M.* 2011. Deficiency of X-linked inverted duplicates with male-biased expression and the underlying evolutionary mechanisms in the Drosophila genome. Mol Biol Evol 28(10): 2823-2832. [PubMed][Website]
  10. Zhang H., Jin J., Tang L., Zhao Y., Gu X., Gao G.*, Luo J.* 2011. PlantTFDB 2.0: update and improvement of the comprehensive plant transcription factor database. Nucleic Acids Res 39(Database issue): D1114-1117. (Featured as ESI Highly Cited (Top 1%) Paper) [PubMed][Website]
  11. Zhang J., Gao G., Chen J. J., Taylor G., Cui K. M., He X. Q.* 2011. Molecular features of secondary vascular tissue regeneration after bark girdling in Populus. New Phytol 192(4): 869-884. [PubMed]
  12. Liu M., Liu P., Zhang L., Cai Q., Gao G., Zhang W., Zhu Z., Liu D.*, Fan Q.* 2011. mir-35 is involved in intestine cell G1/S transition and germ cell proliferation in C. elegans. Cell Res 21(11): 1605-1618. [PubMed]
  13. Xie C., Mao X., Huang J., Ding Y., Wu J., Dong S., Kong L., Gao G., Li C. Y., Wei L.* 2011. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res 39(Web Server issue): W316-322. (Featured as ESI Highly Cited (Top 1%) Paper) [PubMed][Website]
  14. Du P., Wu J., Zhang J., Zhao S., Zheng H., Gao G., Wei L., Li Y.* 2011. Viral infection induces expression of novel phased microRNAs from conserved cellular microRNA precursors. PLoS Pathog 7(8): e1002176. [PubMed]
  15. Shen L., Gao G.*, Zhang Y., Zhang H., Ye Z., Huang S., Huang J., Kang J.* 2010. A single amino acid substitution confers enhanced methylation activity of mammalian Dnmt3b on chromatin DNA. Nucleic Acids Res 38(18): 6054-6064. [PubMed]
  16. He K., Guo A. Y., Gao G., Zhu Q. H., Liu X. C., Zhang H., Chen X., Gu X., Luo J.* 2010. Computational identification of plant transcription factors and the construction of the PlantTFDB database. Methods Mol Biol 674: 351-368. [PubMed]
  17. Gao G., Li J. T., Kong L., Tao L., Wei L.* 2009. Human herpesvirus miRNAs statistically preferentially target host genes involved in cell signaling and adhesion/junction pathways. Cell Res 19(5): 665-667. [PubMed]
  18. Liu X., Wu J., Wang J., Liu X., Zhao S., Li Z., Kong L., Gu X., Luo J.*, Gao G.* 2009. WebLab: a data-centric, knowledge-sharing bioinformatic platform. Nucleic Acids Res 37(Web Server issue): W33-39. [PubMed][Website]
  19. Zhao S. Q., Wang J., Zhang L., Li J. T., Gu X., Gao G.*, Wei L.* 2009. BOAT: Basic Oligonucleotide Alignment Tool. BMC Genomics 10 Suppl 3: S2. [PubMed]
  20. Li Z., Zhang H., Ge S., Gu X., Gao G.*, Luo J.* 2009. Expression pattern divergence of duplicated genes in rice. BMC Bioinformatics 10(Suppl 6): S8. [PubMed]
  21. Li Z., Liu M., Zhang L., Zhang W., Gao G., Zhu Z., Wei L., Fan Q.*, Long M.* 2009. Detection of intergenic non-coding RNAs expressed in the main developmental stages in Drosophila melanogaster. Nucleic Acids Res 37(13): 4308-4314. [PubMed]
  22. Zhao M., Chen X., Gao G., Tao L., Wei L.* 2009. RLEdb: a database of rate-limiting enzymes and their regulation in human, rat, mouse, yeast and E. coli. Cell Res 19(6): 793-795. [PubMed]
  23. Guo A. Y., Chen X., Gao G., Zhang H., Zhu Q. H., Liu X. C., Zhong Y. F., Gu X., He K.*, Luo J.* 2008. PlantTFDB: a comprehensive plant transcription factor database. Nucleic Acids Res 36(Database issue): D966-969. [PubMed][Website]
  24. Liu H., Zhu F., Yong J., Zhang P., Hou P., Li H., Jiang W., Cai J., Liu M., Cui K., Qu X., Xiang T., Lu D., Chi X., Gao G., Ji W., Ding M., Deng H.* 2008. Generation of induced pluripotent stem cells from adult rhesus monkey fibroblasts. Cell Stem Cell 3(6): 587-590. [PubMed]
  25. Kong L., Zhang Y., Ye Z. Q., Liu X. Q., Zhao S. Q., Wei L.*, Gao G.* 2007. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35(Web Server issue): W345-349. (Featured as ESI Highly Cited (Top 1%) Paper) [PubMed][Website]
  26. Sun Y., Zhao S., Yu H., Gao G.*, Luo J.* 2007. ABCGrid: Application for Bioinformatics Computing Grid. Bioinformatics 23(9): 1175-1177. [PubMed][Website]
  27. Ren L., Gao G., Zhao D., Ding M., Luo J., Deng H. 2007. Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprints with models of stem cell differentiation. Genome Biol 8(3): R35. [PubMed]
  28. Ye Z. Q., Zhao S. Q., Gao G., Liu X. Q., Langlois R. E., Lu H., Wei L.* 2007. Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics 23(12): 1444-1450. [PubMed][Website]
  29. Zhang Y., Li J., Kong L., Gao G., Liu Q. R., Wei L.* 2007. NATsDB: Natural Antisense Transcripts DataBase. Nucleic Acids Res 35(Database issue): D156-161. [PubMed][Website]
  30. Zhu Q. H., Guo A. Y., Gao G., Zhong Y. F., Xu M., Huang M., Luo J.* 2007. DPTF: a database of poplar transcription factors. Bioinformatics 23(10): 1307-1308. [PubMed][Website]
  31. Gao G., Zhong Y., Guo A., Zhu Q., Tang W., Zheng W., Gu X., Wei L.*, Luo J.* 2006. DRTF: a database of rice transcription factors. Bioinformatics 22(10): 1286-1287. [PubMed][Website]