Jiahao Li

Computer Science

mail: lijiahao9784@gmail.com
phone: (+86) 183 2038 9784

About Me

My name is Jiahao Li. I am currently a graduate student in the School of Computer Science and Technology at Harbin Institute of Technology, Shenzhen, scheduled to graduate in March 2024. Prior to this, I obtained my bachelor's degree from Shenzhen University in 2021. Through my master's research and an internship experience in Baidu, I have come to a profound realization that academia forms the unwavering foundation of my happiness and lifelong philosophy. This realization is the fundamental motivation behind my application for a Ph.D. program.

During my graduate studies, I engaged in research areas such as Generative Adversarial Networks and their application in biological sequences. Up to now, I have published three academic papers—one accepted by the Journal of Bioinformatic Advances and two accepted by academic conferences (IEEE International Conference on Bioinformatics and Biomedicine, and The 38th Annual AAAI Conference on Artificial Intelligence).
I am most skilled in: Generative Adversity Networks, Sequence Synthetic and Optimization and Molecular Design.


Education

Harbin Institute of Technology

Master of Computer Technology

Average Score: 85.86/100.0; Ranking: 14/114 (Top 12%)

September.2021 ~ March.2024

Shenzhen University

Bachelor of Software Engineering

Average Score: 83.5/100.0

September.2017 ~ June.2021



Academic Paper

[1] Junjie Chen, Jiahao Li, Chen Song, Bin Li, Qingcai Chen, Hongchang Gao, Wendy Hui Wang, Zenglin Xu, Xinghua Shi. Discriminative Forests Improve Generative Diversity for Generative Adversarial Networks. Accepted by AAAI-2024, February 2024. Junjie Chen is my Supervisor.

[2] Jiahao Li, Jiawei Luo, Xianliang Liu, Junjie Chen. High-Activity Enhancer Generation based on Feedback GAN with Domain Constraint and Curriculum Learning. 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Turkiye, 2023, pp. 2065-2070, https://doi.org/10.1109/BIBM58861.2023.10385376.

[3] Jiahao Li, Zhourun Wu, Wenhao Lin, Jiawei Luo, Jun Zhang, Qingcai Chen, Junjie Chen. iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models. Bioinformatics Advances, Volume 3, Issue 1, 2023, vbad043, https://doi.org/10.1093/bioadv/vbad043.



Research Interests

  • Biological Sequence Generation and Optimization for target properties.
  • Molecules design from scratch and its optimization.
  • I have delved into the interdisciplinary field of Artificial Intelligence and Biology, and have successed in enhancing Generative Adversarial Networks (GANs) with Discriminator Forest and employing GANs to generate biological sequences and optimize them for high activity. At present, I focus on Protein Design and Molecular Design, including Synthetic and Optimization. And the long-term goal of my research is to design molecule accessible from available material, targeting specific protein.


    Research Experience

    Discriminator Forests for Improving Generative Diversity

    May. 2022 -- Sep. 2023
  • We proposed a discriminator forest-based Generative Adversarial Networks to improve its generative diversity theoretically. Discriminator Forest consists of a number of independent discriminators.
  • The generalization error bound of Discriminator Forest is determined by the strength of individual discriminators and the correlation among them. Reducing generalization errors to improve diversity.
  • Discriminator Forest improve the FID score from 30.71 to 19.27 on STL10 (96×96), and from 9.22 to 6.87 on LSUN-Cat (256×256).
  • The Framework of Forest-GAN
    Discriminative Forest GAN (Forest-GAN) that consists of a number of discriminators built upon bootstrapping datasets. The predictive results of multiple discriminators are aggregated by an aggregation function.
  • The upper bound of generalization is determined by the strength of individual discriminators and their correlation.
  • The Result of Forest-GAN
    Forest-GAN improves the generation performance. Forest-GAN further improves the performance to a new record FID of 6.87. The generator from Forest-GAN can generate more realistic images from intermediate style, while the base model fails to restore the intermediate mode.

    Enhancer Generation and Optimization for High Activity

    Nov. 2022 -- Sep. 2023
  • We proposed feedback mechanism-based enhancer generation and optimization for high activity. Feedback mechanism guide the generator to focus on feature associated with sequence activity.
  • Feedback mechanism is integrated with Generative Adversarial Networks. Add a domain constraint to alleviate the noise from the feedback loop. Curriculum learning raises the goal of optimization from low-activity enhancers to high-activity enhancers to accelerate the training convergence.
  • Feedback optimization improves the average activity of the generated sequences from 0.5 to 4.5, and the similarity among them is less than 80%. And we find out 10 motifs during the optimization process, which can directly improve enhancer activity
  • The Framework of Enhancer-GAN
    Enhancer-GAN is based on the feedback-loop mechanism. It combines domain constraint and curriculum learning to alleviate the external noise and accelerate the optimization processing. The external analyzer is explored to make prediction for synthetic samples.

    The Result-1 of Enhancer-GAN
    The activity distribution of generated sequences under three strategies in the optimization process. The base model makes little improvement for the activity of the generations, wihle the feedback mechanism significantly improves their activity and the curriculum learning accelerate the training convergence.

    The Result-2 of Enhancer-GAN
    The activity changes caused by motif replacement among fine motifs and flaw motifs.
    (A) The activity of enhancers in low-activity is increased due to the replacement of the flaw motif (GGCTTATA) with 10 fine motifs.
    (B) The activity of enhancers in high-activity is decreased due to the replacement of fine motifs (GACTCACA) to 10 flaw motifs.

    Enhancer Identification based on the Language Model

    Nov. 2021 -- Dec. 2022
  • We proposed a language model-based enhancer identification, utilizing its attention mechanism to calculate feature representation for original sequences. Add adversarial training to mitigate overfitting.
  • Compared to benchmark methods, our method improves the performance of ACC by 4.0% on a balanced dataset, and 5.0% on the imbalanced dataset.
  • We employ the attention mechanism to extract the motif sequences from raw sequences and identify 30 motifs, with 6 of them significantly matching with TFBS in JASPAR.
  • The Framework of iEnhancer-ELM
    We explore the language model into the identification task. iEnhancer-ELM tokenizes DNA sequences with different scale k-mers and captures the contextual information of k-mers by incorporating pre-trained BERT-based enhancer language models.

    The Attention Analysis
    We analyze the captured biological patterns by enhancer language models via exploring the weights in attention mechanism.
  • Calculate the attention weight of a nucleotide in an enhancer sequence.
  • Find out the potential patterns.
  • Filter significant candidates.
  • Generate motifs according to the sequence alignment.
  • The Result of iEnhancer-ELM
    Motif discovery via attention mechanism in enhancer language models.
    (A) Motifs have high attention weights in corresponding regions. The highlighted regions are the motifs captured by attention mechanism.
    (B) 30 discovered motifs by the enhancer language model based on 3-mer on the Liu's dataset.



    Internship Experience

    Molecular Optimization based on Building Blocks

    May. 2023 - Aug. 2023
  • We dedicated to exploring molecular optimization method within discrete space, with the goal of designing synthesizable molecules for desirable properties in a bottom-up manner.
  • The discrete space consists of 180K purchasable building blocks and 91 reaction templates. Genetic Algorithm is our basic model, and the external analyzer was used as the scoring system according to the target property. Another surrogate model is added to accelerate model convergence.
  • We select three properties, including GSK, JNK and synthetics accessibility. And the results show that when our synthetic molecules exhibit higher synthesizability when they have the same score of JSK with existing methods
  • Molecular Optimization based on Building Blocks
    Constructing a generation tree based on purchasable moleculars and reaction templates, wherein the leaf nodes of the generation tree are selected from 180K existing building blocks, and the reaction templates form the branches of the tree, generating the root node from bottom to top. And then, utilize Genetic Algorithms to optimize the leaf nodes in order to generate products that satisfy desired properties.



    Academic Activities

    Oral Presentation in BIBM-2023

    December 8, 2023

    I, the presenter, went to the main venue of BIBM-2023 to make an oral report and introduce my work of Enhancer Generation and Optimization for High Activity.

    AAAI-2024 Pre-Talk Session

    December 24, 2023

    I participated in the Pre-Talk Session, hold by the Shenzhen Computer Federation. I am the presenter to introduce my work of Discriminator Forest for Improving Generative Diversity.



    A Little More About Me

    Alongside my academic interests, some of my other interests and hobbies are:

    • Sports: Running, Basketball, Badminton and others
    • Leisure Activities: Hiking, Reading