School of AI and Advanced Computing

1. Predict Epitranscriptome Targets and Regulatory Functions of N-6-Methyladenosine (m(6)A) Writers and Erasers

Author:Song, YY;Xu, QR;Wei, Z;Zhen, D;Su, JL;Chen, KQ;Meng, J


Abstract:Currently, although many successful bioinformatics efforts have been reported in the epitranscriptomics field for N-6-methyladenosine (m(6)A) site identification, none is focused on the substrate specificity of different m(6)A-related enzymes, ie, the methyltransferases (writers) and demethylases (erasers). In this work, to untangle the target specificity and the regulatory functions of different RNA m6A writers (METTL3-METT14 and METTL16) and erasers (ALKBH5 and FTO), we extracted 49 genomic features along with the conventional sequence features and used the machine learning approach of random forest to predict their epitranscriptome substrates. Our method achieved reasonable performance on both the writer target prediction (as high as 0.918) and the eraser target prediction (as high as 0.888) in a 5-fold crossvalidation, and results of the gene ontology analysis of their preferential targets further revealed the functional relevance of different RNA methylation writers and erasers.
2. Chromosome Classification with Convolutional Neural Network based Deep Learning

Author:Zhang, WB;Song, SF;Bai, TM;Zhao, YX;Ma, F;Su, JL;Yu, LM


Abstract:Karyotyping plays a crucial role in genetic disorder diagnosis. Currently Karyotyping requires considerable manual efforts, domain expertise and experience, and is very time consuming. Automating the karyotyping process has been an important and popular task. This study focuses on classification of chromosomes into 23 types, a step towards fully automatic karyotyping. This study proposes a convolutional neural network (CNN) based deep learning network to automatically classify chromosomes. The proposed method was trained and tested on a dataset containing 10304 chromosome images, and was further tested on a dataset containing 4830 chromosomes. The proposed method achieved an accuracy of 92.5%%, outperforming three other methods appeared in the literature. To investigate how applicable the proposed method is to the doctors, a metric named proportion of well classified karyotype was also designed. An result of 91.3%% was achieved on this metric, indicating that the proposed classification method could be used to aid doctors in genetic disorder diagnosis.
3. Bioinformatics approaches for deciphering the epitranscriptome: Recent progress and emerging topics

Author:Liu, L;Song, BW;Ma, JN;Song, Y;Zhang, SY;Tang, YJ;Wu, XY;Wei, Z;Chen, KQ;Su, JL;Rong, R;Lu, ZL;de Magalhaes, JP;Rigden, DJ;Zhang, L;Zhang, SW;Huang, YF;Lei, XJ;Liu, H;Meng, J


Abstract:Post-transcriptional RNA modification occurs on all types of RNA and plays a vital role in regulating every aspect of RNA function. Thanks to the development of high-throughput sequencing technologies, transcriptome-wide profiling of RNA modifications has been made possible. With the accumulation of a large number of high-throughput datasets, bioinformatics approaches have become increasing critical for unraveling the epitranscriptome. We review here the recent progress in bioinformatics approaches for deciphering the epitranscriptomes, including epitranscriptome data analysis techniques, RNA modification databases, disease-association inference, general functional annotation, and studies on RNA modification site prediction. We also discuss the limitations of existing approaches and offer some future perspectives. (C) 2020 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
4. Learning Bionic Motions by Imitating Animals


Source:2020 IEEE International Conference on Mechatronics and Automation, ICMA 2020,2020,Vol.

Abstract:Motion control algorithms for quadruped robots undergo rapid development in recent years. Interactive quadruped robots have demonstrated they may positively enhance the effect of psychotherapy in the treatment of patients with cognitive impairment, which requires them to have more interactive capabilities than traditional quadruped robots. In this study, we focus on enabling interactive quadruped robots to imitate real animal motions extracted from videos, by which the design of robotic motion controllers can be simplified and the bionic degree and the interactive capabilities of the robots can be enhanced. The motion capture data, however, cannot be directly utilized by the motion controllers since the robots and the real animals differ in their respective body geometries, motion dynamics and the numbers of DOF. To address these differences, we propose two strategies for imitating two different kind of motions. For ordinary motions (head scratching, waving, etc.), we first apply a scaling method to motion captured data and then use an inverse kinematic algorithm for imitation. Furthermore, to minimize the error of motion trajectories between the real animals and the robots, we then transform motion trajectories into a nonlinear optimization problem. For walking motions, we first analyze a classical SLIP model-based walking control algorithm for quadruped robots, and then apply the parameters extracted from motion captured data to the walking control algorithm. Experiments based on an interactive quadruped robot we developed demonstrate that our proposed strategies have great potential in improving the imitation capability of robots on the motions of real animals.
5. An Improved Algorithm for Estimating the Distribution of RNA-related Genomic Features

Author:Wu, Jinge ; Zhang, Lihan ; Weng, Yuanzhe ; Meng, Jia ; Su, Jionglong ; Wang, Yue

Source:ACM International Conference Proceeding Series,2020,Vol.

Abstract:In this paper, we look into the correction on the ambiguities in the conversion between genome-based coordinates and RNA-based coordinates. An improved algorithm for estimating the distribution of RNA-related genomic features is proposed based on our previous article, 'Guitar An R/Bioconductor Package for Gene Annotation Guided Transcriptomic Analysis of RNA-Related Genomic Features'. It applies Expectation Maximization algorithm to estimate RNA-related genomic features using iterations. After each iteration, the proportion of the real distribution coordinates is increased, and the result is closer to the real distribution, demonstrating validity and effectiveness of our proposed method. © 2020 ACM.
6. Structural Compartson of Gene Relevance Networks for Breast Cancer Tissues in Different Grades

Author:Zhang, YL;Dong, YL;Lv, KB;Zhao, QF;Su, JL


Abstract:Background: The breast is an important biological system of human with two distinct states, i.e. normal and tumoral. Research on breast cancer could be based on systematic modeling to contrast the system structures of these two states. Objective: We use mutual information for the construction of the gene network of breast tissues and normal tissues. These gene networks are analyzed, compared as well as classified. We also identify structural key genes that may play significant roles in the formation of breast cancer. Method: Gene networks are constructed using with mutual information values. Four structural parameters, namely node degree, clustering coefficient, shortest path length and standard betweenness centrality, are used for analyzing the gene networks. Support vector machine is used to classify the gene networks into normal and disease states. Genes with standard betweenness centrality of greater than 0.3 are identified as possibly significant in the development of breast cancer. Result: The classification of the gene networks into normal and disease states suggest that the vectors of parameters are linearly separable by any combinations of these four structural parameters. In addition, the six genes BAK1, RRAD, LCN2, EGFR, ZAP70 and FOSB are identified to possibly play significant roles in the formation of breast cancer. Conclusion: In this work, four structural parameters have been generalized to the relevance networks. These parameters are found to distinguish gene networks of normal and cancerous breast tissues at different thresholds. In addition, the six genes identified may motivate further studies and research in breast cancer.
7. Application of Features and Neural Network to Enhance the Performance of Deep Reinforcement Learning in Portfolio Management

Author:Gu, FC;Jiang, ZY;Su, JL


Abstract:Portfolio management is the decision-making process of allocating a certain amount of funds to multiple financial assets and continuously changing the distribution weights to increase returns and reduce risks. With the advance in artificial intelligence technology, it has become possible to use computers for self-learning and large-scale calculations, and to achieve optimized portfolio management. This paper mainly studies and analyzes the problem of portfolio optimization in the digital currency market, uses Poloniex's historical transaction data of digital currency to conduct experiments, and proposes a strategy based on the framework of deep reinforcement learning algorithms. The investment strategy framework uses Convolutional Neural Network and Visual Geometry Group Network. In addition to the closing price, highest price and lowest price, we also consider other internal or external features such as Network Value to Transaction Volume Ratio, Market Value to Realized Value Ratio, Return on Investment and Volatility. The results show that the return rate of our algorithm based on VGG with NVT as feature is 11.05%% better than the work of Jiang et al. and at least 110%% better than investment strategies such as Moving Average Reversion and Robust Median Reversion.
8. A Framework of Hierarchical Deep Q-Network for Portfolio Management

Author:Gao, Y;Gao, ZM;Hu, Y;Song, SF;Jiang, ZY;Su, JL


Abstract:Reinforcement Learning algorithms and Neural Networks have diverse applications in many domains, e.g., stock market prediction, facial recognition and automatic machine translation. The concept of modeling the portfolio management through a reinforcement learning formulation is novel, and the Deep Q-Network has been successfully applied to portfolio management recently. However, the model does not take into account of commission fee for transaction. This paper introduces a framework, based on the hierarchical Deep Q-Network, that addresses the issue of zero commission fee by reducing the number of assets assigned to each Deep Q-Network and dividing the total portfolio value into smaller parts. Furthermore, this framework is flexible enough to handle an arbitrary number of assets. In our experiments, the time series of four stocks for three different time periods are used to assess the efficacy of our model. It is found that our hierarchical Deep Q-Network based strategy outperforms ten other strategies, including nine traditional strategies and one reinforcement learning strategy, in profitability as measured by the Cumulative Rate of Return. Moreover, the Sharpe ratio and Max Drawdown metrics both demonstrate that the risk of policy associated with hierarchical Deep Q-Network is the lowest among all ten strategies.
9. Gibbs Sampling Based Banoian Biclustering of Gene Expression Data


Source:Proceedings - 2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2020,2020,Vol.

Abstract:© 2020 IEEE. This paper proposes a rigorous Bayes model to infer biclusters of microarray data formed by gene sets and condition sets. The model employs few fine-tune threshold parameters and handles missing data by statistically inferring them in Gibbs sampling. The proposed model outperforms others on simulated data and discovered meaningful local patterns, 63%% of which were corroborated by biological evidence.
10. An Event-Triggered Low-Cost Tactile Perception System for Social Robot's Whole Body Interaction

Author:Lin, SZ;Su, JL;Song, SF;Zhang, JM

Source:IEEE ACCESS,2021,Vol.9

Abstract:The social interaction is one of the necessary skills for social robots to better integrate into human society. However, current social robots interact mainly through audio and visual means with little reliance on haptic interaction. There still exist many obstacles for social robots to interact through touch: 1) the complex manufacturing process of the tactile sensor array is the main obstacle to lowering the cost of production; 2) the haptic interaction mode is complex and diverse. There are no social robot interaction standards and data sets for tactile interactive behavior in the public domain. In view of this, our research looks into the following aspects of tactile perception system: 1) Development of low-cost tactile sensor array, including sensor principle, simulation, manufacture, front-end electronics, examination, then applied to the social robot's whole body; 2) Establishment of the tactile interactive model and an event-triggered perception model in a social interactive application for the social robot, then design preprocessing and classification algorithm. In this research, we use k-nearest neighbors, tree, support vector machine and other classification algorithms to classify touch behaviors into six different classes. In particular, the cosine k-nearest neighbors and quadratic support vector machine achieve an overall mean accuracy rate of more than 68%%, with an individual accuracy rate of more than 80%%. In short, our research provides new directions in achieving low-cost intelligent touch interaction for social robots in a real environment. The low-cost tactile sensor array solution and interactive models are expected to be applied to social robots on a large scale.
11. Detection of m6A RNA Methylation in Nanopore Sequencing Data Using Support Vector Machine

Author:Jia, Shen ; Luo, Haochen ; Gao, Qiheng ; Guo, Jiaqi ; Su, Jionglong ; Meng, Jia ; Wu, Xiangyu

Source:Proceedings - 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2019,2019,Vol.

Abstract:N6-methyladenosine (m6A) is a prevalent internal modification in RNA which plays an important role in epitranscriptomics. The detection of m6A may be carried out by utilizing the Oxford Nanopore Technology (ONT) and machine learning. In this research, following a previous study by Liu et al, we hypothesize that the current intensitychange of the modification of the RNA(N6-methyladenosine) is the result of base-calling errors(mismatch frequency, deletion frequency, per-base quality and current intensity). We apply the Curlake, EpiNano software to divide the raw data into 5-mer sequences and extract features from the RNA sequence. The SVM classifier is used to verify this assumption. Our results confirmed the finding of a previous study by Liu et al, suggesting that the base-calling 'errors'may be usedto identify the N6-methyladenesine(m6A), and the consideration of the neighbourhood nucleotides of the 5mer will improve the accuracy of our prediction. © 2019 IEEE.
12. A Novel XGBoost Method to Identify Cancer Tissue-of-Origin Based on Copy Number Variations

Author:Zhang, YL;Feng, T;Wang, SD;Dong, RY;Yang, JL;Su, JL;Wang, B


Abstract:The discovery of cancer of unknown primary (CUP) is of great significance in designing more effective treatments and improving the diagnostic efficiency in cancer patients. In the study, we develop an appropriate machine learning model for tracing the tissue of origin of CUP with high accuracy after feature engineering and model evaluation. Based on a copy number variation data consisting of 4,566 training cases and 1,262 independent validation cases, an XGBoost classifier is applied to 10 types of cancer. Extremely randomized tree (Extra tree) is used for dimension reduction so that fewer variables replace the original high-dimensional variables. Features with top 300 weights are selected and principal component analysis is applied to eliminate noise. We find that XGBoost classifier achieves the highest overall accuracy of 0.8913 in the 10-fold cross-validation for training samples and 0.7421 on independent validation datasets for predicting tumor tissue of origin. Furthermore, by contrasting various performance indices, such as precision and recall rate, the experimental results show that XGBoost classifier significantly improves the classification performance of various tumors with less prediction error, as compared to other classifiers, such as K-nearest neighbors (KNN), Bayes, support vector machine (SVM), and Adaboost. Our method can infer tissue of origin for the 10 cancer types with acceptable accuracy in both cross-validation and independent validation data. It may be used as an auxiliary diagnostic method to determine the actual clinicopathological status of specific cancer.
13. m6Acomet: large-scale functional prediction of individual m(6)A RNA methylation sites from an RNA co-methylation network

Author:Wu, XY;Wei, Z;Chen, KQ;Zhang, Q;Su, JL;Liu, H;Zhang, L;Meng, J


Abstract:BackgroundOver one hundred different types of post-transcriptional RNA modifications have been identified in human. Researchers discovered that RNA modifications can regulate various biological processes, and RNA methylation, especially N6-methyladenosine, has become one of the most researched topics in epigenetics.ResultsTo date, the study of epitranscriptome layer gene regulation is mostly focused on the function of mediator proteins of RNA methylation, i.e., the readers, writers and erasers. There is limited investigation of the functional relevance of individual m(6)A RNA methylation site. To address this, we annotated human m(6)A sites in large-scale based on the guilt-by-association principle from an RNA co-methylation network. It is constructed based on public human MeRIP-Seq datasets profiling the m(6)A epitranscriptome under 32 independent experimental conditions. By systematically examining the network characteristics obtained from the RNA methylation profiles, a total of 339,158 putative gene ontology functions associated with 1446 human m(6)A sites were identified. These are biological functions that may be regulated at epitranscriptome layer via reversible m(6)A RNA methylation. The results were further validated on a soft benchmark by comparing to a random predictor.ConclusionsAn online web server m6Acomet was constructed to support direct query for the predicted biological functions of m(6)A sites as well as the sites exhibiting co-methylated patterns at the epitranscriptome layer. The m6Acomet web server is freely available at:
14. Modeling Gene Networks in Saccharomyces cerevisiae Based on Gene Expression Profiles

Author:Zhang, YL;Lv, KB;Wang, SD;Su, JL;Meng, DZ


Abstract:Detailed and innovative analysis of gene regulatory network structures may reveal novel insights to biological mechanisms. Here we study how gene regulatory network in Saccharomyces cerevisiae can differ under aerobic and anaerobic conditions. To achieve this, we discretized the gene expression profiles and calculated the self-entropy of down-and upregulation of gene expression as well as joint entropy. Based on these quantities the uncertainty coefficient was calculated for each gene triplet, following which, separate gene logic networks were constructed for the aerobic and anaerobic conditions. Four structural parameters such as average degree, average clustering coefficient, average shortest path, and average betweenness were used to compare the structure of the corresponding aerobic and anaerobic logic networks. Five genes were identified to be putative key components of the two energy metabolisms. Furthermore, community analysis using the Newman fast algorithm revealed two significant communities for the aerobic but only one for the anaerobic network. David Gene Functional Classification suggests that, under aerobic conditions, one such community reflects the cell cycle and cell replication, while the other one is linked to the mitochondrial respiratory chain function.
15. m7GHub: deciphering the location, regulation and pathogenesis of internal mRNA N7-methylguanosine (m(7)G) sites in human

Author:Song, BW;Tang, YJ;Chen, KQ;Wei, Z;Rong, R;Lu, ZL;Su, JL;de Magalhaes, JP;Rigden, DJ;Meng, J


Abstract:Motivation: Recent progress in N7-methylguanosine (m7G) RNA methylation studies has focused on its internal (rather than capped) presence within mRNAs. Tens of thousands of internal mRNA m(7)G sites have been identified within mammalian transcriptomes, and a single resource to best share, annotate and analyze the massive m(7)G data generated recently are sorely needed. Results: We report here m(7)GHub, a comprehensive online platform for deciphering the location, regulation and pathogenesis of internal mRNA m(7)G. The m(7)GHub consists of four main components, including: the first internal mRNA m(7) G database containing 44 058 experimentally validated internal mRNA m(7) G sites, a sequence-based high-accuracy predictor, the first web server for assessing the impact of mutations on m7 G status, and the first database recording 1218 disease-associated genetic mutations that may function through regulation of m(7)G methylation. Together, m(7)GHub will serve as a useful resource for research on internal mRNA m(7)G modification.
16. Location-aware convolutional neural networks based breast tumor detection


Source:IET Conference Publications,2018,Vol.2018

Abstract:Breast cancer is one of the most common types of cancer affecting the lives of millions. Early detection and localization of the breast cancer tissues are vital for prevention and cure. Recently, there have been a number of developments on this front, particularly in the direction of automated image analysis. Although they are instrumental in expediting the process, such approaches lack the localization information and hence still demand substantial involvement of clinicians to deliver conclusive results. In this paper, we propose a novel approach for detecting and localizing cancer tissues from mammograms. In particular, we rely on Convolutional Neural Networks for exploiting the spatial relationship of the cancer tissues for detection and localization. Our evaluations on real datasets show that the proposed method is able to classify normal and tumor tissues with the classification accuracy of 90.8%%. Furthermore, our approach achieves the sensitivity of 86.1%% in detection with 1.4 false positives per image on the localization. In comparison to the state-of-the-art approaches, our method offers an additional 1.1%% sensitivity improvement, along with reduced two false positives per image.
17. Particle Filter Based Time Series Prediction of Daily Sales of an Online Retailer

Author:Ping, XY;Chen, QY;Liu, GQ;Su, JL;Ma, F


Abstract:Accurate prediction of sales is instrumental to successful management in the industries. It is crucial in formulating business strategies under uncertainties. In this paper, we consider time series in which observations are arriving sequentially. An online time series model integrating with particle filter is used for predicting sales of 80 products in a local online retailer over 400 days. We embed an Autoregressive model into a state space model and carry out time series prediction for all 80 products using a particular Particle Filter called the Sampling Importance Resampling Filter. Our experiment shows that the proposed model successfully predicts 27.5%% of sales fluctuating within 10%% of the true values. Furthermore, it outperforms the traditional Autoregressive Integrated Moving Average model by 5%% for the same metric used.
18. Re-Identification Based Automatic Matching and Annotation of Chromosome

Author:Wang, Chengyu ; Huang, Daiyun ; Guo, Jingwei ; Su, Jionglong ; Ma, Fei ; Yu, Limin

Source:Proceedings - 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2019,2019,Vol.

Abstract:Karyotyping of human chromosomes generally consists of three steps pre-processing, segmentation and classification. By analyzing the number and structure of chromosomes, diseases such as cancers and genetic disorders can be diagnosed. Besides the traditional methods, The Convolutional Neural Network have improved the computer vision area dramatically. When it comes to chromosome karyotyping, few research methods have been proposed to solve the problem of segmentation and classification. This paper proposes an innovative automatic strategy named Chromosome-Automatic-Annotation (CAA) model, which labels the single chromosomes in microscopic images by 1) applying a joint loss consists of softmax loss and center loss to enlarge the distance of features among the 24 classes; 2) employing the similarity matrix to annotate the single chromosome images in Query Queue with the single chromosome in Gallery Queue. With a dataset of 90624 single chromosome images, after 50 epoch training, the proposed model reached an accuracy of 98.75%% for automatic annotation of the chromosome images on a test set of 644 images. © 2019 IEEE.
19. Extended ResNet and Label Feature Vector Based Chromosome Classification

Author:Wang, CY;Yu, LM;Zhu, X;Su, JL;Ma, F

Source:IEEE ACCESS,2020,Vol.8

Abstract:Human chromosome classification is essential to the clinical diagnosis of cytogenetical diseases such as genetic disorders and cancer. This process, however, is time-consuming and requires specialist knowledge. Considerable efforts have been made to automat the process. Recently, methods based on Convolutional Neural Networks achieved state-of-the-art results on the chromosome classification task. Many studies used karyotype images in performance evaluation, few studies have reported the results of human chromosome classification on microscopical images. This paper proposes a novel method to classify single chromosome images into one of 24 types. In the proposed method an extended ResNet was first devised to extract features of single chromosome images. A label feature vector was then extracted for each of 24 chromosome types based on a validation dataset. Hausdorff distance between feature vector of an input image and each of 24 label feature vectors were calculated, and the label feature vector that has minimum hausdorff distance to the feature vector of the input image was selected as the potential label of the input image. To finally allocate the single chromosomes from a same microscopical image into one of 24 types, a Label Redistribution strategy was used to shrink the label space and to increase the efficiency of chromosome classification. Experiments were implemented with 90,624 single chromosome images, 644 of which were randomly picked to form a testing set in advance. The classification accuracy on microscopical images using our proposed method achieved an accuracy of 94.72%%.
20. Intention Understanding in Human-Robot Interaction Based on Visual-NLP Semantics

Author:Li, ZH;Mu, YS;Sun, ZL;Song, SF;Su, JL;Zhang, JM


Abstract:With the rapid development of robotic and AI technology in recent years, human-robot interaction has made great advancement, making practical social impact. Verbal commands are one of the most direct and frequently used means for human-robot interaction. Currently, such technology can enable robots to execute pre-defined tasks based on simple and direct and explicit language instructions, e.g., certain keywords must be used and detected. However, that is not the natural way for human to communicate. In this paper, we propose a novel task-based framework to enable the robot to comprehend human intentions using visual semantics information, such that the robot is able to satisfy human intentions based on natural language instructions (total three types, namely clear, vague, and feeling, are defined and tested). The proposed framework includes a language semantics module to extract the keywords despite the explicitly of the command instruction, a visual object recognition module to identify the objects in front of the robot, and a similarity computation algorithm to infer the intention based on the given task. The task is then translated into the commands for the robot accordingly. Experiments are performed and validated on a humanoid robot with a defined task: to pick the desired item out of multiple objects on the table, and hand over to one desired user out of multiple human participants. The results show that our algorithm can interact with different types of instructions, even with unseen sentence structures.
Total 39 results found
Copyright 2006-2020 © Xi'an Jiaotong-Liverpool University 苏ICP备07016150号-1 京公网安备 11010102002019号