Find Research Output

Research Output
  • All
  • Scholar Profiles
  • Research Units
  • Research Output
Department Publication Year Content Type Data Sources


Department of Intelligent Science
Clear all

1.Multi-source social media data sentiment analysis using bidirectional recurrent convolutional neural networks

Author:Abid, F;Li, C;Alam, M


Abstract:Subjectivity detection in the text is essential for sentiment analysis, which requires many techniques to perceive unanticipated means of communication. Few accomplishments adapted to capture the syntactic, semantic, and contextual sentimental information via distributed word representations (DWRs)(1). This paper, concatenating the DWRs through a weighted mechanism on Recurrent Neural Network (RNN) variants joint with Convolutional Neural network (CNN) distinctively involving weighted attentive pooling (WAP)(2). Whereas, CNNs with traditional pooling operations comprise many layers merely able to capture enough features. Our considerations empower the sentiment analysis over DWRs contains Word2vec, FastText, and GloVe to produce dense efficient concatenated representation (DECR)(3) to hold long term dependencies on a single RNN layer acquired by Parts of Speech Tagging (POS) explicitly with verbs, adverbs, and noun only. Then use these representations gained in a way, inputted to CNN contain single convolution layer engaging WAP on multi-source social media data to handle the issues of syntactic and semantic regularities as well as out of vocabulary (OOV) words. Experimentations demonstrate that DWRs together with proposed concatenation qualified in resolving the mentioned issues by moderate hyper-parameter configurations. Our architecture devoid of stacking multiple layers achieved modest accuracy of 89.67%% by DECR-Bi-GRU-CNN (WAP) on IMDB as compared to random initialization 81.11%% on SST.

2.Multiview video quality enhancement without depth information

Author:Jammal, S;Tillo, T;Xiao, JM


Abstract:The past decade has witnessed fast development in multiview 3D video technologies, such as Three-Dimensional Video (3DV), Virtual Reality (VR), and Free Viewpoint Video (FVV). However, large information redundancy and a vast amount of multiview video data needs to be stored or transmitted, which poses a serious problem for multiview video systems. Asymmetric multiview video compression can alleviate this problem by coding views with different qualities. Only several viewpoints are kept with high-quality and other views are highly compressed to low-quality. However, highly compressed views may incur severe quality degradation. Thus, it is necessary to enhance the visual quality of highly compressed views at the decoder side. Exploiting similarities among the multiview images is the key to efficiently reconstruct the multiview compressed views. In this paper, we propose a novel method for multiview quality enhancement, which directly learns an end-to-end mapping between the low-quality and high-quality views and recovers the details of the low-quality view. The mapping process is realized using a deep convolutional neural network (MVENet). MVENet takes a low-quality image of one view and a high-quality image of another view of the same scene as inputs and outputs an enhanced image for the low-quality view. To the best of our knowledge, this is the first work for multiview video enhancement where neither a depth map nor a projected virtual view is required in the enhancement process. Experimental results on both computer graphic and real datasets demonstrate the effectiveness of the proposed approach with a peak signal-to-noise ratio (PSNR) gain of up to 2dB over low-quality compressed views using HEVC and up to 3.7dB over low-quality compressed views using JPEG on the benchmark Cityscapes.

3.One-class kernel subspace ensemble for medical image classification (vol 2014, 17, 2014)

Author:Zhang, YG;Zhang, BL;Coenen, F;Xiao, JM;Lu, WJ


4.Effective Piecewise CNN with attention mechanism for distant supervision on relation extraction task

Author:Li, Yuming ; Ni, Pin ; Li, Gangmin ; Chang, Victor

Source:COMPLEXIS 2020 - Proceedings of the 5th International Conference on Complexity, Future Information Systems and Risk,2020,Vol.

Abstract:Relation Extraction is an important sub-task in the field of information extraction. Its goal is to identify entities from text and extract semantic relationships between entities. However, the current Relationship Extraction task based on deep learning methods generally have practical problems such as insufficient amount of manually labeled data, so training under weak supervision has become a big challenge. Distant Supervision is a novel idea that can automatically annotate a large number of unlabeled data based on a small amount of labeled data. Based on this idea, this paper proposes a method combining the Piecewise Convolutional Neural Networks and Attention mechanism for automatically annotating the data of Relation Extraction task. The experiments proved that the proposed method achieved the highest precision is 76.24%% on NYT-FB (New York Times-Freebase) dataset (top 100 relation categories). The results show that the proposed method performed better than CNN-based models in most cases. © 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

5.Exploiting textual queries for dynamically visual disambiguation

Author:Sun, ZR;Yao, YZ;Xiao, JM;Zhang, L;Zhang, J;Tang, ZM


Abstract:Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual poly-semy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach. (c) 2020 Elsevier Ltd. All rights reserved.

6.One-class kernel subspace ensemble for medical image classification

Author:Zhang, YG;Zhang, BL;Coenen, F;Xiao, JM;Lu, WJ


Abstract:Classification of medical images is an important issue in computer-assisted diagnosis. In this paper, a classification scheme based on a one-class kernel principle component analysis (KPCA) model ensemble has been proposed for the classification of medical images. The ensemble consists of one-class KPCA models trained using different image features from each image class, and a proposed product combining rule was used for combining the KPCA models to produce classification confidence scores for assigning an image to each class. The effectiveness of the proposed classification scheme was verified using a breast cancer biopsy image dataset and a 3D optical coherence tomography (OCT) retinal image set. The combination of different image features exploits the complementary strengths of these different feature extractors. The proposed classification scheme obtained promising results on the two medical image sets. The proposed method was also evaluated on the UCI breast cancer dataset (diagnostic), and a competitive result was obtained.

7.Supporting Deterministic Wireless Communications in Industrial IoT

Author:Bartolomeu, P;Alam, M;Ferreira, J;Fonseca, JA


Abstract:Wireless communication technologies have become widely adopted, appearing in heterogeneous applications ranging from tracking victims, responders, and equipment in disaster scenarios to machine health monitoring in networked manufacturing industries. These systems are said to have real-time timeliness requirements since data communication must be conducted within predefined temporal bounds, whose unfulfillment may compromise the correct behavior of the system and cause economic losses or endanger human lives. The support of real-time communications over license-free bands in open environments is a challenging task since real-time medium access is only achieved by a strict timing control of all communicating stations (real and nonreal-time). However, in open communication environments, the traffic generated by uncontrolled stations cannot be avoided by existing medium access protocols. In this paper, the definition, implementation, and assessment of a novel MAC technique named bandjacking is performed. Results demonstrate that the support of low-power deterministic communications is possible in open environments by using bandjacking.

8.A two-level stacking model for detecting abnormal users in Wechat activities

Author:Ling, Jiayuan ; Li, Gangmin

Source:Proceedings - 2019 International Conference on Information Technology and Computer Application, ITCA 2019,2019,Vol.

Abstract:Machine learning algorithms are widely employed in plenty of classification or regression problems. While in real business world, it is confronted with huge and disorder data pattern. To recognize different kinds of users on the internet accurately and fast becomes a challenge. In a Wechat online bargain activity, the staff found that some strange users are highly like robots or malicious users. Thus we tried a two-level stacking model to detect them. This design got a good result of 0.98 accuracy after the training phase and an accuracy of 0.90 in a new term of the testing set. Moreover, this model is adaptable to linear and nonlinear datasets because of its diverse stacking of first-level classifiers. Therefore, this paper indicates a potential of the stacking classification model in big data times. © 2019 IEEE.

9.Single image-based head pose estimation with spherical parametrization and 3D morphing

Author:Yuan, H;Li, MY;Hou, JH;Xiao, JM


Abstract:Head pose estimation plays a vital role in various applications, e.g., driver-assistance systems, human-computer interaction, virtual reality technology, and so on. We propose a novel geometry-based method for accurately estimating the head pose from a single 2D face image at a very low computational cost. Specifically, the rectangular coordinates of only four non-coplanar feature points from a predefined 3D facial model as well as the corresponding ones automatically/manually extracted from a 2D face image are first normalized to exclude the effect of external factors (i.e., scale factor and translation parameters). Then, the four normalized 3D feature points are represented in spherical coordinates with reference to the uniquely determined sphere by themselves. Due to the spherical parametrization, the coordinates of feature points can then be morphed along all the three directions in the rectangular coordinates effectively. Finally, the rotation matrix indicating the head pose is obtained by minimizing the Euclidean distance between the normalized 2D feature points and the 2D re-projections of the morphed 3D feature points. Comprehensive experimental results over two popular datasets, i.e., Pointing'04 and Biwi Kinect, demonstrate that the proposed method can estimate head poses with higher accuracy and lower run time than state-of-the-art geometry-based methods. Even compared with start-of-the-art learning-based methods or geometry-based methods with additional depth information, our method still produces comparable performance. (C) 2020 Elsevier Ltd. All rights reserved.

10.Real-time forward error correction for video transmission

Author:Xiao, Jimin ; Tillo, Tammam ; Lin, Chunyu ; Zhao, Yao

Source:2011 IEEE Visual Communications and Image Processing, VCIP 2011,2011,Vol.

Abstract:When the video streams are transmitted over the unreliable networks, forward error correction (FEC) codes are usually used to protect them. Reed-Solomon codes are block-based FEC codes. On one hand, enlarging the block size can enhance the performance of the Reed-Solomon codes. On the other hand, large Reed-Solomon block size leads to long delay which is not tolerable for real-time video applications. In this paper a novel approach is proposed to improve the performance of Reed-Solomon codes. With the proposed approach, more than one video frame are encompassed in the Reed-Solomon coding block yet no delay is introduced. Experimental results show that the proposed approach outperforms other real-time error resilient video coding technologies. © 2011 IEEE.

11.Large-scale Ensemble Model for Customer Churn Prediction in Search Ads

Author:Wang, QF;Xu, M;Hussain, A


Abstract:Customer churn prediction is one of the most important issues in search ads business management, which is a multi-billion market. The aim of churn prediction is to detect customers with a high propensity to leave the ads platform, then to do analysis and increase efforts for retaining them ahead of time. Ensemble model combines multiple weak models to obtain better predictive performance, which is inspired by human cognitive system and is widely used in various applications of machine learning. In this paper, we investigate how the ensemble model of gradient boosting decision tree (GBDT) to predict whether a customer will be a churner in the foreseeable future based on its activities in the search ads. We extract two types of features for the GBDT: dynamic features and static features. For dynamic features, we consider a sequence of customers' activities (e.g., impressions, clicks) during a long period. For static features, we consider the information of customers setting (e.g., creation time, customer type). We evaluated the prediction performance in a large-scale customer data set from Bing Ads platform, and the results show that the static and dynamic features are complementary, and get the AUC (area under the curve of ROC) value 0.8410 on the test set by combining all features. The proposed model is useful to predict those customers who will be churner in the near future on the ads platform, and it has been successfully daily run on the Bing Ads platform.

12.Segmentation mask guided end-to-end person search

Author:Zheng, DY;Xiao, JM;Huang, KZ;Zhao, Y


Abstract:Person search aims to search for a target person among multiple images recorded by multiple surveillance cameras, which faces various challenges from both pedestrian detection and person re-identification. Besides the large intra-class variations owing to various illumination conditions, occlusions and varying poses, background clutters in the detected pedestrian bounding boxes further deteriorate the extracted features for each person, making them less discriminative. To tackle these problems, we develop a novel approach which guides the network with segmentation masks so that discriminative features can be learned invariant to the background clutters. We demonstrate that joint optimization of pedestrian detection, person re-identification and pedestrian segmentation enables to produce more discriminative features for pedestrian, and consequently leads to better person search performance. Extensive experiments on two widely used benchmark datasets prove the superiority of our approach. In particular, our proposed model achieves the state-of-the-art performance (86.3%% mAP and 86.5%% top-1 accuracy) on CUHK-SYSU dataset.

13.Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction

Author:Ni, P;Li, YM;Li, GM;Chang, V


Abstract:Internet of Things (IoT) based voice interaction system, as a new artificial intelligence application, provides a new human-computer interaction mode. The more intelligent and efficient communication approach poses greater challenges to the semantic understanding module in the system. Facing with the complex and diverse interactive scenarios in practical applications, the academia and the industry urgently need more powerful Natural Language Understanding (NLU) methods as support. Intent Detection and Slot Filling joint task, as one of the core sub-tasks in NLU, has been widely used in different human-computer interaction scenarios. In the current era of deep learning, the joint task of Intent Detection and Slot Filling has also changed from previous rule-based methods to deep learning-based methods. It is an important problem to explore how to realize the models of these tasks to be refined and targeted designed, and to make the Intent Detection task better serve the improvement of precision of Slot Filling task by connecting the before and after tasks. It has great significance for building a more humanized IoT voice interaction system. In this study, we designed two joint models to realize Intent Detection and Slot Filling joint task. For the Intent Detection type task, one is based on BiGRU-Att-CapsuleNet (hybrid-based model) and the other is based on the RCNN model. Both methods use the BiGRU-CRF model for the Slot Filling type task. The hybrid-based model can enhance the semantic capture capability of a single model. And by combining specialized models built independently for each task to achieve a complete joint task, it can be better to achieve optimal performance on each task. This study also carried out detailed comparative experiments of tasks and joint tasks on multiple datasets. Experiments show that the joint models have achieved competitive results in 7 typical datasets included in multiple scenarios in English and Chinese compared with other models.

14.Depth Map Coding Using Histogram-Based Segmentation and Depth Range Updating

Author:Lin, CY;Zhao, Y;Xiao, JM;Tillo, TM


Abstract:In texture-plus-depth format, depth map compression is an important task. Different from normal texture images, depth maps have less texture information, while contain many homogeneous regions separated by sharp edges. This feature will be employed to form an efficient depth map coding scheme in this paper. Firstly, the histogram of the depth map will be analyzed to find an appropriate threshold that segments the depth map into the foreground and background regions, allowing the edge between these two kinds of regions to be obtained. Secondly, the two regions will be encoded through rate distortion optimization with a shape adaptive wavelet transform, while the edges are lossless encoded with JBIG2. Finally, a depth-updating algorithm based on the threshold and the depth range is applied to enhance the quality of the decoded depth maps. Experimental results demonstrate the effective performance on both the depth map quality and the synthesized view quality.

15.A Game Theoretic Reward and Punishment Unwanted Traffic Control Mechanism

Author:Liu, J;Li, MC;Alam, M;Chen, YF;Wu, T


Abstract:With the development of the Internet of Things and the pervasive use of internet service providers (ISPs), internet users and data have reached an unprecedented volume. However, the existence of malicious users seriously undermine user privacy and network security by distributing a large amount of unwanted traffic, such as spam, pop-up, and malware. This to some extent can be identified with the cooperation of individual users by installing anti-virus toolkits. However, users need to purchase such software at an additional cost. Therefore, unless built-in incentive mechanisms exist, rational users will choose not to install virus software. If enough network entities behave in this way, the network will be flooded with unwanted traffic. In this paper, we propose an evolutionary game theoretic incentive mechanism to promote the cooperation of individual users to curb the expansion of unwanted traffic. We propose a combined reward and punishment mechanism to further incentivize cooperative behavior. Meanwhile, the acceptance condition of our framework is analyzed and we carry out a number of simulations to evaluate the acceptance conditions of our framework.The experimental results indicate that our reward and punishment mechanism can efficiently incentivize users to adopt cooperative behavior and reduce unwanted traffic.

16.Dynamic Redundancy Allocation for Video Streaming using Sub-GOP based FEC Code

Author:Yu, L;Xiao, JM;Tillo, T


Abstract:Reed-Solomon erasure code is one of the most studied protection methods for video streaming over unreliable networks. As a block-based error correcting code, large block size and increased number of parity packets will enhance its protection performance. However, for video applications this enhancement is sacrificed by the error propagation and the increased bitrate. So, to tackle this paradox, we propose a rate-distortion optimized redundancy allocation scheme, which takes into consideration the distortion caused by losing each slice and the propagated error. Different from other approaches, the amount of introduced redundancy and the way it is introduced are automatically selected without human interventions based on the network condition and video characteristics. The redundancy allocation problem is formulated as a constraint optimization problem, which allows to have more flexibility in setting the block-wise redundancy. The proposed scheme is implemented in JM14.0 for H.264, and it achieves an average gain of 1dB over the state-of-the-art approach.

17.Improving Disentanglement-Based Image-to-Image Translation with Feature Joint Block Fusion

Author:Zhang, ZJ;Zhang, R;Wang, QF;Huang, KZ


Abstract:Image-to-image translation aims to change attributes or domains of images, where the feature disentanglement based method is widely used recently due to its feasibility and effectiveness. In this method, a feature extractor is usually integrated in the encoder-decoder architecture generative adversarial network (GAN), which extracts features from domains and images, respectively. However, the two types of features are not properly combined, resulting in blurry generated images and indistinguishable translated domains. To alleviate this issue, we propose a new feature fusion approach to leverage the ability of the feature disentanglement. Instead of adding the two extracted features directly, we design a joint block fusion that contains integration, concatenation, and squeeze operations, thus allowing the generator to take full advantage of the two features and generate more photo-realistic images. We evaluate both the classification accuracy and Frechet Inception Distance (FID) of the proposed method on two benchmark datasets of Alps Seasons and CelebA. Extensive experimental results demonstrate that the proposed joint block fusion can improve both the discriminability of domains and the quality of translated image. Specially, the classification accuracies are improved by 1.04%% (FID reduced by 1.22) and 1.87%% (FID reduced by 4.96) on Alps Seasons and CelebA, respectively.

18.End-to-End Distortion-Based Multiuser Bandwidth Allocation for Real-Time Video Transmission Over LTE Network

Author:Yuan, H;Fu, HY;Liu, J;Xiao, JM


Abstract:Long Term Evolution (LTE) network is widely used in video transmission because of its high-speed communication capacity. In this paper, an end-to-end distortion-based bandwidth allocation method for multiusers is proposed to enhance video transmission performance of the LTE network. For an LTE network, since throughput is observed as an independent variable which can affect the packet loss ratio, and thus affect the transmission distortion, the end-to-end distortion model is first derived by taking throughput into account. Then, based on the derived end-to-end distortion model, the bandwidth allocation problem is formulated as a convex optimization problem and solved by Karush-Kuhn-Tucker conditions. Simulation results demonstrate the effectiveness of the proposed method. The rate distortion performance of the proposed method is better than the average bandwidth allocation method under different bandwidth utilizations, and is very close to the exhaustive search-based method.

19.Medical Diagnosis by Complaints of Patients and Machine Learning

Author:Li, Gangmin ; Song, Haowei ; Liang, Hai-Ning ; Qu, Yuanying ; Liu, Lu ; Bai, Xuming

Source:Proceedings - 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, CISP-BMEI 2019,2019,Vol.

Abstract:Self-diagnose becomes an important research topic and hot web application. It relies on patients' own description about their conditions. Finding relationship between patients' complain and the possible diseases is the key. This paper reports our efforts on applying machine learning models to solve this problem. We firstly collected and build a dataset including 10,000 chief complaints from authoritative medical websites including, and and top Chinese hospitals. We then trained Support Vector Machine (SVM) and Bidirectional Long and Short-term Memory (BiLSTM) models using our collected dataset to verify our dataset and to test prediction models. The test shows the models trained with sample datasets have a stable performance with 75%% in accuracy, 81%% in precision and recall being 81%%. © 2019 IEEE.

20.2D to cylindrical inverse projection of the wireless capsule endoscopy images

Author:Liu, Yina ; Tillo, Tammam ; Xiao, Jimin ; Lim, Enggee ; Wang, Zhao

Source:Proceedings - 4th International Congress on Image and Signal Processing, CISP 2011,2011,Vol.1

Abstract:In this paper, a two-dimensional to three-dimensional mapping of the wireless capsule endoscopy (WCE) images is proposed as well as the alignment of the consecutive images. The aim of this project is to reduce the diagnosis time of the doctor using the acquired data. This paper resolves a serious limitation of the inverse projection of the WCE images appeared previously. The novelty of this paper lies in three aspects. Firstly, as the main axis of the WCE is not always aligned with the center of the small bowel, the angle needs to be adjusted to obtain better visual effect. The inverse projected images considering the angle of the WCE are obtained from the innovative mathematical model proposed in this project. After compensating the angle, all the information in the interior wall of the small intestine will be clearly presented. Secondly, the redundancy of the images captured along a trajectory, where, they share a portion of overlapped information is exploited by motion estimation to enhance the final resolution. Therefore, doctors do not need to inspect the same region of the images without any benefit and the diagnosis time is significantly decreased. Finally, the cropped segments are merged into a single image which can fully represent the interior wall of the small intestine. © 2011 IEEE.
Total 86 results found
Copyright 2006-2020 © Xi'an Jiaotong-Liverpool University 苏ICP备07016150号-1 京公网安备 11010102002019号