Find Research Output

Research Output
  • All
  • Scholar Profiles
  • Research Units
  • Research Output
Department Publication Year Content Type Data Sources


Department of Intelligent Science
Clear all

1.Video Streaming Adaptation Strategy for Multiview Navigation Over DASH

Author:Yao, C;Xiao, JM;Zhao, Y;Ming, AL


Abstract:Video content delivery over Internet is receiving increasing attention from both industry and academia, especially for the multiview video contents, as it is the basis to support various applications, such as 3-D video, virtual reality, free view video, and so on. To cope with the dynamic nature of Internet throughput, dynamic adaptive streaming over HTTP (DASH) has been introduced to control the video streaming based on the network conditions. In this paper, we design a streaming framework to improve the user experience of the multiview video streaming over DASH, considering the user behavior of the viewpoint navigation during the streaming process. To eliminate the view switching delay, a multiple view navigation rule is introduced to pre-fetch the possible switching viewpoints. An optimal bitrate allocation scheme is proposed for the introduced rule, allowing the clients to maximize the video quality. Moreover, we found the video quality and the playback starvation probability are conflicting factors, while both are essential for the user's quality of experience (QoE). To tackle this issue, a QoE optimization solution is designed to maximize the overall performance in the proposed framework. Several experiments verify the effectiveness of the proposed framework, and the results demonstrate that the proposed framework outperforms two typical DASH methods.

2.One-class kernel subspace ensemble for medical image classification (vol 2014, 17, 2014)

Author:Zhang, YG;Zhang, BL;Coenen, F;Xiao, JM;Lu, WJ


3.Effective Piecewise CNN with attention mechanism for distant supervision on relation extraction task

Author:Li, Yuming ; Ni, Pin ; Li, Gangmin ; Chang, Victor

Source:COMPLEXIS 2020 - Proceedings of the 5th International Conference on Complexity, Future Information Systems and Risk,2020,Vol.

Abstract:Relation Extraction is an important sub-task in the field of information extraction. Its goal is to identify entities from text and extract semantic relationships between entities. However, the current Relationship Extraction task based on deep learning methods generally have practical problems such as insufficient amount of manually labeled data, so training under weak supervision has become a big challenge. Distant Supervision is a novel idea that can automatically annotate a large number of unlabeled data based on a small amount of labeled data. Based on this idea, this paper proposes a method combining the Piecewise Convolutional Neural Networks and Attention mechanism for automatically annotating the data of Relation Extraction task. The experiments proved that the proposed method achieved the highest precision is 76.24%% on NYT-FB (New York Times-Freebase) dataset (top 100 relation categories). The results show that the proposed method performed better than CNN-based models in most cases. © 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

4.Multiview video quality enhancement without depth information

Author:Jammal, S;Tillo, T;Xiao, JM


Abstract:The past decade has witnessed fast development in multiview 3D video technologies, such as Three-Dimensional Video (3DV), Virtual Reality (VR), and Free Viewpoint Video (FVV). However, large information redundancy and a vast amount of multiview video data needs to be stored or transmitted, which poses a serious problem for multiview video systems. Asymmetric multiview video compression can alleviate this problem by coding views with different qualities. Only several viewpoints are kept with high-quality and other views are highly compressed to low-quality. However, highly compressed views may incur severe quality degradation. Thus, it is necessary to enhance the visual quality of highly compressed views at the decoder side. Exploiting similarities among the multiview images is the key to efficiently reconstruct the multiview compressed views. In this paper, we propose a novel method for multiview quality enhancement, which directly learns an end-to-end mapping between the low-quality and high-quality views and recovers the details of the low-quality view. The mapping process is realized using a deep convolutional neural network (MVENet). MVENet takes a low-quality image of one view and a high-quality image of another view of the same scene as inputs and outputs an enhanced image for the low-quality view. To the best of our knowledge, this is the first work for multiview video enhancement where neither a depth map nor a projected virtual view is required in the enhancement process. Experimental results on both computer graphic and real datasets demonstrate the effectiveness of the proposed approach with a peak signal-to-noise ratio (PSNR) gain of up to 2dB over low-quality compressed views using HEVC and up to 3.7dB over low-quality compressed views using JPEG on the benchmark Cityscapes.

5.New Cancer Treatment Evaluation through Big Data Analytics

Author:Li, GM;Gu, JZ;Bai, XM


Abstract:Cancer plays a leading role in causing morbidity and mortality worldwide. Several treatments have been developed and practiced for fighting against cancer. Totally Implantable Venous Access Port Drug Supply (TIVAPDS) treatment is a new method utilizing Totally Implantable Venous Access Port (TIVAP) delivery method, which is one kind of Intrathecal Drug Delivery System (IDD) with lower side effects, to increase patient's quality of life. This paper reports our study aiming to evaluate the effectiveness of TIVAPDS treatment in order to make contributions to generalize this treatment in China. Our data samples come from The Second Affiliated Hospital of Suzhou University, a forerunner of TIVAPDS practices in China and with patients' agreement. The data statistics summary results and the relationships between each two identified attributes are analyzed. Based on the results, 2 predictive models utilizing C4.5 decision tree and logistic regression algorithms are adopted for prediction. The results are used as reference to assess individual treatment cases, so that the effectiveness of the treatment can be achieved and if possible, to improve the efficiency of TIVAPDS treatment.

6.Exploiting textual queries for dynamically visual disambiguation

Author:Sun, ZR;Yao, YZ;Xiao, JM;Zhang, L;Zhang, J;Tang, ZM


Abstract:Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual poly-semy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach. (c) 2020 Elsevier Ltd. All rights reserved.

7.Error-resilient video coding with end-to-end rate-distortion optimized at macroblock level

Author:Xiao, JM;Tillo, T;Lin, CY;Zhao, Y


Abstract:Intra macroblock refreshment is an effective approach for error-resilient video coding. In this paper, in addition to intra coding, we propose to add two macroblock coding modes to enhance the transmission robustness of the coded bitstream, which are inter coding with redundant macroblock and intra coding with redundant macroblock. The selection of coding modes and the parameters for coding the redundant version of the macroblock are determined by the rate-distortion optimization. It is worth mentioning that the end-to-end distortion is employed in the optimization procedure, which considers the channel conditions. Extensive simulation results show that the proposed approach outperforms other error-resilient approaches significantly; for some video sequences, the average PSNR can be up to 4 dB higher than that of the Optimal Intra Refreshment approach.

8.Image Captioning using Adversarial Networks and Reinforcement Learning

Author:Yan, SY;Wu, FY;Smith, JS;Lu, WJ;Zhang, BL


Abstract:Image captioning is a significant task in artificial intelligence which connects computer vision and natural language processing. With the rapid development of deep learning, the sequence to sequence model with attention, has become one of the main approaches for the task of image captioning. Nevertheless, a significant issue exists in the current framework: the exposure bias problem of Maximum Likelihood Estimation (MLE) in the sequence model. To address this problem, we use generative adversarial networks (GANs) for image captioning, which compensates for the exposure bias problem of MLE and also can generate more realistic captions. GANs, however, cannot be directly applied to a discrete task, like language processing, due to the discontinuity of the data. Hence, we use a reinforcement learning (RL) technique to estimate the gradients for the network. Also, to obtain the intermediate rewards during the process of language generation, a Monte Carlo roll-out sampling method is utilized. Experimental results on the COCO dataset validate the improved effect from each ingredient of the proposed model. The overall effectiveness is also evaluated.

9.One-class kernel subspace ensemble for medical image classification

Author:Zhang, YG;Zhang, BL;Coenen, F;Xiao, JM;Lu, WJ


Abstract:Classification of medical images is an important issue in computer-assisted diagnosis. In this paper, a classification scheme based on a one-class kernel principle component analysis (KPCA) model ensemble has been proposed for the classification of medical images. The ensemble consists of one-class KPCA models trained using different image features from each image class, and a proposed product combining rule was used for combining the KPCA models to produce classification confidence scores for assigning an image to each class. The effectiveness of the proposed classification scheme was verified using a breast cancer biopsy image dataset and a 3D optical coherence tomography (OCT) retinal image set. The combination of different image features exploits the complementary strengths of these different feature extractors. The proposed classification scheme obtained promising results on the two medical image sets. The proposed method was also evaluated on the UCI breast cancer dataset (diagnostic), and a competitive result was obtained.

10.Big Data Real Time Ingestion and Machine Learning

Author:Pal, G;Li, GM;Atkinson, K


Abstract:Data arrives in all shapes and sizes. Many time data are acquired sequentially - as an infinite ever growing stream. This real time stream data needs to be processed sequentially by taking the data source and splitting it up along temporal boundaries into finite chunks or windows. Take examples from stock market, sensors or Twitter feed data. Rather waiting for data to be collected as a whole at a long periodic interval, streaming analysis let us identify patterns - and make decisions based on them - as data start arriving. When data are non-stationary, and patterns change over time, streaming analyses adapt. At scales, where storing raw data becomes impractical, streaming analysis let us persist only smaller, more targeted representations. This work describes machine learning approaches to analyze streams of data with an intuitive parameterization. Linear regression and K-means clustering concepts are redefined to the context of streaming.

11.Supporting Deterministic Wireless Communications in Industrial IoT

Author:Bartolomeu, P;Alam, M;Ferreira, J;Fonseca, JA


Abstract:Wireless communication technologies have become widely adopted, appearing in heterogeneous applications ranging from tracking victims, responders, and equipment in disaster scenarios to machine health monitoring in networked manufacturing industries. These systems are said to have real-time timeliness requirements since data communication must be conducted within predefined temporal bounds, whose unfulfillment may compromise the correct behavior of the system and cause economic losses or endanger human lives. The support of real-time communications over license-free bands in open environments is a challenging task since real-time medium access is only achieved by a strict timing control of all communicating stations (real and nonreal-time). However, in open communication environments, the traffic generated by uncontrolled stations cannot be avoided by existing medium access protocols. In this paper, the definition, implementation, and assessment of a novel MAC technique named bandjacking is performed. Results demonstrate that the support of low-power deterministic communications is possible in open environments by using bandjacking.

12.A two-level stacking model for detecting abnormal users in Wechat activities

Author:Ling, Jiayuan ; Li, Gangmin

Source:Proceedings - 2019 International Conference on Information Technology and Computer Application, ITCA 2019,2019,Vol.

Abstract:Machine learning algorithms are widely employed in plenty of classification or regression problems. While in real business world, it is confronted with huge and disorder data pattern. To recognize different kinds of users on the internet accurately and fast becomes a challenge. In a Wechat online bargain activity, the staff found that some strange users are highly like robots or malicious users. Thus we tried a two-level stacking model to detect them. This design got a good result of 0.98 accuracy after the training phase and an accuracy of 0.90 in a new term of the testing set. Moreover, this model is adaptable to linear and nonlinear datasets because of its diverse stacking of first-level classifiers. Therefore, this paper indicates a potential of the stacking classification model in big data times. © 2019 IEEE.


Author:Cheng, F;Xiao, JM;Tillo, T


Abstract:In this paper, a motion-information-based 3D video coding method is proposed for the texture plus depth 3D video format. The synchronized global motion information of camcorder is sampled to assist the encoder to improve its rate-distortion performance. This approach works by projecting temporal previous frames into the position of the current frame using the depth and motion information. These projected frames are added in the reference buffer as virtual reference frames. As these virtual reference frames are more similar to the current frame than the conventional reference frames, the required residual information is reduced. The experimental results demonstrate that the proposed scheme enhances the coding performance in various motion conditions including rotational and translational motions.

14.Real-time Video Streaming Exploiting the Late-arrival Packets

Author:Xiao, JM;Tillo, T;Lin, CY;Zhao, Y


Abstract:For real-time video applications, such as video telephony service, the allowed maximum end-to-end transmission delay is usually fixed. The packets arriving at the destination out of the maximum end-to-end delay are treated as late-arrival packets, and these packets are discarded in traditional video transmission systems. In this paper, in order to improve the system performance, we propose to exploit these packets to update the decoder reference buffer. Two schemes are proposed to exploit the late-arrival packets, one scheme is to use sliding-window updating, where the updating window is moving; another scheme is to use fixed-window update together with systematic Reed-Solomon code. The effectiveness of the two schemes are validated by simulation results without adding extra delay. It is found that in both schemes, the updating window size plays an important role on the system performance.

15.Automatic Generation of Electronic Medical Record Based on GPT2 Model

Author:Peng, JK;Ni, P;Zhu, JY;Dai, ZJ;Li, YM;Li, GM;Bai, XM


Abstract:Writing Electronic Medical Records (EMR) as one of daily major tasks of doctors, consumes a lot of time and effort from doctors. This paper reports our efforts to generate electronic medical records using the language model. Through the training of massive real -world EMR data, the CMedGPT2 model provided by us can achieve the ideal Chinese electronic medical record generation. The experimental results prove that. the generated electronic medical record text can he applied to the auxiliary medical record work to reduce the burden on the compose and provide a fast and accurate reference for composing work.


Author:Xie, YC;Xiao, JM;Tillo, T;Wei, YC;Zhao, Y


Abstract:Large amount of redundant information and huge data size have been a serious problem for multiview video systems. To address this problem, one popular solution is mixedresolution, where only few viewpoints are kept with full resolution and other views are kept with lower resolution. In this paper, we propose a super-resolution (SR) method, where the low-resolution viewpoints in the 3D video are up-sampled using a fully convolutional neural network. By simply projecting the neighboring high resolution image to the position of the low resolution image, we learn the relationship of high and low resolution patches, and reconstruct the low resolution images into high resolution ones using the projected image information. We propose to use a fully convolutional neural network to establish a mapping between those images. The network is barely trained on 17 pairs of multiview images, and tested on other multiview images and video sequences. It is observed that our proposed method outperforms existing methods objectively and subjectively, with more than 1 dB average gain achieved. Meanwhile, our network training procedure is efficient, with less than 3 hours using one Titan X GPU.

17.Low-Latency Heterogeneous Networks with Millimeter-Wave Communications

Author:Yang, G;Xiao, M;Alam, M;Huang, YM


Abstract:The heterogeneous network (HetNet) is a key enabler to largely boost network coverage and capacity in the forthcoming 5G and beyond. To support the explosively growing mobile data volumes, wireless communications with millimeter-wave (mmWave) radios have attracted massive attention, and is widely considered as a promising candidate in 5G HetNets. In this article, we give an overview on the end-to-end latency of HetNets with mmWave communications. In general, it is rather challenging to formulate and optimize the delay problem with buffers in mmWave communications, since conventional graph-based network optimization techniques are not applicable when queues are considered. Toward this end, we develop an adaptive low-latency strategy, which uses cooperative networking to reduce the end-to-end latency. Then we evaluate the performance of the introduced strategy. Results reveal the importance of proper cooperative networking in reducing the end-to-end latency. In addition, we have identified several challenges in future research for low-latency mmWave HetNets.

18.Single image-based head pose estimation with spherical parametrization and 3D morphing

Author:Yuan, H;Li, MY;Hou, JH;Xiao, JM


Abstract:Head pose estimation plays a vital role in various applications, e.g., driver-assistance systems, human-computer interaction, virtual reality technology, and so on. We propose a novel geometry-based method for accurately estimating the head pose from a single 2D face image at a very low computational cost. Specifically, the rectangular coordinates of only four non-coplanar feature points from a predefined 3D facial model as well as the corresponding ones automatically/manually extracted from a 2D face image are first normalized to exclude the effect of external factors (i.e., scale factor and translation parameters). Then, the four normalized 3D feature points are represented in spherical coordinates with reference to the uniquely determined sphere by themselves. Due to the spherical parametrization, the coordinates of feature points can then be morphed along all the three directions in the rectangular coordinates effectively. Finally, the rotation matrix indicating the head pose is obtained by minimizing the Euclidean distance between the normalized 2D feature points and the 2D re-projections of the morphed 3D feature points. Comprehensive experimental results over two popular datasets, i.e., Pointing'04 and Biwi Kinect, demonstrate that the proposed method can estimate head poses with higher accuracy and lower run time than state-of-the-art geometry-based methods. Even compared with start-of-the-art learning-based methods or geometry-based methods with additional depth information, our method still produces comparable performance. (C) 2020 Elsevier Ltd. All rights reserved.

19.Real-time forward error correction for video transmission

Author:Xiao, Jimin ; Tillo, Tammam ; Lin, Chunyu ; Zhao, Yao

Source:2011 IEEE Visual Communications and Image Processing, VCIP 2011,2011,Vol.

Abstract:When the video streams are transmitted over the unreliable networks, forward error correction (FEC) codes are usually used to protect them. Reed-Solomon codes are block-based FEC codes. On one hand, enlarging the block size can enhance the performance of the Reed-Solomon codes. On the other hand, large Reed-Solomon block size leads to long delay which is not tolerable for real-time video applications. In this paper a novel approach is proposed to improve the performance of Reed-Solomon codes. With the proposed approach, more than one video frame are encompassed in the Reed-Solomon coding block yet no delay is introduced. Experimental results show that the proposed approach outperforms other real-time error resilient video coding technologies. © 2011 IEEE.

20.Cascading One-Class Kernel Subspace Ensembles for Reliable Biopsy Image Classification

Author:Zhang, YG;Zhang, BL;Coenen, F;Lu, WJ


Abstract:Reliable classification of microscopic biopsy images is an important issue in computer assisted breast cancer diagnosis. In this paper, a new cascade scheme with reject options is proposed for microscopic biopsy image classification. The classification system is built as a serial fusion of two different classifier ensembles with reject options to enhance the classification reliability. The first ensemble consists of a set of Kernel Principle Component Analysis (KPCA) one-class classifiers trained for each image class with different image features. The second ensemble consists of a Random Subspace Support Vector Machine (SVM) ensemble, that focuses on the rejected samples from the first ensemble. For both of the ensembles, the reject option is implemented so that an ensemble abstains from classifying ambiguous samples if the consensus degree is lower than some threshold. Using a benchmark microscopic biopsy image dataset obtained from the Israel Institute of Technology, a high classification reliability of 99.46%% was obtained (with a rejection rate of 1.86%%) using the proposed system.
Total 134 results found
Copyright 2006-2020 © Xi'an Jiaotong-Liverpool University 苏ICP备07016150号-1 京公网安备 11010102002019号