8 November 2012
My fields of interest include intelligent image/video processing, computer graphics. To be more specific, my focus is on feature selection, content-based image/video management, retrieval and understanding, affective computing, motion estimation; Chinese ink painting simulation and style learning, video ink-painting stylization and hyperspectral image processing.
Many factors affect the success of machine learning on a given task, and the quality of the features is a very important factor. Unfortunately, the given feature data for training is generally noisy and unreliable. A good feature selection method could reduce the number of features, remove irrelevant, redundant, or noisy data, and bring about immediate effects for many applications including speeding up data mining algorithms, and improving mining performance such as predictive accuracy and result comprehensibility.
We have proposed an automatically fast feature number determining algorithm for ranking-based feature selection methods, by which the computing efficiency is greatly improved while losing less classification accuracy. The subset of each category in the whole training data set is regarded as a bundle of points in a high-dimensional space, which could be contained and described by a convex hull. The overlapped regions between two hulls are actually the main reason for the difficulty in classification. This has motivated us to analyse and estimate the overlapped proportion among the given convex hulls, which reveal a general discriminative relevance between one category and the others in the training dataset. Two main classical algorithms are employed in our Letter: the minimum bounding box algorithm and the Monte Carlo algorithm used for the estimation of convex hull and overlapped proportion respectively.
Our work could be useful for most ranking-based feature selection methods. It has been tested on three image databases and three existing feature selection methods, and the results show that the average feature selection speed on individual datasets is improved by up to about 90%, while the accuracies show a maximum loss of 5.1%.
We are currently working on improving the performance in two ways. Firstly, using the minimum bounding box algorithm to estimate the feature data distribution region may bring some small error regions on the edge and corner of the bounding boxes; that is, the amount of the errors is depending on the shape of the distribution region. So a more accurate geometry method is expected. Secondly, the Monte Carlo algorithm is not an efficient algorithm, and we believe that a more efficient method could be found to improve it. For example, the distance and density of the data distribution can be taken into account.
We are currently working on several very interesting projects which require machine learning, intelligent image/video processing and content understanding, and computer graphics techniques. The first one is Chinese ink painting style learning; we try to define, extract and identify the ancient Chinese famous painter's painting style and apply the specific style to another image. The second one is Chinese ink painting stylisation of video: based on our previous video object identifying techniques, we extract the smooth contours and inner regions of main objects in the video and stylise those feature contours and regions into a specific Chinese painting style. The last one is ancient Chinese fresco damage evaluation: as you know the famous Dunhuang fresco suffers from many nature or man-made factors, so we try to measure the damage of the frescos based on the analysis of hyperspectral images.
For the past several decades, a lot of excellent work has been contributed in this field. I hope more research work with novel methods could be presented to measure the correlation between feature subsets to improve the accuracy and efficiency of machine learning results.
Browse or search all papers in the latest or past issues of Electronics Letters on the IET Digital Library.