MLBio+Laboratory Machine Learning in Biomedical Informatics



TPAMI

Syndicate content IEEE Computer Society
The IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) is published monthly. Its Editorial Board strives to publish papers that present important research results within PAMI's scope. These include statistical and structural pattern recognition; image analysis; computational models of vision; computer vision systems; enhancement, restoration, segmentation, feature extraction, shape and texture analysis; applications of pattern analysis in medicine, industry, government, and the arts and sciences; artificial intelligence, knowledge representation, logical and probabilistic inference, learning, speech recognition, character and text recognition, syntactic and semantic processing, understanding natural language, expert systems, and specialized architectures for such processing.
Updated: 8 weeks 8 hours ago

PrePrint: Fast Inference with Min-Sum Matrix Product

Mon, 09/19/2011 - 09:42
The MAP inference problem in many graphical models can be solved efficiently using a fast algorithm for computing min-sum products of $n \times n$ matrices. The class of models in question includes cyclic and skip-chain models that arise in many applications. Although the worst-case complexity of the min-sum product operation is not known to be much better than $O(n^3)$, an $O(n^{2.5})$ \emph{expected time} algorithm was recently given, subject to some constraints on the input matrices. In this paper we give an algorithm that runs in $O(n^2 \log n)$ expected time, assuming that the entries in the input matrices are independent samples from a uniform distribution. We also show that two variants of our algorithm are quite fast for inputs that arise in several applications. This leads to significant performance gains over previous inference methods in applications within computer vision and natural language processing.

PrePrint: Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data

Mon, 09/19/2011 - 09:42
One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of the well known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with much smaller number of genes while giving competitive classification accuracy but also have tight risk guarantees on future performance unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.

PrePrint: Robust Visual Tracking and Vehicle Classification via Sparse Representation

Mon, 09/19/2011 - 09:42
In this paper, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the tracking target in a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates. The sparsity is achieved by solving an L1-regularized least squares problem. Then the candidate with the smallest projection error is taken as the tracking target. After that, tracking is continued using a Bayesian state inference framework. Two strategies are used to further improve the tracking performance. First, target templates are dynamically updated to capture appearance changes. Second, nonnegativity constraints are enforced to filter out clutters which negatively resemble tracking targets. The proposed approach demonstrates excellent performance in comparison with previously proposed trackers. We also extend the method for simultaneous tracking and recognition by introducing a static template set, which stores target images from different classes. The recognition result at each frame is propagated to produce the final result for the whole video. The approach is validated on a vehicle tracking and classification task using outdoor infrared video sequences.

PrePrint: Sparse Algorithms are not Stable: A No-free-lunch Theorem

Mon, 09/19/2011 - 09:42
We consider two desired properties of learning algorithms: *sparsity* and *algorithmic stability*. Both properties are believed to lead to good generalization ability. We show that these two properties are fundamentally at odds with each other: a sparse algorithm cannot be stable and vice versa. Thus, one has to trade off sparsity and stability in designing a learning algorithm. In particular, our general result implies that $\ell_1$-regularized regression (Lasso) cannot be stable, while $\ell_2$-regularized regression is known to have strong stability properties and is therefore not sparse.

PrePrint: Vision-based Analysis of Small Groups in Pedestrian Crowds

Mon, 09/19/2011 - 09:42
Building upon state-of-the-art algorithms for pedestrian detection and multi-object tracking, and inspired by sociological models of human collective behavior, we automatically detect small groups of individuals who are traveling together. These groups are discovered by bottom-up hierarchical clustering using a generalized, symmetric Hausdorff distance defined with respect to pairwise proximity and velocity. We validate our results quantitatively and qualitatively on videos of real-world pedestrian scenes. Where human-coded ground truth is available, we find substantial statistical agreement between our results and the human-perceived small group structure of the crowd. Results from our automated crowd analysis also reveal interesting patterns governing the shape of pedestrian groups. These discoveries complement current research in crowd dynamics, and may provide insights to improve evacuation planning and real-time situation awareness during public disturbances.

PrePrint: Holistic Context Models for Visual Recognition

Mon, 09/19/2011 - 09:42
A novel framework to context modeling, based on the probability of co-occurrence of objects and scenes is proposed. The modeling is quite simple, and builds upon the availability of robust appearance classifiers. Images are represented by their posterior probabilities with respect to a set of contextual models, built upon the bag-of-words image representation, through two layers of probabilistic modeling. The first layer represents the image in a semantic space, where each dimension encodes an appearance-based posterior probability with respect to a concept. Due to the inherent ambiguity of classifying image patches, this representation suffers from a certain amount of contextual noise. The second layer enables robust inference in the presence of this noise, by modeling the distribution of each concept in the semantic space. A thorough and systematic experimental evaluation of the proposed context modeling is presented. It is shown that it captures the contextual "gist" of natural images. Scene classification experiments show that contextual classifiers outperform their appearance-based counterparts, irrespective of the precise choice and accuracy of the latter. The effectiveness of the proposed approach to context modeling is further demonstrated through a comparison to existing approaches on scene classification and image retrieval, on benchmark datasets. In all cases, the proposed approach achieves superior results.

PrePrint: Partially Supervised Speaker Clustering

Mon, 09/19/2011 - 09:42
In this paper, we specifically address the problem of speaker clustering. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the Euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm -- linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework.

PrePrint: Tracking Pedestrians using Local Spatio-temporal Motion Patterns in Extremely Crowded Scenes

Mon, 09/19/2011 - 09:42
Tracking pedestrians is a vital component of many computer vision applications including surveillance, scene understanding, and behavior analysis. Videos of crowded scenes present significant challenges to tracking due to the large number of pedestrians and the frequent partial occlusions that they produce. The movement of each pedestrian, however, contributes to the overall crowd motion (i.e., the collective motions of the scene's constituents over the entire video) that exhibits an underlying spatially and temporally varying structured pattern. In this paper, we present a novel Bayesian framework for tracking pedestrians in videos of crowded scenes using a space-time model of the crowd motion. We represent the crowd motion with a collection of hidden Markov models trained on local spatio-temporal motion patterns, i.e., the motion patterns exhibited by pedestrians as they move through local space-time regions of the video. Using this unique representation, we predict the next local spatio-temporal motion pattern a tracked pedestrian will exhibit based on the observed frames of the video. We then use this prediction as a prior for tracking the movement of an individual in videos of extremely crowded scenes. We show that our approach of leveraging the crowd motion enables tracking in videos of complex scenes that present unique difficulty to other approaches.

PrePrint: High Accuracy and Visibility-Consistent Dense Multi-view Stereo

Mon, 09/19/2011 - 09:42
Since the initial comparison of Seitz et al. [48], the accuracy of dense multi-view stereovision methods has been increasing steadily. A number of limitations however make most of these methods not suitable to outdoor scenes taken under uncontrolled imaging conditions. The present work consists in a complete dense multi-view stereo pipeline which circumvents these limitations, being able to handle large-scale scenes without sacrificing accuracy. Highly detailed reconstructions are produced within very reasonable time thanks to two key stages in our pipeline: 1) a minimum s-t cut optimization over an adaptive domain that robustly and efficiently filters a quasi-dense point cloud from outliers and reconstructs an initial surface by integrating visibility constraints, followed by 2) a meshbased variational refinement that captures small details, smartly handling photo-consistency, regularization and adaptive resolution. The pipeline has been tested over a wide range of scenes: from classic compact objects taken in a laboratory setting, to outdoor architectural scenes, landscapes and cultural heritage sites. The accuracy of its reconstructions has also been measured on the dense multi-view benchmark proposed by Strecha et al. [59], showing the results to compare more than favorably with the current state-ofthe-art methods.

PrePrint: Active Segmentation

Mon, 09/19/2011 - 09:42
Attention is an integral part of the human visual system and has been widely studied in the visual attention literature. The human eyes fixate at important locations in the scene and every "fixation point" lies inside a particular region of arbitrary shape and size, which can either be an entire object or a part of it. Using that fixation point as an identification marker on the object, we propose a method to segment the object of interest by finding the "optimal" closed contour around the fixation point in the polar space, avoiding the perennial problem of scale in the Cartesian space. The proposed segmentation process is carried out in two separate steps: first, all visual cues are combined to generate the probabilistic boundary edge map of the scene; second, in this edge map, the "optimal" closed contour around a given fixation point is found. Having two separate steps also makes it possible to establish a simple feedback between the mid-level cue (regions) and the low-level visual cues. In fact, we propose a segmentation refinement process based on a feedback process. Finally, our experiments show the promise of the proposed method as an automatic segmentation framework for a general purpose visual system.

PrePrint: Accelerated Hypothesis Generation for Multi-Structure Data via Preference Analysis

Mon, 09/19/2011 - 09:42
Random hypothesis generation is integral to many robust geometric model fitting techniques. Unfortunately it is also computationally expensive, especially for higher-order geometric models and heavily contaminated data. We propose a fundamentally new approach to accelerate hypothesis sampling by guiding it with information derived from residual sorting. We show that residual sorting innately encodes the probability of two points to have arisen from the same model; and is obtained without recourse to domain knowledge (e.g. keypoint matching scores) typically used in previous sampling enhancement methods. More crucially our approach encourages sampling within coherent structures and thus can very rapidly generate all-inlier minimal subsets that maximise the robust criterion. Sampling within coherent structures also affords a natural ability to handle multi-structure data, a condition that is usually detrimental to other methods. The result is a sampling scheme that offers substantial speed-ups on common computer vision tasks such as homography and fundamental matrix estimation. We show on many computer vision data, especially those with multiple structures, that ours is the only method capable of retrieving satisfactory results within realistic time budgets.

PrePrint: The Light Field Camera: Extended Depth of Field, Aliasing and Super-resolution

Mon, 09/19/2011 - 09:42
Portable light field (LF) cameras have demonstrated capabilities beyond conventional cameras. In a single snapshot, they enable digital image refocusing and 3D reconstruction. We show that they obtain a larger depth of field but maintain the ability to reconstruct detail at high resolution. In fact all depths are approximately focused, except for a thin slab where blur size is bounded; i.e. their depth of field is essentially inverted compared to regular cameras. Crucial to their success is the way they sample the LF, trading off spatial vs. angular resolution, and how aliasing affects the LF. We show that applying traditional multi-view stereo methods to the extracted low-resolution views can result in reconstruction errors due to aliasing. We address these challenges using an explicit image formation model, and incorporate Lambertian and texture preserving priors to reconstruct both scene depth and its super-resolved texture in a variational Bayesian framework, eliminating aliasing by fusing multi-view information. We demonstrate the method on synthetic and real images captured with our LF camera, and show that it can outperform other computational camera systems

PrePrint: Mean Shift Trackers with Cross-Bin Metrics

Mon, 09/19/2011 - 09:42
Cross-bin metrics have been shown to be more suitable than bin-by-bin metrics for measuring the distance between histograms in various applications. In particular, a visual tracker that minimizes the earth mover's distance (EMD) between the candidate and reference feature histograms has been recently proposed. This tracker was shown to be more robust than the Mean Shift tracker, which employs a bin-by-bin metric. In each frame, the former tracker iteratively shifts the candidate location by one pixel in the direction opposite to the EMD's gradient until no improvement is made. This optimization process involves the clustering of the candidate feature density in feature space, as well as the computation of the EMD between the candidate and reference feature histograms after each shift of the candidate location. In this paper, alternative trackers that employ cross-bin metrics as well, but that are based on Mean Shift (MS) iterations, are derived. The proposed trackers are simpler and faster due to 1. the use of MS-based optimization, which is not restricted to single pixel shifts, 2. the abstention from any clustering of feature densities, and 3. the abstention from EMD computations in multidimensional spaces.

PrePrint: Image Restoration by Matching Gradient Distributions

Mon, 09/19/2011 - 09:42
A common image restoration method is to use a MAP estimator, which maximizes a posterior probability to reconstruct a clean image from a degraded image. A MAP estimator, when used with a sparse gradient image prior, reconstructs piecewise smooth images and typically removes textures that are important for visual realism. We present an alternative deconvolution method called iterative distribution reweighting (IDR) which imposes a global constraint on gradients so that a reconstructed image should have a gradient distribution similar to a reference distribution. In natural images, a reference distribution not only varies from one image to another, but also within an image depending on texture. We estimate a reference distribution directly from an input image for each texture segment. Our algorithm is able to restore rich mid-frequency textures. A large scale user study supports the conclusion that our algorithm improves the visual realism of reconstructed images compared to those of MAP estimators.

PrePrint: Tracking Mobile Users in Wireless Networks via Semi-Supervised Co-Localization

Mon, 09/19/2011 - 09:42
Recent years have witnessed growing popularity of sensor and sensor-network technologies, supporting important practical applications. One of the fundamental issues is how to accurately locate a user with few labelled data in a wireless sensor network, which requires knowledge about the locations of signal transmitters, or access points. To solve this problem, we have developed a novel machine-learning-based approach that combines collaborative filtering with graph-based semi-supervised learning to learn both mobile-users' locations and the locations of access points. Our framework exploits both labelled and unlabelled data from mobile devices and access points. In our two-phase solution, we first build a manifold-based model from a batch of labelled and unlabelled data in an offline training phase and then use a weighted k-nearest-neighbor method to localize a mobile client in an online localization phase. We extend the two-phase co-localization to an online and incremental model that can deal with labelled and unlabelled data that come sequentially. Finally, we embed an action model to the framework such that additional kinds of sensor signals can be utilized to further boost the performance of mobile tracking. Compared to other state-of-the-art systems, our framework has been shown to be more accurate while requiring less calibration effort in our experiments performed at three different test-beds.

PrePrint: Quantifying and Transferring Contextual Information in Object Detection

Mon, 09/19/2011 - 09:42
Context modelling is challenging because there are often many different types of context co-existing with different degrees of relevance to target objects. It is therefore crucial to automatically quantify and select the most effective context for object detection. Nevertheless, the diversity of context means that learning a robust context model requires a larger training set than learning the target object appearance model, which may not be available in practice. In this work, a novel context modelling framework is proposed without the need for any prior scene segmentation or context annotation. In particular, to quantify context explicitly, we propose a new maximum margin context (MMC) model. Furthermore, to address context learning with limited data, we propose two context transfer learning models based on the observation that although two categories of objects can have very different visual appearance, there can be similarity in their context and/or the way contextual information helps to disambiguate target objects and non-target-objects. Thus training samples from auxiliary classes are utilised to improve the context model for detecting target class. Extensive experiments have been carried out to validate the effectiveness of the proposed models as compared to alternative context models.

PrePrint: Does the Cost Function Matter in Bayes Decision Rule?

Mon, 09/19/2011 - 09:42
In many tasks in pattern recognition, such as automatic speech recognition, optical character recognition, part-of-speech tagging and other string recognition tasks we are faced with a well-known inconsistency: Bayes decision rule is usually used to minimize string (symbol sequence) error, whereas in practice we want to minimize symbol (word, character, tag, etc.) error. When comparing different recognition systems, we do indeed use symbol error rate as evaluation measure. The topic of this work is to analyze the relation between string (i.e. 0-1) and symbol error (i.e. metric, integer-valued) cost functions in Bayes decision rule, for which fundamental analytic results are derived. Simple conditions are derived, for which Bayes decision rule with integer-valued metric cost function, and with 0-1 cost give the same decisions, or lead to classes with limited cost. The corresponding conditions can be tested with complexity linear in the number of classes. The results obtained do not make any assumption w.r.t. the structure of the underlying distributions or the classification problem. Nevertheless, the general analytic results are analyzed via simulations of string recognition problems with Levenshtein (edit) distance cost function. The results support earlier findings that considerable improvements are to be expected when initial error rates are high.

PrePrint: Robust Active Stereo Vision Using Kullback-Leibler Divergence

Mon, 09/19/2011 - 09:42
Active stereo vision is a method of 3-D surface scanning involving the projecting and capturing of a series of light patterns where depth is derived from correspondences between the observed and projected patterns. In contrast, passive stereo vision reveals depth through correspondences between textured images from two or more cameras. By employing a projector, active stereo vision systems find correspondences between two or more cameras, without ambiguity, independent of object texture. In this paper, we present a hybrid 3-D reconstruction framework that supplements projected pattern correspondence matching with texture information. The proposed scheme consists of using projected pattern data to derive initial correspondences across cameras and then using texture data to eliminate ambiguities. Pattern modulation data is then used to estimate error models from which Kullback-Leibler divergence refinement is applied to reduce mis-registration errors. Using only a small number of patterns, the presented approach reduces measurement errors versus traditional structured light and phase matching methodologies while being insensitive to gamma distortion, projector flickering, and secondary reflections. Experimental results demonstrate these advantages in terms of enhanced 3-D reconstruction performance in the presence of noise, deterministic distortions, and conditions of texture and depth contrast.

PrePrint: Recursive Segmentation and Recognition Templates for Image Parsing

Mon, 09/19/2011 - 09:42
In this paper, we propose a Hierarchical Image Model (HIM) which parses images to perform segmentation and object recognition. The HIM represents the image recursively by segmentation and recognition templates at multiple levels of the hierarchy. This has advantages for representation, inference, and learning. Firstly, the HIM has a coarse-to-fine representation which is capable of capturing long-range dependency and exploiting different levels of contextual information (similar to how natural language models represent sentence structure in terms of hierarchical representations such as verb and noun phrases). Secondly, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which yields the first polynomial time algorithm for image labeling. Thirdly, we learn the HIM efficiently using machine learning methods from a labeled dataset. We demonstrate that the HIM is comparable with the state-of-the-art methods by evaluation on the challenging

PrePrint: IrisCode Decompression Based on the Dependence between its Bit Pairs

Mon, 09/19/2011 - 09:42
IrisCode is an iris recognition algorithm developed in 1993 and continuously improved by Daugman. Understanding IrisCode's properties is extremely important because over 60 million persons have been mathematically enrolled by the algorithm. In this paper, IrisCode is proved to be a compression algorithm, which is to say, its templates are compressed iris images. In our experiments, the compression ratio of these images is 1:655. An algorithm is designed to perform this decompression by exploiting a graph composed of the bit pairs in IrisCode, prior knowledge from iris image databases, and the theoretical results. To remove artifacts, two post-processing techniques that carry out optimization in the Fourier domain are developed. Decompressed iris images obtained from two public iris image databases are evaluated by visual comparison, two objective image quality assessment metrics and eight iris recognition methods. The experimental results show that the decompressed iris images retain iris texture, that their quality is roughly equivalent to a JPEG quality factor of ten and that the iris recognition methods can match the original images with the decompressed images. This paper also discusses the impacts of these theoretical and experimental findings on privacy and security.


Powered by Drupal, an open source content management system