Navigation
Contact Me
Office: 4423 Engr Building
Office Hours: T 4:00-5:00 pm
rangwala@cs.gmu.edu
703-993-3826
Bioinformatics & Data Mining
PrePrint: Manifold Learning for Visualizing and Analyzing High-dimensional Data
Intelligent Systems -
Mon, 01/25/2010 - 14:50
Due to the ``curse of dimensionality'', it is difficult to analyze high-dimensional data effectively using traditional statistical methods. Assuming that such data is generated from intrinsic variables with lower dimensions, manifold learning reveals intrinsic structures latent in high dimensional data spaces. It has attracted research interests from many domains in statistics and artificial intelligence. In this paper, we give a tutorial survey of several key and common manifold learning algorithms. We describe their applications to tasks in high dimensional data analysis and visualization. We discuss the pros and cons of these algorithms and how to avoid those pitfalls.
Categories: Bioinformatics & Data Mining
PrePrint: An Artificial Urban Health Care System and Applications
Intelligent Systems -
Mon, 01/25/2010 - 14:50
In recent years, reform and development of the urban health care system (HCS) in China has attracted increasing attentions. An urban HCS includes Community Health Systems (CHS, e.g. community hospitals) and Medical Delivery Systems (MDS, e.g. general hospitals). More cooperation between these two systems could greatly improve overall health care: making medical service more convenient and cost-effective. The urban HCS is a complex system extensively influenced by human behavior. Therefore, it is important to study patient hospital preferences and related patient decision making behavior. Using Agent-Based Modeling and Simulation (ABMS) as an innovative tool, we build an artificial HCS as a platform on which to study medical cooperation, such as sharing beds, sharing doctors and cost accommodation. The results show that patient access time can be greatly reduced and the overuse of resources in MDS is relieved. We also develop the referral appointment system based on linear programming, and present its benefits and capabilities. By adopting the artificial HCS platform in practice, researchers are capable of providing valuable insights to urban health care management.
Categories: Bioinformatics & Data Mining
PrePrint: Adversarial Knowledge Discovery
Intelligent Systems -
Mon, 01/25/2010 - 14:50
In adversarial settings, there are those who wish to conceal their existence, properties and activities from data analysis. This substantially changes the knowledge discovery process -- finding a model that best `fits' the data is unhelpful because it provides adversaries with predictable ways to hide, and ways to manipulate. We survey some of the implications for algorithms and process, and suggest some open problems.
Categories: Bioinformatics & Data Mining
PrePrint: Software Agent-based Intelligence for Code-centric RFID Systems
Intelligent Systems -
Mon, 01/25/2010 - 14:50
Radio frequency identification (RFID) is a kind of electronic identification technology that is becoming widely deployed. Due to its intrinsic small size and low cost features, the RFID technology can be readily integrated into various systems for future smart environment applications, whereby vital information is retrieved by diverse types of communications networks. In order to launch a specific service in an existing RFID system, object identification is first performed to retrieve the corresponding service codes from a backend database. However, the critical gaps that may exist in the identification recognition and subsequent handover of service codes from a database to a service machine can make it challenging to offer a good quality of service. This paper introduces a Code-centric RFID System based on an agent intelligence scheme that can potentially achieve faster service response. In this system, we replace traditional ID numbers with codes that indicate the service that the RFID tag bearer needs for improved system response.
Categories: Bioinformatics & Data Mining
PrePrint: I-Room: a Virtual Space for Intelligent Interaction
Intelligent Systems -
Mon, 01/25/2010 - 14:50
An I Room is a virtual environment for purposeful interaction. It is intended to provide support for a range of collaborative activities, especially those that involve deliberation and decision-making. The I Room acts as a space in which information can be collected, arranged and maintained, and in which participants can collaborate using a variety of communication, presentation and support tools. This concept is founded on a number of complementary principled approaches for guiding purposeful behaviour, which in turn provide a basis for calls to external intelligent systems and knowledge bases. Prototype I Rooms have been constructed using a popular virtual world platform and used for interactive work and leisure activities; several of these applications are presented here to illustrate the concept.
Categories: Bioinformatics & Data Mining
PrePrint: Context Aware Emotional Model for Group Decision Making
Intelligent Systems -
Mon, 01/25/2010 - 14:50
Decision making is the cognitive process leading to the selection of a course of action among variations; indeed, decision making is said to be a psychological construct, depending on the individual or individuals. Although being an important factor in individuals every day life, emotions are many times forgotten in the development of systems to be used by persons. In this paper we present a context aware model of emotions that can be used to design intelligent agents endowed with emotional capabilities that can be used to simulate group decision making processes. Our experiments show that agents endowed with emotional awareness are able to achieve agreements more rapidly.
Categories: Bioinformatics & Data Mining
PrePrint: Context-Aware Middleware for Multimedia
 Services in Heterogeneous Networks
Intelligent Systems -
Mon, 01/25/2010 - 14:50
An important challenge for supporting multimedia applications in heterogeneous networks is the heterogeneity of fixed and mobile access networks. In this work, we design a new and efficient context-aware middleware for facilitating diverse multimedia services in heterogeneous networks environment. Firstly, we present an adaptive service provisioning middleware for handling the heterogeneity of diverse networks and enable service provisioning to mobile users and professionals anywhere, anytime. Then, a context-aware multimedia middleware framework is presented based on the proposed adaptive service provisioning framework to support diverse multimedia services, including, multimedia content filtering, recommendation, adaptation, aggregation, learning, reasoning, and delivery. To the best of knowledge, this study is the first one to provide a general heterogeneous multimedia middleware by jointly considering the characteristics of context-multimedia service and heterogeneous networks.
Presented By: NEC
Ads by Pheedo
Presented By: NEC
Ads by Pheedo
Categories: Bioinformatics & Data Mining
PrePrint: Predicting Performance on a Repetitive Task through Automatic Analysis of Facial Feature Movements
Intelligent Systems -
Mon, 01/25/2010 - 14:50
Accurately predicting human error remains an ongoing problem in many industrial settings. The combined complexity of motor, perceptive, and decision-making activities leads to a vast range of human error making it difficult to devise models capable of predicting and avoiding these errors. Our research proposes a novel behavior-based approach to human performance prediction using computer vision and machine learning. Using facial features automatically extracted from short video segments of experimental participants, we created models to predict participant performance over the entire task, over each phase of the task, and at any given instant within the task (i.e., individual errors). The models successfully predicted human performance with over 90% accuracy across classification categories. We discuss both theoretical and applied implications.
Categories: Bioinformatics & Data Mining
PrePrint: A Lexicon Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews
Intelligent Systems -
Mon, 01/25/2010 - 14:50
Previous sentiment classification studies have adopted either the machine learning approach or the semantic orientation approach. For this study, we proposed a lexicon enhanced method for sentiment classification by combining these two approaches into one framework. Specifically, we used the words with semantic orientations as an additional dimension of features (referred to as "sentiment features") for the machine learning classifiers. We examined the performance of our proposed method through experiments using five different online product review data sets including: digital cameras, books, DVDs, electronics, and kitchen appliances, respectively. Among them, the first data set was collected by the authors and the remaining four were publicly available. Experiments on the different data sets consistently indicated that adding sentiment features significantly improved sentiment classification performance. For the four public data sets, the best classification results were achieved when all three types of features (i.e., content-free, content-specific, and sentiment features) were combined and feature selection was conducted. For the digital camera data set, the best performance was achieved when using the combined features without feature selection.
Categories: Bioinformatics & Data Mining
PrePrint: Will intelligent assets take off? Towards self-serving aircrafts
Intelligent Systems -
Mon, 01/25/2010 - 14:50
In this article we present the self-serving-asset, developed as part of a research project at the Boeing Company and the University of Cambridge. The self-serving asset is self-aware, and has the goal to maximise its life in service by contacting, selecting and procuring service providers autonomously. The result is an open, consistent service chain where complex database transactions are eliminated, and an emergent, yet rather self-capable system starts to materialise. Among various supporting technology multi-agent systems provide the backbone for the “intelligence” characteristic required from the self-serving asset. Intelligent asset agents monitor assets, contact suppliers, use multi-criteria decision making to select among proposals, and handle competition. In this paper we aim to outline the self-serving asset concept, describe the multi-agent platform designed to support the asset, and present experimental results on the preliminary agent architecture in terms of decision optimality, scalability and stability.
Categories: Bioinformatics & Data Mining
PrePrint: Reference Resolution Challenges for an Intelligent Agent: The Need for Knowledge
Intelligent Systems -
Mon, 01/25/2010 - 14:50
This paper presents a vision of how language-endowed, next- generation intelligent agents might resolve – i.e., fully interpret – references to objects and events in language input. It describes some of the more difficult reference phenomena that are not being sufficiently treated by practical systems and suggests what kinds of knowledge must be available to intelligent agents to enable them to reach human competence in reference resolution.
Categories: Bioinformatics & Data Mining
PrePrint: Converting a Historical Encyclopedia of Architecture into a Semantic Knowledge Base
Intelligent Systems -
Mon, 01/25/2010 - 14:50
The historic Encyclopedia of Architecture, written in German between 1880-1943, was one of the largest projects aiming at conserving all architectural knowledge available at that time. Today, its vast amount of content is mostly lost: few complete sets are available, and its complex structure does not lend itself easily to contemporary application. We show how modern semantic technologies can be applied to make these heritage documents accessible again. In particular, we demonstrate how to go beyond classical digitization projects by transforming the historical documents into a semantic knowledge base. Using techniques from natural language processing and the Semantic Web, we show how to automatically populate an ontology that can be used for various application scenarios: Building historians can use it to navigate and query the encyclopedia, while architects can directly integrate it into contemporary construction tools. Additionally, all content is made accessible in a user-friendly Wiki interface that combines original text with NLP-derived metadata and adds annotation capabilities for collaborative use.
Categories: Bioinformatics & Data Mining
IEEE Intelligent Systems - November/December 2009 (Vol. 24, No. 6)
Intelligent Systems -
Mon, 01/25/2010 - 14:50
Categories: Bioinformatics & Data Mining
PrePrint: Semi-Supervised Classification via Local Spline Regression
TPAMI -
Mon, 01/25/2010 - 14:50
We present local spline regression: a new approach to semi-supervised classi cation. The core idea of our approach is to introduce splines developed in Sobolev space to map the data points to be class labels. Speci cally, in each data neighborhood, an optimal spline is estimated under the regularized least squares regression framework. With this spline, the neighboring data points are mapped, and the quadratic loss is evaluated and then formulated in terms of class label vector. Such local losses evaluated on all of the neighborhoods are nally accumulated together to construct a learning model with global consistency. Finally, a transductive classi cation algorithm is developed. Comparative classification experiments on many public data sets and applications to interactive image segmentation and image matting illustrate the validity of our method.
Categories: Bioinformatics & Data Mining
PrePrint: Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality
TPAMI -
Mon, 01/25/2010 - 14:50
Stability (robustness) of feature selection methods is a topic of recent interest yet often neglected importance with direct impact on the reliability of machine learning systems. We investigate the problem of evaluating the stability of feature selection processes yielding subsets of varying size. We introduce several novel feature selection stability measures and adjust some existing measures in a unifying framework that offers broad insight into the stability problem. We study in detail the properties of considered measures and demonstrate on various examples what information about the feature selection process can be gained. We also introduce an alternative approach to feature selection evaluation in form of measures that enable comparing the similarity of two feature selection processes. These measures enable comparing, e.g., the output of two feature selection methods or two runs of one method with different parameters. The information obtained using the considered stability and similarity measures is shown usable for assessing feature selection methods (or criteria) as such.
Categories: Bioinformatics & Data Mining
PrePrint: PADS: A Probabilistic Activity Detection Framework for Video Data
TPAMI -
Mon, 01/25/2010 - 14:50
There is now a growing need to identify various kinds of activities that occur in videos. In this paper, we first present a logical language called Probabilistic Activity Description Language (PADL) in which users can specify activities of interest. We then develop a probabilistic framework which assigns to any subvideo of a given video sequence a probability that the subvideo contains the given activity, and we finally develop two fast algorithms to detect activities within this framework. OffPad finds all minimal segments of a video that contain a given activity with a probability exceeding a given threshold. In contrast, the OnPad algorithm examines a video during playout (rather than afterwards as OffPad does) and computes the probability that a given activity is occurring (even if the activity is only partially complete). Our prototype Probabilistic Activity Detection System (PADS) implements the framework and the two algorithms, building on top of existing image processing algorithms. We have conducted detailed experiments and compared our approach to four different approaches presented in the literature. We show that - for complex activity definitions - our approach outperforms all the other approaches.
Presented By: NEC
Ads by Pheedo
Presented By: NEC
Ads by Pheedo
Categories: Bioinformatics & Data Mining
PrePrint: Stereo Matching with Mumford-Shah Regularization and Occlusion Handling
TPAMI -
Mon, 01/25/2010 - 14:50
This paper addresses the problem of correspondence establishment in binocular stereo vision. We suggest a novel spatially continuous approach for stereo matching based on the variational framework. The proposed method suggests a unique regularization term based on Mumford-Shah functional for discontinuity preserving, combined with a new energy functional for occlusion handling. The evaluation process is based on concurrent minimization of two coupled energy functionals, one for domain segmentation (occluded vs. visible) and the other for disparity evaluation. In addition to a dense disparity map, our method also provides an estimation for the half-occlusion domain, and a discontinuity function allocating the disparity/depth boundaries. Two new constraints are introduced improving the revealed discontinuity map. The experimental tests include a wide range of real data sets from Middlebury stereo database. The results demonstrate the capability of our method in calculating an accurate disparity function with sharp discontinuities and occlusion map recovery. Significant improvements are shown comparing to a recently published variational stereo approach. A comparison on the Middlebury stereo benchmark with sub-pixel accuracies shows that our method is currently among the top-ranked stereo matching algorithms.
Categories: Bioinformatics & Data Mining
PrePrint: A Hierarchical Visual Model for Video Object Summarization
TPAMI -
Mon, 01/25/2010 - 14:50
We propose a novel method for removing irrelevant frames from a video given user-provided frame-level labeling for a very small number of frames. We first hypothesize a number of windows which possibly contain the object of interest, and then figure out which window(s) truly contain the object of interest. Our method enjoys several favorable properties. First, compared to approaches where a single descriptor is used to describe a whole frame, each window's feature descriptor has the chance of genuinely describing the object of interest, hence it is less affected by background clutter. Second, by considering the temporal continuity of a video instead of treating frames as independent, we can hypothesize the location of the windows more accurately. Third, by infusing prior knowledge into the patch-level model, we can precisely follow the trajectory of the object of interest. This allows us to largely reduce the number of windows and hence reduce the chance of overfitting the data during learning. We demonstrate the effectiveness of the method by comparing it to several other semi-supervised learning approaches on challenging video clips.
Categories: Bioinformatics & Data Mining
PrePrint: Script Recognition - A Review
TPAMI -
Mon, 01/25/2010 - 14:50
A variety of different scripts are used in writing languages throughout the world. In a multi-script, multilingual environment, it is essential to know the script used in writing a document before an appropriate character recognition and document analysis algorithm can be chosen. In view of this, several methods for automatic script identification have been developed so far. They mainly belong to two broad categories – structure-based and visual appearance-based techniques. This survey report gives an overview of the different script identification methodologies under each of these categories. Methods for script identification in online data and video-texts are also presented. It is noted that the research in this field is relatively thin and still more research is to be done, particularly in case of handwritten documents.
Categories: Bioinformatics & Data Mining
PrePrint: Tuning Support Vector Machines for Minimax and Neyman-Pearson Classification
TPAMI -
Mon, 01/25/2010 - 14:50
This paper studies the training of support vector machine (SVM) classifiers with respect to the minimax and Neyman-Pearson criteria. In principle, these criteria can be optimized in a straightforward way using a cost-sensitive SVM. In practice, however, because these criteria require especially accurate error estimation, standard techniques for tuning SVM parameters, such as cross-validation, can lead to poor classifier performance. To address this issue, we first prove that the usual cost-sensitive SVM, here called the 2C-SVM, is equivalent to another formulation called the 2ν-SVM. We then exploit a characterization of the 2ν-SVM parameter space to develop a simple yet powerful approach to error estimation based on smoothing. In an extensive experimental study we demonstrate that smoothing significantly improves the accuracy of cross-validation error estimates, leading to dramatic performance gains. Furthermore, we propose coordinate descent strategies that offer significant gains in computational efficiency, with little to no loss in performance.
Presented By: NEC
Ads by Pheedo
Presented By: NEC
Ads by Pheedo
Categories: Bioinformatics & Data Mining
News Highlights
- New paper (in press) at Journal of Bioinformatics and Computational Biology.
- Salman successfully defends Masters Thesis.
- Software released svmPRAT and paper at BMC Bioinformatics
- New funding from NIH as part of ARRA (Grand Opportunities RC2)
- Syed F to join the Lab.
- Paper Accepted at Journal of Chemical Information & Modeling
Bioinformatics & Data Mining
- PrePrint: Manifold Learning for Visualizing and Analyzing High-dimensional Data
- PrePrint: An Artificial Urban Health Care System and Applications
- PrePrint: Adversarial Knowledge Discovery
- PrePrint: Software Agent-based Intelligence for Code-centric RFID Systems
- PrePrint: I-Room: a Virtual Space for Intelligent Interaction
