Seminars and Events
Volgenau School of Engineering: Spring Graduate Student Orientation
Wednesday, January 16, 2013
Oral Defense of Doctoral Dissertation: Delay-Based Methods for Robust Geolocation of Internet Hosts
Tuesday, January 22, 2013
Accurate geolocation of IP addresses, is increasingly important in many applications, such as targeted delivery of localized content over Internet, prevention of Internet, detection and prevention of cyberattacks and cyberterrorism, etc. The current geolocation algorithms can be divided into several classes according to the data that is used for determining the geographic location.
Use of round trip delay measurements for geolocation has proved not very reliable in the past, because of the non-linear correlation between distances and delays generated by the network congestion, queuing delay and circuitous routes. This thesis contributes to the advancement of two classes of delay-based geolocation methods. The first contribution is a family of pure delay-based algorithms based on a general class of proximity measures. When such measures are carefully chosen to discard the data which contains little information about the geographical location of a target IP address, the resulting algorithms have improved accuracy over the existing pure-delay based schemes.
The second contribution is the development of a statistical geolocation scheme based on the application of kernel density estimation to delay measurements amongst a set of landmarks. An estimate of the target IP location is then obtained by maximizing the likelihood of the distances from the target to the landmarks, given the measured delays. This is achieved by an algorithm which combines gradient ascent and force-directed methods. We compare the proposed geolocation schemes with the previous methods by developing a measurement framework using thePlanetLab infrastructure. Our experimental results show that the proposed geolocation algorithms have superior accuracy to the prior art.
Oral Defense of Doctoral Dissertation: Detecting Polymorphic and Mutated Malicious Access in Online Ad Serving Systems
Thursday, January 31, 2013
With the emergence of full-scale electronic commerce, the World Wide Web has been quickly and aggressively realized as an effective advertising medium. Indeed, advertising itself has become an important commodity on the web. The current model allows unethical and dishonest intermediaries to defraud the system by simulating fake traffic to increase revenue.
The development of intelligent data analysis methods for fraud detection can be well motivated from an economic point of view. Additionally, as the number of fraud cases increases, the reputation of Ad Networks suffers. This dissertation focuses on two aspects of the malicious access problem largely unexplored by the research community. These include heuristic analysis and the complexity of detecting polymorphic and mutated fraudulent traffic patterns, as well as the close world assumption generally associated with such problems, where some identities of a given malicious hit are necessarily represented in the training set.
This research investigates the application of novel machine learning techniques to the detection of heterogeneous and complex malicious traffic through a novel heuristic analysis approach aimed at detecting the underlying patterns of malicious hits within a given Click-Stream span.
This thesis proposes a heuristic-based feature selection and classification technique using Artificial Neural Networks (ANN) to detect polymorphic and mutated traffic inflation attacks. A heuristic ANN based feature selector has been investigated. The novelty of the proposed feature selection approach is that it integrates the capability of the wrapper approach to find better feature subsets by combining the filter’s ranking score with the wrapper-heuristic’s score to take advantage of both filter and wrapper heuristics. In addition, hybrid training algorithms for ANN have also been developed to uncover indicators of fraudulent patterns using heuristic analysis and previous conversion data elements. The indicators are used to create a set of patterns which profile legitimate traffic and indicate outliers and anomalies.
Experiments on real life fraud data demonstrate the advantage of the proposed approaches over some of the most frequently used detection techniques. This advantage is demonstrated by a significant decrease in false positive and general stability of the results.
CS Seminar: Dynamically Heterogeneous Cores Through 3D Resource Pooling
Monday, February 11, 2013
3D die stacking is a recent technological development which makes it possible to create chip multiprocessors using multiple layers of active silicon bonded with low latency, high-bandwidth, and very dense vertical interconnects. 3D die stacking technology provides very fast communication, as low as a few picoseconds, between processing elements residing on different layers of the chip. The rapid communication network in a 3D stack design, along with the expanded geometry, provides an opportunity to dynamically share on-chip resources among different cores. This research describes an architecture for a dynamically heterogeneous processor architecture leveraging 3D stacking technology. Unlike prior work in the 2D plane, the extra dimension makes it possible to share resources at a fine granularity between vertically stacked cores. As a result, each core can grow or shrink resources, as needed by the code running on the core. This architecture, therefore, enables runtime customization of cores at a fine granularity and enables efficient execution at both high and low levels of thread parallelism. This architecture achieves performance gains of up to 2X, depending on the number of executing threads, and gains significant advantage in energy efficiency.
Speaker's BioHouman Homayoun is an Assistant Professor of the Department of Electrical and Computer Engineering at George Mason University. He also holds a Courtesy appointment with the Department of Computer Science.Prior to joining George Mason University, He spent two years at the University of California, San Diego, as National Science Foundation Computing Innovation (CI) Fellow awarded by the Computing Research Association (CRA) and the Computing Community Consortium (CCC). Houman's research is on power-temperature and reliability-aware memory and processor design optimizations and spans the areas of computer architecture, embedded systems, circuit design, and VLSI-CAD, where he has published more than 30 technical papers on the subject, including some of the earliest work in the field to address the importance of cross-layer power and temperature optimization in memory peripheral circuits. He is currently leading a number of research projects, including the design of next generation 3D heterogeneous multicores, low power hybrid SRAM-NVM memory hierarchy design, reliability-aware cache design, and power management in data centers. Houman was a recipient of the four-year University of California, Irvine Computer Science Department chair fellowship. He received his PhD degree from the Department of Computer Science at the University of California, Irvine in 2010, an MS degree in computer engineering in 2005 from University of Victoria, Canada and his BS degree in electrical engineering in 2003 from Sharif University of Technology.
GRAND Seminar: Heterogeneous Face Recognition
Tuesday, February 12, 2013
A chief benefit of face recognition technology is the extensive collection of face photographs available to populate target galleries. From sources such as driver's licenses, passports, and mug shots, a (generally) high quality gallery seed exists for a large percentage of the developed world's population. While these gallery images are visible light photographs, many face recognition scenarios exist where probe images used to be match against such galleries are only available from some alternate imaging modality. For example, in environments with adverse illumination conditions (such as nighttime), face images must be captured in the infrared spectrum. In other cases, a lack of a face image requires the use of a forensic sketch to depict a subject. The task of matching face images across image modalities is called heterogeneous face recognition. In this talk, the problem of heterogeneous face recognition will be introduced. Different approaches for performing heterogeneous face recognition will be introduced, including methods for (i) directly measuring the similarity between heterogeneous face images, and (ii) using prototype similarities to performing matching without needing a direct comparison. The follow research topics will also be discussed: (i) the effect of demographics (race, gender, and age) on face recognition performance, (ii) studies on training face recognition system for time lapse invariance, and (iii) designing facial features and matching algorithms for matching caricature sketches to photographs.
Speaker's BioBrendan Klare is a scientist at Noblis. He received the Ph.D. degree in Computer Science from Michigan State University in 2012, and received the B.S. and M.S. degrees in Computer Science and Engineering from the University of South Florida in 2007 and 2008. From 2001 to 2005 he served as an airborne ranger infantryman in the 75th Ranger Regiment, U.S. Army. Brendan has authored several papers on the topic of face recognition, and was the recipient of the Honeywell Best Student Paper Award at the 2010 IEEE Conference on Biometrics: Theory, Applications and Systems (BTAS). His other research interests include pattern recognition and computer vision.
SWE Seminar: Applying Empirical Software Engineering to Software Architecture: Challenges and Lessons Learned
Thursday, February 14, 2013
Software architecture community has developed numerous methods, techniques, and tools to support the architecture process (analysis, design, and review). Historically, most advances in software architecture have been driven by talented people and industrial experience, but there is now a growing need to systematically gather empirical evidence about the advantages or otherwise of tools and methods rather than just rely on promotional anecdotes or rhetoric.
The aim of this presentation is to promote and facilitate the application of the empirical paradigm to software architecture. To this end, Davide Falessi will describe several challenges and lessons learned when assessing software architecture research that used controlled experiments, replications, expert opinion, systematic literature reviews, observational studies, and surveys. Before presenting scientific details, Forrest Shull will introduce the Fraunhofer Institute of Experimental Software Engineering, a not-for-profit applied research and technology transfer organization.
Speaker's BioDr. Davide Falessi joined the Fraunhofer Center for Experimental Software Engineering in Maryland (CESE) in 2012 as a Research Scientist in the Measurement and Knowledge Management Division. He currently serves as a program committee member in several international conferences including ESEM, WICSA, ICSR, SEKE, PROFES, EASE, and MTD. His main research interest is in devising and empirically assessing scalable solutions for the development of complex software-intensive systems with a particular emphasis on architecture, requirements, and quality. He received the PhD and the “Laurea” degrees in Computer Engineering from the University of Rome “TorVergata”.
Dr. Forrest Shull is a senior scientist at the Fraunhofer Center for Experimental Software Engineering in Maryland (CESE), a nonprofit research and tech transfer organization, where he leads the Measurement and Knowledge Management Division. At Fraunhofer CESE, he has been a lead researcher on projects for NASA's Office of Safety and Mission Assurance, the NASA Safety Center, the U.S. Department of Defense, the National Science Foundation, the Defense Advanced Research Projects Agency (DARPA), and companies such as Motorola and Fujitsu Labs of America. He is the Editor-in-Chief of IEEE Software.
CS Distinguished Lecture Series: “A Walk in the Dark: Random Walks and Network Discovery”
Friday, February 15, 2013
We rely on a wide variety of digital networks in our daily lives, to provide a rich set of services in commerce, government, communications, and to feel connected. These networks include the Internet, the World Wide Web, on-line social networks such as Facebook, Twitter, etc., and cellular networks. They are large, complex, very richly structured, and constantly changing over time. Moreover, because of their size and complexity, very little is known about them.
In this talk we focus on the problem of how to discover the structure of these networks. Traditional methods for network discovery and exploration include crawling the network using breadth first search (BFS). We will show, however, that such methods introduce significant biases unless almost all of the network is crawled. Instead, we focus on random walks as a method for exploring the network. We will show that random walks can be used to solve a number of network discovery tasks including characterizing degree distributions, identifying important nodes, finding short paths, locating content, etc., while exploring only a very small portion of network. Last, we do this in the context of networks whose underlying graphs are either directed or undirected.
Speaker's BioProfessor Towsley's research spans a wide range of activities from stochastic analyses of queueing models of computer and telecommunications to the design and conduct of measurement studies. He has performed some of the pioneering work on the exact and approximate analyses of parallel/distributed applications and architectures. More recently, he pioneered the area of network tomography and the use of fluid models for large networks. He has published extensively, with over 150 articles in leading journals.
PhD Comptuer Science, University of Texas (1975), BA Physics, University of Texas (1971). Professor Towsley first joined the faculty at the University of Massachuseets in the Department of Electrical and Computer Engineering in 1976 and moved to the Department of Computer Science in 1986. He was named University Distinguished Professor of Computer Science in 1998. Professsor Towsley was a Visiting Scientist at the IBM T.J. Watson Research Center, (1982-83, 2003), INRIA and AT&T Labs - Research (1996-97), and Cambridge Microsoft Research Lab (2004); a Visiting Professor at the Laboratoire MASI, Paris, (1989-90).
Computer Science Colloquium: Improving Text Retrieval Applications in Software Engineering: A Case on Concept Location
Tuesday, February 26, 2013
The source code of large scale, long lived software systems is difficult to change by developers. Finding a place in the code where to start implementing a change, also known as concept location, can be particularly challenging. Recent approaches based on Text Retrieval leverage the textual information found in the identifiers and comments found in source code in order to guide developers during this task. In addition, text retrieval has been used to address many other software engineering tasks, but wide adoption in industry and education is still ahead of us. Among the factors that deter broader adoption, researchers observed the difficulties of developers to formulate good queries in unfamiliar software and the quality of identifiers present in software. In order to support developers for writing better queries we propose two query reformulation techniques. One is based on feedback provided by the developers, whereas the other one is completely automated and employs machine learning techniques to learn from past user queries. Empirical validation shows that the queries reformulated using our approaches lead to better results in concept location, compared to the original queries and to previous techniques. In order to improve the quality of identifiers in source code we define a catalog of the most common identifier problems, called lexicon bad smells, and propose a series of refactoring operations in order to correct them. We show that the refactored identifiers lead to an improvement in the results of Text Retrieval-based concept location. In addition, we also investigated how to present to developers the information retrieved from source code during concept location. We studied the use of automated text summarization techniques to
Speaker's BiographySonia Haiduc is a PhD candidate at Wayne State University, in Detroit, MI, USA, where she performs research in software engineering. Her research interests include software maintenance and evolution, program comprehension, and source code search. Her work has been published in several highly selective software engineering venues. She has also been awarded the Google Anita Borg Memorial Scholarship for her research and leadership.
Computer Science Colloquium: Solving the Search for Source Code
Thursday, February 28, 2013
Programmers frequently use keyword searches to find source code in large repositories. However, to do this effectively, programmers must specify keyword queries that capture implementation details of their desired code. I propose that code search should be about behavior, not about keywords.
In this talk, I will present an approach to code search that allows programmers to provide inputs and outputs that define the behavior of their desired code. This approach indexes source code repositories by symbolically analyzing the programs and program fragments and transforming them into constraints representing their behavior. Results are identified using an SMT solver, which, given an input/output specification and the constraint representation of a program fragment, determines if the fragment matches the desired behavior. While promoting code reuse, my approach enables reuse where it was not possible before: the constraints can be relaxed, identifying code that approximately matches the specification. Further, the solver can then guide the instantiation of the code to produce the desired behavior. I will illustrate the generality of the approach by showing its instantiation in subsets of three languages, the Java String library, Yahoo! Pipes mashups, and SQL select statements. I will conclude by sharing my vision for new research directions related to this semantic approach to code search.
Speakers BioKathryn Stolee is a Ph.D. candidate and NSF Graduate Research Fellow in the Department of Computer Science and Engineering at the University of Nebraska-Lincoln. She has been awarded an ESEM Distinguished Paper Award and two departmental outstanding research awards. Her research is in software engineering with a focus on program analysis. Extracting useful information from software artifact repositories is a broad theme of her research, most recently through semantic code search.
SANG Seminar: A Comparative Study of Android and iOS for Accessing Internet Streaming Services
Friday, March 01, 2013
Android and iOS devices are leading the mobile device market. While various user experiences have been reported from the general user community about their differences, such as battery lifetime, display, and touchpad control, few in-depth reports can be found about their comparative performance when receiving the increasingly popular Internet streaming services.
Today, video traffic starts to dominate the Internet mobile data traffic. In this talk, focusing on Internet streaming accesses, we set to analyze and compare the performance when Android and iOS devices are accessing Internet streaming services. Starting from the analysis of a server-side workload collected from a top mobile streaming service provider, we find Android and iOS use different approaches to request media content, leading to different amounts of received traffic on Android and iOS devices when a same video clip is accessed. Further studies on the client side show that different data requesting approaches(standard HTTP request vs. HTTP range request) and different buffer management methods (static vs. dynamic) are used in Android and iOS mediaplayers, and their interplay has led to our observations.
Speaker's BioYao Liu is a Ph.D. student of Computer Science Department at George Mason University.
GRAND Seminar: Discovery of Novel Patterns in Massive Time Series Data
Tuesday, March 05, 2013
Massive amounts of data are generated daily at a rapid rate. As a result, the world is faced with unprecedented challenges and opportunities on managing the ever-growing data, and much of the world's supply of data is in the form of time series. One obvious problem of handling time series databases concerns with its typically massive size—gigabytes or even terabytes are common, with more and more databases reaching the petabyte scale. Most classic data mining algorithms do not perform or scale well on time series data due to their unique structure. In particular, the high dimensionality, very high feature correlation, and the typically large amount of noise that characterize time series data present a difficult challenge. As a result, time series data mining has attracted an enormous amount of attention in the past two decades.
This presentation gives an overview of my contributions in the field of time series data mining. The first part of the presentation discusses time series data mining fundamentals - more specifically, the two aspects that hugely determine the efficiency and effectiveness of most time series data mining algorithms: data representation and similarity measure. The second part of the presentation will focus on the discovery of novel and non-trivial patterns in time series data, including frequently encountered (or repeated) patterns, rare (or anomalous) patterns, or latent structure.
Speaker's BioDr. Jessica Lin is an Associate Professor in the Department of Computer Science at George Mason University. She received her PhD degree from University of California, Riverside in June, 2005. Her research interests encompass broad areas of data mining, especially data mining for large temporal and spatiotemporal databases, text, and images. Over the years, she has collaborated with researchers from various domains including medicine, earth sciences, manufacturing, national defense, and astronomy. Her research is partially funded by NSF, US Army and Intel Corporation.
Empirical Evaluation of the Statement Deletion Mutation Operator
Thursday, March 07, 2013
Mutation analysis is widely considered to be an exceptionally effective criterion for designing tests. It is also widely considered to be expensive in terms of the number of test requirements and in the amount of execution needed to create a good test suite. This paper posits that simply deleting statements, implemented with the statement deletion (SDL) mutation operators in Mothra, is enough to get very good tests. A version of the SDL operator for Java was designed and implemented inside the muJava mutation system. The SDL operator was applied to 40 separate Java classes, tests were designed to kill the non-equivalent SDL mutants, and then run against all mutants.
Speaker's BioLin Deng is a first year PhD student at George Mason University. He received the BS degree in Computer Science from Renmin University of China, and the MS degree in Computer and Information Science from Gannon University. He worked as a computer engineer at State Intellectual Property Office of China for four years before joining the MS program of Gannon University. His research interests are in software testing and security.
CS Distinguished Lecture Series: Automatic Annotation of Protein Function
Wednesday, March 20, 2013
Speaker's BioLydia E. Kavraki is the Noah Harding Professor of Computer Science and Bioengineering at Rice University. She also holds an appointment at the Department of Structural and Computational Biology and Molecular Biophysics at the Baylor College of Medicine in Houston. Kavraki received her B.A. in Computer Science from the University of Crete in Greece and her Ph.D. in Computer Science from Stanford University working with Jean-Claude Latombe. Her research contributions are in physical algorithms and their applications in robotics (robot motion planning, hybrid systems, formal methods in robotics, assembly planning, micromanipulation, and flexible object manipulation), as well as in computational structural biology, translational bioinformatics, and biomedical informatics (modeling of proteins and biomolecular interactions, large-scale functional annotation of proteins, computer-assisted drug design, and systems biology). Kavraki has authored more than 180 peer-reviewed journal and conference publications and is one of the authors of a robotics textbook titled "Principles of Robot Motion" published by MIT Press. She is heavily involved in the development of The Open Motion Planning Library (OMPL), which is used in industry and in academic research in robotics and bioinformatics. Kavraki is currently on the editorial board of the International Journal of Robotics Research, the ACM/IEEE Transactions on Computational Biology and Bioinformatics, the Computer Science Review, and Big Data. She is also a member of the editorial advisory board of the Springer Tracts in Advanced Robotics. Kavraki is the recipient of the Association for Computing Machinery (ACM) Grace Murray Hopper Award for her technical contributions. She has also received an NSF CAREER award, a Sloan Fellowship, the Early Academic Career Award from the IEEE Society on Robotics and Automation, a recognition as a top young investigator from the MIT Technology Review Magazine, and the Duncan Award for excellence in research and teaching from Rice University. Kavraki is a Fellow of the Association of Computing Machinery (ACM), a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Fellow of the American Institute for Medical and Biological Engineering (AIMBE), a Fellow of the American Association for the Advancement of Science (AAAS), and a Fellow of the World Technology Network (WTN). Kavraki was elected a member of the Institute of Medicine (IOM) of the National Academies in 2012. She is also a member of the Academy of Medicine, Engineering and Science of Texas (TAMEST) since 2012. Current projects at Kavraki's laboratory are described under http://www.kavrakilab.org and http://www.cs.rice.edu/~kavraki.
Oral Defense of Doctoral Dissertation: Decision Guidance for Sustainable Manufacturing
Thursday, March 21, 2013
Sustainable manufacturing has significant impacts on a company’s business performance and competitiveness in today’s world. A growing number of manufacturing industries are initiating efforts to address sustainability issues; however, to achieve a higher level of sustainability, manufacturers need methodologies for formally describing, analyzing, evaluating, and optimizing sustainability performance metrics for manufacturing processes and systems. Currently, such methodologies are missing.
This dissertation developed the Sustainable Process Description and Analytics (SPDA) formalism and a systematic decision guidance methodology to fill the research gaps. The methodology provides step-by-step guidance for sustainability performance analysis and decision optimization using the SPDA formalism. The SPDA formalism provides unified syntax and semantics for querying, what-if analysis, and decision optimization; is modular, extensible, and reusable; supports built-in process and sustainability metrics modeling that enable users using data from production, energy management, life cycle assessment reference for modeling and analysis; is easy to use by manufacturing and business users; and also provides a reduction procedure that enables the translations of the SPDA query into specialized models such as optimization or simulation model for decision guidance. Two real world sustainable manufacturing case studies have been performed to demonstrate the use of formalism and the methodology.
CS Colloquium-NEW TIME: Automatic Program Repair Using Genetic Programming
Monday, March 25, 2013
"Everyday, almost 300 bugs appear...far too many for only the Mozilla programmers to handle" --Mozilla developer, 2005 Software quality is a pernicious problem. Although 40 years of software engineering research has provided developers considerable debugging support, actual bug repair remains a predominantly manual, and thus expensive and time-consuming, process. I will describe GenProg, a technique that uses evolutionary computation to automatically fix software bugs. My empirical evidence demonstrates that GenProg can quickly and cheaply fix a large proportion of real-world bugs in open-source C programs. I will also briefly discuss the atypical evolutionary search space of the automatic program repair problem, and the ways it has challenged assumptions about software defects.
Speaker's BioClaire Le Goues is a Ph.D. candidate in Computer Science at the University of Virginia. Her research interests lie in the intersection of software engineering and programming languages, with a particular focus on software quality and automated error repair. Her work on automatic program repair has been recognized with Gold and Bronze designations at the 2009 and 2012 ACM SIGEVO "Humies" awards for Human-Competitive Results Produced by Genetic and Evolutionary Computation and several distinguished and featured paper awards.
CS Colloquium: Searching for Relevant Functions and Their Usages in Millions of Lines of Code
Thursday, March 28, 2013
Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or ordinary code fragments. Therefore, developers require support in finding relevant functions and determining how those functions are used. Unfortunately, existing code search engines do not provide enough of this support to developers, thus reducing the effectiveness of code reuse. We provide this support to programmers in a code search system called Portfolio that retrieves and visualizes relevant functions and their usages. We have built Portfolio using a combination of models that address surfing behavior of programmers and sharing related concepts among functions. Currently, Portfolio is instantiated on two large source code repositories with thousands of projects spanning 270 Million C/C++ and 440 Million Java lines of code. In order to evaluate Portfolio, we conducted two experiments. First, an experiment with 49 professional C/C++ programmers to compare Portfolio to Google Code Search and Koders using a standard methodology for evaluating Information Retrieval-based engines. And second, an experiment with 19 Java programmers to compare Portfolio to Koders. The results show with strong statistical significance that users find more relevant functions with higher precision with Portfolio than with Google Code Search and Koders. We also demonstrate that by using PageRank, Portfolio is able to rank returned relevant functions more efficiently.
Speaker's BioDr. Denys Poshyvanyk is an Assistant Professor at the College of William and Mary in Virginia. He received his Ph.D. degree in Computer Science from Wayne State University in 2008. He also obtained his M.S. and M.A. degrees in Computer Science from the National University of Kyiv-Mohyla Academy, Ukraine and Wayne State University in 2003 and 2006, respectively. Since 2010, he has been serving on the steering committee of the International Conference on Program Comprehension (ICPC). He has been elected as a chair of the ICPC steering committee in 2012. He serves as a program co-chair for 21st IEEE International Conference on Program Comprehension (ICPC'13). He also served as a program co-chair for the 18th and 19th International Working Conference on Reverse Engineering (WCRE 2011 and WCRE 2012). Dr. Poshyvanyk received NSF CAREER award and several Best Paper Awards, including ICPC’06, ICPC'07, ICSM’10, and SCAM’10. His research interests are in software engineering, software maintenance and evolution, program comprehension, reverse engineering, software repository mining, source code analysis, and metrics.
GRAND Seminar: Multi-target Tracking by Rank-1 Tensor Approximation
Friday, March 29, 2013
Multi-target tracking (MTT) is an important problem in computer vision and has many applications. We introduce a novel framework for MTT using the rank-1 tensor approximation and propose an L1 norm tensor power iteration solution. In particular, a high order tensor is constructed based on trajectories in the time window, with each tensor element as the affinity of the corresponding trajectory. The assignment variables are the L1 normalized vectors, which are used to approximate the rank-1 tensor. Our approach provides a flexible and effective formulation where both pairwise and high-order association energy can be used expediently. We also show the close relation between our formulation and the multi-dimensional assignment (MDA) model. To solve the optimization in the rank-1 tensor approximation, we propose an algorithm that iteratively powers the intermediate solution followed by an L1 tensor normalization. Aside from effectively capturing high-order motion information, the proposed solver runs efficiently with proved convergence. The experimental validations are conducted on two challenging datasets and our method demonstrates promising performances on both of them.
Speaker's BioHaibin Ling received the B.S. degree in mathematics and the MS degree in computer science from Peking University, China, in 1997 and 2000, respectively, and the PhD degree from the University of Maryland, College Park, in Computer Science in 2006. From 2000 to 2001, he was an assistant researcher at Microsoft Research Asia, Beijing, China. From 2006 to 2007, he worked as a postdoctoral scientist at the University of California Los Angeles. After that, he joined Siemens Corporate Research, Princeton, NJ, as a research scientist. Since fall 2008, he has been an Assistant Professor at Temple University. Dr. Ling's research interests include computer vision, medical image analysis, human computer interaction, and machine learning. He received the Best Student Paper Award at the ACM Symposium on User Interface Software and Technology (UIST) in 2003.
Inter-Disciplinary Computing Seminar: Using GIS to Optimize Liver Transplant Locations in the US and Other Health GIS Projects
Wednesday, April 17, 2013
The first part of the talk will discuss the main topics covered by health GIS research. This will be illustrated with case studies from my own research. The second half will describe my work with Naoru Koizumi and others at GMU on optimizing the geography of liver transplantation in the US. Specifically this research discusses geographic disparities in access to and outcomes in transplantation that have been a persistent widely discussed problem among transplant researchers and members of the transplant community. One of the alleged causes of these disparities in the United States is the administratively determined organ allocation boundaries that limit organ sharing across regions. This talk will describe the work of our research team in applying mathematical programming models to construct alternative liver allocation boundaries that achieve more geographic equity in access to transplants than the current system. The performance of the optimal boundaries was evaluated and compared to that of current allocation system using discrete event simulation.
CS Distinguished Lecture Series: Machine Learning Approaches to Network and Social Media
Friday, April 19, 2013
Across the sciences, a fundamental setting for representing and interpreting information about entities, the structure and organization of communities, and changes in these over time, is a stochastic network that is topologically rewiring and semantically evolving over time and space. While there is a rich literature in modeling invariant networks, until recently, little has been done toward modeling the dynamic processes underlying rewiring networks, and on recovering such networks when they are not observable.
In this talk, I will present some recent developments in analyzing what we refer to as the dynamic tomography of evolving networks. I will present new algorithms for estimating the topological structures of latent evolving networks underlying nonstationary time-series of nodal attributes, along with theoretical results on the asymptotic sparsistency of the proposed methods; and I will present a family of new Bayesian model and scalable inference algorithms for estimating the trajectories of latent multi-functionality of nodal states, and for learning community structures, in both static and evolving social and biological networks. Case studies of a breast cancer network, and the Facebook network will be presented to highlight current capability and future challenge.
Speaker's BioDr. Eric Xing is an associate professor in the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for solving problems involving automated learning, reasoning, and decision-making in high-dimensional and dynamic possible worlds; and for building quantitative models and predictive understandings of biological systems. Professor Xing received a Ph.D. in Molecular Biology from Rutgers University, and another Ph.D. in Computer Science from UC Berkeley.
His current work involves, 1) foundations of statistical learning, including theory and algorithms for estimating time/space varying-coefficient models, sparse structured input/output models, and nonparametric Bayesian models; 2) computational and statistical analysis of gene regulation, genetic variation, and disease associations; and 3) application of statistical learning in social networks, data mining, vision. Professor Xing has published over 150 peer-reviewed papers, and is an associate editor of the Journal of the American Statistical Association, Annals of Applied Statistics, the IEEE Transactions of Pattern Analysis and Machine Intelligence, the PLoS Journal of Computational Biology, and an Action Editor of the Machine Learning journal. He is a recipient of the NSF Career Award, the Alfred P. Sloan Research Fellowship in Computer Science, the United States Air Force Young Investigator Award, and the IBM Open Collaborative Research Faculty Award.
Inter-Disciplinary Computing Seminar: Motor Primitives with Spindle-like Properties Predict the Ability to Learn Different Types of Motor Adaptations
Wednesday, April 24, 2013
The mammalian brain generates motor commands to initiate movement. Through interactions with our environment this motor output is adapted in order to reduce the error between the planned and actual movement. One readily learned motor adaptation involves the predictive compensation for changes in the physical dynamics of the environment. These dynamics result in time-varying physical perturbations that are a function of the motion state, such as velocity. Early in compensating for velocity-dependent perturbations, subjects exhibit a force profile with a shape resembling a muscle spindle firing pattern (a transient, velocity-dependent peak followed by a static, position-dependent offset). As adaptation progresses, this force profile transforms into one that is almost entirely velocity-dependent (task goal specific), as the static ‘tail’ fades and the transient peak increases. Analysis of the initial and final force profiles suggests that these high-dimensional force-profiles can be represented as simple combinations of position and velocity signals linearly modified with stiffness and viscosity gains. We therefore constructed a simple neural network model with spindle-like primitives (having positive position and velocity dependence) and a simple gradient descent learning rule to test whether such a model would show primitive-dependent early learning but task-dependent late learning. This model accurately predicts that (1) the direction of initial learning is generally biased towards the center of the primitive distribution (2) the asymptotic learning state mostly reflects the task goal and (3) the adaptation rate for learning goals in the directions parallel or orthogonal to the center of the primitive distribution is enhanced or slowed, respectively, compared to other directions. These results suggest that adaptation to predictable force perturbations takes place in a spindle-like coordinate system that dictates the degree of difficulty for learning different types of perturbations and predicts the evolution of adaptive responses.
Speaker's BioDr. Wilsaan Joiner is an Assistant Professor in the Department of Bioengineering. He received his PhD in Biomedical Engineering from the Johns Hopkins University School of Medicine in 2007. From 2007-2012, he was a postdoctoral fellow at Harvard University and the National Eye Institute (The Laboratory of Sensorimotor Research). Dr. Joiner’s Sensorimotor Integration Laboratory at the Volgenau School of Engineering conducts translational research investigating human sensory integration, motor learning and control using computational and experimental approaches. Ongoing projects include the influence of eye movements and internal monitoring signals in guiding goal-directed movements and the neural processes underlying motor adaptation and memory consolidation.
Oral Defense of Doctoral Dissertation: Towards Power-Efficient Internet Streaming to Mobile Devices
Wednesday, April 24, 2013
Internet streaming services are very popular today. As a fact, video traffic now accounts for more than 51 percent total Internet traffic. With the pervasive adoption of various mobile devices in practice in the past several years, today Internet streaming services are receiving a rapidly growing number of requests from various mobile devices. As a result, more than 50 percent of the data consumed by mobile devices is video streaming traffic. However, streaming delivery to mobile devices is more challenging than to its desktop counterpart.
In this dissertation, we first empirically investigate Internet mobile streaming practices. We investigate mobile streaming from various perspectives, including hardware and software heterogeneity, different characteristics of mobile videos, and different user access patterns. The results provide us in-depth understanding on the current Internet mobile streaming services. A critical constraint on mobile devices for receiving Internet streaming services is that they have limited battery capacities. Among different power-consuming sources, transmission power consumption is very significant: for a mobile device receiving streaming services, about 30 percent to 40 percent of the power is consumed by the WNIC for streaming data transmission. So in order to prolong the battery lifetime, it is important to save the battery power consumed by the WNIC. For this purpose, we design and implement new schemes that can effectively save battery power consumption while maintaining good streaming quality to mobile devices. In particular, we focus on P2P streaming and client-server streaming as they are widely used. We aim to save battery power consumption from two aspects: (1) how the data is received, and (2) how much data is received. Our techniques have been implemented and experimental results show they are effective in reducing the battery power consumption on mobile devices without degrading the streaming quality.
Oral Defense of Doctoral Dissertation: A Cost-Effective Distributed Architecture for Content Delivery and Exchange Over Emerging Wireless Technologies
Thursday, April 25, 2013
Opportunities in education are lacking in many parts of the developed nations and are missing in most parts of the developing nations. This is, in significant part, due to shortages of classroom instructional resources such as quality teaching staff, hardware and software. Distance education (DE) has proved to be a successful teaching approach and overcomes some of the barriers imposed by classroom instruction, primarily due to the shortage of teachers. Many DE software tools have been developed and are in use today. Most require high network capacity, not supported by common long distance wireless network infrastructures in many places of the world. To address obstacles related to network infrastructures of developing countries for content delivery and exchange, this research develops the design of a cost-effective distributed architecture for content delivery and exchange over emerging limited capacity wireless technologies. The design of the proposed target architecture includes an overlay network with distributed peer-to-peer systems. Simulation is used to explore parameters and metrics in order to validate the effectiveness and scalability of this architecture. An n-tier hierarchical training model to train local instructors by experienced instructors, using DE resources supported by this architecture, is discussed as a way to mitigate the teacher shortages in developing countries. The situation in Bangladesh is used to provide examples, based on the author’s familiarity with it.
Oral Defense of Doctoral Dissertation: Data Mining Framework for Metagenome Analysis
Tuesday, April 30, 2013
Advances in biotechnology have dramatically changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments. The process of “metagenomics” involves sequencing of the genetic material of organisms, co-existing within ecosystems ranging from ocean, soil and human body. Researchers are trying to determine the collective microbial community or population of microbes that coexist across different environmental and clinical samples. Several researchers and clinicians have embarked on studying the pathogenic role played by the microbiome (i.e., the collection of microbial organisms within the human body) with respect to human health and disease conditions.
There is a critical need to develop new methods that can analyze metagenomes and correlate heterogeneous microbiome data to clinical metadata. Lack of such methods is an impediment for the identification of the function and presence of microbial organism within different samples, reducing our ability to elucidate the microbial-host interactions and discover novel therapeutics. From another perspective, comparing metagenomes across different ecological samples allows for the characterization of biodiversity across the planet. The goals of this dissertation are to develop novel data mining algorithms that allow for the accurate and efficient analysis of metagenome data obtained from different environments.
Specific contributions have included the development of a suite of clustering algorithms for handling large-scale targeted and whole metagenome sequences. We develop a novel locality sensitive hashing (LSH) based method for clustering metagenome sequence reads. Our method achieves efficiency by approximating the pairwise sequence comparison operations by using a randomized hashing technique. We incorporate this clustering approach within a computational pipeline (LSH-Div) to estimate the species diversity within an ecological sample. We also developed an algorithm called MC-MinH that uses the min-wise hashing approach, along with a greedy clustering algorithm to group 16S and whole metagenome sequences. We represent unequal length sequences using contiguous subsequences or k-mers, and then approximate the computation of pairwise similarity using independent min-wise hashing. Further, MC-MinH was extended as a distributed algorithm implemented within the Map-Reduce based Hadoop platform. The distributed clustering algorithm can perform a greedy iterative clustering as well as an agglomerative hierarchical clustering and can handle large volumes of input sequences.
We also developed a novel sequence composition-based taxonomic classifier using extreme learning machines referred to as TAC-ELM. This algorithm uses the framework of extreme learning machines to quickly and accurately learn the weights for a neural network model. TAC-ELM when combined with BLAST (Basic Local Alignment Search Tool) has shown improved taxonomy classification results.
In order to make these developed computational tools accessible to a broad group of researchers, we have also developed a web-based analysis portal. The portal implements a LIMS database using the open source Drupal content management system to store and retrieve the multi-modal microbiome data. For analysis and development of workflows, we use the Galaxy platform. This provides a web-based user friendly platform for integrating the tools developed to create user-customized pipelines and a batch-based job submission system.
To summarize, this dissertation has contributions in the area of metagenome sequence clustering and classification which can be easily integrated within computational workflows for species diversity estimation and large-scale microbiome analysis.
Oral Defense of Doctoral Dissertation: An Autonomic Framework for Integrating Security and Quality of Service Support in Databases
Thursday, May 02, 2013
The back-end databases of multi-tiered applications are a major data security concern for enterprises. The abundance of these systems and the emergence of new and different threats require multiple and overlapping security mechanisms. Therefore, providing multiple and diverse database intrusion detection and prevention systems (IDPS) is a critical component of the defense-in-depth strategy for DB information systems. At the same time, an e-business application is expected to process requests with a certain service quality to maintain current customers and attract new ones. To meet both objectives it is necessary to use a combination of IDPSs that best meets the security and QoS requirements of the system stakeholders for each workload intensity level. Due to the dynamic variability of the workload intensity, it is not feasible for human beings to continuously reconfigure the system. It is therefore important that current systems be built with adaptive capabilities that can dynamically respond to changes in it is surroundings. This research presents a self-optimizing and self-protecting database system environment that captures dynamic and fine-grained tradeoffs between security and QoS by using a multi-objective utility function. The utility functions considers the performance impact of IDPSs on the overall system under a certain workload, the detection and false detection rates of the IDPSs, and high level stakeholder preferences and constraints. The model was validated in a simulated environment. The feasibility of the approach is also demonstrated in an experimental environment.
Oral Defense of Doctoral Dissertation: Emergency Communications via Handheld Devices
Thursday, May 02, 2013
Ensuring effective communications during emergencies is an important issue for any functional government. One way to address this issue is to ensure the availability of the emergency responders capable of making the appropriate decisions and taking timely actions with sufficient resources. Many XML-based languages such as the Emergency Data Exchange Language (EDXL) and associated Common Alert Protocol (CAP) have been de- signed to provide a basis for such communications.
To ensure that messages are delivered in a timely manner, I propose some role- and task-based ontological enhancements for these languages. I address this availability problem further by proposing a Role-based model Availability Emergency Responder Framework (AERF). This AERF ensures that a list of personnel for a particular role in an organization is always reachable to handle an emergency call. I develop a working prototype of the AERF framework for a local hospital that provides emergency cases. The prototype demonstrates the feasibility and security of the AERF framework and addresses the availability of emergency responders based on their assigned roles.
In order to inform the general public of nearby emergencies, the Department of Homeland Security initiated the Commercial Mobile Alert System (CMAS), which utilized existing commercial telecommunication infrastructures to broadcast emergency alert text messages to all mobile users in an area affected by an emergency. One of the limitations of the Cell Broadcast Service (CBS) is that the smallest area that CMAS can broadcast its message is a cell site, which is, in most cases, quite large for small-scale emergencies.
I propose an enhancement to CMAS by using CMAS as a transport protocol to distribute small-scale emergency alerts to areas that are smaller than a cell site. I also suggest a proper enhancement to the CAP 1.2 message structure for CMAS emergency alerts. Another limitation of CMAS messages is the maximum message size of 90 characters of clear text. I propose an enhancement to CMAS by using a combination of different encoding techniques and emergency protocol standard including the Common Alerting Protocol (CAP 1.2) to provide alert messages with meaningful and rich content. I show the viability of our solution using a prototype implementation that can generate and broadcast CMAS emergency alerts through Emergency Response Alert System (ERAlert) to Android phones where an Emergency Response Application (ERApp) will intercept, decode, and display meaningful alerts to users. Lastly, I propose a Navigation Assistance Framework (NAF) that allows emergency organizations to provide emergency information that can be filtered through the traffic patterns in order to assist victims navigate out of the emergency and reach their intended destinations in a reasonable amount of time. I develop a ERSimMon to simulate this capability in a small scale to show the effectiveness of my solution.
SWE Seminar: GuideArch: Guiding the Exploration of Architectural Solution Space Under Uncertainty
Thursday, May 02, 2013
A system's early architectural decisions impact its properties (e.g., scalability, dependability) as well as stakeholder concerns (e.g., cost, time to delivery). Choices made early on are both difficult and costly to change, and thus it is paramount that the engineer gets them "right". This leads to a paradox, as in early design, the engineer is often forced to make these decisions under uncertainty, i.e., not knowing the precise impact of those decisions on the various concerns. How could the engineer make the "right" choices in such circumstances? This is precisely the question we have tackled in this talk. We present GuideArch, a framework aimed at quantitative exploration of the architectural solution space under uncertainty. It provides techniques founded on fuzzy math that help the engineer with making informed decisions.
Speaker's BioNaeem Esfahani is a Ph.D. candidate in Computer Science Department, Volgenau School of Engineering. He got his B.Sc. degrees on Electrical and Computer Engineering from University of Tehran in 2005. He also received a M.Sc. degree in Computer Engineering from Sharif University of Technology in 2008. His current research mainly focuses on Software Architecture, Self-Adaptive Software Systems, and Software Quality of Service Analysis & Improvement.
Project Presentation for Engineer Degree: A TeC Implementation on the Android Platform
Friday, May 03, 2013
This thesis describes a software system developed based upon the Team Computing (TeC) model. TeC belongs to Ubiquitous computing, which emphasizes the coordination of teams comprised of software components, devices and human operation.
This particular system is implemented on the popular Android platform using Java. It allows the teams designed by end users to be deployed and operating on the Android devices. The system also implemented a decentralized, lightweight protocol for self- healing.
Volgenau School of Engineering: Convocation
Thursday, May 16, 2013
Event InfoIN CASE OF RAIN students should report directly to the Patriot Center floor, via the lower entrance on the "East" side of the Patriot Center (i.e. loading dock door), to be directed to their seats. There will be no procession in the event of rain.
Graduates traditionally assemble at 1:15 p.m. in the Patriot Center parking lot at your department's designated area (indicated by department signs). It is very important that you be at the appropriate location by 1:15 p.m. so we can line up all graduates by degree program for the processional.
Oral Defense of Doctoral Dissertation: An Extensible Framework for Generating Ontology from Various Data Models
Thursday, May 30, 2013
In the Information Technology field, Ontology is concerned with the use of formal representation to describe concepts and relationships in a domain of knowledge. Using ontologies, organizations can facilitate processes such as integrating heterogeneous systems, assessing data quality, validating business rules, and discovering hidden facts. Ontology engineering, however, is not a trivial process. Developing ontologies is highly dependent on the availability and knowledge of ontology modelers and domain experts. Moreover, the development process is often lengthy and error-prone. In this dissertation, I developed an extensible framework for generating ontologies from data models. For this dissertation, the framework is limited to generating ontology from two types of data models: the Relational Database (RDB) and Object-Relational Database (ORDB) models. The framework, however, is extensible to support the generation of ontologies from other types of data models (e.g. XML). The derived ontology is expressed in the OWL Web Ontology Language, a W3C recommendation.
The proposed framework has been validated by implementing it as a prototype, and by examining the ontologies it generates from a syntactic and semantic perspective. For the semantic examination, domain requirements were used to compute the recall and precision for the ontologies generated by my framework and that of a similar tool. Moreover, the relative amount of terminological content (which I call the relative explicitness) of these ontologies was measured as well using a methodology that I developed in my research. The results showed the ability of my framework to generate ontologies that are closely aligned with the domain.
SWE Seminar: Adaptive Program Repair via Program Equivalence: A Duality With Mutation Testing
Wednesday, June 05, 2013
Software bugs remain a compelling problem. Automated program repair is a promising approach for reducing cost, and many methods have recently demonstrated positive results. However, success on any particular bug is variable, as is the cost to find a repair.
This talk focuses on generate-and-validate repair methods that enumerate candidate repairs and use test cases to define correct behavior. We formalize repair cost in terms of test executions, which dominate most test-based repair algorithms. Insights from this model lead to a novel deterministic repair algorithm that computes a patch quotient space with respect to an approximate semantic equivalence relation. Generate-and-validate program repair is shown to be a dual of mutation testing, directly suggesting possible cross-fertilization. Evaluating on 105 real-world bugs in programs totaling 5MLOC and involving 10,000 tests, our new algorithm requires an order-of-magnitude fewer test evaluations than the previous state-of-the-art and is over five times more efficient monetarily. This talk presents work that is currently under submission.
Speaker's BioWestley Weimer is an Associate Professor of Computer Science at the University of Virginia. His main research interest lies in advancing software quality by using both static and dynamic programming language approaches. http://www.cs.virginia.edu/~weimer/
Oral Defense of Doctoral Dissertation: Scheduling Algorithms Optimizing Throughput and Energy for Networked Systems
Wedensday, June 26, 2013
Scheduling problems consider allocating limited resources under constraints among competing requests in order to fulfill their obligations. Practical resource management algorithms with provable performance guarantees are of great importance. In this dissertation, we study scheduling algorithms for resource management in networked systems. Mainly, we design, analyze, and implement two types of scheduling algorithms: (1)throughput-aware scheduling algorithms, and(2)energy-aware scheduling algorithms.
Throughput is a main metric that scheduling algorithms are designed to optimize. We study algorithms for network routers to schedule weighted packets with time constraints over a wireless fading channel. We design both offline and online algorithms to maximize weighted throughput, which is defined as the total value of the packets successfully sent before their respective deadlines.
Energy consumption has become an important performance metric in designing scheduling algorithms for computing systems and networked systems. For network routers, we first study a problem of scheduling jobs with values and deadlines to maximize net profit, which is defined as the difference between the revenue obtained from the jobs sent before their respective deadlines and the cost of total energy consumption during this course. Another model for network routers that we study is the trade-off between energy consumption and jobs’ flow time or stretch in an online setting. We design bi-criteria power-down strategies optimizing both and analyze their performance using competitive ratio. For Point-of-Presence (PoP) design, current IP core networks operate at a nearly constant power rate independent of the traffic load. Thus, the gap between the available network capacity and the temporal traffic demand presents opportunities for reducing network power consumption by deactivating network components without noticeably affecting network performance. We study a theoretical model for PoP design. The objective is to find out an assignment between traffic links and chassis within a PoP, such that the total energy cost of the PoP is minimized. We analyze the hardness of this model and design several approximation algorithms with provable near-optimal performance. For a given processor, each speed change involves time and energy overhead, as well as a negative impact on its lifetime reliability. Motivated by this observation, we study theoretical energy-aware scheduling problems by considering the number and the cost of processor’s speed changes. Four related problems based on this framework are studied.
Oral Defense of Doctoral Dissertation: Money Laundering Evolution Detection, Transaction Scoring, and Prevention Framework
Wednesday, July 03, 2013
Money laundering is a major and ongoing global issue that has not been addressed with a dynamic approach by authorities using multiple systems. Made powerful by modern tools and resources available to them, money launderers are adopting more sophisticated schemes, spanning across many countries, to avoid being detected by anti-money laundering systems. Consequently, money laundering detection and prevention techniques must be multi-layered, multi-method, and multi-component to be ahead of the evolving laundering schemes. Handling such a multifaceted problem involves a large amount of unstructured, semi-structured and transactional data that stream at speeds requiring a high level of analytical processing to discover unraveling business-complexities, and discover deliberately concealed relationships. Therefore, I developed the money laundering evolution detection framework (MLEDF) to capture the trail of the dynamic and evolving schemes. My framework uses sequence matching, case-based analysis, social network analysis, and more importantly, complex event processing to link the fraud trails. My system capture a single scheme as an event in a trail in “real-time”, and then using detection algorithms, associate the captured event with other ongoing events.
A comprehensive Anti Money Laundering system must incorporate a risk modeling that calculates the dynamic attributes of transactional relationships and the potential social relationships among seemingly unrelated entities from a financial perspective. Therefore, I developed an industry-wide system to assign a risk score for any transaction being a part of a larger money laundering scheme. This score should be valid across every financial domain, continuously updated, and it is not specific to the evaluating financial institution.
Additionally, I developed a transaction scoring exchange and money laundering prevention framework that uses a transaction messaging system and assigns scores to the transactions, where the score is derived from the dynamics risk of the transaction and the statically computed risk score. The transaction score is correlated to the static and dynamic risk scores, in order to identify transactions score pertaining to money laundering, and to prevent transaction sequences from being executed. The transaction score uses, dynamic risk score obtained from the analytics of results of the real-time detection algorithms, to produce valid results.
The recommended money laundering prevention system relies upon the finding of an accurate detection system, supported by dynamic risk modeling systems for transaction scoring. My prevention framework includes a protocol to exchange the information among the framework participants, and it incorporates two levels of cooperation and information sharing.
The developed three level systems in this study consist of multi-levels and multi-components, and they can be easily incorporated within existing structure financial institutions. My system allows financial investigators to overcome the long processes and time-consuming characteristics of their investigations, to prevent money laundering schemes, or at least be aware of such schemes in their early stages.
I validated the accuracy of calculating the money laundering evolution detection framework, dynamic risk scoring, and transactions scoring framework using a multi-phase test methodology. My test used data generated from real-life cases, and extrapolated to generate more varying scenarios of money laundering evolution and risk data from real-life schemes and patterns generator that I implemented.
Oral Defense of Doctoral Dissertation: On Using Meta-Modeling and Multi-Modeling to Address Complex Problems
Wednesday, July 03, 2013
Models, created using different modeling techniques, usually serve different purposes and provide unique insights. While each modeling technique might be capable of answering specific questions, complex problems require multiple models interoperating to complement/supplement each other; we call this Multi-Modeling. To address the syntactic and semantic challenges of this multi-modeling approach for solving complex problems, a systematic methodology for developing multi-modeling workflows is presented. The approach is domain specific: Identification of the domain and the supporting modeling techniques is the first step. Then a Domain Specific Multi-Modeling Workflow Language (DSMWL), supported by a Domain Ontology, is developed and then used to construct workflows that capture interoperations between various models. The domain ontology provides semantic guidance to effect valid model interoperation.
The approach is illustrated using a case study from the Drug Interdiction and Intelligence domain. The Joint Inter-Agency Task Force (JIATF) - South, an agency well known for interagency cooperation and intelligence fusion, receives large amounts of disparate data regarding drug smuggling efforts. Analysis of such data using various modeling techniques is essential in identifying best Courses of Action (COAs). The proposed methodology is applied to the Drug Interdiction domain by performing domain analysis, developing a Domain Specific Multi-Modeling Workflow Language (DSMWL) and a Domain Ontology, and then using the DSMWL and the Domain Ontology to create workflows of model interoperations involving Social Networks, Timed Influence Nets, and Geospatial models.
Oral Defense of Doctoral Dissertation: Evolving Local Minima in the Protein Energy Surface
Wednesday, July 24, 2013
Proteins are the molecular tools of the living cell and the path to unraveling their function is through modeling and understanding their structure. Many diseases occur when a protein loses its intended function due to its inability to form the appropriate structure with which it binds to other molecules in the cell. A holistic approach to protein modeling would be to characterize all possible structural states accessible by a protein under native, physiologic conditions. However, this task is infeasible. The question then becomes, how can we model the subset of these structural states most relevant to the function or disfunction of a protein?
This thesis proposes a novel computational framework to obtain an expansive view of the protein conformational space relevant for function while controlling computational cost. The framework complements experimental and high-resolution computational methods which limit their focus to a single region of the conformational space. The framework employs the knowledge that functionally-relevant conformations are those low in energy and the framework incorporates the latest understanding of protein structure and energy from biophysics. Specifically, this thesis proposes a novel stochastic search framework for exploring a diverse ensemble of conformations which capture low-energy basins in the protein energy surface.
The proposed search framework employs a hybrid or memetic approach for explicit sampling of local minima in the protein energy surface. This hybrid search framework combines a global evolutionary search approach with a local search component to take advantage of the latest advances from the computational biology community. Specifically, the following questions are addressed to effectively model the protein conformational space: (1) How to balance limited computational resources between exploration of the conformational space in global search with exploitation of local minima in local search? The hybrid search framework combines a global evolutionary search to explore the breadth of the conformational space with a local search for efficiently exploiting local minima in the underlying energy surface. (2) How to sample new conformations at the global level? Two complementary approaches are investigated. One approach proposes an enhanced fragment selection method for sampling a new conformation based on an existing structure. The other approach employs a genetic algorithm to combine features from multiple existing structures to sample a new conformation. (3) How to employ energy to better discriminate between interesting conformations and noise in the conformational search space? A multi-objective decomposition of the energy function is employed to guide the search towards more biologically relevant, low-energy conformations by focusing on the energy terms with the most discriminatory power.
Work in this thesis shows that, by combining advanced algorithmic components with the latest understanding of protein biophysics, the proposed search framework is able to more effectively model functionally-relevant conformational states. A direct comparison between the proposed framework and a state-of-the-art coarsegrained sampling algorithm shows that the enhanced sampling strategies lead to a more comprehensive picture of the underlying protein energy surface. By taking this more comprehensive view, the framework is able to capture the protein native state as well as or better than methods relying primarily on protein-specific sampling strategies.
Oral Defense of Doctoral Dissertation: Latent Variable Models of Sequence Data for Classification and Discovery
Tuesday, August 20, 2013
The need to operate on sequence data is prevalent across a range of real world applications including protein/DNA classification, speech recognition, intrusion detection, and text classification. Sequence data can be distinguished from the more-typical vector representation in that the length of sequences within a dataset can vary and that the order of symbols within a sequence carries meaning. Although it has become increasingly easy to collect large amounts of sequence data, our ability to infer useful information from these sequences has not kept pace. For instance, in the domain of biological sequences, experimentally determining the order of amino acids in a protein is far easier than determining the protein's physical structure or its role within a living organism. This asymmetry holds over a number of sequence data domains, and, as a result, researchers increasingly rely on computational techniques to infer properties of sequences that are either difficult or costly to collect through direct measurement.
This work explores a number of latent variable models over sequence data. These models were designed to produce alternate representations of sequences that distill relevant information, making them both easier to process with traditional machine-learning tools and potentially improving on benchmarks over standard inference tasks such as classification and motif finding.
In this presentation, I will discuss two latent variable models that incorporate structure from the Profile Hidden Markov Model (HMM), a model commonly used to represent biological sequences. These methods both simplify and enrich the mechanisms by which standard Profile HMMs operate.
The first model relaxes the discrete Profile HMM hidden state space to a continuous one. Placing a regularizer that encourages sparsity on this new continuous space produces a new model that shares many characteristics with a set of successful techniques known as Sparse Dictionary Learning. This relaxation is the basis of our Relevant Subsequence Sparse Dictionary Learning (RS-DL) model. Applied to continuous sequences, RS-DL is effective at extracting human-recognizable motifs. In addition, subsequences extracted using RS-DL can improve on classification performance over standard nearest neighbor and dynamic time warping techniques.
The next model I discuss involves incorporating Profile HMM structure into a family of purely discriminative models. These models, which we call Subsequence Networks, are similar to convolutional neural networks, which have garnered state-of-the-art results in a number of tasks in computer vision. Subsequence Networks compare favorably to state-of-the-art sequence Kernel approaches on protein sequence classification problems while using a significantly different mode of operation.
CS Seminar: Knowledge Discovery for Computational Intelligence Research (KDCIR)
Wednesday, August 28, 2013
Data is crucial for computational modeling to solve complex real world problems. Data acquisition through sensors, encoding/representation as data, extraction of patterns/information/knowledge all are crucial steps in modeling. The evolving field of ‘Big Data’ combines artificial intelligence, signal processing, data representation, data analytics, knowledge discovery from big data for analytics and predictive modeling. It includes representing and analyzing structured and unstructured data sourced from various sources. Distributed computational paradigms such as multi-agents are argued as efficient approaches to distributed knowledge discovery and modeling. The talk will summarize the KDCIR research motivated by problems from optimization, planning, training-performance modeling in elite sports (cycling), obesity/diabetes modeling, renal transplantation, stress/depression modeling and education analytics. Knowledge discovery from brain-computer interaction has also been studied to decode and learn from brain signals to provide a multichannel interaction for human-centered applications. The research and innovation from KDCIR projects will be summarized.
Speaker's BioProfessor Dharmendra Sharma has been the Dean of the Faculty of Information Sciences and Engineering at the University of Canberra from 2004-2012. He has assumed various senior leadership roles in universities for over eighteen years. He has been awarded the position of University Distinguished Professor by the University of Canberra in 2012. Prof Sharma’s research background is in the Artificial Intelligence areas of Planning, Data Analytics and Knowledge Discovery, Predictive Modeling, Constraint Processing, Fuzzy Reasoning, Brain-Computer Interaction, Hybrid Systems and their applications to health, education, security, digital forensics and sports. He has published over 210 research papers and has supervised to completion 23 higher degrees research students. He has received several competitive research awards and recognitions for his research leadership initiatives. He is a Fellow of the Australian Computer Society, a Fellow of the South Pacific Computer Society, and a Senior Member of IEEE. Prof Sharma has served on several industry, academic, and research bodies including government advisory committees. He had completed his PhD from the Australian National University and has been an academic for over 30 years. He may be contacted at Dharmendra.Sharma@canberra.edu.au.
Oral Defense of Doctoral Dissertation: Path Planning in Similar Environments
Friday, September 06, 2013
Path planning aims at navigating a robot from an initial configuration to a goal configuration without violating various constraints. The problem of path planning is theoretically intractable (PSPACE hard), but in everyday life we (as human beings) navigate in our environment without much difficulty. This is partially due to the fact that most objects we encounter today are similar or identical to the objects we encountered yesterday or even years ago.
Environments with similar objects are quite common. For example, desks and chairs in a classroom or in an office may be moved around from one place to another frequently, but unfamiliar items are seldom introduced. A dynamic environment where obstacles are allowed to move can be considered as a continuous sequence of similar static environments due to motion coherence. We term “discrete similar-workspace problem” for static environments and “continuous similar-workspace problem” for dynamic environments. In this thesis, I designed path planners that address both problems. These planners significantly improve not only the efficiency but also robustness over existing planners.
More specifically, I have developed a path planner which exploits similarity across different static environments. This planner can remember and reuse the computation for every obstacle encountered. To address the “continuous similar-workspace problem”, existing methods have explored the temporal coherence (i.e. similarity). However, all these methods repair blindly and periodically at fixed time intervals with little attempt to analyze the similarity across different time instances. This results in either redundant updates or failure to detect invalid edges and nodes. To address these issues, I designed two path planners for dynamic environments which can detect critical events such as the topological changes in configuration space for known environments or predicted collisions amidst obstacles with unknown motion. The experimental results show that our planners which explore similarity across different environments not only provide significant time efficiency, but also improves the chances of finding a solution.
SWE Seminar: Combinatorial Testing-Based Fault Localization
Wednesday, September 25, 2013
Combinatorial testing has been shown to be a very effective testing strategy. After a failure is detected by testing, the next task is fault localization, i.e., how to locate the fault that causes the failure. In this talk, I will discuss a fault localization approach we have developed that leverages the result of combinatorial testing.
Our approach consists of two major steps. At the first step, we identify failure-inducing combinations in a combinatorial test set. A combination is failure-inducing if its existence causes a test to fail. Based on the execution result of a combinatorial test set, we produce a ranking of suspicious combinations in terms of their likelihood to be inducing. At the second step, we create a small group of tests from a given failure-inducing combination. In the group, one test is referred to as the core member, and it produces a failed execution. The other tests are referred to as the derived members which are similar to the core member but produce passed executions. The traces of these test executions are then analyzed to locate the faults.
Experimental results show that our approach is very effective in that only a small number of additional tests are needed to locate the faults. To the best of our knowledge, our approach is the first effort to perform code-based fault localization based on combinatorial testing.
Speaker's BioYu Lei is currently an associate professor of computer science at the University of Texas at Arlington. He was a Member of Technical Staff in Fujistu Network Communications, Inc. from 1998 to 2001. He obtained his PhD degree from North Carolina State University in 2002. His research interests are in the area of automated software analysis, testing and verification, with a special focus on combinatorial testing and concurrency testing. His current research is supported by the US National Institute of Standards and Technology.
SANG Seminar: Hardware in Cybersecurity: from the Weakest Link to Great Promises
Friday, October 04, 2013
It is well-known that hardware implementation can outperform the software implementation of the same application, including security primitives such as encryption, by up to several magnitudes. However, hardware implementation may also make security primitives vulnerable despite their mathematical soundness. In this talk, we will discuss the role of hardware in cybersecurity.
First, we will use the finite state machine (FSM) model to demonstrate that the systems built with today's design flow and tools are vulnerable against a simple random walk attack. We further show that a malicious designer can embed Hardware Trojan Horse (HTH) into the system to gain unauthorized control of the system. We then describe a practical circuit level technique to guarantee the trustworthiness of the circuit implementation of a given FSM.
Second, we describe our recent work on physical unclonable function (PUF), a unique feature embedded in the chip during fabrication process. PUF has many promising applications in security and trust such as device authentication and secret key generation and storage. We will focus on the following questions: how to push the amount of the PUF information we can extract to the theoretical upper bound; how to ensure that the PUF information is random (and thus secure against attacks); how coding, including error correction coding, can impact the hardware efficiency when implementing a PUF.
Finally, we will show very briefly a couple of projects on hardware-software co-design in building security and trust to demonstrate the great promise that hardware can bring to cybersecurity.
Speaker's BioDr. Gang Qu is an Associate Professor in the Department of Electrical and Computer Engineering and the Institute for Systems Research at the University of Maryland, College Park. He is the co-director of the embedded system research lab and the wireless sensor lab. His current research interests are on VLSI design automation and wireless sensor networks, with special focus on security and energy efficiency. He has published more than 100 journal articles and conference papers in these fields with best paper awards in MobiCom (2001) and ASAP (2006). Dr. Qu is currently on the editorial boards of IEEE Transactions on Computers, IEEE Embedded Systems Letters, and Integration, the VLSI Journal.
VSE Seminar: Machine Learning Approaches for Annotating Biological Data.
Wednesday, October 09, 2013
Biological systems are complex and not completely understood. New generation of high-throughput (“Big Data”) technologies capture large volumes of complex, multi-modal data associated with these systems. Scientific discovery and advancement requires extracting useful information from these datasets, which presents unique and challenging computing problems. Complexity within biological data arises due to heterogeneity, incompleteness, missing information, noisy nature and inter-dependencies between the input and output domains.
In this talk, I will provide an overview of my contributions related to the development of accurate and efficient mining approaches for annotating these biological datasets. I will present a multi-task learning approach that seeks to leverage the hierarchical structure present within multiple biological archives for classification. I will also describe an approach for modeling of sequential data. I will provide a highlight of how these developed approaches are integrated within computational pipelines to solve biological problems as they relate to the fields of metagenomics (or community genomics), protein function prediction and drug discovery.
Speaker's BioHuzefa Rangwala is an Assistant Professor at the department of Computer Science & Engineering, George Mason University. He holds an affiliate appointment with the Bioengineering Department and the School of Systems Biology, George Mason University. He received his Ph.D. in Computer Science from the University of Minnesota in the year 2008. His research interests include machine learning, bioinformatics and high performance computing. He is the recipient of the NSF Early Faculty Career Award in 2013, the 2013 Volgenau Outstanding Teaching Faculty Award, 2012 Computer Science Department Outstanding Teaching Faculty Award and 2011 Computer Science Department Outstanding Junior Researcher Award. He is Mason's 2014 SCHEV Outstanding Faculty Award, Rising Star nominee. His research is funded by NSF, NIH, DARPA, USDA and nVidia Corporation.
CS Interdisciplinary Seminar: Harvesting Geospatial Intelligence from Social Media Feeds
Wednesday, October 16, 2013
The remarkable success of online social media sites marks a shift in the way people connect and share information. Much of this information now contains some form of geographical content due to the proliferation of location-aware devices, thus fostering the emergence of geosocial media - a new type of user-generated geospatial information. Through geosocial media we are able, for the first time, to observe human activities in scales and resolutions that were so far unavailable. Furthermore, the wide spectrum of social media data and service types provides a multitude of perspectives on real-world activities and happenings, thus opening new frontiers in geosocial knowledge discovery. However, gleaning knowledge from geosocial media is a challenging task, as they tend to be unstructured and thematically diverse. In this presentation I will present work performed by Mason's geosocial team (http://geosocial.gmu.edu) on harvesting and analyzing information from such content, offering representative application examples and an overview of our system prototype.
VSE Seminar: Advancing Biomolecular Modeling and Simulation: A Probabilistic Approach for Characterizing Complex Systems in the Presence of Constraints
Thursday, October 17, 2013
A fundamental issue in our understanding of biology and treatment of disease concerns elucidating the underlying determinants of biological function in biomolecules. The main biomolecules at the center of most chemical reactions in the living and diseased cell, DNA, RNA, and proteins, are complex modular systems composed of many heterogeneous and often highly-coupled building blocks operating under physics-based constraints. In DNA and RNA, building blocks at the sequence level combine in non-trivial ways to give rise to complex functional signals. Modeling proteins adds additional algorithmic challenges due to the need for spatial reasoning to capture the dynamic nature of these systems as they flex their structures to modulate function. Yet, modeling is a central tool in understanding the molecular basis of many proteinopathies, such as cancer and neurodegenerative disorders. In this talk, I will provide an overview of algorithmic challenges and our contributions in physical algorithms for modeling states and state transitions in complex systems in the presence of constraints. The main focus of the talk will be on the novel probabilistic approach we have proposed for structure and motion computation. The underpinnings of this approach are in sampling-based robot motion planning and evolutionary computation to efficiently search high-dimensional configuration (solution) spaces with non-linear cost (objective) functions. The approach makes several innovations, including (1) novel use of connectivity and embeddings of the search space for an adaptive search of high exploration capability, (2) exploitation of the interplay between global and local search for better coverage, and (3) incorporation of multi-objective optimization to attenuate reliance on noisy cost functions. Comprehensive evaluations show that this approach, when informed by a representation of molecular geometry grounded in biophysics, is highly effective. The talk will conclude with selected applications demonstrating the ability of this work to formulate hypotheses and guide biological research, followed by an outline of my future research agenda.
Speaker's BioAmarda Shehu is an Assistant Professor in the Department of Computer Science with affiliated appointments in the Department of Bioengineering and the School of Systems Biology. Shehu received her Ph.D. in Computer Science from Rice University in Houston, TX, where she was an NIH fellow of the Nanobiology Training Program of the Gulf Coast Consortia. Shehu's general research interests are in the field of Artificial Intelligence. Her research contributions to date are in computational structural biology, biophysics, and bioinformatics with a focus on issues concerning the relationship between sequence, structure, dynamics, and function in biological molecules. Shehu's research is currently supported by the NSF, the Jeffress Trust Program in Interdisciplinary Research, and the Virginia Youth Tobacco Program. Shehu is also the recent recipient of an NSF CAREER award. She is a member of ACM, IEEE, Biophysical Society, International Society for Computational Biology, the American Chemical Society, and the Council on Undergraduate Research. Research and educational materials resulting from Shehu's work, including images, videos, publications, and software, can be found at: http://cs.gmu.edu/~ashehu.
SANG Seminar: There's Always Room for Improvement: Dissecting Bad Codes with Chatter and AVMeter
Friday, October 18, 2013
Malware classification and family identification are not new problems. However, the rapid evolution of the malware attack/defense ecosystem has enabled much more fruitful research. In this talk, our contributions to the domain will be summarized by presenting two systems that we at Verisign have named Chatter and AVMeter.
Chatter is a system for representing and leveraging the sequence of events in a malware execution. Whereas calculating and exposing low-level feature values might have ill scalability or gamesmanship effects, Chatter tersely and efficiently captures execution patterns. By creating an alphabet/language to represent runtime behavior, techniques from n-gram processing are used to train a binary classifier that is capable of distinguishing different malware samples with high accuracy.
AVMeter is a system for evaluating the performance of antivirus scans and labels. Researchers rely heavily on these outputs in establishing ground-truth for their methods and companies use them to guide mitigation and disinfection efforts. However, there is a lack of research that validates the performance of these antivirus vendors. Utilizing malware samples that have been manually labeled by expert analysts, we reveal dramatic errors in the correctness, coverage, and consistency of current antivirus offerings. We invite the community to challenge assumptions about relying on AV scans and labels as a ground truth for malware analysis and classification.
Speaker's BioAziz Mohaisen is a research scientist at VeriSign Labs. His research interests are broadly focused on the security, privacy, measurement, and analysis of complex and emerging network systems. His recent work has emphasized data-driven security and its applications in malware analysis, network routing, information sharing, and Internet-scale reputation. He obtained his Ph.D. in computer science from the University of Minnesota in 2012 where he wrote his dissertation on trustworthy social computing systems.
CS Distinguished Lecture Series: Understanding Global Climate Change: A Data Driven Approach
Wednesday, October 23, 2013
Climate change is the defining environmental challenge facing our planet, yet there is considerable uncertainty regarding the social and environmental impact due to the limited capabilities of existing physics-based models of the Earth system. This talk will present an overview of research being done in a large interdisciplinary project on the development of novel data driven approaches that take advantage of the wealth of climate and ecosystem data now available from satellite and ground-based sensors, the observational record for atmospheric, oceanic, and terrestrial processes, and physics-based climate model simulations. These information-rich datasets offer huge potential for monitoring, understanding, and predicting the behavior of the Earth's ecosystem and for advancing the science of climate change. This talk will discuss some of the challenges in analyzing such data sets and our early research results.
Speaker's BioVipin Kumar is currently William Norris Professor and Head of Computer Science and Engineering at the University of Minnesota. His research interests include High Performance computing and data mining, and he is currently leading an NSF Expedition project on understanding climate change using data driven approaches. He has authored over 250 research articles, and co-edited or coauthored 10 books including the widely used text book ``Introduction to Parallel Computing", and "Introduction to Data Mining" both published by Addison-Wesley. Kumar co-founded SIAM International Conference on Data Mining and served as a founding co-editor-in-chief of Journal of Statistical Analysis and Data Mining (an official journal of the American Statistical Association). Kumar is a Fellow of the AAAS, ACM and IEEE. He received the 2009 Distinguished Alumnus Award from the Computer Science Department, University of Maryland College Park, and 2005 IEEE Computer Society's Technical Achievement Award for his contributions to the design and analysis of parallel algorithms, graph-partitioning, and data mining. Kumar's foundational research in data mining and its applications to scientific data was honored by the ACM SIGKDD 2012 Innovation Award, which is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD).
SWE Seminar: Architectural Decay in Software Systems: Symptoms, Causes, and Remedies
Tuesday, October 29, 2013
Engineers frequently neglect to carefully consider the impact of their changes to a software system. As a result, the software system's architectural design eventually deviates from the original designers' intent and degrades through unplanned introduction of new and/or invalidation of existing design decisions. Architectural decay increases the cost of making new modifications and decreases a system's reliability, until engineers are no longer able to effectively evolve the system. At that point, the system's actual architecture may have to be recovered from the implementation artifacts, but this is a time-consuming and error-prone process, and leaves critical issues unresolved: the problems caused by architectural decay will likely be obfuscated by the system's many elements and their interrelationships, thus risking further decay. In this talk, I will focus on pinpointing locations in a software system's architecture that reflect architectural decay. I will discuss the reasons why that decay occurs.
Specifically, I will present an emerging catalog of commonly occurring symptoms of decay -- architectural "smells". I will illustrate the occurrence of smells identified in the process of recovering the architectures of several real-world systems. Finally, I will provide a comparative analysis of a number of automated techniques that aim to recover a system's architectural design from the system's implementation.
Speaker's BioJoshua Garcia is a Ph.D. candidate in Computer Science at the University of Southern California (USC). He is a member of the Software Architecture Research Group at the Center for Systems and Software Engineering at USC. His research interests include architectural decay, software-architecture recovery, distributed event-based systems, and self-adaptive systems. He has worked for a variety of organizations including the NASA Jet Propulsion Laboratory, the Southern California Earthquake Center at USC, and Xerox Special Information Systems.
SANG Seminar: Wireless Security and Privacy: From Crypto-based Protection to Cross-Layer Security Enhancement
Friday, November 01, 2013
The past decade has witnessed an explosive deployment of wireless technologies. The vast expansion of connectivity by wireless networks combined with the rapid evolution of highly-programmable mobile devices, have had strong impacts on our life. Security has arisen as a major concern in wireless networks. Many crypto-based solutions have been developed to provide the first line of defense, ranging from fundamental security services such as authentication and privacy, to the protection of the infrastructure and the various network components. Recently, cross-layer security enhancement has emerged as a promising approach to the unique challenge facing mobile wireless networks, i.e. mobile devices may be compromised (stolen, reverse engineered, or forged). The idea is to extract unique and unforgeable information from wireless communications or mobile devices and supply such information to other layers for enhanced security. In this talk, we will discuss this transition and several promising research directions.
Speaker's BioWenjing Lou received her Ph.D. in Electrical and Computer Engineering from University of Florida. She is currently an associate professor in the Computer Science department at Virginia Polytechnic Institute and State University, and a co-director of the Complex Networks and Security Research (CNSR) lab. Prior to joining Virginia Tech in 2011, she was on the faculty of Worcester Polytechnic Institute from 2003 to 2011.
Her current research interests include wireless network security and data security and privacy in cloud computing. She is currently serving on the editorial boards of IEEE Transactions on Parallel and Distributed Systems and IEEE Wireless Communications Letters. She has served as TPC co-chair for the security symposium of several IEEE conferences, including IEEE Globecom 2007, IEEE ICCCN 2009, IEEE ICC 2010, IEEE PIMRC 2011, and IEEE Globecom 2012. She is the mini-conference co-chair for IEEE INFOCOM 2013, and the panel co-chair for IEEE INFOCOM 2014. She serves as TPC Area Chair for IEEE INFOCOM 2013 & 2014, IEEE ICNP 2013, and IEEE SECON 2014. She is the lead co-founder and TPC co-chair for the First IEEE Conference on Communications and Network Security (IEEE CNS), held in Washington DC in October 2013.
GRAND Seminar: From AND/OR Search to AND/OR Sampling
Monday, November 11, 2013
Sampling is one of the main approaches for approximate reasoning in graphical models. In this work we show that while sampling can be structure-blind, exploiting the graph-structure can reduce sampling variance and hence speed-up convergence. Specifically, I will show how “AND/OR Importance sampling” can exploit problem decomposition, yielding significantly improved estimators. Moreover, combining AND/OR sampling with cutset-sampling to reduce the effective dimensionality yields further variance reduction. Extensive empirical evaluation demonstrates the power of the new estimators, often showing an order of magnitude improvements over previous schemes. In particular, these schemes were part of a solver that won first place in the recent UAI 2010 competition and first place in the Pascal 2011 approximate reasoning challenge.
Speaker's BioRina Dechter is a professor of Computer Science at the University of California, Irvine. She received her PhD in Computer Science at UCLA in 1985, an MS degree in Applied Mathematics from the Weizmann Institute and a B.S in Mathematics and Statistics from the Hebrew University, Jerusalem. Her research centers on computational aspects of automated reasoning and knowledge representation including search, constraint processing and probabilistic reasoning.
Professor Dechter is an author of Constraint Processing published by Morgan Kaufmann, 2003, has authored over 150 research papers, and has served on the editorial boards of: Artificial Intelligence, the Constraint Journal, Journal of Artificial Intelligence Research (JAIR) and Journal of Machine Learning Research (JMLR). She was awarded the Presidential Young Investigator Award in 1991, is a fellow of the American Association of Artificial Intelligence since 1994, was a Radcliffe Fellow 2005-2006 and received the 2007 Association of Constraint Programming (ACP) Research Excellence Award. She has been Co-Editor-in-Chief of Artificial Intelligence, since 2011.
SANG Seminar: Defeat Information Leakage from Browser Extensions Via Data Obfuscation
Friday, November 15, 2013
Today web browsers have become the de facto platform for Internet users. This makes browsers the target of a lot of attacks. With the security considerations from the very beginning, Chrome offers more protection against exploits via benign-but-buggy extensions. However, more and more attacks have been launched via malicious extensions while there is no effective solution to defeat such malicious extensions. As user's sensitive information is often the target of such attacks, in this talk, I will discuss how to proactively defeat information leakage with our iObfus framework. A proto-type has been implemented and iObfus works seamlessly with the Chromium 25. Evaluation against malicious extensions shows the effectiveness of iObfus, while it only introduces trivial overhead to benign extensions.
Speaker's BioWentao Chang is currently a PhD candidate in Computer Science at George Mason University. He received his B.S. degree (Hons.) in Computer Science and Technology from Nanjing University, China in 2006, M.S. degree in Computer Science from George Mason University in 2010. His research interests include web browsing security, malware analysis and mobile security.
CS Seminar: Guiding Evolutionary Learning by Behavioral Evaluation
Monday, November 18, 2013
An intelligent agent can display behavior that is not directly related to the task it learns. Depending on the adopted AI framework and task formulation, such behavior is sometimes attributed to environment exploration, or ignored as irrelevant, or even penalized as undesired. We postulate here that virtually every interaction of an agent with its learning environment can result in outcomes that carry information which can be potentially exploited to solve the task. To support this claim, we present Pattern Guided Evolutionary Algorithm (PANGEA), an extension of genetic programming (GP), a genre of evolutionary computation that aims at synthesizing programs that display the desired input-output behavior. PANGEA uses machine learning to search for regularities in intermediate outcomes of program execution (which are ignored in standard GP), more specifically for relationships between these outcomes and the desired program output. The information elicited in this way is used to guide the evolutionary learning process by appropriately adjusting program fitness. An experiment conducted on a suite of benchmarks demonstrates that this architecture makes agent learning more effective than in conventional GP. In the paper, we discuss the possible generalizations and extensions of this architecture and its relationships with other contemporary paradigms like novelty search and deep learning. In conclusion, we extrapolate PANGEA to postulate a dynamic and behavioral learning framework for intelligent agents.
Speaker's BioKrzysztof (Chris) Krawiec is an associate professor in the Institute of Computing Science, Poznan University of Technology, Poland, and currently Fulbright Visiting Scholar at Computer Science and Artificial Intelligence Laboratory at MIT. His publications in evolutionary computation concern genetic programming, semantic genetetic programming, coevolution and modularity, as well as applications in image analysis, medical decision support, games, and climate science. He cochaired European Conference on Genetic Programming (EuroGP) in 2013 and 2014, and is a member of the Editorial Board of Genetic Programming and Evolvable Machines journal, and the president of Polish Chapter of IEEE Computational Intelligence Society.
CS Distinguished Lecture Series: Designing Software Systems that Comply with Privacy and Security Regulations
Wednesday, November 20, 2013
Properly protecting information is in all our best interests, but it is a complex undertaking. The fact that regulation is often written by non-technologists, introduces additional challenges and obstacles. Moreover, those who design systems that collect, store, and maintain sensitive information have an obligation to design systems holistically within this broader context of regulatory and legal compliance.
There are questions that should be asked when developing new requirements for information systems. For example ..... How do we build systems to handle data that must be kept secure and private when relevant regulations tie your hands? When building a system that maintains health or financial records for a large number of people, what do we need to do to protect the information against theft and abuse, keep the information private, AND at the same time, satisfy all governing privacy/security laws and restrictions? Moreover, how do we know that we've satisfied those laws? How do we monitor for compliance while ensuring that we're monitoring the right things? And, how do you accomplish all this in a way that can be expressed clearly to end-users and legislators (or auditors) so they can be confident you are doing the right things?
We've been working on technologies to make these tasks simpler, and in some senses, automatic. In this talk, I will describe some of the research that we have been conducting to address these problems.
BioDr. Annie I. Antón is a Professor in and Chair of the School of Interactive Computing at the Georgia Institute of Technology in Atlanta. She has served the national defense and intelligence communities in a number of roles since being selected for the IDA/DARPA Defense Science Study Group in 2005-2006. Her current research focuses on the specification of complete, correct behavior of software systems that must comply with federal privacy and security regulations. She is founder and director of ThePrivacyPlace.org. Antón currently serves on various boards, including: the U.S. DHS Data Privacy and Integrity Advisory Committee, an Intel Corporation Advisory Board, and the Future of Privacy Forum Advisory Board. She is a former member of the CRA Board of Directors, the NSF Computer & Information Science & Engineering Directorate Advisory Council, the Distinguished External Advisory Board for the TRUST Research Center at U.C. Berkeley, the DARPA ISAT Study Group, the USACM Public Council, the Advisory Board for the Electronic Privacy Information Center in Washington, DC, the Georgia Tech Alumni Association Board of Trustees, the Microsoft Research University Relations Faculty Advisory Board, the CRA-W, and the Georgia Tech Advisory Board (GTAB). Prior to joining the faculty at Georgia Tech, she was a Professor of Computer Science in the College of Engineering at the North Carolina State University. Antón is a three-time graduate of the College of Computing at the Georgia Institute of Technology, receiving a Ph.D. in 1997 with a minor in Management & Public Policy, an M.S. in 1992, and a B.S. in 1990 with a minor in Technical and Business Communication.
Oral Defense of Doctoral Dissertation: Spam, Phishing, and Fraud Detection Using Random Projections, Adversarial Learning, and Semi-Supervised Learning
Friday, November 22, 2013
Spam, phishing, and fraud detection are security applications that impact most people. Challenges for building spam, phishing, and fraud detection models include difficulty in obtaining annotated data, increased computational complexity for robust detection methods, annotation errors, and changes in the underlying data distribution. We address the above challenges as follows. Clustering and active learning are combined to make efficient use of annotated data, yielding state of the art spam detection performance using only 10 percent of the annotated data employed by previously published methods. Social Network Analysis (SNA) reputation features for mail transfer agents are introduced to evaluate paths from sender to receiver, increasing the detection rate by 70 percent (with the same false positive rate) for state of the art spam detection. Random projections with boosting achieve state of the art spam detection with a 75 percent reduction in computational cost for message classification. The Randomized Hough Transform - Support Vector Machine updates training set annotations, increasing the (precision, recall) F measure by 9.3 percent compared to a state of the art method for handling adversarial noise. Spectral clustering of URL n-grams and transductive semi-supervised learning are used to increase the detection rate by 100 percent (doubling the detection rate with the same false positive rate) for state of the art phishing detection under adversarial modification of message text. Reputation and similarity features are used to enhance the ability to withstand changes in underlying data distributions, producing a 13.5 percent increase in cost savings for state of the art fraud detection. Future research possibilities include the application of these methods to identify deception in social media channels.
Oral Defense of Doctoral Dissertation: Mitigating Denial-of-Service Attacks in Contested Network Environments
Monday, January 06, 2014
In an increasingly connected world, ensuring availability of the services and applications carried by computer networks is critical. As the most far-reaching computer network, the Internet is the home to millions of services that have profound influences on people’s life and work. Additionally, mobile ad hoc networks (MANETs), often used to support critical military and civilian projects, rely on the availability of all participating nodes to fulfill their missions. However, the availability of these networks and services are constantly threatened by denial-of-service (DoS) attacks with growing intensity and sophistication. In this thesis, we study different DoS combating mechanisms to protect services and hosts in the contested Internet and MANET environments.
To mitigate distributed denial-of-service (DDoS) attacks bombarded by powerful botnets on the Internet, we propose a moving target mechanism that progressively separates benign clients from the mingled attackers. This is achieved by endowing mobility to the defense system to evade naive attackers, and smartly shuffling clients to quarantine advanced attackers. We present two mobile defense architectures tailored for different threat and application models.
One architecture, named MOTAG, is built on secret moving network proxies that act as the intermediate layer between authenticated clients and the protected services. By only disclosing the active proxies to the authenticated clients and quickly replacing the attacked ones, this intermediate layer becomes a moving target that continues to escape from network flooding attacks.
Another architecture, enabled by the resource elasticity and network space of cloud computing, replicates open web servers to partition incoming clients’ workload. By dynamically instantiating new replica servers scattered in the cloud and re-shuffling clients’ assignments, we are able to quarantine flooding attacks targeting both network and computational resources. Under both architectures, advanced attackers may follow the moving targets to persist their attacks. To isolate the following attackers from benign clients, we perform elaborate shuffling operations that intelligently redistribute clients among different proxies or replica servers. For guiding the shuffling operations, we design an optimal dynamic programming algorithm that expectedly saves the maximum number of benign clients from each shuffle. We also introduce a much faster greedy algorithm that can generate near-optimal shuffling plans in realtime. Furthermore, a maximum-likelihood algorithm is employed to accurately estimate the number of following attackers. Results from extensive simulations show that our defense mechanism can save a vast majority of benign clients from persistent and intense DDoS attacks in a few rounds of shuffling. Experiments that study the overhead of the shuffling operation demonstrate that clients can be re-assigned among different proxies or replica servers in several seconds.
In addition, this thesis introduces a capability-based mechanism that inhibits MANET DoS attacks in the context of multi-path routing. Existing solutions are limited because they assume a single path is used to route traffic for each flow. To prevent attacks that multiply their throughput by employing multiple different routes, we present CapMan, an enhanced capability-based mechanism that enforces per-flow limits across all employed routes. To ensure overall capability compliance, CapMan not only informs all intermediate nodes of a flow about the assigned capability, but also provides them with a global throughput perspective via periodical summary exchange. Results show that CapMan is able to maintain flow-wide capability constraints consistently and distributedly, even when multiple colluding insiders attempt to exacerbate the attack. In the meantime, the impact of CapMan on well-behaved flows is shown to be small.
Test-Out Exam: INFS 501
Wednesday, January 15, 2014
RegistrationPlease register by sending an email to email@example.com with the following REQUIRED information:
*Your full name and G number.
*Your program and academic advisor.
*Which exams you wish to register for.
*If spring 2014 will be your first semester as a student. If this is not your first semester as an MS student, please explain why you are are taking the test out exams now.
Test-Out Exam: INFS 515
Wednesday, January 15, 2014
RegistrationPlease register by sending an email to firstname.lastname@example.org with the following REQUIRED information:
*Your full name and G number.
*Your program and academic advisor.
*Which exams you wish to register for.
*If spring 2014 will be your first semester as a student. If this is not your first semester as an MS student, please explain why you are are taking the test out exams now.
Test-Out Exam: INFS 519
Wednesday, January 15, 2014
RegistrationPlease register by sending an email to email@example.com with the following REQUIRED information:
*Your full name and G number.
*Your program and academic advisor.
*Which exams you wish to register for.
*If spring 2014 will be your first semester as a student. If this is not your first semester as an MS student, please explain why you are are taking the test out exams now.
Test-Out Exam: SWE 510
Wednesday, January 15, 2014
RegistrationPlease register by sending an email to firstname.lastname@example.org with the following REQUIRED information:
*Your full name and G number.
*Your program and academic advisor.
*Which exams you wish to register for.
Oral Defense of Doctoral Dissertation: A Dynamic Dialog System Using Semantic Web Technologies
Thursday, January 16, 2014
A dialog system or a conversational agent provides a means for a human to interact with a computer system. Dialog systems use text, voice and other means to carry out conversations with humans in order to achieve some objective. Most dialog systems are created with specific objectives in mind and consist of pre-programmed conversations.
The primary objective of this dissertation is to show that dialogs can be dynamically generated using semantic technologies. I show the feasibility by constructing a dialog system that can be specific to control physical entry points, and that can withstand an attempt to misuse the system by playing back previously recorded conversations. As a solution my system randomly generates questions with a pre-specified difficulty level and relevance, thereby not repeating conversations. In order to do so, my system uses policies to govern the entrance rules, and Item Response Theory to select questions derived from ontologies relevant to those policies. Furthermore, by using contextual reasoning to derive facts from a chosen context, my dialog system can be directed to generate questions that are randomized within a topic, yet relevant to the task at hand. My system design has been prototyped using a voice interface. The prototype demonstrates acceptable performance on benchmark data.
Computer Science Department: GTA Orientation
Thursday, January 16, 2014
C4I Seminar Series: Efficient User Assets Management by Trade-based Asset Blocks and Dynamic Junction Tree for Combo Prediction Markets
Friday, January 24, 2014
We have often heard that the collective wisdom of an informed and diverse group usually out-performs individual experts on forecasting and estimation tasks. The key question is how best to aggregate those diverse judgments. In 2010, it wasn't known whether any system could reliably beat the simple average. Mason is among the teams that reliably did so for two years in IARPA's ACE forecasting challenge. In June 2013 we closed down our geopolitical prediction market to create SciCast, a new and improved science & technology market. This talk discusses ongoing improvements to the SciCast forecasting engine. Unlike other prediction markets, SciCast allows forecasters make conditional forecasts: the chance that China's lunar rover would deploy can be made to depend on a successful soft lunar landing. To avoid a combinatorial explosion, SciCast uses Bayesian networks as the underlying probability model. But tracking the joint probability structure is not enough: markets also must track assets for each user, awarding users for correct forecasts and ensuring there is no possible world where they go negative. Previously, we tracked assets using the same junction tree structure as the joint probability model. This approach provides fast computation of the minimum value and expected value. However, it wastes a lot of space: the majority of users trade sparsely relative to the total number of questions, and even more sparsely compared to the whole joint probability space. Therefore most of the asset junction tree remains untouched. Worse, every time a question is added or resolved, we have to update the asset tree for all users, just in case. We think a trade-based method can overcome this problem and be computationally efficient as well. It turned out that we can build asset blocks involving the questions being traded only, then collect them in an organized manner such as merging sub-block to its super set. Further, when computing user score and cash, we can construct an asset junction tree dynamically, based on the collection of asset blocks then use the asset junction tree for efficient computations. When questions are resolved, it is straightforward to update user's asset blocks accordingly. Basically, for any asset block which contains the resolved questions, we realize the resolving state and truncate the block. In this presentation, I will explain in detail how the trade-based asset blocks are built and how to construct the corresponding asset junction tree dynamically. Computational examples will be demonstrated and compared with other alternative methods. For general questions about prediction markets or other background knowledge, please visit https://scicast.org/.
Speaker's BioDr. Wei Sun is a research assistant professor of the Sensor Fusion Lab and the C4I Center at George Mason University, where he works on stochastic modeling, probabilistic reasoning, optimization, decision support systems, data fusion and general operations research. Dr. Sun focuses his research on inference algorithm for hybrid Bayesian networks, nonlinear filtering, and information fusion. He is an expert in Bayesian inference and developer of several efficient inference algorithms. He is also a contributor/committer of the open-source Matlab BN toolbox. Applications of his research include tracking, fusion, bioinformatics, classification, diagnosis, etc. Prior to joining GMU, Dr. Wei Sun was a Senior Analyst with United Airlines, Inc. and a professional Electrical Engineer in China. He is the recipient of the GMU’s Academic Excellence Award in 2003 and PhD Fellowship during 2003-2007.
GRAND Seminar: Re-Projection of Terabyte-sized 3D Images for Interactive Visualization
Tuesday, February 18, 2014
This talk will present the computational challenges and approaches to knowledge discovery from terabyte-sized images. The motivation comes from experimental systems for imaging and analyzing human pluripotent stem cell cultures and material science specimens at the spatial and temporal coverage that lead to terabyte-sized image data. The objective of such an unprecedented cell study and material imaging is to characterize specimens at high statistical significance in order to guide a repeatable growth of high quality stem cell colonies and to understand metallurgical processes. To pursue this objective, multiple computer and computational science problems have to be overcome including image correction (flat-field, dark current and background), stitching, segmentation, tracking, re-projection, feature extraction and then representation of large images for interactive visualization and sampling in a web browser.
In this presentation, we will focus on the problem of re-projecting terabyte-sized 3D images for interactive visualization from multiple orthogonal viewpoints. The current solutions are limited to gigabyte-sized images using specialized hardware to achieve interactivity and are lacking the ability to share data for collaborative research. We overcome these limitations by pre-computing re-projected views of terabyte-sized images and by using the Deep Zoom framework for accessing multiple orthogonal views. Our approach is based on researching extensions to Amdahl’s law for Map-Reduce computations, establishing benchmarks for image processing on a Hadoop platform, and introducing a computer cluster node utilization coefficient for re-projection computations running on a computer cluster/cloud. The theoretical models of algorithmic complexity and cluster utilization at terabyte scale are applied to selecting an optimal computer cluster configuration. Additional interactive measurement capabilities are added as plugins to the open source OpenSeadragon project with the Deep Zoom capabilities. This presentation will conclude with illustrations of enabled scientific discoveries, as well as with several collaboration opportunities to create reference resources for scientific discoveries from terabyte-sized images.
Speaker's BioPeter Bajcsy received his Ph.D. in Electrical and Computer Engineering in 1997 from the University of Illinois at Urbana-Champaign (UIUC) and a M.S. in Electrical and Computer Engineering in 1994 from the University of Pennsylvania (UPENN). He worked for machine vision, government contracting, and research and educational institutions before joining the National Institute of Standards and Technology (NIST) in 2011. At NIST, he has been leading a project focusing on the application of computational science in biological metrology, and specifically stem cell characterization at very large scales. Peter’s area of research is large-scale image-based analyses and syntheses using mathematical, statistical and computational models while leveraging computer science fields such as image processing, machine learning, computer vision, and pattern recognition. He has co-authored more than more than 24 journal papers, eight book chapters and close to 100 conference papers
CS Interdisciplinary Seminar: Combining Genetic Algorithms and Simulation to Search for Failure Scenarios in System Models
Wednesday, February 19, 2014
Large infrastructures, such as clouds, can exhibit substantial outages, sometimes due to failure scenarios that were not considered during system design. We define a method that uses a genetic algorithm (GA) to search system simulations for parameter combinations that result in system failures, so that designers can take mitigation steps before deployment. We apply the method to study an existing infrastructure-as-a-service cloud simulator. We characterize the dynamics, quality, effectiveness and cost of GA search, when applied to seek a known failure scenario. Further, we iterate the GA search to reveal previously unknown failure scenarios. We find that, when schedule permits and failure costs are high, combining GA search with simulation proves useful for exploring and improving system designs.
Speaker's BioKevin Mills is a senior scientist with the Information Technology Laboratory at the US National Institute of Standards & Technology. In 1996, he received a PhD in Information Technology from George Mason University, where he served on the adjunct faculty through 2006. From 1996 to 1999, he served as a program manager at DARPA. His current research interests include understanding and predicting behavior in complex information systems.
SWE Seminar: What's Wrong With the Program I Haven't Written Yet?
Friday, February 21, 2014
Software developers primarily rely on experience and intuition to make development decisions. I will describe speculative analysis, a new technique that helps developers make better decisions by informing them of the consequences of their likely actions. As a concrete example, I will consider collaborative development and the conflicts that arise when developers make changes in parallel. This is a serious problem. In industry, some companies hire developers solely to resolve conflicts. In open-source development, my historical analysis of over 140,000 versions of nine systems revealed that textual, compilation, and behavioral conflicts are frequent and persistent, posing a significant challenge to collaborative development. Speculative analysis can help solve this problem by informing developers early about potential and existing conflicts. Armed with this information, developers can prevent or more easily resolve the conflicts. I will demonstrate Crystal, a publicly available tool that detects such conflicts early and precisely. Crystal has inspired a collaboration with Microsoft and some Microsoft teams now use a version of the tool in their everyday work
Speaker's BioYuriy Brun is an assistant professor at the University of Massachusetts, Amherst. Yuriy's research interests are in software system modeling, design, and development. Previously, he served as a CI Fellow at the University of Washington, received his PhD degree in 2008 from the University of Southern California, as an Andrew Viterbi Fellow, and received his MEng degree in 2003 from MIT. He was recognized with the IEEE TCSC Young Achiever in Scalable Computing Award in 2013, and his doctoral research was a finalist in the ACM Doctoral Dissertation Competition in 2008. His work on speculative analysis, the subject of his talk, won a 2011 ACM SIGSOFT Distinguished Paper Award and was the spotlight paper in the October 2013 issue of IEEE Transactions on Software Engineering. http://cs.umass.edu/~brun
GRAND Seminar: 3D Shape Retrieval Benchmarking and Contests
Tuesday, March 18, 2014
Benchmarking of 3D Shape retrieval allows developers and researchers to compare the strengths of different methodologies on a standard dataset. Here we describe the procedures involved in developing a benchmark and issues involved. We then discuss some of the current 3D shape retrieval benchmarking efforts of our group and others. We also review the different performance evaluation measures that are developed and used by researchers in the community. After that we give an overview of the 3D shape retrieval contest (SHREC) tracks run under the EuroGraphics Workshop on 3D Object Retrieval and give details of the tracks that we have organized for SHREC.Then we present some of the results based on the different SHREC contest tracks and the other NIST shape benchmark. Finally, I will describe some recent non-rigid 3D shape retrieval algorithms developed by our group.
Speaker's BioAfzal Godil is a project leader in the Information Technology Laboratory at National Institute of Standards and Technology (NIST) where he has been for over seventeen years. Prior to that he has worked at the NASA Langley and Lewis Research centers. His main research interests are in 3D shape analysis and retrieval, digital human modeling, shape metrology, computer vision and computational methods. He has organized twelve Shape Retrieval tracks at the different Eurographics Workshop on 3D Object Retrieval. He is also active in a variety of standards efforts, such as Web graphics, Anthropometry and Medical extension of X3D.
CS Interdisciplinary Seminar: Computer Vision and Robotic Methods for Building and Bridge Inspection
Wednesday, March 19, 2014
Bridges represent a critical component of infrastructure systems, and therefore condition monitoring via periodic inspection has long been a key part of bridge operations and maintenance practice. There are more than 578,000 bridges in the US alone, most of which must be inspected every two years, and so hundreds of millions of dollars per year are spent on the inspection of trillions of dollars in assets. In current practice, images captured during an inspection are not considered quantitative sources of information and are therefore underutilized.
This seminar will present a fundamental rethink of how we capture and handle inspection images, which has resulted in the utilization of computer vision to connect visually observable structural damage to changes in the underlying mechanical performance of structures. Included will be a discussion of the challenges of structural image segmentation and feature extraction, as well as current efforts to represent images as dynamic sources of information.
Speaker's BioDavid Lattanzi, Ph.D., P.E. is an Assistant Professor in the Sid and Reva Dewberry Department of Civil, Environmental, and Infrastructure Engineering. Dr. Lattanzi studies how to develop new methods of structural inspection through the use of artificial intelligence, computer vision, and robotics. A registered professional engineer, he has participated in post-disaster inspection and assessment work both domestically and abroad. He received his doctorate in structural engineering, as well as a concurrent M.S. in mechanical engineering, from the University of Washington, where he developed robotic bridge inspection techniques.
CS Seminar: The Building Blocks of Data Science
Wednesday, March 19, 2014
In this talk I will make an attempt to flesh out the core components of what is being called Data Science. The umbrella term ``Data Science’’ incorporates elements of Computer Science, Information theory and Statistics expressed in the language of optimization theory. The identification of these core elements will help towards arriving at a declarative framework for Data Science and decouple its use from implementation. This in turn may lead to overcome the ``Data Science Crunch’’ – where organizations own and have access to large quantities of data and appreciate its potential value, but lack human talent and a support framework to exploit it to its fullest.
Speaker's BioProf Sanjay Chawla, University of Sydney Sanjay Chawla is a Professor in the School of Information Technologies, University of Sydney, Australia. His main area of research is data mining and machine learning. More specifically he has been working on three problems of contemporary interest: outlier detection, imbalanced classification and adversarial learning. His research has been published in leading conferences and journals and has been recognized by several best-paper awards. He serves on the editorial board of IEEE TKDE and Data Mining and Knowledge Discovery. Sanjay served as the Head of School from 2008-2011 and was an academic visitor at Yahoo! Labs, Bangalore in 2012. He received his PhD in 1995 from the University of Tennessee, USA.
SWE Seminar: Generating Test Data to Distinguish Conjunctive Queries without Comparisons
Thursday, March 20, 2014
The widespread use of databases in software systems has increased the importance of unit testing the queries that form the interface to these databases. Mutation analysis is a powerful testing technique that has been adapted to test database queries. But each of the existing mutation approaches to testing database queries has one or more of the following shortcomings: inability to recognize equivalent mutants, inability to generate test databases automatically, or inability to apply to mutate all aspects of a query.
In this paper we address all three of these challenges by adapting results from the rich literature on query rewriting. We restrict attention to the class of conjunctive queries without comparisons. In return for this restriction, we give an algorithm that recognizes equivalent mutants, generates a test database that distinguishes each nonequivalent mutant, and applies to arbitrary mutations, as long at the mutation is also a conjunctive query without comparisons. The paper presents the test database generation algorithm and proves that it is sound and complete for conjunctive queries without comparisons. We then illustrate the algorithm on a sample query. We evaluate mutations of the query both with the new technique and compare the results to existing mutation techniques for databases.
Speaker's BioPreetham Vemasani is a first year PhD student at George Mason University. He received the BS degree in Computer Science and Information Technology from JNTU Hyderabad, India and the MS degree in Software Engineering from George Mason University. His research interests are in software testing and databases. He is currently working on 'Generating efficient test data for database queries' with Dr. Paul Ammann and Dr. Alexander Brodsky.
SWE Seminar: Mutation Subsumption Graphs
Thursday, March 20, 2014
Mutation testing researchers have long known that many generated mutants are not needed. This paper develops a graph model to describe redundancy among mutations. We define “true” subsumption, a relation that practicing test engineers would like to have, but cannot due to issues of computability. We also define dynamic subsumption and static subsumption as approximations of “true” subsumption. We explore the properties of the approximate subsumption relations in the context of a small example. We suggest possible uses for subsumption graphs.
Speaker's BioBob Kurtz is a Senior Principal Engineer with Raytheon. Bob has more than 25 years of experience in developing software systems, with a focus on real-time embedded systems. Bob has a B.A. in Computer Science from the State University of New York Empire State College, and an M.S. in Software Engineering from George Mason University. Bob is currently pursuing a Ph.D. in Software Engineering at George Mason University.
CS Seminar: Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility
Friday, March 21, 2014
We propose Migratory Compression (MC), a coarse-grained data transformation, to improve the effectiveness of traditional compressors in modern storage systems. In MC, similar data chunks are re-located together, to improve compression factors. After decompression, migrated chunks return to their previous locations. We evaluate the compression effectiveness and overhead of MC, explore reorganization approaches on a variety of datasets, and present a prototype implementation of MC in a commercial deduplicating file system. We also compare MC to the more established technique of delta compression, which is significantly more complex to implement within file systems.
We find that Migratory Compression improves compression effectiveness compared to traditional compressors, by 11 percent to 105 percent, with relatively low impact on runtime performance. Frequently, adding MC to a relatively fast compressor like gzip results in compression that is more effective in both space and runtime than slower alternatives. In archival migration, MC improves gzip compression by 44–157 percent. Most importantly, MC can be implemented in broadly used, modern file systems.
Joint work with Xing Lin (Univ. of Utah) and Guanlin Lu, Philip Shilane, and Grant Wallace (EMC Corporation—Data Protection and Availability Division) This will be an expanded version of the talk presented by Xing at FAST'14.
Speaker's BioFred Douglis holds a Ph.D. in computer science from U.C. Berkeley. He has worked in industrial applied research throughout his career, including Matsushita, AT&T, IBM, and currently EMC. He also has been a visiting professor at VU Amsterdam and Princeton University. He received an IBM Outstanding Technical Achievement award for his contributions to System S, productized as Infosphere Streams. His research interests include storage, distributed systems, and Internet tools and performance.
He served as EIC of IEEE Internet Computing from 2007-2010 and has been on its editorial board since 1999. He has published one book, 40 workshop or conference papers, 7 journal or magazine articles, and over 50 patents and patent applications.
GRAND Seminar: Some Expeditions in Predictive Modeling to Enable Systems Biology
Friday, March 28, 2014
With the data explosion being witnessed in biology, immense emphasis is being placed on developing systematic approaches to integrate the various types and sources of data to build models of complex biological processes and diseases.
In this talk, I will discuss our efforts to model complex biomedical phenotypes using predictive modeling approaches applied to large genome-wide data sets. The first part presents experiences and results from a collaborative-competitive effort to model and predict survival rates for breast cancer patients using a recently published set of gene expression, copy number aberration and clinical features.
In the second part, I will present our analysis of heterogeneous ensemble predictive methods that generally produce the best performance for complex biomedical prediction problems. These methods leverage the consensus and diversity among hundreds or even thousands of heterogeneous base predictors, and thus generally outperform even the best homogeneous ensemble methods, like boosting and random forests.
Speaker's BioGaurav Pandey is an Assistant Professor in the Department of Genetics and Genomic Sciences at the Mount Sinai School of Medicine (New York) and is part of the newly formed Institute for Genomics and Multiscale Biology. He completed his Ph.D. in computer science and engineering from the University of Minnesota, Twin Cities in 2010, and subsequently completed a post-doctoral fellowship at the University of California, Berkeley. His primary fields of interest are computational biology, genomics and large-scale data analysis and mining, and he has published extensively in these areas.
CS Interdisciplinary Seminar: Computational Learning Sciences
Wednesday, April 09, 2014
The field of Learning Sciences conducts research on how people learn across a range of domains with and through the use of artifacts. This understanding is then used to design more productive learning environments. How people learn, our understanding of how people learn, as well as our ability to design learning environments is undergoing a tremendous transformation with increasing digitization of artifacts and our practices. Increased digitization, in addition to other affordances, implies computational capabilities embedded in artifacts. How does this impact learning? This question gives rise to a new area of research I call Computational Learning Sciences (CLS). In this talk I start an exploration of this area and work towards a problem definition for it by presenting findings from a study of newcomer participation in a Java programming community. I emphasize the inherently socio-technical nature of learning and outline three relevant and useful avenues for CLS research: 1. Content Curation – through aggregation, recommendation, and crowdsourcing; 2. Collaboration Configuration – through analytics and modeling of learner and teacher activity; and, 3. Competency Certification – through formative, dynamic and summative assessment.
Speaker's BioAditya Johri studies the use of information technologies for learning and knowledge sharing, with a focus on cognition in informal environments. His research is funded through several NSF grants including an Early Career Award. He is a co-editor of the Cambridge Handbook of Engineering Education Research (CHEER), Cambridge University Press (2014). He received his Ph.D. in Learning Sciences and Technology Design from Stanford University. He can be reached at email@example.com. More information at: http://mason.gmu.edu/~ajohri3
Oral Defense of Doctoral Dissertation: Unsupervised Bayesian Musical Key and Chord Recognition
Wednesday, April 09, 2014
Many tasks in Music Information Retrieval can be approached using indirection in terms of data abstraction. Raw music signals can be abstracted and represented by using a combination of melody, harmony, or rhythm for musical structural analysis, emotion or mood projection, as well as efficient search of large collections of music. In this dissertation, we focus on two tasks: analyzing tonality and harmony of music signals. Our approach concentrates on transcribing western popular music into its tonal and harmonic content directly from the audio signals. While the majority of the proposed methods adopt the supervised approach which requires scarce manually-transcribed training data, our approach is unsupervised where model parameters for tonality and harmony are directly estimated from the target audio data. First, raw audio signals in the time domain are transformed using undecimated wavelet transform as a basis to build an enhanced 12-dimensional pitch class profile (PCP) in the frequency domain as features of the target music piece. Second, a bag of local keys are extracted from the frame-by-frame PCPs using an infinite Gaussian mixture which allows the audio data to “speak-for-itself” without pre-setting the number of Gaussian components to model the local keys. Third, the bag of local keys is applied to adjust the energy levels in the PCPs for chord extraction.
From experimental results, we demonstrate that our approach – a much simpler one compared to most of the existing methods – performs just as well or outperforms many of the much more complex models for the two tasks without using any training data.
CS Distinguished Lecture Series: Tor and Censorship: Lessons Learned
Friday, April 11, 2014
Tor is a free-software anonymizing network that helps people around the world use the Internet in safety. Tor's 5500 volunteer relays carry traffic for around a million daily users, including ordinary citizens who want protection from identity theft and prying corporations, corporations who want to look at a competitor's website in private, people around the world whose Internet connections are censored, and even governments and law enforcement.
The last year has included major cryptographic upgrades in the Tor software, dozens of research papers on attacking and improving the Tor design, mainstream press about government attempts to attack the Tor network, discussions about funding, FBI/NSA exploitation of Tor Browser users, botnet related load on the Tor network, and other important topics.
In this talk I'll aim to strike a balance between explaining Tor's "intellectual merit" side (all the neat research problems that Tor raises, and how we've positioned ourselves to get so much attention from academics) and Tor's "broader impact" side (the many ways that Tor has changed lives around the world).
Speaker's BioRoger Dingledine is project leader for The Tor Project, a US non-profit working on anonymity research and development. While at MIT he developed Free Haven, one of the early peer-to-peer systems that emphasized resource management while maintaining anonymity for its users. He works with the Electronic Frontier Foundation, the US Navy, Voice of America, the National Science Foundation, and other organizations to design and develop systems for anonymity and traffic analysis resistance. He organizes academic conferences on anonymity, speaks at such events as Blackhat, Defcon, Toorcon, and the CCC congresses, and also does tutorials on anonymity for national and foreign law enforcement. Roger was honored in 2006 as one of the top 35 innovators under the age of 35 by Technology Review magazine.
SANG Seminar: Tackling the Challenges of I/O Virtualization in Data Centers
Friday, April 25, 2014
Large-scale data centers leverage virtualization to achieve high resource utilization, scalability, and availability. While ideally the performance of an application running inside a virtual machine (VM) shall be independent of co-located applications and VMs that share the physical resources, current systems are yet to achieve this goal.
In this talk, I will describe our efforts in addressing a number of challenges in order to achieve optimal I/O performance in such virtualized systems. Specifically, TRACON constructs mathematical models and scheduling algorithms to mitigate the VM interference; Matrix leverages machine learning and optimization techniques to allocate VM resource in a way that minimizes the cost while achieving good application performance; and Mortar enhances the hypervisor to pool together spare memory on each machine and expose it as a volatile data cache to improve virtual I/O performance.
Speaker's BioHowie Huang is an Assistant Professor in Department of Electrical and Computer Engineering at the George Washington University. His research interests are in the areas of computer systems and architecture, including cloud computing, big data computing, high-performance computing and storage systems. His projects won the Best Poster Award at PACT'11, ACM Undergraduate Student Research Competition at SC'12, and a finalist for the Best Student Paper Award at SC'11. He was a recipient of the NSF CAREER Award in 2014, NVIDIA Academic Partnership Award in 2011, IBM Real Time Innovation Faculty Award in 2008, and School of Engineering and Applied Science Outstanding Young Researcher Award in 2014. He received his Ph.D. in Computer Science from the University of Virginia in 2008.
Oral Defense of Doctoral Dissertation: Quantitative Framework to Design Services with Intrusion Tolerant QoS
Friday, April 25th, 2014
Large software systems can be designed as a set of loosely coupled services interacting with each other; simple services can be composed to form more complex services. But, for services to be usable in production, they must satisfy non-functional requirements, especially security-related quality of service in order to ensure confidentiality, integrity, and availability. Unfortunately, software vulnerabilities expose these services to malicious actors, and make them susceptible to attacks. Due to the distributed and decentralized nature of services, publishing and guaranteeing security quality of service are crucial so that potential applications and clients can make use of the provided services. Moreover, intrusion prevention and detection are not perfect in securing services, due to the increased sophistication of malicious attacks. This has motivated the addition of the Intrusion Tolerant component to complement the line of defense for applications and services. Given the need of making services intrusion-tolerant, my research focuses on providing a Quantitative Framework for Intrusion Tolerant Services, a systematic approach to model, design and implement services with Intrusion Tolerant Quality of Service (IT-QoS). The approach relies on the foundation of the recovery-based architecture of Self Cleansing Intrusion Tolerance, and a correlation component, which is based on Semi-Markov Process for computing IT-QoS metrics, and discovering a mathematical dependency between those metrics and the intrusion tolerance control parameters such as the service exposure window. To system architects of service providers, the framework would constitute as the basis for ensuring differentiated levels of certain IT-QoS metrics such as Secure Availability, and Mean Time To Security Failure, which are indicators of the reliability of a service operating in the presence of cyber security threats.
Oral Defense of Doctoral Dissertation: An Analysis of a Model-based Evolutionary Algorithm: Learnable Evolution Model
Monday, April 28, 2014
An evolutionary algorithm (EA) is a biologically inspired metaheuristic that uses mutation, crossover, reproduction, and selection operators to evolve solutions for a given problem. Learnable Evolution Model (LEM) is an EA that has an evolutionary algorithm component that works in tandem with a machine learner to collaboratively create populations of individuals. The machine learner infers rules from best and least fit individuals, and then this knowledge is exploited to improve the quality of offspring. Unfortunately, most of the extant work on LEM has been ad hoc, and so there does not exist a deep understanding of how LEM works. And this lack of understanding, in turn, means that there is no set of best practices for implementing LEM. For example, most LEM implementations use rules that describe value ranges corresponding to areas of higher fitness in which offspring should be created. However, we do not know the efficacy of different approaches for sampling those intervals. Also, we do not have sufficient guidance for assembling training sets of positive and negative examples from populations from which the ML component can learn.
This research addresses those open issues by exploring three different rule interval sampling approaches as well as three different training set configurations on a number of test problems that are representative of the types of problems that practitioners may encounter. Using the machine learner to create offspring induces a unique emergent selection pressure separate from the selection pressure that manifests from parent and survivor selection; an outcome of this research is a partially ordered set of the impact that these rule interval sampling approaches and training set configurations have on this selection pressure that practitioners can use for implementation guidance. That is, a practitioner can modulate selection pressure by traversing a set of design configurations within a Hasse graph defined by partially ordered selection pressure.
Oral Defense of Doctoral Dissertation: Management of Uncertainty in Self-Adaptive Software
Tuesday, May 20, 2014
The ever-growing complexity of software systems coupled with the need to maintain their quality of service (QoS) characteristics, even under adverse conditions and highly uncertain environments, have instigated the emergence of self-adaptive software systems. A self-adaptive software system has the mechanisms that automate and simplify the management and modification of software systems after they are deployed, (i.e., during run-time) to achieve certain functional or QoS goals.
While the benefits of such systems are plenty, their development has shown to be significantly more challenging than static and predictable software systems. One key culprit is that self-adaptation is subject to uncertainty. Uncertainty can be observed in every facet of adaptation, albeit at varying degrees. It follows from the fact that the system's user, adaptation logic, and business logic are loosely coupled, introducing numerous sources of uncertainty. This challenges the confidence with which the adaptation decisions are made. A key observation is that while the level of uncertainty could vary, no self-adaptive software system is ever completely free of it.
This is precisely the challenge I addressed in this research. I have presented a general quantitative approach for tackling the complexity of automatically making adaptation decisions under uncertainty. I redefined the conventional definition of optimal solution to one that has the best range of behavior. In turn, the selected solution has the highest likelihood of satisfying the system's quality objectives, even if due to uncertainty, properties expected of the system are not borne out in practice.
In this dissertation, I begin with describing the problem and providing an overview of the solutions. Then, I define what I mean by uncertainty and position the approach with regard to the related work. Next, I provide the theoretical detail of the approach, which includes the formal specification of the problem and two (combinatorial and evolutionary) optimization techniques. I also describe some techniques for quantifying uncertainty and required step for effecting adaptation decisions at run-time. Finally, I discuss the implementation details of the framework. My experience with the approach, including experimental evaluation and porting the framework to a new problem domain, has been very positive. The evaluation results have strongly corroborated my hypotheses.
CS Interdisciplinary Seminar: Smart Cars for Safe Pedestrians
Wednesday, June 18, 2014
One of the most significant large-scale deployments of intelligent systems in our daily life nowadays involves driver assistance in smart cars.
Accident statistics show that roughly one quarter of all traffic fatalities world-wide involve vulnerable road users (pedestrians, bicyclists); most accidents occur in an urban setting. Devising an effective driver assistance system for vulnerable road users has long been impeded, however, by the "perception bottleneck", i.e. not being able to detect and localize vulnerable road users sufficiently accurate. The problem is challenging due to the large variation in object appearance, the dynamic and cluttered urban backgrounds, and the potentially irregular object motion. Topping these off are stringent performance criteria and real-time constraints. I give an overview of the remarkable computer vision progress that has been achieved in this area and discuss the main enablers: the algorithms, the data, the hardware and the tests.
Daimler has recently introduced an advanced set of driver assistance functions in its Mercedes-Benz 2013-2014 S-, E-, and C-Class models, termed “Intelligent Drive”, using stereo vision. It includes a pedestrian safety component which facilitates fully automatic emergency braking - the system works day and night. I discuss “Intelligent Drive” and future research directions, on the road towards accident-free driving.
Speaker's BioDariu M. Gavrila received the PhD degree in computer science from the University of Maryland at College Park, USA, in 1996. Since 1997, he has been with Daimler R&D in Ulm, Germany, where he is currently a Principal Scientist. In 2003, he was further appointed professor at the University of Amsterdam, chairing the area of Intelligent Perception Systems (part time). Over the past 15 years, Prof. Gavrila has focused on visual systems for detecting humans and their activity, with application to intelligent vehicles, smart surveillance and social robotics. He led the multi-year pedestrian detection research effort at Daimler, which materialized in the Mercedes-Benz S-, E-, and C-Class models (2013-2014). He is frequently cited in the scientific literature and he received the I/O 2007 Award from the Netherlands Organization for Scientific Research (NWO) as well as several conference paper awards. His personal Web site is www.gavrila.net.
Oral Defense of Doctoral Dissertation: Profiling, Tracking, and Monetizing: An Analysis of Internet and Online Social Network Concerns
Wednesday, July 02, 2014
This dissertation explores concerns facing Internet and specifically Online Social Network users. The attacks we discuss can lead to identity theft, biased and tailored website content delivery, geolocation threats, monetization, and an overall lack of privacy. We introduce a profiling and tracking attack that correlates a user’s online persona that is captured from seemingly innocuous website traffic (e.g., operating system, search engine, browser, time spent on website, etc.) with that of the same user’s real Facebook profile through analytics captured from a custom Facebook Fan Page. We show how an adversary might identify the personally identifiable information of the user given only their online persona.
The protection of one’s identity is paramount especially for user’s working in the intelligence community. As a result, these organizations are currently employing privacy preserving technologies as part of their standard network defenses to anonymize their outbound traffic. Our results show that while network-level anonymity systems are better at protecting end-user privacy than having no privacy preserving technology in place, they are unable to thwart de-anonymization attacks aimed at applications and private data of end-users. We demonstrate and substantiate our claims using a targeted experiment against actual operational scenarios of real-world users who are relying on an implementation of a privacy preserving technology. To this end, we execute multiple attacks associated with network monitoring, phishing, and Online Social Networks.
We also discuss how a user can be monetized through an attack vector such as spam. Spam is a profit-fueled enterprise, and cybercriminals are focusing more of their efforts at growing Online Social Networks. One of the common methods of monetizing Online Social Network spam is to entice users to click on links promising free gift cards and iPads. However, these links actually lead to ad networks that bombard users with surveys in an attempt to collect personal and contact information that they will sell to other marketers. To date, we lack a solid understanding of this enterprise’s full structure. We examined the survey scam process to determine the affiliates that are behind this lucrative scam by performing an analysis of five months of Facebook spam data. We provide the first empirical study and analysis of survey scams and demonstrate how to determine which ad networks are sponsoring this spam.
Lastly, we focus on why people act in an insecure way when handling their personal information especially passwords and personal images. This is a major problem as seen in revenge porn and sextortion related cases. Often the victim’s life has been negatively altered in direct relationship to these types of cases. Using a combination of well-known human-computer interaction methods such as surveys and exit interviews combined with custom software (e.g., Cloudsweeper and Gmail Image Extractor) we show that users act differently if they visually see the threat associated with their security behavior. We analyze the results of Cloudsweeper which is designed to scan Google Mail accounts and report any cleartext passwords, their associated monetary value, and provides the option for passwords to be encrypted and redacted. Additionally, we introduce for the first time the Google Image Extractor which is designed to extract images from the user’s Google Mail account and provides the opportunity for users to delete their images seamlessly. Our contributions will help determine if there is a need for such applications as well as determine if convenience or privacy prevails.
The main takeaway is users were not educated about quite prevalent attack vectors for compromising client systems and violating user privacy. We show the extent to which information made freely available on the Internet, can negatively impact the organization and users. Upon completion of the experiments, we compiled the results and presented it as security awareness briefings. The security awareness briefings in combination with our software tools will help mitigate some of the concerns we present and discuss in this dissertation.
Oral Defense of Doctoral Dissertation: An Approach to Analyzing and Recognizing Human Gait
Monday, July 21, 2014
Gait analysis has been an active area of research in computer vision for a long time. It is also important for rehabilitation science where clinicians explore innovative ways helping to analyze gait of different people. The traditional ways to study gait rely on 3D optical motion capture systems which involve the use of cumbersome active/passive markers to be placed on a subject’s body.
The attachment of markers to the segments hinder natural patterns of movement and may lead to altered gait information. Automated gait analysis has been proposed as a solution to this problem. The aim of automated gait analysis is to provide information about the gait parameters and gait determinants from video without using markers. Gait is a repetitive, highly constrained and periodic activity. Different gait determinants are active in different phases of the gait cycle to minimize the excursion of the body’s center of gravity and help produce forward progression with the least expenditure of energy. The motion of limb segments encode information about different phases of gait cycle. However, estimating the motion of limbs from the videos is challenging since limbs are self occluding and only apparent motion can be observed using the images. To add to the issue,the quality of the recorded video (color contrast, cluttered background) and clothing worn by the subject can play a significant role in the computation of that apparent motion.
In this thesis, we present novel methods using image flow to identify different phases (double support, mid swing, toe off and heel strike) of a gait cycle. We use the torso excursion information and lower legs rotational velocities to identify these phases. The top 30 percent of the subject’s body is used to estimate the torso instantaneous velocity. The zero values of the vertical component of the velocity identifies the double support and mid swing phases. Utilizing these phases, we present approaches to approximate the lower leg motions using translation and rotational motion models. The zero values of the instantaneous rotational velocity of the lower legs determine the toe off and heel stike events. We also apply a modified version of color tracking algorithm to track the hand position during the gait cycle. Some of the limb segments such as upper legs and upper arms get occluded by other segments during majority of the gait cycle. We have presented a method to model the motion synergies between pairs of segments (upper leg–lower leg, foot–lower leg and upper arm–lower arm) using data from a 3D motion capture database. The estimated parameters of the non-linear dynamic models depend on the phases of the gait cycle being considered. We implement an Unscented Kalman Filter that estimates the angular position of the unobserved limb segments (upper leg, foot, upper arm and forearm) based on these models and the motion data obtained for the observed limb segments (lower leg, hand). We compare our results with those obtained from a 3D motion capture system and by manually labeling the images. The small error in the results demonstrates the sensitivity and specificity of our techniques. Finally, we use histograms of normal flow to represent the motion patterns of different regions of the body. We measure the motion similarity between two image frames using the cosine similarity measure for comparing two histograms. Computing this measure between all the pairs of image frames in the two gait sequences gives a similarity matrix as a feature. These features are used in Support Vector Machines and Dynamic Programming together with the information about the phases of gait to compare two gait sequence. We demonstrate our approach on a publicly available gait dataset and present the analysis.
In summary, we establish that we can capture segmental data using a markerless gait analysis system. These data are sensitive, reliable and provide recognizable clinically relevant information about motion through all phases of gait.
CS Seminar: Computer Agents that Negotiate Proficiently with People
Tuesday, July 22, 2014
Negotiation and persuasion are tools for social influence that are endemic to human interaction, from personal relationships and business partnerships to political debate. The inclusion of people presents novel problems for the design of automated agents’ negotiation and persuasion strategies. People do not adhere to the optimal, monolithic strategies that can be derived analytically. Their negotiation behavior is affected by a multitude of social and psychological factors. In this talk I will show how combining machine learning techniques for opponent modeling with human behavioral models, formal decision-making and heuristics approaches enable agents to interact well with people. Applications include intelligent agents that help drivers reduce energy consumption, agents that support rehabilitation and employer-employee negotiation.
Speaker's BioSarit Kraus (Ph.D. Computer Science, Hebrew University, 1989) is a Professor of Computer Science at Bar-Ilan University and an Adjunct Professor at the University of Maryland. Her research is focused on intelligent agents and multi-agent systems (including people). Kraus was awarded the IJCAI Computers and Thought Award, ACM SIGART Agents Research award, the EMET prize and her paper with Prof. Barbara Grosz was a winner of the IFAAMAS influential paper award (joint winner). She is AAAI and ECCAI fellow and a recipient of the advanced ERC grant. http://u.cs.biu.ac.il/~sarit/index.html
Oral Defense of Doctoral Dissertation: Decision Guidance Query Language (DGQL), Algorithms and System
Friday, July 25, 2014
Decision optimization is widely used in many Decision Support and Guidance Systems (DSGS) to support business decisions such as procurement, scheduling and planning. In spite of rapid changes in customer requirements, the implementation of DSGS is typically rigid, expensive and not easily extensible, in stark contrast to the agile implementation of information systems based on the DBMS and SQL technologies. This dissertation introduces the Decision Guidance Query Language (DGQL) designed to (re-)use SQL programs for decision optimization with the goals of making DSGS implementation agile and intuitive, and leveraging existing investment in SQL-implemented systems. This dissertation addresses several related technical issues with DGQL: (1) how to annotate existing queries to precisely express the optimization semantics, (2) how to translate the annotated queries into equivalent mathematical programming (MP) formulations that can be solved efficiently using existing industrial solvers, and (3) how to develop specialized optimization algorithms for a class of multi-stage production problems modeled in DGQL.
The algorithms for the multi-stage production network utilize the fact that only part of the problem is dynamic, e.g., the demand for output products in a manufacturing process, whereas the rest of the problem is static, e.g., the connectivity graph of the assembly processes and the cost functions of machine assemblies. An online decomposition algorithm (ODA) is developed based on offline preprocessing of static assembly components to create an approximated cost function, which is used to decompose the original problem into smaller problems and significantly improve solution quality and time complexity. The preprocessing of each static assembly component involves discretizing assembly output, finding the corresponding optimal machine configuration, and constructing a piece-wise linear approximation of the assembly cost function. An adaptive preprocessing algorithm (APA) is introduced that considers only a small part of the discretized points by classifying outputs based on their predicted machine configuration. An initial experimental evaluation suggests that (1) machine generated MP models introduce little or no degradation in performance as compared with expertly crafted models, (2) ODA, using offline preprocessing, leads to an order of magnitude improvement in quality of solutions and optimization time as compared to MILP, and (3) ADA shows significant improvement in preprocessing time with no reduction in the quality of the online solution.
Oral Defense of Doctoral Dissertation: A Highly Recoverable Filesystem for Solid State Drives
Thursday, July 31, 2014, 1:30 pm
Recovering deleted information from storage drives is a long-standing problem. Prior research has approached information recovery by developing file-carving techniques. However, two issues present significant challenges to on-going efforts. 1) Prior knowledge of file types is required to construct file carvers, including file headers and footers, and 2) fragmentation prevents file carvers from achieving successful recovery. More recently, solid-state drives (SSDs) have become more popular. SSDs provide several advantages over mechanical hard drives such as smaller sizes, the lack of moving parts, and provide better performance. However, due to problems such as wear leveling and write amplification in SSDs, files are severely fragmented and thus exacerbate the data recovery problem. In addition, SSDs use TRIM and garbage collection schemes to enhance their performance, which can permanently remove data immediately after executing a delete operation.
In this dissertation, I developed a framework for recovering deleted files without the knowledge of the file type amidst significant fragmentation. I developed the Recovery Filesystem by modifying an existing implementation of the exFat filesystem running on top of FUSE. The central idea underlying the Recovery Filesystem is a special identifier embedded in each data block. The identifier monitors each block by mapping the data block to a single file regardless of the file status, existing or deleted. The block sequence number and creation timestamp are also maintained to facilitate the recovery process. In addition, I developed a garbage collection scheme for SSDs that maximizes data retention without sacrificing SSD performance.
The experiments conducted in this dissertation demonstrate that the Recovery Filesystem yields acceptable read/write performance results. In addition, file recovery experiments used to compare the Recovery Filesystem with open source recovery techniques demonstrate that the Recovery Filesystem provides significant advantages in the case of fragmented data.
Volgenau School of Engineering: New Graduate Student Orientation
Monday, August 18, 2014
Event InfoVolgenau School of Engineering is welcoming its newly admitted graduate students to a special orientation event.
It is highly recommended that both domestic and international students plan to attend. Essential information regarding university services for graduate students and program information from academic departments will be provided. Also, this is your opportunity to meet your peers, the administrative and academic staff members who will assist you during the pursuit of your graduate course work and degree.
Department of Computer Science: GTA Teaching Workshop
Wednesday, September 03, 2014
This workshop is mandatory for all CS department graduate teaching assistants.
SWE Seminar: An Evaluation of the Effectiveness of the Atomic Section Model
Wednesday, September 17, 2014
Society increasingly depends on web applications for business, work, and pleasure. As the use of web applications continues to increase, the number of failures, some minor and some major, continues to grow. A significant problem is that we still have relatively weak abilities to test web applications, and often rely on informal, ad-hoc, and ineffective techniques. A key driver to this problem is that web applications use novel technologies, including control structures that are not available in traditional software, and new state handling techniques, including new variable scopes. Traditional testing techniques do not adequately model or test these novel technologies. The atomic section model (ASM), which was introduced in a previous paper, models these novel technologies to support design, analysis, and testing. In this talk, we will present the results of an empirical study that was used to evaluate the effectiveness of the ASM to design tests. We will also discuss WASP (Web Atomic Section Project), a tool which extracts the ASM from the implementation and supports various test criteria.
Speaker's BioSunitha Thummala is a Ph.D. student in the IT Department of Volgenau School of Engineering. She received her Master's degree in Software Engineering from George Mason University in 2013. Her research interests are in the areas of model based testing and web applications.
Distinguished Lecture Series: Probabilistic Topic Models and User Behavior
Friday, September 26, 2014
Probabilistic topic models provide a suite of tools for analyzing large document collections. Topic modeling algorithms discover the latent themes that underlie the documents and identify how each document exhibits those themes. Topic modeling can be used to help explore, summarize, and form predictions about documents. Topic modeling ideas have been adapted to many domains, including images, music, networks, genomics, and neuroscience.
Traditional topic modeling algorithms analyze a document collection and estimate its latent thematic structure. However, many collections contain an additional type of data: how people use the documents. For example, readers click on articles in a newspaper website, scientists place articles in their personal libraries, and lawmakers vote on a collection of bills. Behavior data is essential both for making predictions about users and for understanding how a collection is organized.
I will review the basics of topic modeling and describe our recent research on collaborative topic models, models that simultaneously analyze a collection of texts and its corresponding user behavior. > We studied collaborative topic models on scientists' libraries (from Mendeley) and scientists' click data (from the arXiv). Collaborative topic models enable interpretable recommendation systems, capturing scientists' preferences and pointing them to articles of interest. Further, they help organize the articles according to the discovered patterns of readership. For example, we can identify articles that are important within a field, and articles that transcend disciplinary boundaries.
More broadly, topic modeling is a case study in the large field of applied probabilistic modeling. I will survey some recent advances in this field. I will show how modern probabilistic modeling gives data scientists a rich language for expressing statistical assumptions and scalable algorithms for uncovering hidden patterns in massive data.
Biography:David Blei is a Professor of Statistics and Computer Science at Columbia University. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data.
David earned his Bachelor's degree in Computer Science and Mathematics from Brown University (1997) and his PhD in Computer Science from the University of California, Berkeley (2004). Before arriving to Columbia, he was an Associate Professor of Computer Science at Princeton University. He has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013).
GRAND Seminar: Galaxy: A Web-based Platform for High-throughput Genomics
Tuesday, October 07, 2014
High-throughput DNA sequencing has transformed biomedical research into a data-driven science. Today, the majority of time and cost in genomics experiments arise from analyzing the data, not producing it. Galaxy (http://galaxyproject.org) is a popular Web-based platform for analyzing large biological datasets and for genomic data in particular. Galaxy makes computational biology analyses accessible, reproducible, and collaborative. Galaxy has been cited more than 1800 times in scientific publications, there are 63 active public servers, and our main public server (http://usegalaxy.org) processes ~130,000 analysis jobs each month. In this talk, I will describe (a) the Galaxy platform and (b) new algorithms and visual analytics applications for Galaxy to analyze cancer genomic data.
Speaker's BioJeremy Goecks is an Assistant Professor of Computational Biology at George Washington University. His research agenda centers on (a) developing methods for analyzing high-throughput biomedical data, with a focus on cancer genomics (b) creating visual analytics systems for large genomic datasets. He is a lead investigator for the Galaxy project (http://galaxyproject.org); Galaxy is a popular Web-based platform for doing accessible, reproducible, and collaborative analysis of biological datasets. Jeremy received his Ph.D. from Georgia Tech and his B.S. (with Honors) from the University of Wisconsin-Madison.
Department of Computer Science: GTA Teaching Workshop
Wednesday, October 08, 2014
This workshop is mandatory for all CS department graduate teaching assistants.
Oral Defense of Doctoral Dissertation: Visual Scene Understanding Through Semantic Segmentation
Thursday, October 09, 2014
The problem of visual scene understanding entails recognizing the semantic constituents of a scene and the complex interactions that occur between them. Development of algorithms for semantic segmentation, which requires the simultaneous segmentation of an image into regions and the classification of these regions into semantic categories, is at the heart of this problem. This dissertation presents methods that provide improvements to the state of the art in semantic segmentation of images and investigates the use of the obtained semantic segmentation output for related image retrieval and classification tasks. We present a method for non-parametric semantic segmentation of images which can effectively work on image datasets with a large number of categories. The method exploits query time feature channel relevance and also introduces the semantic label descriptor for improving the semantic segmentation output by retrieving images which share semantically similar spatial layouts. We further demonstrate how to associate accurate confidences with the resulting semantic segmentation through the use of the strangeness measure. We show how this measure can be applied for confidence ranking of unlabeled images and associate high uncertainty scores with images containing unfamiliar semantic categories. We then demonstrate the use of semantic segmentation output for additional tasks such as scene categorization, learning related semantic concepts and content based image retrieval.
SWE Seminar: Automated Detection and Mitigation of Inter-Application Security Vulnerabilities in Android
Monday, October 27, 2014
Android is the most popular platform for mobile devices. It facilitates sharing of data and services among applications using a rich inter-application communication system. While access to resources can be controlled by the Android permission system, enforcing permissions is not sufficient to prevent security violations, as permissions may be mismanaged, intentionally or unintentionally. Android’s enforcement of the permissions is at the level of individual apps, allowing multiple malicious apps to collude and combine their permissions or to trick vulnerable apps to perform actions on their behalf that are beyond their individual privileges. In this talk, we will present our ongoing research which explores a proactive scheme for automated detection and mitigation of inter-application vulnerabilities. Our approach leverages concepts from the domains of formal methods, model-driven development, and programming languages, and allows the end-users to safeguard a given bundle of apps installed on their device from such complex, inter-app vulnerabilities. We will illustrate the ideas in the context of practical applications, discuss its potential to put the field forward, and pose important areas of research in the coming era.
Speaker's BioHamid Bagheri is a Postdoctoral researcher in the Department of Computer Science at George Mason University. He received his PhD in Computer Science from University of Virginia in 2013. Hamid works in the crossroads of software engineering, program synthesis, and formal methods. His research career has focused on the development of techniques and tools that aid with the analysis and synthesis of software systems. He has been prolific in his early career, developing several novel techniques, including new methods and tools for compositional analysis of android inter-app vulnerabilities, synthesis of partial code frameworks from application architectures, and synthesis of object-relational mapping tradeoff spaces for database-centric applications. The results of his research have been published in some of the most prestigious software engineering venues, such as ICSE, ASE, and MoDELS, among others.
Alireza Sadeghi received the B.Sc. degree in computer (software) engineering and M.Sc. degree in information technology from Sharif University of Technology in 2008 and 2010, respectively. He is a Ph.D. student in the Department of Computer Science at George Mason University. His research interests focus on software engineering, specifically, static program analysis and mobile app security Inspection.
Oral Defense of Doctoral Dissertation: Enforcing Careflows an Treatment Consents in Electronic Medical Record Systems
Wednesday, October 29, 2014
Procedure-oriented medical treatment specifies processes, which embody standards that are based on years of experience to avoid known pitfalls in care procedures, that need to be followed by members of care teams. These processes are referred to as medical treatment workflow or careflow, which are not currently embedded in and enforced by the existing Electronic Medical Record systems (EMRs). This dissertation demonstrates a method to incorporate medical workflows, with sufficient flexibility to handle unanticipated exceptions, into existing EMRs. This dissertation also shows the utility and flexibility of the proposed method by developing prototypes for surgical procedures and hemodialysis procedures. Secondly, receiving medical treatment, choosing an alternative treatment, or terminating a treatment requires the patient’s explicit or derived informed consent regardless of the electronic nature of a healthcare system. Failure to obtain informed consent is one of the top ten reasons for medical malpractice claims in the United States. Consequently, e-healthcare systems need to have an effective and accurate consent management system. Obtaining and enforcing consent in EMRs is complex due to the fact it's governed by state laws, institutional policies, and treatment types. However, it becomes even more complicated if requesting an appropriate consent involving a minor. This dissertation provides a method to incorporate medical treatment consent into existing EMRs. The utility and flexibility of this method are also shown by prototyping a system that enforces treatment consent for treating adolescent patients across all fifty states. The method provided shows how both the current state-specific wide-variety regulations, and any future changes of the regulations can be incorporated and enforced by a single system.
VSE Seminar: Virtual Humans
Thursday, October 30, 2014
Virtual humans have application to a number of domains, including entertainment, training, evacuation analysis, architectural and urban design, epidemiology, public policy, and more. Research in virtual humans is at the crossroads of animation, artificial intelligence, psychology and game design. The goal is the production of 3D animated characters that reasonably represent the behaviors of real humans. These virtual humans need to be able to navigate through the virtual world and have meaningful interactions with objects and other agents found in the world. They should appear to have a rich set of complex behaviors that typify real human behaviors. Furthermore, the behavior of a population of these characters needs to be feasible to author and modify, preferably by non-programmers. Subject Matter Experts (SMEs) need to be able to tailor behaviors to achieve the objectives of their simulations without being required to specify every detail of every agent’s behavior. In other words, there must be a balance between control and autonomy. To reflect real human populations there also needs to be variation in the behaviors of the virtual humans to reflect individual differences (e.g. personalities, emotion, roles, and relationships). Finally, all of these features must come together in a framework that simulates virtual human behaviors in real-time in complex virtual environments.
Speaker's BioJan M. Allbeck joined the Department of Computer Science in August of 2009 as an assistant professor and faculty advisor for the undergraduate concentration in Computer Game Design. She is also director of the Games and Intelligent Animation Laboratory (GAIA). Prior to joining Mason, she received a Ph.D. in Computer and Information Science from the University of Pennsylvania, where she also received a Master’s degree and served as associate director for the Center for Human Modeling and Simulation for several years. Her Ph.D. advisor was Dr. Norman I. Badler. She also hold a B.A. in Mathematics and B.S. in Computer Science from Bloomsburg University of Pennsylvania.
SWE Seminar: EvoDroid: Segmented Evolutionary Testing of Android Apps
Wednesday, November 05, 2014
Proliferation of Android devices and apps has created a demand for applicable automated software testing techniques. Prior research has primarily focused on either unit or GUI testing of Android apps, but not their end-to-end system testing in a systematic manner. In this talk, I will present EvoDroid, an evolutionary approach for system testing of Android apps. EvoDroid overcomes a key shortcoming of using evolutionary techniques for system testing, i.e., the inability to pass on genetic makeup of good individuals in the search. To that end, EvoDroid combines two novel techniques: (1) an Android-specific program analysis technique that identifies the segments of the code amenable to be searched independently, and (2) an evolutionary algorithm that given information of such segments performs a step-wise search for test cases reaching deep into the code. Our experiments have corroborated EvoDroid's ability to achieve significantly higher code coverage than existing Android testing tools.
Speaker's BioRiyadh Mahmood is a PhD candidate in the Computer Science department at George Mason University. Riyadh has been working in the software engineering / IT consulting field for over a decade. He holds a Bachelor’s degree in Computer Engineering and a Master’s degree in Information Technology from Virginia Tech. Riyadh is currently working on his dissertation proposal, focusing on testing of Android applications on the cloud.
VSE Seminar: Performance Engineering for the Internet of Things
Thursday, November 06, 2014
Originally coined to describe networked RFID tags, the Internet of Things (IoT) now commonly refers to global systems of communicating IP-friendly objects. Even after taking into account the hype surrounding the label, proposed IoT systems are expected to require tens of billions of devices, dwarfing current Internet deployments. For these systems to reach their true potential a careful balance must be struck between producing extremely inexpensive and resource constrained devices while simultaneously designing protocols and system software that provide high levels of predictable performance. In this talk I describe recent work in IoT technologies that address this balance, including routing protocols, control plane management and energy harvesting methods. I also discuss future applications in areas such as smart spaces, smart grids and semi-autonomous systems.
GRAND Seminar: Robotic Motion Planning, Re-planning, and Parallelization
Tuesday, November 11, 2014
Reliable sensors, open-source software, and the steady march of Moore’s law have finally placed robotic technologies within the reach of the average citizen. However, we are still searching for motion planning and coordination algorithms that allow robots to move and interact with the same speed, grace, and harmony as humans.
My goal is to make the latter a reality, and this talk will highlight some of my work on re-planning in dynamic environments and parallelization of algorithms. In particular, I will talk about RRT-X and C-FOREST.
RRT-X is a new Re-planning algorithm that allows real-time navigation in environments that change unexpectedly. Theoretical results derived in the creation of RRT-x lend insight into the two other popular algorithms RRT* (Karaman and Frizzoli) and RRT# (Arslan and Tsiotras). C-FOREST is a parallelization framework for sampling-based motion planning algorithms that achieves significant super-linear speedup in practice (for example, using 64 CPUs yields a speedup over 300X). It has recently been included in the popular The Open Motion Planning Library (OMPL) maintained by the Kavraki Lab at Rice University.
Speaker's BioMichael Otte is a Researcher at the University of Colorado at Boulder in residence at the Aerospace Systems Directorate at the Air Force Research Laboratory, prior to that he was a Postdoctoral Associate at the MIT Laboratory for Information and Decision Systems. He received his PhD in 2011 from the University of Colorado at Boulder. He currently works on the application of artificial intelligence to robotics, with a focus on path planning and multi-agent systems.
Oral Defense of Doctoral Dissertation: Computation Offloading and Storage Augmentation for Mobile Devices
Wednesday, November 12, 2014
The increasing popularity of mobile devices calls for effective execution of mobile applications. Due to the relatively slower CPU speed and smaller memory size, plenty of research has been conducted on properly splitting and offloading computing intensive tasks in mobile applications to external resources (e.g., public clouds). Prior research on mobile computation offloading has mainly focused on how to offload. Moreover, existing solutions require the application developers to specify the computation intensive segment of the applications beforehand. In addition, these solutions do not work transparently with existing applications. In this dissertation, we aim to design and implement a mobile application offloading framework by addressing these issues. For this purpose, we first design and implement a transparent offloading mechanism called FaST: a Framework for Smart and Transparent mobile computation offloading, to allow mobile applications to automatically offload resource-intensive methods to more powerful computing resources without requiring any special compilation or modification to the applications' source code or binary. We also find the key features that can dynamically impact the response time and the energy consumption of the offloaded methods and thus design Lamp, a runtime Learning model for application profiler, based on which we profile the local on-device and the remote on-server executions of the applications. To address the problem of what to offload, we design and implement an application partitioner, called Elicit, to Efficiently identify computation-intensive tasks in mobile applications for offloading in a dynamically changing environment by estimating the applications' on-device and on-server performance with Lamp.
In addition to the offloading demand for mobile applications, the increasing popularity of mobile devices has also caused the demand surge of pervasive and quick accessing files across different personal devices owned by a user. Most existing solutions, such as DropBox and SkyDrive, rely on centralized infrastructure (e.g., cloud storage) to synchronize files across different devices. Therefore, these solutions come with potential risks of user privacy and data secrecy. In addition, the consistency, synchronization, and data location policies of these solutions are not suitable for resource-constrained mobile devices. Furthermore, these solutions are not transparent to the mobile applications. Therefore, in this dissertation, we design and implement a system Virtually Unify Personal Storage (vUPS) for fast and pervasive accesses of personal data across different devices.
Distinguished Lecture Series: Learning and Multiagent Reasoning for Autonomous Robots
Wednesday, November 12, 2014
Over the past half-century, we have transitioned from a world with just a handful of mainframe computers owned by large corporations, to a world in which private individuals have multiple computers in their homes, in their cars, in their pockets, and even on their bodies. This transition was enabled by computer science research in multiple areas such as systems, networking, programming languages, human computer interaction, and artificial intelligence. We are now in the midst of a similar transition in the area of robotics. Today, most robots are still found in controlled, industrial settings. However, robots are starting to emerge in the consumer market, and we are rapidly transitioning towards a time when private individuals will have useful robots in their homes, cars, and workplaces. For robots to operate robustly in such dynamic, uncertain environments, we are still in need of multidisciplinary research advances in many areas such as computer vision, tactile sensing, compliant motion, manipulation, locomotion, high-level decision-making, and many others. This talk will focus on two essential capabilities for robust autonomous intelligent robots, namely online learning from experience, and the ability to interact with other robots and with people. Examples of theoretically grounded research in these areas will be highlighted, as well as concrete applications in domains including robot soccer and autonomous driving.
Biography:Dr. Peter Stone is an Alfred P. Sloan Research Fellow, Guggenheim Fellow, AAAI Fellow, Fulbright Scholar, and University Distinguished Teaching Professor in the Department of Computer Science at the University of Texas at Austin. He received his Ph.D in Computer Science in 1998 from Carnegie Mellon University. From 1999 to 2002 he was a Senior Technical Staff Member in the Artificial Intelligence Principles Research Department at AT&T Labs - Research. Peter's research interests include machine learning, multiagent systems, robotics, and e-commerce. In 2003, he won a CAREER award from the National Science Foundation for his research on learning agents in dynamic, collaborative, and adversarial multiagent environments. In 2004, he was named an ONR Young Investigator for his research on machine learning on physical robots. In 2007, he was awarded the prestigious IJCAI 2007 Computers and Thought award, given once every two years to the top AI researcher under the age of 35. In 2013 he was awarded the University of Texas System Regents' Outstanding Teaching Award and in 2014 he was inducted into the UT Austin Academy of Distinguished Teachers.
Oral Defense of Doctoral Dissertation: Learning in Relational Networks
Thursday, November 13, 2014
Graphs, a.k.a. relational networks, have emerged as a powerful mechanism for representing complex structured data captured from the omnipresent social and information networks. The past decade has seen a tremendous growth in web services and applications that allow users to communicate with each other, to collaborate in creating media, discuss and rate purchased entities and items. Relational networks are also found in a variety of domains ranging from gene-regulatory networks, author-citation networks, and inter-country trade networks. The objective of this dissertation is to develop approaches for extracting useful information from these social and information networks.
Specifically, I present classification algorithms for predicting class labels associated with nodes of relational networks. Acquiring class labels for these nodes is an expensive process. As such, I introduce novel active learning based approaches for acquiring class labels and training relational classifiers. Our techniques apply to both single-labeled and multi-labeled networks. Our experimental evaluation on real-world data shows statistically significant results on both single- and multi-labeled networks in comparison to state-of-the-art approaches. Relational networks play an important role also for recommender systems. They capture the inter-connections between various users and items. Traditional recommender systems identify and recommend interesting items to a given user based on the user's past rating activity. These systems improve their recommendations by identifying user preferences and item related information from external sources, like reviews written by users, or concept tags about items shared by users. In this dissertation, I also seek to improve recommender systems by integrating user preferences as side information within standard neighborhood-based and latent factor based methods. We assume that a user's choice of tags provides additional information about the user's personal preference and additional features about the item. Since querying users to provide tags imposes an additional burden on the user, we demonstrate the use of relational classification algorithms for predicting tags for both items and users. Our experimental results on several real-world data show the advantages of using tag-based information within recommender systems.
GRAND Seminar: Skinning Cubic Bézier Splines and Catmull-Clark Subdivision Surfaces
Thursday, November 20, 2014
Linear blend skinning has become a vital tool for the animation and design of 2D and 3D shapes. Unfortunately, such methods cannot deform vector graphics (splines and subdivision surfaces). Vector graphics are used throughout 2D (PDF’s, fonts, the web, illustrations) and 3D (industrial design and computer-aided design) computer graphics. We propose an efficient solution to this problem that preserves the vector nature of a shape (splines stay splines and subdivision surfaces stay subdivision surfaces). This allows the entire literature of linear blend skinning approaches to be used with vector graphics for the first time.
Note: This paper is to be presented at ACM SIGRAPH Asia 2014 in December
Speaker's BioSongrun Liu received the B.Sc. degree in computer Science and Technology and M.Sc. degree in Software Engineering from Zhejiang University in 2008 and 2011, respectively. He is a Ph.D. student in the Department of Computer Science at George Mason University, working with Dr. Yotam Gingold. His current research interests focus on optimization, animation, 2D/3D modeling and editing in computer graphics.
SANG Seminar: Evaluating Cyber Moving Target Techniques
Friday, November 21, 2014
Cyber moving target (MT) has been identified as one of the game-changing themes to re-balance the cyber landscape in favor of defense. MT techniques make cyber systems less static, less homogeneous, and less deterministic in order to create uncertainty for the attackers. Although many MT techniques have been proposed in the literature, little has been done on evaluating their effectiveness, benefits, and weaknesses.
In this talk, we evaluate the wide range of MT techniques using three approaches. First, a qualitative assessment studies the potential benefits, gaps, and weaknesses for each category of MT. This step identifies major gaps in this domain which can guide future research and prototyping efforts. We also provide the findings of a qualitative assessment case study on code reuse defenses. Second, for the MT techniques that are identified as potentially more beneficial in the qualitative assessment, we perform a deeper quantitative assessment using real exploits. Third, we perform an assessment of how information leakage can impact the effectiveness of MT techniques inside a larger system.
Finally, we outline possible directions for future work in this domain.
Speaker's BioHamed Okhravi is a member of Technical Staff at the Cyber Systems and Technology group of MIT Lincoln Laboratory, researching in the area of cyber security. Dr. Okhravi received his MS and PhD in Electrical and Computer Engineering from University of Illinois at Urbana-Champaign (2006 and 2009). During his PhD study, he also interned at Network Geographics (2007) and Cisco Systems, Inc. (2008). Currently, Dr. Okhravi is working on cyber-attack survivable systems at MIT. His research interests are in cyber security, cyber trust, science of security, security metrics, and operating systems.
Oral Defense of Doctoral Dissertation: Recommending Service Repairs
Friday, November 21, 2014
An increasing amount of data is provided by information services: Software constructed with web services that return output from underlying data sources given input supplied in HTTP requests. Instead of the traditional SOAP/WSDL standards, these restricted type of services are designed in the Representative State Transfer (REST) architectural style in which service functionality is memoryless, stateless, and free of side effects. Indeed, this service-oriented approach to software construction offers many benefits, but there is one looming disadvantage. After deployment, services within the program may fail. In response, programmers may attempt to recover lost functionality with another service. Unfortunately, the repair process is a manual task where programmers must locate the failure and then repair it with a substitute; that is, another version with similar functionality. While techniques are available for identifying where the program is broken, an open problem remains: How to identify candidate repairs?
To address this problem, recent research explores methods to automate aspects of the repair process, and thus support programmers while they debug failures. As yet, most automated repair methods assume programmers have access to program source code and so they focus on internal software changes. However we consider services to be “black-boxes”; that is, implementation details (such as source code or documentation) are unavailable. Consequently, programmers cannot modify the failed service or rely on source code for insight into its functionality. Hence we focus our repair methods on the recommendations of substitute services and not on source code modifications. In other words, given a failed service, and assuming services are registered in a repository, the problem is to recommend another service (or set of services) that would be a “good” substitute for the failure. Without source code, our idea is to describe service functionality by its behavior; that is, the observable relationship between its input and output. The thesis of this dissertation is service behavior can be economically sampled and efficiently compared, and as a result, given a service we can recommend candidate substitutions from a repository of services. In this dissertation a model for representing information services is proposed, a definition for measuring similarity between two services is defined, and a language for expressing compositions of services is specified. Then, a heuristic to extract small representative samples of behavior is presented, a heuristic for improving service recommendation performance is presented, and a method for making recommendations in the context of service compositions is described.
Oral Defense of Doctoral Dissertation: Motion Comparison and Tactic Analysis in Sports Training
Monday, November 24, 2014
There are two important subjects in competitive sports training, namely the athlete’s individual mechanics training and the team’s tactics training. For the student athlete’s individual mechanics training, existing training systems seek to capture and visualize the student’s motions in 3D virtual environments. However, those systems leave the students themselves to figure out how to revise their motions to improve performance. We design a training system that is able to compare the coach’s motions to the student’s and quantify their distances. In addition, based on the motion comparison, we generate training advice to tell students where and how to improve. Besides individual mechanics training, tactics training is an important training aspect for team sports. Tactics training studies the current state of the game from the existing broadcast video footages to determine a good tactic move. Because broadcast sports videos consist of both tactic relevant shots and irrelevant shots, many efforts have been made to automatically segment videos to separate these types of shots. Some methods use domain knowledge of the target sports activity, which are not able to be applied to other sports activities. Other systems use supervised learning to improve frame classification and shot boundary detection accuracy, but often fail to maintain the integrity of the tactic relevant segments and the video structure. We introduce a novel method S-CRP to segment broadcast sports videos into high quality semantic shots. In addition, we also introduced a new performance metric to provide a more accurate measure of how well the segmentation result maintains the structure of the original video.
Current tactic analysis systems provide only low level assistances in tactics training. They are able to capture certain events in the game and help the team with statistical analysis, but they provide little help towards finding a better tactic. After sports videos segmentation, we design a novel tactics training system that is able to consider players’ attributes together with their positions to estimate each player’s offense threat or defense ability and find the defense tactic that can minimizes the offense team’s threat.
Oral Defense of Doctoral Dissertation: Robust and Reusable Methods for Shepherding and Visibility-Based Pursuit
Wednesday, December 03, 2014
Algorithms for the control and monitoring of swarms of moving agents are important in a wide variety of real and virtual applications such as crowd control, livestock herding, and decentralized robot control architecture. In the presence of obstacles, these applications can be quite difficult, especially with large swarms. This thesis presents reusable and robust motion planning algorithms for the swarm control problem of shepherding, a task involving using a small set of mobile robots interact with a larger set of swarm agents; and the swarm monitoring problem of visibility-based pursuit, a task involving using a mobile robot to follow and maintain visibility of a moving swarm. For both problems, we developed algorithms to efficiently sample reusable geometric information in the environment to enable fast online planning and replanning. For the shepherding problem, we discuss several representations and abstractions for flocks to improve scalability and robustness to uncertainty. For the visibility-based pursuit problem, we discuss several methods for space decomposition that enable fast online planning to achieve visibility objectives. The results are validated using multi-agent simulation software to understand the tradeoffs between different techniques for these problems.
GRAND Seminar: Learning to Parse Videos
Friday, December 05, 2014
In this talk, we look into the problem of segmenting, tracking, and extracting 3D time varying shape and camera poses for non-rigid objects in monocular videos. Our method segments and tracks objects and their parts using past segmentation and tracking experience from a training set, and uses the segmented point trajectories of each object to extract 3D shape assuming a low-rank shape prior. We segment using motion boundaries and learnt saliency detection, and outperform by a margin the previous state-of-the-art in challenging video scenes. We ``learn to track’’ using a novel tracking loss in a distance learning framework, and outperform color and texture histograms as well as deep feature learnt from Imagenet Classification or Detection tasks. We extract dense 3D object models from realistic monocular videos, a problem typically studied with lab acquired datasets, pre-segmented objects and oracle trajectories.
Speaker's BioKaterina Fragkiadaki is a Post Doctoral Researcher in the Computer Vision Laboratory at UC Berkeley with Jitendra Malik. She received her Ph.D. in University of Pennsylvania in 2013. She is the co-recipient of best thesis award in the Computer Science Department in UPenn. She has a BA in Electrical and Computer Engineering from National Technical University of Athens. She works on problems related to video understanding, tracking, video segmentation.
Oral Defense of Doctoral Dissertation: Error Controls for Broadcast Communication Systems
Tuesday, December 16, 2014
Traditional network protocols employ error control techniques for reliable information dissemination over noisy communication channels. In this dissertation, two main topics are investigated for e cient error controls over a broadcast channel. First, unequal error protection (UEP) coding schemes for multiuser communications are investigated, and we propose integer programming approaches to UEP coding and decoding. Second, reliable packet transmissions over a single-hop broadcast network are considered, and we propose a unied solution to use a deterministic network coding for a packet retransmission scheme and a packet-level forward error correction scheme.
For multiuser communications over a broadcast channel, integer programming approaches are introduced to the construction and the decoding of a binary linear UEP code. First, optimal UEP codes are constructed from integer programming for maximum e ciency, and lower bounds of UEP codes are derived to show the e ciency. Then, performance of the UEP coding scheme for multiuser communications are analyzed on a degraded broadcast channel. Finally, a decoding method of the binary UEP code, that uses iterative integer programming and majority logic, is proposed. By presenting numerical results, examples, and comparisons, we demonstrate that the UEP coding scheme e ectively provides e cient forward error correction for multiuser broadcast communications.
For reliable packet transmissions over a single-hop broadcast network, we propose packet-level error control schemes by using a deterministic linear network coding. We rst construct a deterministic network code based on a Reed-Solomon (RS) error correcting code. Then, we provide an adaptive way to apply the deterministic network code for both retransmissions and forward error corrections by puncturing a generator matrix of the RS code. Numerical analysis and simulations are performed to show the e ciency of the error control schemes.
Oral Defense of Doctoral Dissertation: Method and Models to Enable Optimal Automated Service Composition
Thursday, December 18, 2014
Since the development of the Service Oriented Architecture concept, business analysts and system developers have looked forward to the day when they could reconfigure applications to adapt to new business by combining services in new ways to adapt to changing business needs. Technologies such as the Web Services Description Language and Business Process Modeling Notation (BPMN) provide key building blocks but are not sufficient to enable run-time reconfiguration of services. To enable this functionality, this research develops Druid, a framework for dynamically composing web services into executable processes based on a business process model defined using the BPMN modeling language. To support this framework, this research develops a service description modeling language, extensions to the BPMN language, and a formal model for composing services based on a business model. This research also develops a Quality of Service (QoS) model used for calculating the optimal service composition.
Oral Defense of Doctoral Dissertation: Probabilistic Algorithms for Modeling Protein Structure and Dynamics
Tuesday, January 13, 2015
This thesis proposes novel probabilistic algorithms to address critical open problems in computational structural biology regarding the relationship between structure, dynamics, and function in protein molecules. The focus on protein modeling research is warranted due to the ubiquity and central role of proteins in life-critical processes in the living cell. Many disorders in the sick cell are proteinopathies, where a protein disrupts a chemical process, causing the cell to deviate from its intended biological activity. However, unlike other life-critical macromolecules, such as DNA and RNA, where significant information about activity can be extracted from knowledge of the ordering of the constitutive building blocks, proteins exhibit a more complex relationship between the order of building blocks, the structures arising from spatial arrangements of the building blocks in three-dimensional space, and the determination from such arrangements of biological activity. Since studies of proteins pose exceptional challenges in wet laboratories, the work presented in this thesis proposes novel and powerful algorithms to complement wet-laboratory research. The proposed algorithms exploit analogies between protein systems and articulated linkages and employ techniques from machine learning and stochastic optimization. The work presented in this thesis is of general utility to other domains in computer science dealing with hard optimization problems for complex systems.
Education:George Mason University, 1998 B.S. in Computer Science George Mason University. 2011 M.S. in Computer Science
CS Seminar: Automated Analysis and Testing of Mobile Software
Wednesday, February 11, 2015
Mobile app markets have created a fundamental paradigm shift in the way software is delivered to consumers. In particular, by providing a medium for reaching a large consumer base at a nominal cost, app markets have made it possible for small entrepreneurs to compete against prominent software companies. At the same time, since many of the entrepreneurs do not have the resources to employ proper software engineering practices, many apps provisioned on the markets are riddled with defects that not only inconvenience the users, but also easily exploited by attackers for nefarious purposes. In this talk, I will present a combination of static and dynamic program analysis techniques aimed at detecting such defects in Android apps. I will also describe experimental evaluation of the tools realizing these techniques using real-world apps and in collaboration with the government agencies. Finally, I will conclude the talk with an outline of future research directions.
CANCELLED: GRAND Seminar: Player Perception of Responsiveness and Naturalness for Virtual Characters in Digital Games
Wednesday, February 18, 2015
DUE TO INCLEMENT WEATHER THIS TALK HAS BEEN CANCELLED Real-time animation controllers are fundamental for animating characters in response to player input. However, the design of such controllers requires making trade-offs between the naturalness of the character's motions and the promptness of the character's response. Furthermore, lag is typically unavoidable, particularly in networked multiplayer games. In this talk, I will present the results of multiple experiments, in which we investigate the effects of response lag and trade-offs between responsiveness and naturalness on players' enjoyment, control, satisfaction, and opinion of the character in a simple platform game.
Sophie JoergSophie Joerg is an Assistant Professor of Computer Science at Clemson University. Her research is in computer graphics, centering around character animation, motion perception, and digital games. She holds a Ph.D. from Trinity College Dublin and was a visiting researcher at Carnegie Mellon University and Disney Research, Pittsburgh. Her recent work, which is supported by agencies including the National Science Foundation, focuses on creating new algorithms in data-driven character animation, investigating how we perceive virtual characters, and using digital games in education.
CANCELLED: CS Seminar: Scalable and Interactive Data Analytics
Wednesday, February 18, 2015
DUE TO INCLEMENT WEATHER THIS TALK HAS BEEN CANCELLED.
Big data has become ubiquitous. In recent years, data acquisition has become easier and data storage has become cheaper which has led to the accumulation of large volumes of data in a wide range of applications. Analytical methods that can provide critical insights from such voluminous datasets are yet to catch up with these rapid developments. This talk consists of two parts. In the first part, I will present mapreduce-based scalable ensemble machine learning algorithms that can efficiently handle large-scale data by facilitating simultaneous participation of multiple computing nodes. I will demonstrate the superior performance of the proposed algorithms in terms of speedup and scaleup while maintaining the generalizability of the corresponding original versions. Some of the related topics such as privacy-preserving data mining and multi-task learning in the context of big data will also be discussed. In the second part of the talk, I will present a novel interactive topic modeling approach for analyzing document collections. Most of the widely-used topic modeling methods based on probabilistic models, such as Latent Dirichlet Allocation (LDA), have drawbacks in terms of consistency from multiple runs, empirical convergence, and incorporating user feedback. To overcome these challenges, we developed a reliable and flexible visual analytics system for topic modeling called UTOPIAN (User-driven Topic modeling based on Interactive Nonnegative Matrix Factorization). I will describe the UTOPIAN system and how it enables users to interact with the topic modeling method and steer the results in a user-driven manner. I will end this talk with some of our ongoing works in healthcare and social media.
Speaker's BioChandan Reddy is an Associate Professor in the Department of Computer Science at Wayne State University. He received his Ph.D. from Cornell University and M.S. from Michigan State University. He is the Director of the Data Mining and Knowledge Discovery (DMKD) Laboratory and a scientific member of Karmanos Cancer Institute. His primary research interests are Data Mining and Machine Learning with applications to Healthcare Analytics, Bioinformatics and Social Network Analysis. His research is funded by the National Science Foundation, the National Institutes of Health, the Department of Transportation, and the Susan G. Komen for the Cure Foundation. He has published over 50 peer-reviewed articles in leading conferences and journals including TPAMI, TKDE, DMKD, SIGKDD, ICDM, SDM, and CIKM. He received the Best Application Paper Award in ACM SIGKDD conference in 2010, and was a finalist of the INFORMS Franz Edelman Award Competition in 2011. He is a senior member of IEEE and life member of ACM.
Oral Defense of Doctoral Dissertation: Hierarchical Multiagent Learning from Demonstration
Monday, March 02, 2015
Developing agent behaviors is often a tedious, time-consuming task consisting of repeated code, test, and debug cycles. Despite the difficulties, complex agent behaviors have been developed, but they required significant programming ability. An alternative approach is to have a human train the agents, a process called learning from demonstration. This thesis develops a learning from demonstration system called Hierarchical Training of Agent Behaviors (HiTAB) which allows rapid training of complex agent behaviors. HiTAB manually decomposes complex behaviors into small, easier to train pieces, and then reassembles the pieces in a hierarchy to form the final complex behavior. This decomposition shrinks the learning space, allowing rapid training. I used the HiTAB system to train George Mason University's humanoid robot soccer team at the competition which marked the first time a team used machine learning techniques at the competition venue. Based on this initial work, we created several algorithms to automatically correct demonstrator error.
The primary thrust of the thesis is to apply HiTAB to the multiagent case, which presents two primary difficulties. First, multiagent systems typically have large, high-dimensional learning spaces requiring significant examples to sufficiently cover. Second, a daunting inverse problem exists: the demonstrator knows which macro-level behaviors the system should perform, but not the micro-behaviors each agent should perform to accomplish the desired macro-behavior. We developed two techniques to reduce the gap between micro- and macro-level behaviors: agent hierarchies and behavior bootstrapping. Both techniques reduce the gap through simplifying the problem by reducing the number of agents being trained and reducing the number of agents directly interacting with each other. We applied agent hierarchies to a homogeneous swarm of hundreds of agents (the largest multiagent learning from demonstration system to date) and a group of robots performing a patrolling task. Additionally, we trained a heterogeneous group of robots to perform a box pushing task using agent hierarchies.
GRAND Seminar: Design Research at IISc and the Indo-US Center of Excellence for Sustainable Design
Wednesday, March 04, 2015
Design Research investigates phenomena associated with designing so as to develop knowledge with which to inform and improve research, education and practice of design. Centre for Product Design and Manufacturing at Indian Institute of Science (IISc) is has a strong focus on design research, its areas spanning from design theory and methodology,design synthesis and creativity, human factors, aesthetics, safety, sustainability, design for manufacturing, digital human modelling, product informatics, geometric modelling, CAD/CAE, Robotics, and so on. IISc has been recently awarded a Indo-US Joint Center of Excellence (CoE) in Design of Sustainable Products, Services and Manufacturing Systems, with partners from Centre for Study of Science, Technology and Policy (CSTEP), Indian Institute of Management Ahmedabad (IIMA), National Innovation Foundation India (NIF), University of California Berkeley, Syracuse University and Washington State University. The talk will outline some of the research from IISc into design, and the main objectives and deliverables of the CoE.
Speaker's BioAmaresh Chakrabarti is a professor of Engineering Design at the Centre for Product Design and Manufacturing, Indian Institute of Science (IISc), Bangalore. He has BE in Mechanical Engineering from Univ. of Calcutta (now IIEST Shibpur), India, ME in Mechanical Design from IISc, and PhD in Engineering Design from University of Cambridge, UK. After PhD, he led for 10 years the Design Synthesis team at the EPSRC Centre for Excellence Engineering Design Centre at University of Cambridge, before joining IISc. His interests are in design synthesis & creativity, eco-design & sustainability, and product informatics. He authored/edited 10 books, over 250 peer-reviewed articles, and has 6 patents granted/pending. He co-authored DRM, a methodology used widely as a framework for doing engineering design research. He is an Associate Editor, AI EDAM (Cambridge University Press), Area Editor, Research in Engg Design (Springer), Regional Editor, J for Remanufacturing (Springer), and Advisory Editor for nine International Journals incl. J of Engg Design (Taylor & Francis), Clean Technologies and Environmental Policy (Springer), and Int J of Design Creativity and Innovation (Taylor & Francis). Professor Chakrabarti has been on the Advisory Board of Design Society, UK, where he is currently a member of its Board of Management, member of the CII National Committee on Design, India, and member of the Jury for India Design Mark, India Design Council. He founded IDeASLab – the first laboratory in India for research into design creativity, sustainability and innovation. He is Programme chair for International Conferences on Research into Design (ICoRD), 22nd CIRP Design Conference (CIRP Design 2012) and vice-Chair for AI in Design (AID) and Design Computing and Cognition (DCC) Conferences. He is an Honorary Fellow of the Institution of Engineering Designers, the peer society under the UK Royal Charter in engineering design. 11 of his papers won top paper awards in various international conferences.
SANG Seminar: Content-Adaptive Display Power Saving in Internet Mobile Streaming
Friday, March 06, 2015
Backlight scaling is a technique proposed to reduce the display panel power consumption by strategically dimming the backlight. However, for Internet streaming to mobile devices, a computationally intensive luminance compensation step must be performed in combination with backlight scaling to maintain the perceived appearance of video frames. This step, if done by the CPU, could easily offset the power savings via backlight dimming. Furthermore, computing the optimal backlight scaling values requires per-frame luminance information, which is typically too energy intensive to compute on mobile devices.
In this talk, I will present Content-Adaptive Display (CAD) for Internet mobile streaming. CAD uses the mobile device's GPU rather than the CPU to perform luminance compensation at reduced power consumption. Optimal backlight scaling schedule is computed using a more efficient dynamic programming algorithm than existing work. We implement CAD within an Android app and use a Monsoon power meter to measure the real power consumption. Experiments are conducted on more than 470 randomly selected YouTube videos, and results show that CAD can effctively produce power savings.
Speaker's BioMengbai Xiao is a Ph.D. student of Computer Science Department at George Mason University. His research interests incude mobile computing, cloud storage, and security.
SWE Seminar: Defect-Based Testing
Friday, March 06, 2015
What’s a good test case? Text book, practice and intuition suggest: one that reveals a fault. But the thought experiment of a perfect program shows the deficiency of this definition - in this case, there would not be any good test cases. A more adequate definition is that a good test case reveals likely potential faults with good cost effectiveness. The model-based testing community tends to answer this question in one of two ways: good test cases are defined by coverage (because we can) and by explicit test purposes (because we sense that there must be more, but others should do the work). In this talk, we argue why coverage-based testing is inherently problematic, if not useless, and propose to complement explicit test purposes by fault and failure models. These encode what typically goes wrong in a specific domain, technology, company, or application family, and describe what can potentially go wrong, thus catering to the above definition of good test cases. We discuss the nature of potential faults, formalize them, provide examples, and discuss their operationalization for test case generation also outside the domain of model-based testing.
Speaker's BioAlexander Pretschner is a full professor of computer science at Technische Universitaet Muenchen. Research interests include software engineering, specifically testing; and information security, specifically distributed data usage control. Prior appointments include a full professorship at Karlsruhe Institute of Technology; an adjunct associate professorship at TU Kaiserslautern; a group management position at the Fraunhofer Institute for Experimental Software Engineering in Kaiserslautern; guest professorships at the universities of Rennes, Trento, and Innsbruck; and a senior researcher's position at ETH Zurich. PhD degree from Technische Universitaet Muenchen; Master's degrees from Kansas University, on a Fulbright scholarship, and from RWTH Aachen. Recent awards include two IBM faculty awards, a Google focused research award, and a Fraunhofer Attract award. Member of the editorial board of the IEEE Transactions on Secure and Dependable Computing, the Journal of Software Testing, Verification, and Reliability, and the Journal of Software Systems Modeling; membership in numerous program committees; organization of ca. 25 symposia; frequent invited speaker; frequent reviewer for national and international funding agencies as well as hiring committees.
CS Seminar: Cybersecurity Hardening at Wireless Physical Layer
Wednesday, March 18, 2015
With the rapid advancement of communication and networking technologies, ubiquitous wireless network access and Internet of Things (IoT) have become reality. Due to the "open air" nature of the wireless medium, resource constraints on wireless devices, and distributed nature of wireless networks, ensuring cybersecurity faces great challenges.
Although traditional cryptography-based mechanisms are widely used for cybersecurity, there has been an increasing interest in hardening cybersecurity by exploiting various physical layer properties. Different from traditional computational-security mechanisms, physical layer security is based on the physical principles of wireless channels and devices, thus provides a fundamentally different approach to secure wireless networks and provides another means of protection at the physical layer. It can be widely applied in various wireless networks for providing different security services. It has a great potential to simplify or avoid complex cryptographic computation and achieve information-theoretical security.
This talk provides an overview of various wireless physical layer security techniques. Our recent works on extracting shared secret keys from wireless channels and physical layer challenge-response authentication will be presented. Future directions in the physical layer security area and its applications will also be discussed.
Speaker's BioKai Zeng is an assistant professor in the Department of Electrical and Computer Engineering at George Mason University. He is also affiliated with the Department of Computer Science and the Center for Secure Information Systems. He received his Ph.D. degree in Electrical and Computer Engineering at Worcester Polytechnic Institute (WPI) in 2008. He was a postdoctoral scholar in the Department of Computer Science at University of California, Davis (UCD) from 2008 to 2011, and an assistant professor in the Department of Computer and Information Science at University of Michigan - Dearborn from 2011 to 2014.
Dr. Zeng was a recipient of the U.S. National Science Foundation Faculty Early Career Development (CAREER) award in 2012. He won Excellence in Postdoctoral Research Award at UCD in 2011 and Sigma Xi Outstanding Ph.D. Dissertation Award at WPI in 2008. He was a visiting faculty of Air Force Research Laboratory through Visiting Faculty Research Program (VFRP) in summer 2013. He is a member of IEEE and ACM, and a Sigma Xi Associate Member.
Computer Science Colloquium: Computing on Unbounded Private Data
Thursday, March 19, 2015
Today, a large amount of private data resides in the cloud in an encrypted form. There are tremendous benefits of performing computations on this data to extract useful information. Yet, security of encryption seems inimical to this goal.
This raises the following fundamental question: Can the cloud perform general computations on encrypted datasets of unbounded size (without knowing the decryption key)?
My research provides the first positive resolution of this question. In this talk I will describe these new results, and discuss their interplay with other broad questions in cryptography.
Speaker's BioOmkant Pandey is a postdoctoral researcher at the University of Illinois at Urbana Champaign (UIUC). He is interested in the broad areas of secure computation and next-generation encryption systems. He received his Ph.D. from UCLA, and has published over 20 original research papers at venues such as CRYPTO, EUROCRYPT, TCC, and ACM CCS, including an influential work on Attribute Based Encryption with over 1600 citations.
GRAND Seminar: The Success of Question Answering Communities: How Diversity Influences Ad Hoc Groups
Friday, March 20, 2015
Question & answer communities have emerged as popular and effective repository of knowledge. The growing success of this communities depends principally on the will of the users to answer questions. In this study, we analyzed the users and the posts of 50 different active communities belonging to the Stack Exchange network. In particular, we focused on the correlation between success and diversity within these communities. A framework to evaluate different aspects of the success and diversity inside the communities is proposed. The results show how the effect of diversity on success within Q&A communities depends on the nature of the topic. This means that in general, non-technical topics, diversity in users' tenure increases the ability of the community to obtain better answers. On the other hand, in technical boards, expertise prevails over diversity, so it is more important to maintain an expert community to ensure success within these communities.
Speaker's BioEmanuel Alsina is a PhD student in Computer Science at the International Doctorate School in Information and Communication Technologies organized by the Department of Engineering Enzo Ferrari of the University of Modena and Reggio Emilia in Italy. He received the bachelor and master degrees in Management Engineering from the University of Bologna respectively in 2008 and 2011. His interests focus on complex systems in industrial environments, modeling and simulation, decision-making process, and group dynamics. Currently he is working with the Center for Complexity in Business at the Robert H. Smith School of Business at the University of Maryland, studying the diversity and complexity in group dynamics and the impact that has on group performance. He has studied and worked on international teams in Italy, Argentina, and USA.
Computer Science Colloquium: Practical Methods for Computing on Encrypted Data
Monday, March 23, 2015
Traditional data encryption provides an all-or-nothing approach to protecting our data: anyone holding the secret key can recover the plaintext, while anyone without it will learn nothing at all. Recent advances in cryptography offer a new way to protect our data, allowing us to perform complex computations directly on the ciphertext while revealing nothing more or less than what the data owner intends.Perhaps the most famous example of this is fully homomorphic encryption, which was first constructed in 2009, and has since received a lot of attention and research. But there have been equally exciting advances in a whole host of related areas, including encrypted database search, secure computation, functional encryption, and provably-secure program obfuscation. Cryptographers now have a large bag of tools and techniques for computing on encrypted data. In this talk, we will explore the current landscape, describing both what is possible and what is practical.
Speaker's BioDov Gordon received his PhD in cryptography from the University of Maryland in 2010. He was a postdoc at Columbia University from 2010 to 2012 as a recipient of the Computing Innovation Fellowship, which funded him to study the practical applications of secure computation. He is currently a Research Scientist at Applied Communication Sciences, where he continues his research in cryptography, while also collaborating in a broad range of research in cyber and network security.
Distinguished Lecture Series: Greening Datacenters: Past, Present and Future
Friday, March 27, 2015
Datacenters host the server infrastructure that powers organizations of many sizes, from universities and enterprises to large Internet services. Collectively, datacenters consume a massive amount of power, representing a financial burden for datacenter operators, an infrastructure burden on power utilities, and an environmental burden on society. However, this problem could be worse if it were not for several advances made over the last decade, especially in the design of large-scale datacenters. In this talk, I will overview the architecture of these datacenters, discuss the main advances made to date, and suggest research directions for the future. I will also explore one of these directions in more detail, namely energy conservation via reducing the tail latencies of online services.
Speaker's BioDr. Ricardo Bianchini received his PhD degree in Computer Science from the University of Rochester in 1995. He is a Professor of Computer Science at Rutgers University, but is currently on leave working as Microsoft's Chief Efficiency Strategist. His main interests include cloud computing, and power/energy/thermal management of datacenters. In fact, Dr. Bianchini is a pioneer in datacenter energy management, energy-aware storage systems, energy-aware load distribution across datacenters, and leveraging renewable energy in datacenters. Dr. Bianchini has co-chaired the program committee of several conferences and workshops, and currently serves on the editorial board of five journals. He has published eight award papers, and has received the CAREER award from the National Science Foundation. He is currently an ACM Distinguished Scientist and an IEEE Fellow.
Computer Science Colloquium: Practice-Oriented Cryptography: A Definition-Centric Approach
Thursday, April 02, 2015
In cryptography, a good formalization can often lead to significant improvements for both security and efficiency. In this talk, I'll discuss two illustrative examples.
First, I'll present UCE, a security model for hash functions (a very widely used operation in cryptography) that gives rigorous justification for several important heuristic schemes. While UCE can be directly instantiated via existing hash constructions, which are already quite efficient, I'll show a novel hash design that leads to extremely fast instantiations of UCE.
Second, I'll give the first formalization of garbled circuits, a central technique in cryptography. Based on this abstraction, I discover critical bugs in several well-known applications and implementations. I also present new optimization methods that significantly improve the speed of creating and evaluating garbled circuits.
Speaker's BioViet Tung Hoang is a postdoc at Georgetown University (with Adam O'Neill) and University of Maryland College Park (with Jonathan Katz). He did his Ph.D. at UC Davis under the supervision of Phillip Rogaway, and then spent another year doing postdoc at UCSD with Mihir Bellare.
Computer Science Colloquium: Efficient Cryptographic Tools for Privacy-Preserving Applications
Monday, April 06, 2015
The benefits of electronic transactions are numerous, but they come with a price: the technologies used often overlook the privacy of their users. Achieving privacy in a digital world is a rather challenging problem, privacy-preserving solutions are harder to construct and, at the same time, are significantly more computationally intensive than privacy-intrusive ones. In this talk I show how to combine techniques from cryptography and systems security to get all the benefits of electronic transactions without sacrificing user privacy and without giving up efficiency.
I will focus on two major sub-domains of privacy-preserving technologies: anonymous credentials (the core ingredient to private authorization), and anonymous electronic payments. In the first part of my talk I will present Anonymous Credentials Light, the most efficient and provable secure anonymous credential scheme. In the second part, I will describe a privacy-preserving protocol for electronic payments and show an efficient smartphone implementation of it for the public transportation scenario.
Speaker's BioFoteini Baldimtsi is a Post-Doctoral researcher in the Department of Computer Science at Boston University. Her research interests are in the general area of cryptography, security and data privacy with a special focus on the design and implementation of provably secure private authorization and electronic payment schemes. She received her PhD and MSc in Computer Science from Brown University in 2014 and 2011, and has worked as a researcher at IBM and Microsoft Research.
Distinguished Lecture Series: Confessions of an Accidental Greenie: From Green Destiny to the Green 500
Friday, April 17, 2015
While green computing appears to be mainstream, this was not always the case. It was only a decade ago when green computing, and in particular, green supercomputing, was still being scoffed at. Back then, computing only focused on performance in terms of speed, as evidenced by the annual Gordon Bell Awards at Supercomputing (SC). Such a view is akin to purchasing an automobile based solely on its top speed rather than on other performance metrics such as energy efficiency or reliability. As a consequence, when Green Destiny debuted in April 2002 as the world's most energy-efficient (i.e., greenest) supercomputer in the world, it was resoundingly ridiculed --- so much so that one supercomputing visionary joked that "Green Destiny is so low power that it runs just as fast as when it is unplugged." At the same time, the audacity of Green Destiny, a 240-node supercomputer in five square feet and consuming as little as 3.2 kilowatts of power (or the equivalent of two hairdryers), created such a fervor as a disruptive technology that it led to international news coverage by the New York Times, CNN, the International Herald Tribune, PC World, Slashdot, and BBC News. Now 13 years later, this talk chronicles the confessions of an accidental green from how Green Destiny came about and how it has evolved.
Speakers BioWu Feng is the Elizabeth & James Turner Fellow and Professor of Computer Science at Virginia Tech (VT), where he directs the Synergy Lab and serves as a VT site co-director for the National Science Foundation Center for High-Performance Reconfigurable Computing (CHREC) and director of the Synergistic Environments for Experimental Computing (SEEC) Center. In addition, he holds appointments in the Department of Electrical & Computer Engineering, Health Sciences, and Virginia Bioinformatics Institute.
Dr. Feng has published 200+ peer-reviewed technical publications in high-performance networking and computing, high-speed systems monitoring and measurement, low-power and power-aware computing, computer science pedagogy for K-12, and bioinformatics. Of recent note is the publication of "The Green Computing Book: Tackling Energy Efficiency at Large Scale" (http://www.amazon.com/The-Green-Computing-Book-Computational/dp/1439819874) and a worldwide commercial on his research on biocomputing in the cloud, courtesy of the Microsoft Cloud (https://www.youtube.com/watch?v=GY2Bg0op-Kc).
Dr. Feng holds a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign, a M.S. in Computer Engineering, and B.S. degrees in Electrical & Computer Engineering and Music from Penn State University. In addition to being a Distinguished Scientist of the ACM and Senior Member of the IEEE Computer Society, Dr. Feng has been named to HPCwire's Top People to Watch twice, once in 2004 and again in 2011, and was recently recognized with an Outstanding Faculty Award bestowed by the Commonwealth of Virginia in 2013.
Computer Science Colloquium: Building Software with the Crowd
Monday, April 20, 2015
Software development work has become collaborative beyond individual teams and organizations, as developers share knowledge and code on Q&A sites and write code together in cloud IDEs. These trends suggest the potential for more fundamental shifts in the nature of software development work. For instance, we might imagine a future in which developers are hyperspecialists, in which transient teams rapidly assemble complete work, and where individual designs are the result of the participation of many. I present my work towards such a future, detailing some of the initial steps I am taking in enabling a future of crowd development. I particularly present several tools and approaches for utilizing crowds in programming and design tasks, and highlight some of the challenges and opportunities for the novel development tools that are needed in this future and the deeper questions crowdsourcing raises about the nature of software engineering.
Speaker's BioThomas LaToza is a Postdoctoral Research Associate at the University of California, Irvine. His research in software engineering focuses on human aspects of software development, with both empirical and design work on tools for programming, software design, and collaboration. He has degrees in Psychology and Computer Science from the University of Illinois and a PhD in Software Engineering from Carnegie Mellon University. He has served on a variety of program committees, was co-chair of the Fifth Workshop on the Evaluation and Usability of Programming Languages and Tools, and is currently co-chair of the Second International Workshop on Crowdsourcing in Software Engineering. His work is partially supported by the National Science Foundation with a $1.4M grant on Crowd Programming.
Oral Defense of Doctoral Dissertation: Design and Modeling of Schedulers for Multi-Task Jobs on Computer Clusters
Monday, April 20, 2015
Over the course of the last decade, a tremendous amount of information has been generated, collected, and eventually stored on modern server scale computers. This ongoing process led to the coinage of the term “Big Data,” a concept that the industry uses to describe the all-encompassing activities of storing and processing this petabyte scale data. Stakeholders, academics, and technologists are trying to extract actionable intelligence from this huge volume of data, while massive clusters of computers are now being used to process the continually expanding amount of information.
This dissertation addresses two specific challenges brought about by the need to process Big Data jobs efficiently on clusters of computers. The first challenge is the design, development, and assessment of sophisticated job schedulers capable of choosing tasks from thousands of queued jobs and executing them on one of the several hundred available machines. The second challenge is to predict a job's completion time based on its various characteristics and that of other jobs executing in the same environment. Standard makespan computations that ignore the contention on compute nodes significantly underestimate a job's completion time.
This dissertation begins by proposing a mathematically sound model, based on Queuing Networks (QN), for predicting the execution time of programs running on a cluster of computers. A special emphasis is placed on Hadoop MapReduce jobs. The model captures contention at compute nodes due to the parallel execution of multiple map tasks running on a node's available slots. In multi-core computers used for most Big Data processing today, memory access times can become a significant component of an application's execution time. This work presents a multi-level, multi-core performance model that considers the contention due to this shared memory access. The model was validated through measurements on 4, 12, and 16-core machines. We also showed that there is a significant difference in predictions when memory contention is not considered. Having captured the complexity of the tasks of MapReduce jobs, including considering the effect of contention due to shared memory elements, we then used this model to augment existing Hadoop schedulers and developed a new scheduler, called the Contention Aware Scheduler. This Hadoop scheduler is based on a novel scheduling method called Least Maximum Utilization First - Threshold (LMUF-T), which we consider a step towards an autonomic scheduler. We also developed a framework called Trace Driven Analytic Model (TDAM) to assess existing and newly designed Hadoop job schedulers through the enhancement of a popular Hadoop simulator called Mumak.
Computer Science Colloquium: Practical, Automated Analysis of Mobile Software using Relational Logic
Wednesday, April 22, 2015
The ever-increasing expansion of software into nearly every aspect of modern life, from mobile banking to healthcare systems, is making its security and reliability more important than ever. Software analysis techniques hold out the promise of improved software quality. However, the state of the art is still far from this idealized situation. In this talk, I will present my ongoing research, which explores the possibility of leveraging recent advancements in relational logic constraint solving technologies for practical software analysis. In the context of Android—the most popular platform for mobile devices—I will present a novel approach for automated detection and mitigation of vulnerabilities in the Android permission protocol. Our study of thousands of real-world Android apps revealed hundreds of security exploits, many of which were previously unknown. I will illustrate the ideas in the context of practical applications, discuss its potential for moving the field forward, and pose important areas of research in the coming era.
Speaker's BioHamid Bagheri is a postdoctoral research fellow in the Computer Science Department at George Mason University and the Computer Science and Artificial Intelligence Laboratory at Massachusetts Institute of Technology. He received his Ph.D. in Computer Science from University of Virginia in 2013. Hamid is broadly interested in software engineering, and particularly in advancing software reliability and security by developing new methods and tools relying on concepts from fields like formal methods, program analysis, model-driven development, and software architecture. He has been prolific in his early career, developing several novel techniques, including new methods and tools for compositional analysis of Android inter-app vulnerabilities, synthesis of application-specific frameworks, and synthesis of tradeoff spaces for object-relational databases. The results of his research have been published in some of the most prestigious venues and journals, such as ICSE, IEEE TSE, and MoDELS, among others. He has been a finalist at the ACM student research competition, and the recipient of an outstanding paper award at SEKE. Hamid has served on program committees for major conferences, and reviewed for multiple journals.
Oral Defense of Doctoral Dissertation: Autonomic, Optimal, And Near-Optimal Resource Allocation In Cloud Computing
Thursday, April 23, 2015
Cloud computing has changed the IT industry by providing robust technology solutions at a variety of prices. The technology used to implement a cloud computing infrastructure is based on largely distributed virtual environments that provide services to consumers allowing them to lease computing resources that scale to their needs, and gives consumers the ability to run a wide range of applications dynamically and on an on-demand basis. Resource management is a core aspect of virtualization in cloud computing and enables efficient management of a large volume of resources. Therefore, there is a need to use autonomic techniques for resource management in cloud computing in order to optimize a utility function of interest to stakeholders in a way that considers tradeoffs between competing Service Level Agreements (SLAs) contracted between consumers and cloud providers (CPs). In this research, we provide the design and validation of analytic performance models and optimization of live virtual machine migration with minimal downtime and disruption of services, and provide a non-linear optimization and experimental validation using a real cloud simulation environment. Additionally, we designed and implemented an autonomic resource provisioning method in an Infrastructure as a Service (IaaS) cloud with the goal of maximizing the cloud provider’s revenue subject to availability, capacity, and virtual machine (VM) migration constraints. Furthermore, we provide an autonomic resource provisioning method considering the hierarchical structure of an IaaS cloud of data centers, clusters within the data centers, racks within the clusters and servers within the racks. This problem assumes that cloud customers request a group of virtual machines with a communication pattern among them. Our solution minimizes communication costs and improves performance subject to VM co-location constraints. Moreover, we provide an autonomic solution for resource management in a Software as a Service (SaaS) cloud where software services are provided dynamically at different prices and categories enabling consumers to subscribe, and unsubscribe to software services. The SaaS cloud provider’s goal is to minimize the cost of its infrastructure, leased from an IaaS provider, and at the same time ensure the delivery of software services to customers while meeting response time SLAs subject to resource constraints.
Oral Defense of Doctoral Dissertation: Multiview Rank Learning For Multimedia Known Item Search
Friday, April 24, 2015
Known Item Search (KIS) is a specialized task of the general multimedia search problem. KIS describes the scenario where a user has seen a video before, must formulate a text description based on what he remembers, and knows that there is only one correct answer. The KIS task takes as input a text-only description and returns the ranked list of videos most likely to match the known item.
A KIS query is a verbose text description which is used to search a video repository consisting of metadata, audio, and visual content. The task presents a challenge in mapping the unique views of the video and query into a common feature space for search and ranking. Additionally, the queries often include key terms or phrases which are mapped into an incorrect multimedia view. The mapping problem causes the result set to drift away from the intended meaning of the original query. Supervised learning approaches to the KIS problem must overcome the imbalance of positive to negative examples that results from having a single known item.
We introduce a multiview rank learning approach to KIS, based on boosted regression trees, which provides a common feature space and overcomes the view ranking challenge. Natural language processing techniques are used to address the view drift problem by extracting key phrases from the original query which align with a specific video view. This approach allows us to activate only those views of the video which are applicable to the given query. A semi-supervised rank learning approach is used to overcome the class imbalance of having a single known item. Pseudo-positive examples are identified in a similarity graph and a K-Step Markov approach is used to estimate the importance of nodes relative to the truth root node. We evaluate our approach using benchmark datasets from the TRECVid evaluation and a large social media collection.
Oral Defense of Doctoral Dissertation: Toward Automated Forensic Analysis of Obfuscated Malware
Friday, April 24, 2015
Malware analysis, forensics, and reverse engineering reveal a deeper understanding of the inner workings of malware and the mechanics behind attack detection, which enables us to develop better defenses against increasingly sophisticated malware. Despite its inherent value, the current state of forensic analysis requires notable manual effort due to various obfuscation techniques used by malware. In this work, we investigate how to automate forensic analysis of obfuscated malware and develop novel tools that can automatically pinpoint and recover hidden, obfuscated malicious code within memory dumps and network traffic captures. Our tool also helps to identify the vulnerable data structure within the exploited binary executable.
Our novel solution combines static binary analysis, dynamic binary analysis, symbolic execution, binary instrumentation, and taint analysis. It automatically and accurately pinpoints the exact start and boundaries of attack code, even if disjointed and misaligned within random bytes. Additionally, it comprehensively handles self-modifying code and extracts the complete hidden, incremental, and transient code without using any signature or pattern, even if protected by multiple layers of sophisticated encoders. Our system consists of two automated components to achieve these overarching goals. First, an online portion uses kernel modifications to trigger exporting segments of memory when an exploited process is detected. Second, a malware code extraction framework draws on selective symbolic execution (S2E, based on KLEE and QEMU) to analyze any given byte stream offline or full binaries in an online mode. In full binary tracing mode, our method can monitor live network applications, even with complex external libraries such as OpenSSL.
This framework provides binary instrumentation for branch exploration and data-flow, or taint, analysis to find the original attack string and the data structure exploited. Furthermore, the data-flow analysis can track multiple source labels simultaneously at the byte granularity. We also address an unanswered but common condition in which tainted data becomes code. By augmenting the KLEE bitmap solver, our solution seamlessly continues propagation without loosing accuracy. Because large volumes of data result from this process, its output is serialized and shareable. Our tool provides execution, translation, data (writes), system call, and taint traces visually in memory snapshot deltas, D3 visualizations, JSON, and Elasticsearch documents. While these interactive dashboards and reporting mechanisms allow the analyst to quickly and effectively triage output, they also afford the opportunity to quickly acquire detailed information when necessary.
Computer Science Colloquium: Secure and Robust Software Through Testing and Verification
Monday, April 27, 2015
Hardly any area exists in our everyday life that is not affected by software, yet software remains buggy. These software bugs not only affect correctness and robustness but also cause severe security vulnerabilities. Software verification can provide strong guarantees about important (security) properties of a program, but full verification of software correctness is prohibitively expensive for most applications. Software testing is the predominant approach for assuring software correctness and robustness, but software testing is not complete and can only increase confidence. It is therefore important to measure the adequacy of testing techniques.
In this talk, I will present some of my work in verification and testing that aims to improve software security and robustness. First, I will present a model and type system to statically verify the absence of information-flow malware in mobile apps. I will also describe techniques I developed to significantly reduce the number of false positives and an evaluation on real-world apps. For 72 apps, totaling 570,000 lines of code, the results show that the model and type system are effective and that the developer burden is low. Second, I will present my work on improving software testing. In particular, I will focus on mutation analysis, which measures the adequacy of testing techniques using artificial faults (mutants). I will present an empirical study that involved more than 350 real faults and 230,000 mutants, in which I have shown that mutants are a valid substitute for real faults, and that mutation analysis is significantly more effective than code coverage.
Speaker's BioRené Just is a Postdoctoral Research Associate at the University of Washington, where he is a member of the Programming Languages and Software Engineering group. He received his PhD in Computer Science from the University of Ulm in 2013. His research interests are in software engineering and software security, in particular static and dynamic program analysis, type systems, mobile security, and mining software repositories. His research won two ACM SIGSOFT Distinguished Paper Awards (FSE'14 and ISSTA'14).
Oral Defense of Doctoral Dissertation: Using Hardware Isolated Execution Environments For Securing Systems
Tuesday, April 28, 2015
With the rapid proliferation of malware attacks on the Internet, malware detection and analysis play a critical role in crafting effective defenses. Advanced malware detection and analysis rely on virtualization and emulation technologies to introspect the malware in an isolated environment and analyze malicious activities by instrumenting code execution. Virtual Machine Introspection (VMI) systems have been widely adopted for malware detection and analysis. VMI systems use hypervisor technology to create an isolated execution environment for system introspection and to expose malicious activity. However, recent malware can detect the presence of virtualization or corrupt the hypervisor state and thus avoid detection and debugging.
In this thesis, I developed several systems using hardware isolated execution environments for attack detection, malware debugging, and sensitive operations. My research approach combines 1) the isolated execution concept with 2) hardware-assisted technologies. It leverages System Management Mode (SMM), a CPU mode in the x86 architecture, to transparently detect and debug armored malware and perform sensitive workloads. This research uses SMM to secure systems with a minimal Trust Computing Base (TCB) and low performance overhead. To demonstrate the effectiveness of my research, several prototypes of using SMM as the isolated execution environment are implemented. First, I use SMM to introspect all layers of system software, including applications, OSes, hypervisors, and firmware. Secondly, my research leverages SMM to transparently debug armored malware and achieve a higher level of transparency than state-of-the-art systems. Lastly, this thesis uses SMM to securely perform password-logins without trusting the operating system and prevents ring 0 keyloggers.
Oral Defense of Doctoral Dissertation: Convex Hull Problems
Wednesday, April 29, 2015
The convex hull problem is an important problem in computational geometry with such diverse applications as clustering, robot motion planning, convex relaxation, image processing, collision detection, infectious disease tracking, nuclear leak tracking, extent estimation, among many others. The convex hull is a well-studied problem with a large body of results and algorithms in a variety of contexts. In this thesis, we consider three contexts: when only an approximate convex hull is required, when the input points come from a (potentially unbounded) data stream, and when layers of concentric convex hulls are required. The first context applies when input point sets may contain errors from noise or from rounding, or when the accuracy provided by exact algorithms are simply not required. This thesis proposes a framework for examining convex hull approximation algorithms so that they can be better compared.
This framework is then used to assess a number of existing algorithms and new algorithms proposed in the thesis. This framework can help an engineer to select the most appropriate algorithm for her scenario and to analyze new algorithms for this problem. Moreover, our new algorithms exhibit better space, time and error bounds than existing ones.
The second context applies to a base station in a wireless sensor network that receives incoming input points and must maintain a running convex hull within a memory constraint. This thesis proposes a new streaming algorithm that processes each point in time O(log k) where k is the memory constraint, while maintaining very good accuracy. And finally, the last context applies when all the convex layers are sought. This has a variety of applications from robust estimation to pattern recognition. Existing algorithms for this problem either do not achieve optimal O(n log n) runtime and linear space, or are overly complex and difficult to implement and use in practice. This thesis remedies this situation by proposing a novel algorithm that is both simple and optimal. The simplicity is achieved by independently computing four sets of monotone convex layers in O(n log n) time and linear space. These are then merged together in O(n log n) time.
Oral Defense of Doctoral Dissertation: Autonomic Performance Optimization with Application to Self-Architecting Software Systems
Thursday, April 30, 2015
Service Oriented Architectures (SOA) are an emerging software engineering discipline that builds software systems and applications by connecting and integrating well-defined, distributed, reusable software service instances. SOA can speed development time and reduce costs by encouraging reuse, but this new service paradigm presents significant challenges.
Many SOA applications are dependent upon service instances maintained by vendors and/or separate organizations. Applications and composed services using disparate providers typically demonstrate limited autonomy with contemporary SOA approaches. Availability may also suffer with the proliferation of possible points of failure–restoration of functionality often depends upon intervention by human administrators. Autonomic computing is a set of technologies that enable self-management of computer systems. When applied to SOA systems, autonomic computing can provide automatic detection of faults and take restorative action. Additionally, autonomic computing techniques possess optimization capabilities that can leverage the features of SOA (e.g., loose coupling) to enable peak performance in the SOA system’s operation. This dissertation demonstrates that autonomic computing techniques can help SOA systems maintain high levels of usefulness and usability.
This dissertation presents a centralized autonomic controller framework to manage SOA systems in dynamic service environments. The centralized autonomic controller framework can be enhanced through a second meta-optimization framework that automates the selection of optimization algorithms used in the autonomic controller. A third framework for autonomic meta-controllers can study, learn, adjust, and improve the optimization procedures of the autonomic controller at run-time. A detailed set of experiments demonstrates the effectiveness and scalability of the approaches, and also reveals some of the challenges in applying meta-optimization and metacontrol to the autonomic management of SOA systems. Many of the techniques described in this dissertation have broad applicability in autonomic computing and related fields.
SANG Seminar: Delving into Internet DDoS Attacks by Botnets: Characterization and Analysis
Friday, May 08, 2015
Internet DDoS attacks are prevalent but hard to defend against, partially due to the volatility of the attacking methods and patterns used by attackers. Understanding the latest of DDoS attacks can provide new insights for effective defense. But most of existing understandings are based on indirect traffic measures (e.g., backscatters) or traffic seen locally (e.g., in an ISP or from a botnet). In this study, we present an in-depth study based on 50,704 different Internet DDoS attacks directly observed in a seven-month period. These attacks were launched by 674 botnets from 23 different botnet families with a total of 9026 victim IPs belonging to 1074 organizations in 186 countries. Our analysis reveals several interesting findings about today's Internet DDoS attacks.
Some highlights include:
(1) the geolocation analysis shows that the geospatial distribution of the attacking sources follows certain patterns, which enables very accurate source prediction of future attacks for most active botnet families; (2) from the target perspective, multiple attacks to the same target also exhibit strong patterns on inter-attack time interval, allowing accurate start time prediction of next anticipated attacks from certain botnet families; (3) there is a trend for different botnets in a family and from different families to collaborate on attacking the same target, simultaneously or in turn. These findings add to the existing literature on the understanding of today's Internet DDoS attacks, and offer new insights for designing new defense schemes at different levels.
Speaker's BioAn Wang is a 3rd year Ph.D. student of Computer Science Department at George Mason University. Her research interests include software defined networking and network/system security.
Department of Computer Science: Graduation Celebration and Awards Dinner
Wednesday, May 13, 2015
Volgenau School of Engineering: Convocation
Thursday, May 14, 2015
Timeline of Events
1:00pm Graduates Line-Up in Parking Lot A; 1:30pm Processional Preparation; 1:45pm Processional; 2:00pm Call to Order and Welcome; 2:15pm Commencement Address; 2:30pm Presentation of Diplomas; 3:30pm Reception.
Information for Graduates: Assemble at 1:00pm in Parking Lot A at the sign for your degree and major. You will be given a card to write your name on. This is the way your name will be read as you walk across the stage. Wear your cap and gown when you arrive. We would like to know if you will be attending Convocation. Reply using the Convocation RSVP Form.
Information for Guests: Doors open at 1:00pm. Seats are first come first serve. No tickets are required for this event, bring as many guests as you like. Parking for this event will be available in Parking Lot A. IN CASE OF RAIN students should report directly to the Patriot Center floor, via the lower entrance on the "East" side of the Patriot Center at the loading dock door. You will be directed to your seats. There will be no processional in the event of rain.
Department of Computer Science: Post-Convocation Reception
Thursday, May 14, 2015
Event InfoImmediately following the Volgenau School of Engineering Convocation Ceremony. All graduates from the Computer Science Department and their guests are welcome.
Oral Defense of Doctoral Dissertation: A Model-Based Testing Technique for Component-Based Real-Time Embedded Systems
Friday, June 05, 2015
The growing complexity of modern real-time embedded systems makes component-based software engineering (CBSE) technology more desirable. Although many ideas have been proposed for building component-based real-time embedded software, techniques for testing component-based real-time systems have not been well developed. A typical component-based embedded system consists of multiple user tasks, as well as hardware, middleware and software layers. Interaction problems between different components can cause system failures in field applications. The challenges not only come from the integration of multiple components through their interfaces, but also include the composition of extra-functional properties. A real-time embedded system needs to achieve its functionality under the constraints caused by its extra-functional properties. Since the time at which the system actions take place is important, correct functional behavior with regard to timing properties is essential to real-time embedded systems. Therefore, this research is intended to help detect both functional and temporal faults during the integration of component-based real-time embedded software.
This dissertation presents a test model that depicts both inter-component and intra-component relationships in component-based real-time embedded software and identifies key test elements. The test model is realized using a family of novel graph-based test models in which not only are the functional interactions and the dependence relationships illustrated, but also the time-dependent interaction among components, are illustrated. Time dependent behavior is modelled by means of timers and clocks. I use the graph-based test model to develop a novel family of test adequacy criteria that help generate effective test cases. I also present new algorithms to facilitate automate generation of the test cases. To increase the observability of system behavior, I instrument related operations to generate trace data including task id, operation, time stamp, and execution state from program execution, where the dynamic information gathered is used to check against the expected results. The experiments showed that the proposed approach effectively detected various kinds of integration faults and optimized the balance between budget and quality in an industrial product software testing.
Oral Defense of Doctoral Dissertation: Market-Based Decision Guidance Framework for Power and Alternative Energy Collaboration
Friday, June 12, 2015
Commercial & Industrial (C&I) organizations typically have diverse energy resource options which present a need for decisions that can streamline consumption and reduce costs. When alternative power resources and the need for limiting harmful emissions are also considered, the search space for optimal decisions becomes increasingly complex. Therefore, there is a need for a collaborative decision guidance system for electrical components power generation, storage, and consumption.
Proposed in this dissertation is an extensible framework to facilitate C&I entities forming a consortium to collaborate on their electric power supply and demand. The collaborative framework includes the structure of market setting, bids, and market resolution that produces a schedule of how power components are controlled as well as the resulting payment. The market resolution must satisfy a number of desirable properties (i.e., feasibility, Nash equilibrium, Pareto optimality, and equal collaboration profitability) which are formally defined in the dissertation. Additionally, peak-demand cost constitutes a significant portion of an organization’s energy budget. Units of an organization can separately run their services by choosing their peak-demand budget. Alternatively, they can collaborate to determine their peak-demand budget in lieu of monetary benefit.
Therefore, a primary market is proposed where units can reduce their peak-demand cost by collaboratively determining optimal peak-demand bounds and their respective payments. Moreover, to support the extensible framework components’ library, power components such as utility contract, back-up power generator, renewable resource, and power consuming service are formally modeled. Finally, the validity of this framework is evaluated by a case study using simulated load scenarios to examine the ability of the framework to efficiently operate at the specified time intervals with minimal overhead cost.
SANG Seminar: Cybersecurity Dynamics: A Foundation for the Science of Cyber Security
Wednesday, June 17, 2015
For decades, Computer and Information Security studies have been driven by core concepts such as Confidentiality, Integrity, and Authentication. What would be the core concepts that will drive the study of the emerging Science of Cyber Security? In this talk, I will present the novel concept/abstraction of Cybersecurity Dynamics, which we believe will formulate the ultimately-wanted foundation for the emerging Science of Cyber Security. Cybersecurity Dynamics is an extremely multidisciplinary approach to tackling some real-world problem of the highest importance. I will start with some Cyberseurity Dynamics models (with emphasis on high-level ideas) and discuss their usefulness. I will then present a conceptual generalization on Cybersecurity Dynamics and illustrate how it can benefit from mathematical techniques such as Dynamical Systems, Stochastic Processes, Control Theory, Game Theory, and Network Science. I will discuss some inherent technical barriers that must be adequately addressed but seem to go beyond the reach of existing mathematical techniques (i.e., new techniques/skills are needed in order to achieve the ultimate goal). Please refer to http://www.cs.utsa.edu/~shxu/socs/index.html for more information about this exciting research endeavor.
Speaker's BioShouhuai Xu is a Full Professor in the Department of Computer Science, University of Texas at San Antonio. He is the Director of the Laboratory for Cybersecurity Dynamics (http://www.cs.utsa.edu/~shxu/LCD/index.html) at UTSA. His research is primarily in making cyberspace more secure and trustworthy. He is especially interested in mathematical modeling and analysis of cybersecurity, and devising practical cyber defense technology that include both provably-secure cryptographic protocols and advanced cyber defense mechanisms. His research has been funded by AFOSR, ARO, NSF and ONR. He is a Program Committee co-chair of NSS'15 and was a Program Committee co-chair of Inscrypt'13. He has served on the Program Committees of 100+ international conferences/workshops. He is currently an Associate Editor of IEEE Transactions on Dependable and Secure Computing (IEEE TDSC) and IEEE Transactions on Information Forensics and Security (IEEE T-IFS). He earned his PhD in Computer Science from Fudan University.
Oral Defense of Doctoral Dissertation: Probabilistic Approaches to Protein-protein Docking
Monday, July 27, 2015
Characterizing the three-dimensional structures of protein-protein assemblies, a problem known as protein-protein docking, is central to understanding the physical and structural bases of molecular interactions in cellular processes. Doing so can also provide useful insights in structure-function studies and the design of elective drugs. Despite significant contributions from wet-laboratory techniques, the number of high-resolution structures of protein assemblies characterized in the wet laboratory cover only a small fraction of possible interactions. Research in dry laboratories is vibrant but challenged by the complexity of molecular interactions. Predominantly, methods based on stochastic optimization are employed to handle the size and complexity of the space of possible placements of units in an assembly.
Despite significant work showing that knowledge of interaction interfaces can be valuable to guide docking methods, very few methods incorporate such information. Those that do are restricted to the setting of directly incorporating wet-lab macroscopic measurements,such as chemical shifts, which are hard to obtain on a variety of systems.
Oral Defense of Doctoral Dissertation: Zero-Day Attack Detection Using Collaborative and Transduction-Based Anomaly Detectors
Tuesday, July 28, 2015
Web applications have emerged as the primary means of access to vital and sensitive services such as online payment systems and databases storing personally identifiable information. Unfortunately, the need for ubiquitous and often anonymous access exposes web servers to adversaries. Indeed, network-borne zero-day attacks pose a critical and widespread threat to web servers that cannot be mitigated by the use of signature-based intrusion detection systems.
Content-based Anomaly Detection (AD) techniques are regarded as a promising mechanism to detect `zero-day' attacks. AD sensors have also been shown to perform better than signature based systems in detecting polymorphic attacks. However, the False Positive Rates (FPRs) produced by current AD sensors have been a cause of concern.
In the first part of this work, we present a collaborative approach to quickly detect zero-day attacks. To detect previously unseen attacks, we correlate web requests containing user-submitted content across multiple web servers that is deemed abnormal by local Content Anomaly Detection (CAD) sensors. We are the first to propose the exchange of suspicious (abnormal) request content between sites, which significantly reduces false positives.
The cross-site information exchange happens in real-time leveraging privacy preserving data structures. Moreover, we automatically filter out high entropy and rarely seen legitimate requests, reducing the amount of data and time an operator has to spend sifting through alerts. Our results come from a fully working prototype using eleven weeks of real-world data from production web servers. During that period, we identify at least three application-specific attacks not belonging to an existing class of web attacks as well as a wide range of traditional classes of attacks, including SQL injection, directory traversal, and code inclusion without using human-specified knowledge or input.
In the second part of this work, we introduce and evaluate transAD, a system of payload inspection AD sensors that are based on Transductive Confidence Machines (TCM). Existing TCM based implementations have very high false positive rates and are not suitable for use as NIDS.
Our approach leverages an unsupervised machine learning algorithm to identify anomalous packets; unlike most AD sensors, ours does not require manually labeled data. Also, transAD uses an ensemble of TCM sensors to achieve better detection rates and lower FPRs than single sensor implementations. Therefore, transAD presents a hardened defense against poisoning attacks.
We evaluated our prototype implementation using two real-world data sets collected from a public university's network. Approximately 1.1 million packets containing real attacks were processed by our AD sensor. To compute the ground truth, we manually labeled 18,500 alerts. In the course of scanning millions of packets, our sensor's low false positive rate would significantly reduce the number of false alerts that need to be inspected by an operator while maintaining a high detection rate.
Department of Computer Science: GTA Fall Orientation
Tuesday, August 25, 2015
Department of Computer Science: PhD Student Orientation and Reception
Tuesday, August 25, 2015
Oral Defense of Doctoral Dissertation: Regularized Learning in Multiple Tasks with Relationships
Tuesday, August 25, 2015
Supervised classification is a sub-task of machine learning where the goal is to infer a classification function using labeled data. A vast amount of research has been conducted in this area addressing various classification problems such as binary, multi-class and multi-label classification. However, we often encounter classification problems in real world in groups of tasks with complex interactions among them. Methods that are able to take advantage of the additional information regarding task relationships and interactions are able to perform better in terms of classification accuracy. Furthermore, with the vast amount of data that is being accumulated in the recent years the real world problems that have any practical utility have exploded in terms of problem size; with respect to number of data elements, feature size and number of class labels. Therefore, there is an urgent need for scalable methods that are able to gracefully scale to web-scale problems. In my thesis, I have tried to address these issues by developing novel classification methods for large scale hierarchical classification and multi-task learning.
Test-out Exam NEW DATE: CS 530
Wednesday, August 26, 2015
Test-out Exam NEW DATE: CS 531
Wednesday, August 26, 2015
Test-out Exam NEW DATE: INFS 501
Wednesday, August 26, 2015
Test-out Exam NEW DATE: INFS 515
Wednesday, August 26, 2015
Test-out Exam NEW DATE: INFS 519
Wednesday, August 26, 2015
Test-out Exam NEW DATE: SWE 510
Wednesday, August 26, 2015
ECE/CS Seminar: Virtualized Queueing Theory
Wednesday, Sept. 02, 2015
Queueing is a fundamental problem characteristic to resource-sharing systems: whenever there is more load than available resources, some jobs/requests are queued and implicitly delayed. The classical queueing theory was conceived by Erlang more than a century ago, and has since evolved as one the most important branches of applied probability. Due to some key technical limitations in its scope, the applicability of queueing theory has been increasingly questioned in the context of complex modern systems, e.g., the Internet.
This talk introduces Virtualized Queueing Theory (VQT), a modern approach whose central idea is to increase tractability by sacrificing the exact analysis. At the apparent expense of computing queueing performance metrics in terms of bounds, VQT can address, for the first time, two fundamental problems: per-flow scheduling and multi-queue networks, for a broader class of arrival processes (e.g., Markov modulated). Moreover, by relying on martingale representations of certain transforms of the queueing systems, the obtained bounds are tight and improve existing results by orders of magnitude.
Speaker's BioFlorin Ciucu was educated at the Faculty of Mathematics, University of Bucharest (B.Sc. in Informatics, 1998), George Mason University (M.Sc. in Computer Science, 2001), and University of Virginia (Ph.D. in Computer Science, 2007). Between 2007 and 2008 he was a Postdoctoral Fellow in the Electrical and Computer Engineering Department at the University of Toronto. Between 2008 and 2013 he was a Senior Research Scientist at Telekom Innovation Laboratories (T-Labs) and TU Berlin. Currently he is an Assistant Professor in the CS department at the University of Warwick. His research interests are in the stochastic analysis of communication networks, resource allocation, and randomized algorithms. He has served on the Technical Program Committee of several conferences including IEEE Infocom, ACM Sigmetrics, IFIP Performance, ACM e-Energy, or ACM Mobihoc. Florin is a recipient of the ACM Sigmetrics 2005 Best Student Paper Award and IFIP Performance 2014 Best Paper Award.
New Student Fall Welcome 2015: BS ACS & BS CS
Tuesday, September 08, 2015
Computer Science Seminar: Towards Data-Driven Autonomics in Exascale Data Centers
Monday, September 21, 2015
Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. In this talk, I will outline some new ideas towards data-driven autonomics that can enable the exascale data centers of the future. In my vision, large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing ?nuts-and-bolts? operations. I believe that we are at a technological inflection point with the confluence of ideas from distributed systems, data science, statistical machine learning, computational modeling, network science and complexity science such that an interdisciplinary effort will be able to fit together all pieces of this puzzle and solve the grand challenge. The breakthrough I envision is an operator-less exascale data center where management and control are based on predictive holistic models. These models will include not only the computer system as such, but also its geographical, physical and socio-political environment. This development will be a game changer in how large data centers are managed and controlled, enabling scales and efficiencies that are currently unimaginable.
Speaker's BioOzalp Babaoglu is Professor of Computer Science at the University of Bologna. He received a Ph.D. in 1981 from the University of California at Berkeley. Babaoglu's virtual memory extensions to AT&T Unix as a graduate student at UC Berkeley became the basis for a long line of ?BSD Unix? distributions. He is the recipient of 1982 Sakrison Memorial Award, 1989 UNIX International Recognition Award and 1993 USENIX Association Lifetime Achievement Award for his contributions to the Unix system community and to Open Industry Standards. Before moving to Bologna in 1988, Babaoglu was an Associate Professor in the Department of Computer Science at Cornell University where he conducted research on distributed systems and fault-tolerance. Since moving to Italy, he has been active in numerous European research projects in distributed computing and complex systems including BROADCAST, CABERNET, ADAPT and DELIS. In 2001 he co-founded the Bertinoro international center for informatics (BiCi). Since its inception, this Italian Dagstuhl has organized more than 150 prestigious scientific meetings/schools and has had thousands of young researchers from all over the world pass through its doors. In 2002 Babaoglu was made a Fellow of the ACM for his ?contributions to fault-tolerant distributed computing, BSD Unix, and for leadership in the European distributed systems community?. In 2007, he co-founded the IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO) conference series and has been a member of its Steering Committee since inception and has served as co-general chair for the 2007 and 2013 editions. Since 2013, he has been on the Selection Committee for the ACM Heidelberg Laureate Forum which brings together young researchers in Computer Science and Mathematics with Abel, Fields and Turing Laureates. He currently serves on the editorial board of ACM Transactions on Autonomous and Adaptive Systems. Previously, he served for two decades on the editorial boards of ACM Transactions on Computer Systems and Springer-Verlag Distributed Computing.
Oral Defense of Masters Thesis: On Maximal Layers of Random Orders
Thursday, September 24, 2015
In this thesis we investigate the maximal layers of random partial orders. Main contributions are two-fold. In the first half we investigate the expected size of different maximal layers of a random partial order. In particular when the points are in a plane, we give an enumerative formula for the distribution of the size of these maximal sets. Using Monte Carlo based simulation we extrapolate the results for higher dimensions. In the second half we explore the computational aspect of the problem. To this end we propose a randomized algorithm for computing the maximal layers and analyze its expected runtime. We show that the expected runtime of our proposed algorithm is bounded by o(kn^2) and in the worst case by O(kn^2). For non-constant k, to the best of our knowledge, this is the first such algorithm for the problem. Furthermore we extend the results for random orders with arbitrary distributions.
Department of Computer Science: GTA Workshop
Thursday, October 01, 2015
Seminar: Google Engineering Career Talk
Wednesday, October 07, 2015
An event for learning about Google and meeting Google staff.
RSVP here: https://docs.google.com/a/google.com/forms/d/1a2Woqk0ny4f3bbvvB-oxYrMUo3qz_sLmORsPgRd7mtk/viewform
Distinguished Lecture Series: 2D and 3D Model Based Visual Tracking
Friday, October 30, 2015
Speaker's BioHenrik I. Christensen is the KUKA Chair of Robotics and a Distinguished Professor of Computing. He is also the director of the Robotics & Intelligent Machines Center at Georgia Tech.
Christensen received his first degree in mechanical engineering and subsequently received M.Sc. and Ph.D. degrees from Aalborg University in Denmark. His research focuses on human centered robotics, specifically perception, human-robot interaction, and systems modeling. A large number of companies have commercialized his research and he has maintained active collaborations with researchers and companies across three continents. Christensen also serves as an advisor to numerous companies and agencies across the world.
The author of more than 300 contributions in the areas of computer vision, artificial intelligence, and computer vision, Christensen held positions at Aalborg University, Royal Institute of Technology, and the University of Pennsylvania before joining the Georgia Tech faculty.
Additionally, Christensen was the founder of the European Network of Excellence in Robotics (1999-2006), and in 2011, he founded the U.S. Robotics Virtual Organization. He was the coordinator of the effort to formulate a National Robotics roadmap, which was presented to congress in 2009 and 2013.
Christensen serves on seven editorial boards across the areas of computer vision and robotics. He is the editor in chief of Robotics and Trends in Robotics and serves as the senior technology lead for the Robotics Technology Consortium (RTC). He is also a member on the boards of the Robot Industry Association (RIA) and the College Industry Council for Material Handling Industry Education (CICHME).