RASS: Resilient Autonomic Software Systems

Award no: AFOSR FA9550-16-1-0030

Program Manager: Dr. James Lawton

PI and co-PIs: Daniel A. Menasce, Hassan Gomaa (both from George Mason University), and Sam Malek (from UC Irvine)

Contact: menasce@gmu.edu

 

Overview. USAF missions are generally supported by software systems that may become unexpectedly unavailable or degraded due to hostile activity or forces of nature.  Under such occurrences, these systems need to dynamically self-adapt to continuously operate in the best possible manner. The systems that support the USAF missions are generally highly dynamic, distributed, very complex, heterogeneous, and need to be extremely resilient and dependable.

Thus, the objective of this research is to conceptualize a framework called Resilient Autonomic Software Systems (RASS) inspired by the MAPE-K (i.e., Monitor, Analyze, Plan, and Execute based on Knowledge) paradigm for autonomic systems (aka self-managing systems).

As shown in the figure above, sensor data is obtained from a variety of systems (e.g, airplanes, UAVs, cloud systems, and cell-phones) by the Monitor component of RASS. These data are aggregated and sent to the Analyze Adaptation component that determines whether self-adaptation is needed and why. The results of this analysis are passed to the Plan Adaptation component, which determines an optimal or near-optimal plan to automatically adapt the system. This plan is received by the Execution Adaptation component, which sends adaptation commands to the parts of the system that need to be adapted and/or recovered, and orchestrates the adaptation to ensure an appropriate and consistent recovery. Because the adaptation plan is driven by the system's software architecture, an Architecture Discovery module recovers the descriptive architecture (i.e., the one that reflects the actual system) or supplements the prescriptive architecture (i.e., the one used to guide the development) when one is not available or has been eroded by system evolution.

The Knowledge Base of RASS includes recovery methods and models that capture their performance overhead, failure recovery software patterns, a configuration database, an application binary repository to assist in recovering the software architecture of mobile applications, and the descriptive and prescriptive software architectures. 

A significant challenge of our research is that all the activities described above need to be carried out in a distributed fashion without centralized control. This is essential to achieve high-resiliency and high-dependability.

Doctoral students involved: Emad Albassam, George Mason University, graduated with the dissertation A Model-Based Approach for Self-Healing and Self-Configuration in Component-Based Software Systems; Jason Porter, George Mason University, graduated with the dissertation Decentralized Runtime Architecture Discovery and Testbed for Adaptation and Failure Recovery of Large Dynamic Distributed Systems; Mahmoud Hammad, University of California at Irvine, graduated with the dissertation Self-Protection of Android Systems from Inter-Component Communication Attacks; Noor Bajunaid, George Mason University, graduated with the dissertation Modeling and Optimization of Performance and Reliability of Distributed Autonomic Systems.

The RASS group has been working on the four interrelated research areas:

Automatic generation of architectural models of mobile applications. Autonomic software systems leverage an abstract representation of the software, often in the form of an architectural model, to manage and adapt the system at runtime. Prior research assumes that these models are developed manually by software engineers. Manual construction of such models is difficult and labor-intensive, hence error prone. Moreover, models used for runtime adaptation of software quickly become obsolete due to changes in the software. In this research, we developed novel static program analysis and dynamic monitoring techniques for the Android system to automatically obtain and maintain a precise architectural model without human intervention. Static analysis techniques are used to extract the static architectural model of the Android app from its byte-code, whereas dynamic monitoring is used to update the runtime model and keep it synchronized with the running system.  Due to Android's commercial success as well as the fact that it is an open-source platform, Android is increasingly adopted in many mobile military systems. The techniques developed in this thrust of the RASS research facilitates construction of autonomic Android systems that are resilient to changes in system resources (e.g., available battery) as well as unexpected dynamic conditions (e.g., security attacks).

Robust Decentralized Discovery of Software Architectures of Distributed Systems. As indicated above, autonomic systems typically rely on the knowledge of a software architectural model to plan for adaptation. This thrust of the research considers highly dynamic distributed software systems for which the software architecture is not known in advance because it either suffered erosion (i.e., the software changed over time and architecture prescriptive documents were not updated) or no prescriptive documents were ever produced. We designed and implemented a system called DeSARM (Decentralized Software Architecture discoveRy Mechanism) that uses a gossip-like mechanism based on propagation of aggregated message traces to uncover the structural view and elements of the behavioral view of the distributed architecture. We conducted experiments to assess the convergence of the process, its termination under stable conditions, its accuracy in discovering the architecture, and its overhead relative to a centralized approach. The experiments also tested the robustness of the method with respect to component and node failures.

Decentralized Dynamic Optimization of Software Architectures. Distributed systems can be made more dependable by replacing software components by other components or composite components that implement one or more fault-tolerant mechanisms or add security to the communication between components. Examples of such fault-tolerant mechanisms include checkpointing, replication with or without replica diversity, replication with diversity and voting, consensus building protocols in the presence of byzantine failures. Fault-tolerant mechanisms typically add some level of performance overhead in terms of added computationally requirements and/or added communication requirements and latency. We designed a utility function that combines dependability and performance overhead. The value of the utility increases with the system dependability increases and decreases with the performance overhead. We are building a library of fault tolerance mechanisms along with parameterized analytic models that can be used to assess their dependability and performance overhead. We are designing a totally decentralized algorithm that gradually modifies the architecture of the distributed system in a way that brings its total utility to a near-optimal value. The algorithm is based on our own decentralized versions of combinatorial search techniques such as hill-climbing and combines local node optimizations with selective propagation of local changes to peer nodes. We are also designing experiments to compare the utility obtained with the decentralized near-optimal method with the utility obtained under a centralized approach. We are also analyzing inherent properties of the method including time to converge, complexity, overhead, and termination under stable conditions. We have built an analytic model, based on queuing networks, that predicts an application's response time and availability under different checkpointing approaches.

Decentralized Recovery Patterns and Connectors. This research is investigating how recovery patterns, recovery connectors, and a decentralized MAPE manager (as shown in the above figure) can be designed and integrated to provide self- healing and self-configuration in distributed software architectures. A distributed software architecture is composed of several interconnected architectural structure patterns (such as client/server, hierarchical, decentralized control, hierarchical control) and distributed communication patterns (such as asynchronous, synchronous with reply, subscription/notification, brokered).  A recovery pattern defines how components in an architectural pattern can be dynamically relocated and recovered to a consistent state after a component has failed. Connectors in component-based software architectures interconnect components and encapsulate a communication protocol. In this research, we extend connectors with adaptation and recovery capabilities to assist with component recovery. All communication with a component passes through one or more recovery connectors, which maintain a copy of messages sent to the component. When a component failure occurs, a decentralized MAPE manager works with the relevant recovery connectors, using the appropriate recovery pattern(s), to analyze the cause of component failure, plan for component recovery, and execute the plan by reconfiguring the system, assigning the recovered component to a different node, and re-establishing its communication with neighboring components to ensure a consistent system state.

Publications.

  1. W. Connell, D.A. Menasce, M. Albanese, Performance Modeling of Moving Target Defenses with Reconfiguration Limits, IEEE Tr. Dependable and Secure Computing, accepted for publication on November 15, 2018.
  2. D. Menasce and S. Bardhan, TDQN: Trace-Driven Analytic Queuing Network Modeling of Computer Systems, The Journal of Systems & Software, Elsevier, Vol. 147, January 2019, pp. 162-171.
  3. U. Tadakamalla and D.A. Menasce, FogQN: An Analytic Model for Fog/Cloud Computing, Proc. 1st Workshop on Managed Fog-to-Cloud (mF2C), joint with 11th IEEE/ACM Intl. Conf. Utility and Cloud Computing, Zurich, Switzerland, December 17-20, 2018.
  4. V. Tadakamalla and D.A. Menasce, Model-Driven Elasticity Control for Multi-Server Queues Under Traffic Surges in Cloud Environments, Proc. 2018 Intl. Conf. Autonomic Computing, Trento, Italy, September 3-7, 2018.
  5. M. Hammad, J. Garcia, and S. Malek, SALMA: Self-Protection of Android Systems from Inter-Component Communication Attacks. 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2018), Montpellier, France, September 2018.
  6. V. Tzeremes and H. Gomaa, A Software Product Line Approach to Designing End User Applications for the Internet of Things, Proc. 13th Intl. Joint Conf. Software Technologies (ICSOFT 2018), Porto, Portugal, July 2018.
  7. M. Shin, H. Gomaa, and D. Pathirage, A Software Product Line Approach for Feature Modeling and Design of Secure Connectors, Proc. 13th Intl. Joint Conf. Software Technologies (ICSOFT 2018), Porto, Portugal, July 2018 (best paper award, invited to submit an extended version to a Spring LNCS publication).
  8. V. Tadakamalla and D.A. Menasce, Analytic Model of Traffic Surges for Multi-Server Queues in Cloud Environments, Proc. Workshop: Cloud Performance and Reliability, IEEE CLOUD 2018 Conf. July 2-7, 2018, San Francisco, CA, USA.
  9. N. Bajunaid and D.A. Menasce, Efficient Modeling and Optimizing of Checkpointing in Concurrent Component-Based Software Systems, J. Systems and Software, Vol. 139, May 2018, pp. 1-13.
  10. Mahmoud Hammad, Joshua Garcia, and Sam Malek. A Large-Scale Empirical Study on the Effects of Code Obfuscations on Android Apps and Anti-Malware Products. Proc. Intl. Conf. Software Engineering (ICSE), May 2018, Gothenburg, Sweden.
  11. J. Porter, D.A. Menasce, H. Gomaa, and E. Albassam, TESS: Automated Performance Evaluation of Self-Healing and Self-Adaptive Distributed Software Systems, 9th ACM/SPEC Intl. Conf. Performance Engineering (ICPE), Berlin, Germany, April 9--13, 2018.
  12. D.A. Menasce and Noor Bajunaid, Performance Evaluation of Heterogeneous Multi-Queues with Job Replication 2017 Computer Measurement Group Conf. (imPACt 2017), New Orleans, LA, November 6-9, 2017.
  13. Joshua Garcia, Mahmoud Hammad, and Sam Malek. Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware ACM Transactions on Software Engineering and Methodology (TOSEM), November 2017 (Accepted)
  14. W. Connell, D.A. Menasce, M. Albanese, Performance Modeling of Moving Target Defenses, 2017 Moving Target Defense (MTD) Workshop, Oct 30th - Nov 3rd, 2017, Dallas, Texas.
  15. M. Carvalho, D.A. Menasce, and F. Brasileiro, Capacity Planning for IaaS Cloud Providers Offering Multiple Service Classes, Future Generation Computer Systems, Elsevier, 77 (2017), pp. 97–111.
  16. M. Shin, H. Gomaa, and D. Pathirage, Reusable Secure Connectors for Secure Software Architectures, Proc. 4th International Workshop on Model-Driven and Component-Based Software Engineering (ModComp), MODELS Conference, Austin, Texas, September 2017.
  17. H. Gomaa, E. Albassam, and D.A. Menasce, Run-time Software Architectural Models for Adaptation, Recovery and Evolution, Proc. 12th International Workshop on Models@runtime, MODELS Conference, Austin, Texas, September 2017.
  18. E. Albassam, H. Gomaa, and D.A. Menasce, Model-Based Recovery and Adaptation Connectors: Design and Experimentation, ICSOFT 2016, CCIS 743, E. Cabello et al. (Eds.), Lecture Notes in Computer Science, Chapter 6, Springer Intl. Publishing AG, pp. 1–24, 2017, DOI: 10.1007/978-3-319-62569-0_6.
  19. Joshua Garcia, Mahmoud Hammad, Negar Ghorbani, and Sam Malek. Automatic Generation of Inter-Component Communication Exploits for Android Applications. 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2017), Paderborn, Germany, September 2017.
  20. Venkat Tadakamala and D.A. Menasce, Analysis and Autonomic Elasticity Control for Multi-Server Queues Under Traffic Surges, 2017 IEEE Intl. Conf. Cloud and Autonomic Computing (ICCAC), Tucson, AZ, USA, September 18-22, 2017.
  21. Alireza Sadeghi, Reyhaneh Jabbarvand, and Sam Malek. PATDroid: Permission-Aware GUI Testing of Android. 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2017), Paderborn, Germany, September 2017.
  22. Reyhaneh Jabbarvand and Sam Malek. muDroid: An Energy-Aware Mutation Testing Framework for Android. 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2017), Paderborn, Germany, September 2017.
  23. E. Albassam, H. Gomaa, and D.A. Menasce, Variable Recovery and Adaptation Connectors for Dynamic Software Product Lines, 10th Intl. Workshop on Dynamic Software Product Lines (DSPL 2017), Sevilla, Spain, September 25-29, 2017.
  24. E. Albassam, J. Porter, H. Gomaa, and D.A. Menasce, DARE: A Distributed Adaptation and Failure Recovery Framework for Software Systems, 14th IEEE Intl. Conf. Autonomic Computing (ICAC 2017), Columbus, OH, July 17-21, 2017.
  25. Alireza Sadeghi, Hamid Bagheri, Joshua Garcia, and Sam Malek. A Taxonomy and Qualitative Comparison of Program Analysis Techniques for Security Assessment of Android Software. IEEE Transactions on Software Engineering (TSE), Vol. 43, No. 6, June 2017.
  26. Alireza Sadeghi, Naeem Esfahani, and Sam Malek. Ensuring the Consistency of Adaptation through Inter- and Intra-Component Dependency Analysis. ACM Transactions on Software Engineering and Methodology (TOSEM), Vol. 26, No. 1, May 2017.
  27. Mahmoud Awad and Daniel A. Menasce, Deriving Parameters for Open and Closed QN Models of Operational Systems Through Black Box Optimization, 8th ACM/SPEC Intl. Conf. Performance Engineering, L'Aquila, Italy, April 22--26, 2017.
  28. Mahmoud Hammad, Hamid Bagheri, and Sam Malek. DELDroid: Determination and Enforcement of Least-Privilege Architecture in Android. Intl. Conf. Software Architecture (ICSA), Gothenburg, Sweden, April 2017.
  29. Noor Bajunaid and Daniel A. Menasce, Analytic Models of Checkpointing for Concurrent Component-Based Software Systems, 8th ACM/SPEC Intl. Conf. Performance Engineering, L'Aquila, Italy, April 22--26, 2017.
  30. Hamid Bagheri, Joshua Garcia, Alireza Sadeghi, Sam Malek, Nenad Medvidovic, "Software Architectural Principles in Contemporary Mobile Software: from Conception to Practice," J. Systems and Software 119, 2016, pp. 31-44.
  31. Daniel A. Menasce and Noor Bajunaid, "Modeling and Optimization of Multitiered Server-based Systems," 2016 Computer Measurement Group Conference (imPACT 2016), La Jolla, CA, November 7-10, 2016.
  32. Mahmoud Awad and Daniel A. Menasce, "A Knowledge Base to Support the Automatic Derivation of Performance Models of Operational Systems," 2016 Computer Measurement Group Conference (imPACT 2016), La Jolla, CA, November 7-10, 2016.
  33. Jason Porter, Daniel A. Menasce, and Hassan Gomaa, "DeSARM: A Decentralized Software Architecture Discovery Mechanism for Distributed Systems." 11th International Workshop on Models@run.time (MODELS 2016), Saint-Malo, France, October 4, 2016.
  34. Mahmoud Awad and Daniel A. Menasce, "Performance Model Derivation of Operational Systems Through Log Analysis," IEEE Intl. Symp. Modeling, Analysis and Simulation of Computer Systems and Telecommunication Systems (MASCOTS 2016), Imperial College, London, UK, September 19-21, 2016.
  35. Bradley Schmerl, Jeff Gennari, Alireza Sadeghi, Hamid Bagheri, Sam Malek, Javier Camara, and David Garlan, "Architecture Modeling and Analysis of Security in Android Systems," Proc. 10th European Conf. Software Architecture, Istanbul, Turkey, September 2016.
  36. Sam Malek, Kyle Canavera, and Naeem Esfahani, "Automated Inference Techniques to Assist with the Construction of Self-Adaptive Software," in Managing Trade-offs in Adaptable Software Architectures, eds. Ivan Mistrik, Nour Ali, John Grundy, Rick Kazman, and Bradley Schmerl, Elsevier.
  37. Emad Albassam, Hassan Gomaa, and Daniel A. Menasce, "Model-based Recovery Connectors for Self-Adaptation and Self-Healing," Proc. 11th Intl. Joint Conf. Software Technologies (ICSOFT 2016), July 24-26, 2016, Lisbon, Portugal.
  38. Vasilios Tzeremes and Hassan Gomaa, "A Multi-Platform End User Software Product Line Meta-model for Smart Environments," Proc. 11th Intl. Joint Conf. Software Technologies (ICSOFT 2016), July 24-26, 2016, Lisbon, Portugal.
  39. Hamid Bagheri, Alireza Sadeghi, Reyhaneh Jabbarvand Behrouz, and Sam Malek. "Practical, Formal Synthesis and Autonomic Enforcement of Security Policies for Android." Proc. 46th Annual IEEE/IFIP Intl. Conf. Dependable Systems and Networks (DSN 2016), Toulouse, France, June 2016.
  40. Michael Shin, Hassan Gomaa, and Don Pathirage, "Reusable Secure Connectors for Secure Software Architectures," Proc. 15th Intl. Conf. Software Reuse, Limassol, Cyprus, June 2016.
  41. Nariman Mirzaei, Joshua Garcia, Hamid Bagheri, Alireza Sadeghi, and Sam Malek. "Reducing Combinatorics in GUI Testing of Android Applications," Proc. 38th Intl. Conf. Software Engineering (ICSE 2016), Austin, TX, May 2016.
  42. Eric Yua and Sam Malek, "Mining Software Component Interactions to Detect Security Threats at the Architectural Level." Proc. 13th Working IEEE/IFIP Conf. Software Architecture (WICSA 2016), Venice, Italy, April 2016.

 

Last updated: April 30, 2019.