Leveraging Unstructured Information to Support Software Engineering Tasks | George Mason Department of Computer Science

When: Thursday, March 24, 2022 from 11:00 AM to 12:00 PM
Speakers: Andrian Marcus
Location: ZOOM only
Export to iCal

Abstract

During software evolution a collection of related artifacts are created, containing different types of data: structured (e.g., analysis data), semi-structured (e.g., source code), and unstructured (e.g., natural language text). The unstructured information is embedded in documentation, source code, and various stakeholder communications and it is very important for the developers to understand a great deal of the “why” and “what” of the software system, as much as the source code is useful to understand the “how” of the software. Software artifacts written in natural language (e.g., requirements, design documents, user manuals, scenarios, bug reports, developers’ messages, etc.), together with the comments and identifiers in the source code encode the domain of the software and the developers’ knowledge about the system, capture design decisions, change requests, developer information, etc. Given the large amount of unstructured information in software, tools are necessary for its storage, retrieval, and analysis, before it is delivered to the users (i.e., developers and other stakeholders).

This talk will summarize more than a decade of research conducted with my students and collaborators, where we leveraged text retrieval and natural language processing techniques and proposed solutions to software engineering problems, such as, feature and bug localization, traceability link recovery, bug triage, defect prediction, change impact analysis, refactoring, etc.

The second part of the talk will focus on a new take on software traceability, which focuses on identifying how and where in the source code certain business rules are implemented, such as, data constraints. This ongoing work include empirical studies that inform the design of tools for the identification of data constraints implementations, using static code analysis and text analysis.

The talk will conclude with presenting emerging and future research focusing on identifying individual computational elements of algorithms and algorithm families, which help in understanding them and distinguishing them from one another.

Bio

A former Fulbright Scholar, with education experience on two continents, ranging from Computer Science to European Studies, Andrian Marcus is a Professor in the Department of Computer Science at The University of Texas at Dallas. His research interests are in software engineering, with focus on program understanding and software evolution and he is the recipient of the NSF CAREER award.

Professionally, he is most proud of his outstanding current and past graduate students and finds mentoring to be the most rewarding part of the academic career. Over time, their joint research earned six Best/Distinguished Paper Awards and six Most Influential Paper Awards at software engineering conferences.

His professional service includes serving on the Steering Committees of the IEEE International Conference on Software Maintenance and Evolution (ICSME) and of the IEEE Conference on Software Visualization (VISSOFT). He was the General Chair and the Program Co-chair of ICSME in 2011 and 2010, respectively, and Program Co-Chair for other conferences (ICPC'09, VISSOFT'13, SANER'17). Currently he is serving as Software Evolution Area Chair and as New Faculty Symposium Co-chair for ICSE’23. He also serves on the editorial board of the Journal of Software: Evolution and Process and served on the editorial board of the IEEE Transactions on Software Engineering (2014-2018), and the Empirical Software Engineering Journal (2010-2021).

Posted 2 years, 1 month ago

Leveraging Unstructured Information to Support Software Engineering Tasks Events / CS Seminar

Categories

Leveraging Unstructured Information to Support Software Engineering Tasks
Events / CS Seminar