Printer friendly version
Code review: Groundbreaking work on data mining version histories
23 June 2014
Making changes within a complex software system is often error-prone – even the smallest mistake can endanger the entire system. Ten years ago, computer scientists from Saarbrücken around Professor Andreas Zeller developed a technique that automatically issues suggestions on how to manage changes in software, based on the program’s version history. Their work was now named the most influential contribution of the last ten years at the „International Conference on Software Engineering“.
In the awarded research paper, the researchers from Saarbrücken examined the development of software over a long time span for the first time. This is documented in version histories that contain stored alterations to the software. They applied computing methods to the version histories, similar to those used by US online retailer Amazon. On Amazon, customers are given recommendations such as „Customers who bought this item also bought...“ The computer scientists translated this approach to „Programmers who changed these functions also changed the following code blocks“. In this way, their recently developed program „eROSE“ can guide developers safely through essential changes to complex software.
Their paper from 2004 immediately attracted attention. For the first time, the alteration history of a program had been used to automatically issue further review suggestions. Their work led to further research into automated version history analysis, a field currently engaging around 150 researchers from all over the world. In combination with error databases, the computer scientists from Saarbrücken could predict possible error sources within the Microsoft operating system Windows Vista. At the time, they were able to trace these issues to insufficient team structures. Today, Microsoft maintains a research department of its own, where staff members are responsible for the systematic review of error and version histories and for deducing suggestions from these archives.
The computer scientists from Saarbrücken were also successful with software companies like IBM, Google and SAP. „Using data mining, we were not only able to predict errors, but also gained insights into software development from a new perspective“, says Andreas Zeller. In retrospect, it is no surprise that the analysis of version archives has become an independent research field within software engineering, Zeller says – this was long before „Big Data“ became a catchphrase on the Internet.
In their most recent project, the researchers also make use of the data mining principle: They automatically extract information from huge amounts of data. With their newest software „Chabada“, they examined 22 521 mini programs (apps) available in the Google Play store. With the aid of this software, they could reveal 81% of existing spy apps without having to know their behavioral patterns. Even Google took an interest in this approach: Ulfar Erlingson, head of Google Security Research, set up a meeting with the researchers shortly after the paper was published, and invited them to visit the Google center last fall to install their automated suggestion program.
The research group around Professor Zeller is not only establishing itself in the scientific world, but also in the industry. In 2013, Zeller co-founded the software company „Testfabrik“. With their „Webmate” software, his former PhD students have developed an automated testing service for complex Web 2.0 applications. The founders estimate that there is a market potential of around 120 million Euros a year for these kinds of services in Germany.
Background information about the international award
With the „Most Influential Paper Award“, the program committee of the „International Conference on Software Engineering“ annually honors a research paper that has had „the most influence on theory and practice since its publication ten years ago“. This year, the awarded publication „Mining Version Histories to Guide Software Changes“ is by a group of computer scientists from Saarbrücken around Professor Andreas Zeller, who presently holds the Chair for Software Engineering at the Saarland University. The research paper was originally written by Thomas Zimmermann, Peter Weißgerber, Stephan Diehl and Andreas Zeller and was presented at the conference in 2004. This is the first time that a German research group has received the award since it was established 25 years ago.
Background information about Computer Science research at Saarland University
The Department of Computer Science represents the center of Computer Science research in Saarbrücken. Seven other internationally renowned research institutes are nearby: The Max Planck Institutes for Informatics and for Software Systems, the German Research Center for Artificial Intelligence (DFKI), the Center for Bioinformatics, the Intel Visual Computing Institute, the Center for IT Security, Privacy and Accountability (CISPA), and the Cluster of Excellence “Multimodal Computing and Interaction”.