Fighting plagiarism: metrics and methods to measure and find similarities among source code of computer programs in VPL

Rodríguez-del-Pino, J.C.; Rubio-Royo, E.; Hernández-Figueroa, Z.
3rd International Conference on Education and New Learning Technologies (EDULEARN). ISBN: 978-84-615-0441-1
Julio, 2011.

In this paper, we show a tool that uses different metrics and methods to look for and show the most similar source files in a set. Similarity among files is directly related to the possibility that they are the outcome of a process of plagiarism. The metrics used are three: two proposed by the authors and a third commonly used. The reason for using three metrics is that each one is sensitive to different forms of systematic changes in source code files, so the combination of them increases the capability of discover plagiarism attempts. Searching for the most similar files requires a preprocessing consisting of: a lexical analysis, a filtering and a normalization of expressions, to get a signature for each file. These signatures will then be compared using the proposed metrics. The search process is optimized to run using a minimum memory and little time. As result of the process, we get a list of the most similar pairs of files sorted from highest to lowest similarity, besides a list of clusters of the most similar files. Both lists are shown using a gradation of colours to express the similarity levels in a friendly manner; the numeric results of the applied metrics are shown as well. This interface is designed to facilitate taken appropriate decisions.

The proposed tool is part of VPL, a Virtual Programming Lab module for Moodle, a popular Learning Management System distributed under GNU/GPL license. The anti-plagiarism tool offers a user-friendly interface allowing compare files from VPL activities among them or against external sources, with online response.