Logo
Logo
Campo de búsqueda / búsqueda general

 
Autor
Título
Tema

Título: On the importance of lexicon, structure and style for identifying source code plagiarism
Autor(es): RAMIREZ DE LA CRUZ, AARON
RAMIREZ DE LA ROSA, ADRIANA GABRIELA
SANCHEZ SANCHEZ, CHRISTIAN
JIMENEZ SALAZAR, HECTOR
Temas: Código fuente (Computación)
Estructura de datos (Computadoras)
Plagio - Innovaciones tecnológicas
Fecha: 2007
Editorial: New York : Association for Computing Machinery
Citation: FIRE 2014 : post-proceedings of the 6th workshop of the Forum for Information Retrieval Evaluation
Resumen: Source code plagiarism can be identified by analyzing similarities of several and diverse aspects of a pair of source code. In this paper we present three types of similarity features that account for three aspects of source code documents, particularly: i) lexical, ii) structural, and iii) stylistics. From the lexical view, we used a character 3-gram model without considering reserved words for the programming language in revision. For the structural view, we proposed two similarity metrics that take into account the function’s signatures within a source code, namely the data types and the identifier’s names of the function’s signature. The third view consists on accounting for several stylistics’ features, such as the number of white spaces, lines of code, upper letters, etc. Accordingly, we proposed 8 similarity features to represent pairs of source code in order to, under a supervised approach, identify plagiarized pairs of source codes. We use a set of more than 32000 source code documents from Java and C to perform our experiments. The results show the pertinence of our set of features to identify plagiarism for source code documents that satisfy particular conditions, such as, source code that solve difficult problems.
URI: http://ilitia.cua.uam.mx:8080/jspui/handle/123456789/484
Aparece en las colecciones:Libros

Ficheros en este ítem:
Fichero Descripción TamañoFormato 
On the importance.pdf331.43 kBAdobe PDFVisualizar/Abrir


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.