Título: | Pertinence of lexical and structural features for plagiarism detection in source code |
Autor(es): | RAMIREZ DE LA CRUZ, AARON RAMIREZ DE LA ROSA, ADRIANA GABRIELA SANCHEZ SANCHEZ, CHRISTIAN JIMENEZ SALAZAR, HECTOR VILLATORO TELLO, ESAU |
Temas: | Características léxicas y estructurales Cálculo de similitudes Documento Representación Detección de plagio Procesamiento de lenguaje natural |
Fecha: | 2014 |
Editorial: | México : Instituto Politécnico Nacional, Centro de Investigación en Computación |
Citation: | Research in Computing Science 85 (2014) |
Resumen: | Source code plagiarism can be identified by analyzing several and diverse views of a pair of source code. In this paper we present three representations from lexical and structural views of a given source code. We attempt to show that different representations provide diverse information that can be useful to identify plagiarism. In particular, we present representations based on 3-grams of characters, data type of function’s signatures and names of the identifiers of function’s signatures. While we used only three representations, more representations can be added. We conducted our analysis over a collection of 79 source code written in C language. Our results show that n-gram representation is very informative, but also that representations taken from the function’s signatures are, to some extend, complementaries. |
URI: | http://ilitia.cua.uam.mx:8080/jspui/handle/123456789/762 |
Aparece en las colecciones: | Artículos |
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
Pertinence of Lexical and Structural Features for Plagiarism Detection in Source Code.pdf | 211.58 kB | Adobe PDF | Visualizar/Abrir |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.