Pertinence of lexical and structural features for plagiarism detection in source code

biblioteca@correo.cua.uam.mx

Título:	Pertinence of lexical and structural features for plagiarism detection in source code
Autor(es):	RAMIREZ DE LA CRUZ, AARON RAMIREZ DE LA ROSA, ADRIANA GABRIELA SANCHEZ SANCHEZ, CHRISTIAN JIMENEZ SALAZAR, HECTOR VILLATORO TELLO, ESAU
Temas:	Características léxicas y estructurales Cálculo de similitudes Documento Representación Detección de plagio Procesamiento de lenguaje natural
Fecha:	2014
Editorial:	México : Instituto Politécnico Nacional, Centro de Investigación en Computación
Citation:	Research in Computing Science 85 (2014)
Resumen:	Source code plagiarism can be identified by analyzing several and diverse views of a pair of source code. In this paper we present three representations from lexical and structural views of a given source code. We attempt to show that different representations provide diverse information that can be useful to identify plagiarism. In particular, we present representations based on 3-grams of characters, data type of function’s signatures and names of the identifiers of function’s signatures. While we used only three representations, more representations can be added. We conducted our analysis over a collection of 79 source code written in C language. Our results show that n-gram representation is very informative, but also that representations taken from the function’s signatures are, to some extend, complementaries.
URI:	http://ilitia.cua.uam.mx:8080/jspui/handle/123456789/762
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
Pertinence of Lexical and Structural Features for Plagiarism Detection in Source Code.pdf		211.58 kB	Adobe PDF	Visualizar/Abrir