On the importance of lexicon, structure and style for identifying source code plagiarism

RAMIREZ DE LA CRUZ, AARON; RAMIREZ DE LA ROSA, ADRIANA  GABRIELA; SANCHEZ SANCHEZ, CHRISTIAN; JIMENEZ SALAZAR, HECTOR

Artículos

Producción de investigaciones y proyectos académicos

DC Field

Value

Language

dc.contributor.author

RAMIREZ DE LA CRUZ, AARON

dc.contributor.author

RAMIREZ DE LA ROSA, ADRIANA GABRIELA

dc.contributor.author

SANCHEZ SANCHEZ, CHRISTIAN

dc.contributor.author

JIMENEZ SALAZAR, HECTOR

dc.coverage.spatial

<dc:creator id="info:eu-repo/dai/mx/cvu/239516">ADRIANA GABRIELA RAMIREZ DE LA ROSA</dc:creator>

dc.coverage.spatial

<dc:creator id="info:eu-repo/dai/mx/cvu/170715">CHRISTIAN SANCHEZ SANCHEZ</dc:creator>

dc.coverage.spatial

<dc:creator id="info:eu-repo/dai/mx/cvu/54971">HECTOR JIMENEZ SALAZAR</dc:creator>

dc.coverage.temporal

<dc:subject>info:eu-repo/classification/cti/7</dc:subject>

dc.date.accessioned

2020-06-22T22:57:17Z

dc.date.available

2020-06-22T22:57:17Z

dc.date.issued

2007

dc.identifier.citation

FIRE 2014 : post-proceedings of the 6th workshop of the Forum for Information Retrieval Evaluation

en_US

dc.identifier.uri

http://ilitia.cua.uam.mx:8080/jspui/handle/123456789/484

dc.description.abstract

Source code plagiarism can be identified by analyzing similarities of several and diverse aspects of a pair of source code. In this paper we present three types of similarity features that account for three aspects of source code documents, particularly: i) lexical, ii) structural, and iii) stylistics. From the lexical view, we used a character 3-gram model without considering reserved words for the programming language in revision. For the structural view, we proposed two similarity metrics that take into account the function’s signatures within a source code, namely the data types and the identifier’s names of the function’s signature. The third view consists on accounting for several stylistics’ features, such as the number of white spaces, lines of code, upper letters, etc. Accordingly, we proposed 8 similarity features to represent pairs of source code in order to, under a supervised approach, identify plagiarized pairs of source codes. We use a set of more than 32000 source code documents from Java and C to perform our experiments. The results show the pertinence of our set of features to identify plagiarism for source code documents that satisfy particular conditions, such as, source code that solve difficult problems.

en_US

dc.description.sponsorship

FIRE 2014 : post-proceedings of the 6th workshop of the Forum for Information Retrieval Evaluation

en_US

dc.language.iso

Inglés

en_US

dc.publisher

New York : Association for Computing Machinery

en_US

dc.relation

978-1-4503-3755-7

dc.rights

https://dl.acm.org/doi/abs/10.1145/2824864.2824879

dc.rights

https://doi.org/10.1145/2824864.2824879

dc.subject

Código fuente (Computación)

en_US

dc.subject

Estructura de datos (Computadoras)

en_US

dc.subject

Plagio - Innovaciones tecnológicas

en_US

dc.title

On the importance of lexicon, structure and style for identifying source code plagiarism

en_US

dc.type

Capítulo de libro

en_US

Aparece en las colecciones:

Libros

Fichero

Descripción

Tamaño

Formato

On the importance.pdf

331.43 kB

Adobe PDF

Visualizar/Abrir

Campo de búsqueda / búsqueda general

Libros

Tesis

Artículos

Revistas

Multimedia

Iniciar Sesión

Repositorio Nacional

Novedades

Biblioteca "Dr. Miguel León Portilla"

Contacto