DC Field | Value | Language |
dc.contributor.author | RAMIREZ DE LA CRUZ, AARON | - |
dc.contributor.author | RAMIREZ DE LA ROSA, ADRIANA GABRIELA | - |
dc.contributor.author | SANCHEZ SANCHEZ, CHRISTIAN | - |
dc.contributor.author | JIMENEZ SALAZAR, HECTOR | - |
dc.coverage.spatial | <dc:creator id="info:eu-repo/dai/mx/cvu/239516">ADRIANA GABRIELA RAMIREZ DE LA ROSA</dc:creator> | - |
dc.coverage.spatial | <dc:creator id="info:eu-repo/dai/mx/cvu/170715">CHRISTIAN SANCHEZ SANCHEZ</dc:creator> | - |
dc.coverage.spatial | <dc:creator id="info:eu-repo/dai/mx/cvu/54971">HECTOR JIMENEZ SALAZAR</dc:creator> | - |
dc.coverage.temporal | <dc:subject>info:eu-repo/classification/cti/7</dc:subject> | - |
dc.date.accessioned | 2020-06-22T22:57:17Z | - |
dc.date.available | 2020-06-22T22:57:17Z | - |
dc.date.issued | 2007 | - |
dc.identifier.citation | FIRE 2014 : post-proceedings of the 6th workshop of the Forum for Information Retrieval Evaluation | en_US |
dc.identifier.uri | http://ilitia.cua.uam.mx:8080/jspui/handle/123456789/484 | - |
dc.description.abstract | Source code plagiarism can be identified by analyzing similarities of several and diverse aspects of a pair of source code. In this paper we present three types of similarity features that account for three aspects of source code documents, particularly: i) lexical, ii) structural, and iii) stylistics. From the lexical view, we used a character 3-gram model without considering reserved words for the programming language in revision. For the structural view, we proposed two similarity metrics that take into account the function’s signatures within a source code, namely the data types and the identifier’s names of the function’s signature. The third view consists on accounting for several stylistics’ features, such as the number of white spaces, lines of code, upper letters, etc. Accordingly, we proposed 8 similarity features to represent pairs of source code in order to, under a supervised approach, identify plagiarized pairs of source codes. We use a set of more than 32000 source code documents from Java and C to perform our experiments. The results show the pertinence of our set of features to identify plagiarism for source code documents that satisfy particular conditions, such as, source code that solve difficult problems. | en_US |
dc.description.sponsorship | FIRE 2014 : post-proceedings of the 6th workshop of the Forum for Information Retrieval Evaluation | en_US |
dc.language.iso | Inglés | en_US |
dc.publisher | New York : Association for Computing Machinery | en_US |
dc.relation | 978-1-4503-3755-7 | - |
dc.rights | https://dl.acm.org/doi/abs/10.1145/2824864.2824879 | - |
dc.rights | https://doi.org/10.1145/2824864.2824879 | - |
dc.subject | Código fuente (Computación) | en_US |
dc.subject | Estructura de datos (Computadoras) | en_US |
dc.subject | Plagio - Innovaciones tecnológicas | en_US |
dc.title | On the importance of lexicon, structure and style for identifying source code plagiarism | en_US |
dc.type | Capítulo de libro | en_US |
Aparece en las colecciones: | Libros
|