Zur Webseite der Informatik

Detection of Heavily Obfuscated Program Code Plagiarism

Plagiarising parts of documents protected by copyright does not only occur in natural-language texts but also during software development by adopting external program code. In this project we explore how the basically juridical question how to detect plagiarism can be supported by algorithmic means. In contrast to plagiarism of natural-language texts, program code plagiarism may be heavily obfuscated because equivalent program semantics can be expressed by a multitude of different codings. Obfuscation results from adapting the plagiarised program code to the context in which it is integrated or from malicious obfuscation steps designed to prevent automatic plagiarism detectors from detecting the plagiarism. The problem of automatically detecting program code plagiarism is similar to code clone detection. I.e., the automatic detection of heavily obfuscated program code plagiarism requires a partial solution of the generally unsolvable problem of detecting semantic code clones.
Our approach to thwart code obfuscation is based on partial normalizations of the program code under consideration, in order to abstract the code to its "semantic essence" as far as possible. For that purpose we make use of the program analysis framework Bauhaus and apply common compiler construction and program analysis techniques. We use program dependency graphs (PDG) to abstract dataflows from different control-flow variation possibilities. Expressions are converted into a normalized form by algebraical transformations. Furthermore, loops and conditional statements are normalized using code optimization techniques (e.g. loop invariant code motion). Semantical matches are detected by structural subgraph comparisons in the normalized representation of the program code. The original code and the plagiarized code may be split into multiple procedures in different ways; therefore, the subgraphs are compared interprocedurally.
If you are interesed in a student thesis or a research collaboration, please contact Torsten Görg.