CodeOntology is a building block of the Web of Code, an attempt to leverage code in a semantic framework.
Our framework is composed of three actors:
The ontology is designed to model the domain of object-oriented programming languages. It is written in OWL 2 and is mainly focused towards the Java programming language, but it can be easily reused to represent more languages. The modelling process underlying the creation of the ontology has been guided by common competency questions that usually arise during software process and has been inspired by a re-engineering of the Java abstract syntax. The ontology is available on Zenodo under CC BY 4.0 license.
The parser analyzes Java code to serialize it into RDF triples. It is able to extract structural information common to all object-oriented programming languages, like class hierarchy, methods and constructors. Optionally, it can also serialize into RDF triples all the statements and expressions, thereby providing a complete RDF-ization of source code.
CodeOntology currently supports natively both Maven and Gradle projects. The RDF serialization of a Java project acts in three steps: first the project is analyzed to download all of its dependencies and load them in class path, then an abstract syntax tree of the source code and its dependencies is built and processed to extract a set of RDF triples.
The parser, along with a tutorial on how to use it to extract a knowledge base from any Java project, is available on GitHub.
We are currently applying the parser to analyze repositories from GitHub, retrieved automatically through the GitHub API. We have also applied the parser to extract RDF triples from the OpenJDK 8 source code. The resulting dataset is available for download on Zenodo and can be queried through our remote SPARQL endpoint.