31 December 2014

A Round-Trip to RDF

This is my first post in the blog and I would like to start off by sharing an application I developed as part of my studies at École Centrale Paris. Its name is RDF Usine and it is meant to convert plain text files (e.g. CSV) into RDF format.

I have published not only the executable library but also the complete source code.

Without any further introductions, let's see how the application looks like:

RDF Usine


Application Interface


As it is illustrated in the figure depicted before, the interface is made up by three main components. Each one of them is briefly described as follows:
  • The "Configuration Pane" is on the left. We will use it in order to define the different parameters that are inherent to each file format.
  • The "Input File Preview Pane" is on the top right. It shows a raw preview of the input file.
  • The "Output RDF Files Preview Pane" is on the bottom left. This tab shows a tabular representation of the input file (in "Table View" tab) and a preview of how RDF output files will look like (in the tabs "Turtle" and "N-TRIPLE").

Main Features


The key features of RDF Usine are enumerated as follows:
  • The application is multilingual! It supports English, French and Spanish. You can easily change the language by making use of the "Language" menu:

    Language Menu

    If you speak other languages and would like to contribute with their translations, your collaboration will be pretty much appreciated! Please contact me for more details.
  • RDF Usine was developed using JavaFX technology so it could be run not only in Windows but also in Linux, Mac OS X and other operating systems.
  • Java
  • RDS Usine also "speaks" two RDF formats. Information can be exported in:
    • Turtle
    • N-Triple
    Turtle N-Triples
  • The range of accepted file encodings is quite broad, including UTF-8, ISO-8859 and US-ASCII, among others.
  • Multi-line fields are supported. For example:

    Multi-line fields example

  • It is possible to preview the complete input and output files. According to the user needs, it is also possible to restrict the visualisation to the first 10, 100 or 1000 rows (for big files, it is advisable to use one of these filters). You may use the "Preview" menu for doing so:
  • Preview

    Preview Menu

  • Configurations can be Saved and Loaded to be applied again afterwards. This includes all settings specified in the "General settings", "Fields" and "Prefixes" tabs:
  • Save/Load Configuration

    Configuration Menu

  • Possible field delimiters:
    • Semicolon: ;
    • Pipe: |
    • Comma: ,
    • Space
    • Tabulation
    • Dollar Sign: $

    Field delimiters

  • Escape characters may also be configured:

    Escape char.

  • Headers can be:
    • Read from a user-defined row number of the file.
    • Defined in a customised way -in tab "2) Fields"-.
    • Headers

  • Entity classes can be defined as free text.

    Entity type

  • You need to specify where to start and end processing the input files?
    That is quite easy with the file "Boundaries" options!

    Boundaries

  • Subjects can optionally have a text prefix and they may be:
    • Read from a user-defined column number of the input file.
    • Defined as auto numeric values.
    • Subjects

  • Need to work on multiple files or entire folders?
    That is not a problem! Use the File(s)... button to select one or more input files or the Folder... button to select a directory and look for input files recursively in all the subdirectories.

Preview and Export

  • While working with multiple files, you can use the "Preview" combo box situated on the "General Settings" tab to choose the file that you would like to preview in the "Input File Preview Pane" and the "Output RDF Files Preview Pane":

    File selection

  • The application will allow you to experiment with different parameters and see results interactively.

    In addition to the "Table View", you will also be able to see the RDF previews for Turtle and N-Triple formats. Both text windows are read-only but allow the user to select text fragments and copy them to the Clipboard by means of the context menu (right-click):

    Example of Turtle Output