MappingTool

From DBpedia Mappings
Revision as of 17:00, 3 November 2010 by Maxjakob (talk | contribs)
Jump to navigationJump to search

The DBpedia Mapping Tool supports users to create such mappings within a graphical user interface. It is available here.

Working with template mappings

Create mapping

Creating template mappings for DBpedia with the mapping tool requires three main steps.

  1. Searching and loading a Wikipedia template,
  2. creating or editing a corresponding mapping and finally
  3. exporting and saving the created mapping to DBpedia.

In order to start the template mapping process a user has to load a Wikipedia template first. Therefore a text field in the top menu supports the user by searching for available Wikipedia templates with the aid of auto completion. If the user has found the requested template it can be pulled from Wikipedia by pushing the button load mapping or pressing the ENTER key.

Subsequently a HTTP request to the tools back end, using the browser build-in XMLHttpRequest object, is made to load the Wikipedia template and extract its properties afterwards. In the meantime the back end also requests the DBpedia MediaWiki API for a potentially existing mapping which would be loaded then as well. After a few moments the user will see a browser screen similar to the one shown in figure 1 with an expanded representation of the template mapping if such a mapping exist.

Figure 1: View for creating a template mapping for the unmapped Wikipedia template Infobox:train

In the upper left side a list with the extracted Wikipedia template properties is shown, organized in a tree panel widget. Below is a widget with available template mapping elements placed which are defined in the DBpedia Mapping Language Specification. The nodes in this template widget support drag-and-drop in combination with the template mappings widget in the center of the screen. In case of a yet unmapped Wikipedia template the center widget is almost empty just showing a root node with the title of the new template mapping. This title will be the title of the MediaWiki page stored in DBpedia later on.

The widgets on the right side organize the DBpedia ontology data and provide the user with support to find and select ontology classes and properties. Users can browse the ontology classes in the widget on the bottom right to find a proper class representing the data in the Wikipedia template. For a faster access to the ontology the widget furthermore provides a simple search box, where a user can search classes by simple patterns. Having found a matching ontology class the user has to load the properties of this ontology class. By clicking on a class node the corresponding properties will be loaded into the widget which is placed in the upper-right above the ontology class widget.

After having load the properties from the back end a user can browse through the property widget (see figure 2), which provides useful information on each property, like the expected type of value used in the Wikipedia template or the domain of the property. Whenever an ontology class is a sub class of another ontology class it inherits the properties of its parent class, this is why the provided properties can be defined on different domains.

[1] Figure 2: Property widget, showing some available properties of the mean of transportation ontology class

To assemble a template mapping, the user has to compose the available nodes on the widgets by dragging and dropping them to the center mapping widget; starting with a template from the template widget e.g. TemplateMapping or ConditionalMapping. Users should keep in mind that most of the preconfigured templates cannot be dropped to the template mapping root node because this would result in invalid template mappings as only some of the available templates are valid to act as starting nodes of template mappings. The underlying mapping language specification defines which nodes can be nested and what type a possible child node is supposed to be. In most cases the color of a node shows the user where it can be dropped.

Edit mapping

Editing a template mapping is similar to the process of creating one. First of all a user has to request a template from Wikipedia by the help of the auto completion field in the top menu to retrieve the template specific properties. While loading the information from Wikipedia the back end sends a request to the DBpedia MediaWiki in parallel to retrieve a corresponding template mapping. If it finds a matching template mapping on DBpedia the tool parses the structure of the mapping (see listing below) and displays it in the template mapping widget in the center of the tool (see figure 3).

{{TemplateMapping
| mapToClass = Company
| mappings =
    {{PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }}
    {{PropertyMapping | templateProperty = type | ontologyProperty = type }}    
    {{PropertyMapping | templateProperty = genre | ontologyProperty = genre }}
    {{PropertyMapping | templateProperty = fate | ontologyProperty = fate}}
    {{PropertyMapping | templateProperty = predecessor | ontologyProperty = predecessor }}
    {{PropertyMapping | templateProperty = successor | ontologyProperty = successor}}
<!-- mapped fundation into 3 different ontologyProperties  -->
[..]

Listing: Excerpt from the Infobox company template mapping on DBpedia,label=lst:infobox_company_template_mapping

Composing and extending a template mapping can be done by drag-and-drop operations described in the previous section.

[2] Figure 3: Template mapping widget showing the structure of the Infobox company mapping template

In fact users do not necessarily need to drag-and-drop a node when they want to edit the value of a node. The alternative option is to select the edit node menu item on the nodes context menu which can be useful when users want to fix a typo or just because it is faster to change the mapping this way. A more common scenario for this feature is to edit the value of the property operator which is supported by some of the templates defined in the mapping language specification like Condition or CalculateMapping}.

The operator property can have values like add, equals, isSet, contains or otherwise which indicate operations on values. Unfortunately the operator property cannot be filled by drag and drop because it is independent from the ontology and Wikipedia. In this case the user has to edit the value of the node manually by opening the context menu of a node and select the menu item edit node (see figure 4). The user can check a freshly created operator node for a hint by resting with the mouse on the node to see a description of possible values if these are defined in the DBpedia mapping language grammar.

[3] Figure 4: Form to manually edit the value of a node

For template nodes in the template mapping widget the context menu offers furthermore a menu entry to add new template properties defined in the DBpedia grammar as well. The offered properties are contextual and differ by the type of the node. When a user chooses to add a property from within the context menu the corresponding property is taken and cloned from the templates widget located in the bottom left corner of the tool. The appended property supports all implemented functionality like validation or suggestive drag and drop, where only matching nodes support drag and drop.

Import mapping from plain text

Instead of creating or editing a mapping graphically and export it to plain text users also have the ability to import plain text mapping code. This can be helpful when users want to maintain their mappings with the help of the mapping tool by using some of the features provided by the tool.

It also enables the user to merge pieces of code between mappings. Consider the following situation where the user starts to create a mapping for the template Infobox:Prime minister which has almost the same properties as the template Infobox:officeholder because PrimeMinister is a sub class of OfficeHolder. A user can copy the relevant pieces of code from the mapping Mapping:Infobox officeholder and merge it into the mapping Mapping:Infobox Prime minister. After having merged the relevant parts the user can benefit from a predefined mapping for the template Infobox:Prime minister. Afterwards users can customize the predefined mapping by adding new properties or changing and deleting invalid ones.

To use the import feature the user has to select the tab Reverse from the widget in the center of the tool. Thereafter a plain text mapping can be added to the text box in the center. By clicking the button apply the tool processes the given mapping code. If the code is not a valid mapping the tool will show a box with an error message and further information on the error. If no error message comes up the tool loads the mapping tree in the Mapper tab where the user can edit the mapping in the GUI.

Save mapping to DBpedia

If the user has finished the mapping process the tool provides a method to save the generated mapping to the DBpedia MediaWiki. This requires the following two basic steps:

  • Generating a plain text representation of the mapping and
  • sending the generated text to DBpedia.

To generate the textual representation of the mapping the user has to change the view by selecting the tab Output instead of the tab Mapper (see figure 5). The tool then automatically parses the graphical representation shown in the tab Mapper and presents the corresponding textual mapping. This text can be manually copied and pasted to the DBpedia MediaWiki by the user. Another option is to use the tool to save the mapping to the DBpedia MediaWiki. Therefore the user can use the button send to DBpedia and an AJAX request to the tools back end will be executed.

[4] Figure 5: Screenshot of the tools output view

The back end can be configured to check the mapping for validity by using the DBpedia build in validation service\footnote{\url{http://mappings.dbpedia.org/server/mappings/en/validate/}}. If no error occurs it will force the saving process on DBpedia otherwise a message with the errors is shown to the user.

Maintaining mappings

Maintaining template mappings is one of the most important tasks for developers creating mappings for DBpedia. Because Wikipedia is an open platform where thousands of users change and update the content permanently template definitions change from time to time.

Maintaining templates is time consuming and evolves with the number of mapped templates. The tool can help developers to ease this repeating task by loading the template properties of a Wikipedia template as well as the corresponding mapping.

Presenting and aggregating these information eases to focus on the task of maintaining. The tool also provides the ontology properties if the user selects the relevant ontology class in the ontology widget. With these three different information blocks (Wikipedia, Mapping and Ontology) maintaining becomes handy and saves time.

Validate mappings

The DBpedia mapping tool can validate template mappings and show users possible errors by a small icon and a short help message when a user hovers on an erroneous node. Validation can be done in two different ways the semantical validation checks for correct Wikipedia and ontology properties, whereas the syntactical validation checks for a syntactical valid mapping.

A semantical incorrect template mapping links non existing Wikipedia properties to ontology properties or vice versa but without being syntactical incorrect. On the other hand it is possible to create syntactical incorrect mappings which are semantical correct e.g. when a DBpedia template element is not conform with the grammar rules but maps properties correctly. Both forms of validation are supported by the DBpedia mapping tool.

Semantical validation

Semantical validation or comparing is a feature to find unmapped or non-existent Wikipedia properties in template mappings. It is also designed to determine non-existent DBpedia ontology properties used in template mappings. Found differences are highlighted and a help message is attached to the node when a user hovers with the mouse over the marked node. Figure \ref{fig:compare} shows the screen view after the compare feature has been run. On the left side in the Wikipedia property widget all mapped properties are grayed out, so that users can identify unmapped properties by color and add these properties to the template mapping. In the mapping widget unknown template or ontology properties are marked with a red exclamation mark. If the user locates the mouse pointer over the node a blue bubble with a help message appears to tell the user that the template property is unknown. In the ontology property widget already mapped properties are grayed out as well.

If more than one ontology class is found in the template mapping the properties for each found ontology class are loaded and compared to the mapping but just the first found class is loaded to the ontology property widget.

To compare a template mapping with the Wikipedia and ontology properties a user has to push the button compare located in the menu bar of the center widget.

[5] Figure 6: Visual representation of a template mapping after running the compare feature

Syntactical validation

Syntactical validation is implemented in the front end as well as on the back end. Whereas the front end validation is to prevent users of making mistakes the back end validation is intended to keep invalid mappings out of the DBpedia MediaWiki.

Users can start the validation process by clicking the validate tree button on the menu of the mapping widget in the center. This persuades the mapping tool to iterate on the mapping nodes starting with the root node and validating each against the DBpedia Mapping Language grammar. It checks for properties without values, for correct multiplicity of properties as well as for correct nesting.

If users modify a sub tree of a template mapping in the center widget it is possible to use the validate subtree menu item on the context menu of each node or to use the [ALT + click] shortcut to validate this particular node and the sub nodes.

[6] Figure 7: Example of a syntax validation result with help message

Because syntax errors affect the template mapping as a whole erroneous nodes invalidate their parents as well (see figure \ref{fig:syntaxValidation} for an example). If a syntax error is found the erroneous node as well as its parents are marked with a red exclamation mark when the global validate tree option is used. Besides the marking with an icon the erroneous node gets a help message attached to explain users what syntax error has been found which can be seen when the mouse rests on the node. In contrast to the global validate tree feature the validate subtree method does not infect parent nodes if an error occurs.

It is also possible to check the syntax of a mapping against the DBpedia remote validation service. Therefore a user has to use the corresponding menu item on the Output tab. The back end sends a validation request to the DBpedia service and returns a list of errors found by the service.

Adding ontology classes

The DBpedia ontology is based on the most commonly used infobox templates within the English edition of Wikipedia\cite{dbpedia_jws}. Therefore creating new mappings for DBpedia requires adding new ontology classes regularly. The tool provides the user with a simple form (see figure 8) to create new ontology classes to enhance the work flow. To open the form a user has to push the button new in the menu of the ontology widget on the right bottom of the screen.

[7] Figure 8: Form to add an ontology class

Adding properties

Editing or creating template mappings with the predefined set of ontology classes, properties and data types of DBpedia is sometimes not sufficient. In some cases the set of available properties in the ontology does not entirely reflect the available properties defined in the Wikipedia template. This can happen due to changes in the ontology or in the Wikipedia template since the last visit. For these cases the tool offers the ability to add new properties with the aid of a simple form which can be accessed by pressing the new button in the menu of the property widget. %(Please see figure \ref{fig:createPropertyForm} on page \pageref{fig:createPropertyForm})

A popup appears, where the user has to fill the mandatory fields marked with a star (see figure \ref{fig:createPropertyForm}), this includes the type, the title and the label of the property. The title is the internally used name of the property, which is referenced in the template mappings; the label is a short description of the property and is displayed to the user. The label can differ from language to language, while the title remains the same for all languages. At the present moment the tool only supports English, which is why users should use English titles, labels and comments when creating new properties. Choosing the type ObjectProperty or DatatypeProperty depends on the underlying data of a Wikipedia template property and complies with the definitions in the OWL specification\cite{ref:owlProperty}. ObjectProperties refer to individuals which are represented by an ontology class whereas DatatypeProperties reflect data values like length or weight. The domain field supports auto completion for all available ontology classes of the DBpedia ontology. If the domain field is left blank the domain is automatically set to owl:Thing.

When a user tries to save the property a XmlHttpRequest to the tools back end is made and the back end sends a request to the DBpedia API to check whether the property already exist for example when it is tied to another domain and therefore not shown in the list of the available properties for the currently selected ontology class. If it does not find a matching property it will save the new one, otherwise it will load the property data from DBpedia and ask the user if the saving process should be aborted or if the data on DBpedia should be overwritten. Subsequently the tool reloads the property widget.

Sometimes a new property is not available immediately after the automatic reload of the property widget due to the internal process of saving data to the ontology. If this is the case a user has to force a manual reload of the property widget by clicking on the ontology class in the ontology widget a few moments later.

[8] Figure 9: Form to add an ontology property

Handling Wikipedia redirects

Redirects are alternative titles for articles. They are designed to help the users to organize a wiki by making articles accessible under different names.

#redirect [Template:Infobox officeholder]

Listing: Example of a Wikipedia redirect to Template:Infobox officeholder in the Template:Infobox Prime Minister template

Not all Wikipedia templates consist of a textual template definition instead they redirect to other defined templates e.g. the Wikipedia template Template:Infobox Prime Minister redirects to the Wikipedia template Template:Infobox officeholder which means that the template for a prime minister inherits all properties of the officeholder template.

Nevertheless users can create a mapping for the Infobox Prime Minister\footnote{\url{http://mappings.dbpedia.org/index.php/Mapping:Infobox_Prime_Minister}} template. This is why the tool supports redirects when a user tries to load the available template properties from Wikipedia for Prime Minister. The tool then shows a confirmation box to the user asking if the tool should load the properties from the redirected template. This enables the user to create a mapping for the Prime Minister template while having access to the properties of the officeholder template of Wikipedia.

Miscellaneous/other features

While Mapping is the core feature of the program there is also a need for some other minor enhancements to work in a proper work-flow. These features include easy access to Wikipedia pages using a particular template as well as shortcuts.

Using shortcuts

The program supports several shortcuts to improve the users work-flow. Often a user wants to add a new property at the bottom of the mapping but drag-and-drop the appropriate element from the template widget to the mapping widget is forbidden by the build-in nesting rules. For example a PropertyMapping node cannot be dropped on the last PropertyMapping node because this would result in an improper nesting of templates. To avoid the situation where a user has to scroll to the mappings node on top of the mapping widget to drop the PropertyMapping node the program offers a shortcut to copy a node and attach it to the end of the mapping - this can be achieved by pressing the Shift key while clicking on a node with the mouse. This works for most of the available nodes. The same result can be achieved by using the context menu on a node and choose the menu item copy node. Furthermore the program has shortcuts for deleting nodes (Ctrl + click) and validating subtrees (Alt + click).

Fetching Wikipedia articles using a specific template

While mapping a Wikipedia template property to the corresponding ontology property identifying the meaning of a label or to find the matching ontology property is sometimes difficult. Therefore the user has to check which values are typically assigned to the template property when it is used in Wikipedia articles. This can be time consuming, especially when the found article does not use the ambiguous template property. To improve the process of finding a Wikipedia article which uses the currently mapped template the user has the ability to use the examples feature. The program offers five random example Wikipedia articles which use the specific template from a set of the first 100 offered by the Wikipedia API. The feature can be accessed by clicking the button with the label examples in the Wikipedia property widget.

Synchronizing the ontology

To synchronize the underlying ontology of the mapping tool with the dataset from DBpedia a user can use the synchronize button on the left side of the menu bar. This feature requires a password to prevent an abusive usage of this operation which could at worst exceed the server and database capacity by generating the database representation of the ontology over and over. The most time consuming part of this operation is querying the API of the DBpedia MediaWiki for all necessary information. Depending on the configuration the tool also supports caching for this operation which supersedes the HTTP request to DBpedia as a consequence of this users have to invalidate the cache before synchronizing the ontology.

Cache invalidation

Cache invalidation is always required when a user changes the configuration of the tool to force a reload of the settings. Besides it can be necessary when the ontology has to be regenerated from the DBpedia source, as the tool is able to cache the API results of DBpedia for a fixed time period defined in the tools config/config.ini.