Mapping Guide: Difference between revisions

From DBpedia Mappings
Jump to navigationJump to search
(obsolete url vladimiralexiev, similar solution)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Dear fellow Mappers,
Dear fellow Mappers,
one benefit of the DBpedia ontology is to standardise and reduce the redundancy in properties used by entities. At the moment, the DBpedia ontology is starting to inflate with equal properties. The ontology is getting unclear and the benefits of standardisation get lost. For instance, there are the ontology properties: [[OntologyProperty:DateClosed|dateClosed]], [[OntologyProperty:ClosingDate|closingDate]], [[OntologyProperty:Closed|closed]], [[OntologyProperty:DateOfAbandonment|dateOfAbandonment]] and [[OntologyProperty:Dissolved|dissolved]]. All these properties describe the same or at least nearly the same. For example a closure of a firm, closing a road, decommissioning of facilities, or an abandonment of a project.  
one benefit of the DBpedia ontology is to standardise and reduce the redundancy in properties used by entities. At the moment, the DBpedia ontology is starting to inflate with equal properties. The ontology is getting unclear and the benefits of standardisation get lost. For instance, there are the ontology properties: [[OntologyProperty:DateClosed|dateClosed]], [[OntologyProperty:ClosingDate|closingDate]], [[OntologyProperty:Closed|closed]], [[OntologyProperty:DateOfAbandonment|dateOfAbandonment]] and [[OntologyProperty:Dissolved|dissolved]]. All these properties describe the same or at least nearly the same. For example a closure of a firm, closing a road, decommissioning of facilities, or an abandonment of a project.  
It seems that there is a need for a short guide how to write mappings and take care of the usefulness of the ontology. The basic introductions for writting mappings could be found [[How_to_edit_DBpedia_Mappings|here]].
It seems that there is a need for a short guide how to write mappings and take care of the usefulness of the ontology. The basic introductions for writting mappings are at [[How_to_edit_DBpedia_Mappings]].


== General instructions ==
== General instructions ==
Line 10: Line 10:
Generally, if you found unclear or doubled ontology properties, do not hesitate to create a discussion page for this property and note your questions or objections about this property. Help us to keep the ontology clean and useful.
Generally, if you found unclear or doubled ontology properties, do not hesitate to create a discussion page for this property and note your questions or objections about this property. Help us to keep the ontology clean and useful.


== Check the mapping statistics ==
=== Check the mapping statistics ===
If you ask yourself "Where do I start mapping?", please check  the [[Mapping_Statistics|Mapping Statistics]]. They give you a good idea of where new mappings would make the biggest impact.
If you ask yourself "Where do I start mapping?", please check  the [[Mapping_Statistics|Mapping Statistics]]. They give you a good idea of where new mappings would make the biggest impact.


== Check redirects for your infobox ==
=== Check redirects for your infobox ===
If you have found an infobox that isn't mapped already, check whether the infobox is redirected to another. If it is so, check whether that infobox is already mapped. If it is not, create a mapping for the infobox to which is redirected, not for the one that redirects to another.
If you have found an infobox that isn't mapped already, check whether the infobox is redirected to another. If it is, check whether that infobox is already mapped. If it is not, '''only then''' create a mapping for the target infobox, not for the redirected infobox.
* If you don't, you will wonder as in [https://github.com/dbpedia/extraction-framework/issues/296 #296] Why Infobox_Geopolitical_organization (eg United_Nations) is mapped to Country?
* Hopefully [https://github.com/dbpedia/mappings-tracker/issues/3 #3] "Statistics and Validator to check for redirected templates" will be done soon, to warn against problems like this


== Read infobox template documentation ==
=== Merge redirected templates ===
A new service was added that shows all redirected templates: http://mappings.dbpedia.org/server/mappings/en/redirects/ (substitute your language for "en").
You can contribute by merging redirected templates, which is tracked under [https://github.com/dbpedia/mappings-tracker/issues/59 #59].
 
If the template was merely moved (renamed) in Wikipedia:
* Use the Move link to rename the DBpedia mapping
* Enter reason "moved template"
* On the next page, click the old template name (the URL ends in "redirect=no")
* Delete this redirect page, since we don't need them in DBpedia
 
If the template was merged to another template in Wikipedia, you need to merge its DBpedia mapping to the target mapping.
This requires intellectual effort:
* Find discriminator field(s) that can distinguish between entity types, eg:
** "type" can distinguish Geopolitical Organisation from Country
** "lake name" can distinguish between Lake and other BodyOfWater
* Check which fields of the redirected template are still used. Sometimes the target template's documentation doesn't mention all fields of redirected templates
* Merge the mapping of these fields into the target template
* Delete the redirected template
 
=== Read infobox template documentation ===
Take the template documentation of the infobox that you want to map as your source for property definitions. It can be found at the Wikipedia page of the template. See at the template documentation of the [http://en.wikipedia.org/wiki/Template:Infobox_China_station Infobox China station] for instance. Of course, not all templates have a adequate documentation. So, if your infobox hasn't one, the following points become even more important.  
Take the template documentation of the infobox that you want to map as your source for property definitions. It can be found at the Wikipedia page of the template. See at the template documentation of the [http://en.wikipedia.org/wiki/Template:Infobox_China_station Infobox China station] for instance. Of course, not all templates have a adequate documentation. So, if your infobox hasn't one, the following points become even more important.  
== Create an empty mapping ==
* Checkout [https://github.com/dbpedia/mappings_chrome_extension mappings_chrome_extension] that generates blank mappings by analyzing which template properties are used (template statistics)
* Go to chrome://extensions, enable "Developer mode", "Load unpacked extension", point to the extension folder
To test it:
* Go to http://mappings.dbpedia.org/server/statistics/en/?show=100000 (or any other language), find an uncreated infobox (red/grey), click edit
* This will go to a URL like http://mappings.dbpedia.org/index.php?title=Mapping_en:Infobox_concentration_camp&action=edit
* The extension will fill the edit box with a blank mapping template like this:
<pre>
{{TemplateMapping | mapToClass =
| mappings =
<!-- {{ PropertyMapping | templateProperty = location | ontologyProperty = }} -->
<!-- {{ PropertyMapping | templateProperty = coordinates type | ontologyProperty = }} -->
<!-- {{ PropertyMapping | templateProperty = in operation | ontologyProperty = }} -->
        ....
</pre>
See the presentation [https://github.com/VladimirAlexiev/my/blob/master/pres/20150209-dbpedia/add-mapping-long.html#sec-6 Adding a DBpedia Mapping] for more useful hints.


== Check for similar mappings ==
== Check for similar mappings ==
Line 24: Line 64:
DO NOT copy blindly.
DO NOT copy blindly.
* Do not just copy and paste, but take a careful look at properties that are equal or similar to properties used in your infobox to map.
* Do not just copy and paste, but take a careful look at properties that are equal or similar to properties used in your infobox to map.
* Your infobox may be different from the infobox of the mapping you're reusing. Read the documentation of your infobox. Especially check whether to map to OntologyProperties or DataProperties (see [http://vladimiralexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html#sec-3-2 Object/DataProp Dichotomy])
* Your infobox may be different from the infobox of the mapping you're reusing. Read the documentation of your infobox. Especially check whether to map to OntologyProperties or DataProperties (see [https://github.com/VladimirAlexiev/my/blob/master/pres/20150209-dbpedia/dbpedia-problems-long.html#sec-3-2 Object/DataProp Dichotomy])
* Many mappings have various errors. Do not propagate the errors by copying uncritically. If you find (or even suspect) an error in the mapping you're reusing, [http://github.com/dbpedia/mappings-tracker/issues/new raise an issue]
* Many mappings have various errors. Do not propagate the errors by copying uncritically. If you find (or even suspect) an error in the mapping you're reusing, [http://github.com/dbpedia/mappings-tracker/issues/new raise an issue]


Line 71: Line 111:
The range of the property should be defined by considering the property values and the infobox definition. Some infobox properties hold different data types or patterns of values, because the infobox property is not clearly defined in the template documentation. Therefore, Wikipedia authors use that property as they want. That makes it difficult for us to define the property's range. If a range is defined in the infobox definition, generally stick to that range. I found infobox properties with a range defined, but as I checked the values, I had to discover that the property values mostly disagree with the defined range. In such a case, chose a range that covers the property values and leave a note in the property comment. If the infobox property has no range defined, you always have to look at the values. For example, you have to weigh up to chose between a strict [[Template:ObjectProperty|object property]] with an ontology class as range or a [[Template:DatatypeProperty|data type property]] with xsd:string as range. A string would catch more information, but a object property is the clearer definition. You can motivate your decision in the property comment.
The range of the property should be defined by considering the property values and the infobox definition. Some infobox properties hold different data types or patterns of values, because the infobox property is not clearly defined in the template documentation. Therefore, Wikipedia authors use that property as they want. That makes it difficult for us to define the property's range. If a range is defined in the infobox definition, generally stick to that range. I found infobox properties with a range defined, but as I checked the values, I had to discover that the property values mostly disagree with the defined range. In such a case, chose a range that covers the property values and leave a note in the property comment. If the infobox property has no range defined, you always have to look at the values. For example, you have to weigh up to chose between a strict [[Template:ObjectProperty|object property]] with an ontology class as range or a [[Template:DatatypeProperty|data type property]] with xsd:string as range. A string would catch more information, but a object property is the clearer definition. You can motivate your decision in the property comment.
=== Comments ===
=== Comments ===
'''YOU MUST''' add an English comment to the ontology property. And not just repeat the prop name, butgive some useful info on usage, and contrast to other similar props. If the template documentation has a definition of the property, use it as comment. A short description of the property or a definition of the property values is really helpful for other people, which have to decide whether this property can be used for their mapping.
'''YOU MUST''' add an English comment to the ontology property. And not just repeat the prop name, but give some useful info on usage, and contrast to other similar props. If the template documentation has a definition of the property, use it as comment. A short description of the property or a definition of the property values is really helpful for other people, which have to decide whether this property can be used for their mapping.


The biggest complaint of your fellow ontology editors is that props are not documented. In the near bright future, new props and classes without comment will be ''deleted''.
The biggest complaint of your fellow ontology editors is that props are not documented. In the near bright future, new props and classes without comment will be ''deleted''.

Latest revision as of 22:47, 14 March 2022

Dear fellow Mappers, one benefit of the DBpedia ontology is to standardise and reduce the redundancy in properties used by entities. At the moment, the DBpedia ontology is starting to inflate with equal properties. The ontology is getting unclear and the benefits of standardisation get lost. For instance, there are the ontology properties: dateClosed, closingDate, closed, dateOfAbandonment and dissolved. All these properties describe the same or at least nearly the same. For example a closure of a firm, closing a road, decommissioning of facilities, or an abandonment of a project. It seems that there is a need for a short guide how to write mappings and take care of the usefulness of the ontology. The basic introductions for writting mappings are at How_to_edit_DBpedia_Mappings.

General instructions

Create a user page for your account and insert some information about yourself. Please add your email address. Thank You!

Please try to minimize the amount of edits. First write the whole mapping before committing it. That helps other people to keep track of the edits.

Generally, if you found unclear or doubled ontology properties, do not hesitate to create a discussion page for this property and note your questions or objections about this property. Help us to keep the ontology clean and useful.

Check the mapping statistics

If you ask yourself "Where do I start mapping?", please check the Mapping Statistics. They give you a good idea of where new mappings would make the biggest impact.

Check redirects for your infobox

If you have found an infobox that isn't mapped already, check whether the infobox is redirected to another. If it is, check whether that infobox is already mapped. If it is not, only then create a mapping for the target infobox, not for the redirected infobox.

  • If you don't, you will wonder as in #296 Why Infobox_Geopolitical_organization (eg United_Nations) is mapped to Country?
  • Hopefully #3 "Statistics and Validator to check for redirected templates" will be done soon, to warn against problems like this

Merge redirected templates

A new service was added that shows all redirected templates: http://mappings.dbpedia.org/server/mappings/en/redirects/ (substitute your language for "en"). You can contribute by merging redirected templates, which is tracked under #59.

If the template was merely moved (renamed) in Wikipedia:

  • Use the Move link to rename the DBpedia mapping
  • Enter reason "moved template"
  • On the next page, click the old template name (the URL ends in "redirect=no")
  • Delete this redirect page, since we don't need them in DBpedia

If the template was merged to another template in Wikipedia, you need to merge its DBpedia mapping to the target mapping. This requires intellectual effort:

  • Find discriminator field(s) that can distinguish between entity types, eg:
    • "type" can distinguish Geopolitical Organisation from Country
    • "lake name" can distinguish between Lake and other BodyOfWater
  • Check which fields of the redirected template are still used. Sometimes the target template's documentation doesn't mention all fields of redirected templates
  • Merge the mapping of these fields into the target template
  • Delete the redirected template

Read infobox template documentation

Take the template documentation of the infobox that you want to map as your source for property definitions. It can be found at the Wikipedia page of the template. See at the template documentation of the Infobox China station for instance. Of course, not all templates have a adequate documentation. So, if your infobox hasn't one, the following points become even more important.

Create an empty mapping

  • Checkout mappings_chrome_extension that generates blank mappings by analyzing which template properties are used (template statistics)
  • Go to chrome://extensions, enable "Developer mode", "Load unpacked extension", point to the extension folder

To test it:

{{TemplateMapping | mapToClass = 
| mappings = 
	<!-- {{ PropertyMapping | templateProperty = location | ontologyProperty = }} -->
	<!-- {{ PropertyMapping | templateProperty = coordinates type | ontologyProperty = }} -->
	<!-- {{ PropertyMapping | templateProperty = in operation | ontologyProperty = }} -->
        ....

See the presentation Adding a DBpedia Mapping for more useful hints.

Check for similar mappings

A helpful hint. Check for already mapped infoboxes that describe similar things. Example: If you want to map the "Infobox China station", the mappings for "Infobox station" or "Infobox japan station" are really helpful. You can find similar infoboxes via the Wikipedia categories. Most template documentation pages have links to that categories at their bottom.

DO NOT copy blindly.

  • Do not just copy and paste, but take a careful look at properties that are equal or similar to properties used in your infobox to map.
  • Your infobox may be different from the infobox of the mapping you're reusing. Read the documentation of your infobox. Especially check whether to map to OntologyProperties or DataProperties (see Object/DataProp Dichotomy)
  • Many mappings have various errors. Do not propagate the errors by copying uncritically. If you find (or even suspect) an error in the mapping you're reusing, raise an issue

Map the properties

Please spend some research effort into this issue.

Get an overview of the property values

Get an overview of the values of the infobox property that you want to map. Issue #327 should make this very easy, but for now you need to mess with some (light) SPARQL.

Go to http://dbpedia.org/sparql and enter the following query:

SELECT DISTINCT * WHERE 
{
?s  <http://dbpedia.org/property/platform>  ?o.
?s  <http://dbpedia.org/property/wikiPageUsesTemplate>  <http://dbpedia.org/resource/Template:Infobox_china_station>.
}

Instead of platform you enter the name of your infobox property. Consider that spaces and underscores are removed and compound words are camelCase. Instead of Infobox_china_station you enter your infobox for which you are just writing a mapping.

  • The current DBpedia version can already be outdated, therefore you have to consider recent redirects. The "Infobox china station" now redirects to "Infobox China station" for example.
  • If your query do not deliver results, try a simple property that is mostly used in the infobox like "name" for instance. So you can check whether your query is correct.
  • Otherwise, check the infobox history for redirects and try other variations of the infobox name. From the results, you know what kind of values the property holds.
  • Better yet, query http://live.dbpedia.org/sparql that should be pretty up to date

Search for ontology properties

Search for ontology properties not only via the left-hand search box in the Wiki-menu, but via the Ontology Properties link in the menu. Consider that you can not just search for "date" and all properties that include "date" in their name or label are displayed. You will only get the properties that start with the term "date", so the property closingDate is not in the results. The search function of the Wiki is not sufficient at all. Therefore, do not rely upon the search results in the moment (btw. do you know a good Wikimedia search extension?). If you have found a possible ontology property for the infobox property, check out the "What links here"-link of the Wiki and compare the already mapped infobox properties with the one you want to map to that ontology property. Do they describe same things? Note that some of the already written mappings can be inaccurate. If you found inconsistency, add your concerns to the discussion page of the inaccurate mapping, or change it, if it's an unambiguous error.

Create new ontology properties

If you have an infobox property that definitely can not be mapped to an existing ontology property, you can create a new one. But please stick to some simple rules:

Naming Conventions

Generally, the property name is build from more than one word.

  • All props must use lowerCaseCamelCase names, while classes use UpperCaseCamelCase.
    • Casing is especially important when you use a prop (in rdfs:subPropertyOf) or class (in rdfs:domain, rdfs:range).
    • See what happens to those who're not careful:
     dbo:bronzeMedalist rdfs:subPropertyOf dbo:Medalist .
  • Use full US English words, this helps your fellow editors with searching. "eyeColor" & "hairColor" are ok, "colour" is not ok
  • The name of the new property should not just copied from the infobox
    • Garbage prop names like "appmag_v, dist_ly, names, size_v, Dist ly, Names, Dist pc, Credit, Dec, Ra" won't be tolerated. ¡No pasarán! (#7)
    • Better take a look at the template documentation and the property definition if there is one.
    • If not, take a look at a few Wikipedia articles that uses the infobox you want to map, or revert to the SPARQL Query above, and check how the property is used.
  • Specific data
    • If the property is used for numbers, use prefix "numberOf*" or suffix "*Number" (unfortunately whether to use prefix or suffix, is not yet standardized)
    • If the property is used for dates, "date" should be part of the new prop name, or use suffix "*AsOf"

Domain

Take care by defining a domain and a range of properties. Do not just define them as owl:Thing only because it is simple. If your property is especially for an ontology class, do not hesitate to define this class as domain. That will prevent people to reuse this property for other classes by mistake, especially if the property name is not unambiguous.

Range

The range of the property should be defined by considering the property values and the infobox definition. Some infobox properties hold different data types or patterns of values, because the infobox property is not clearly defined in the template documentation. Therefore, Wikipedia authors use that property as they want. That makes it difficult for us to define the property's range. If a range is defined in the infobox definition, generally stick to that range. I found infobox properties with a range defined, but as I checked the values, I had to discover that the property values mostly disagree with the defined range. In such a case, chose a range that covers the property values and leave a note in the property comment. If the infobox property has no range defined, you always have to look at the values. For example, you have to weigh up to chose between a strict object property with an ontology class as range or a data type property with xsd:string as range. A string would catch more information, but a object property is the clearer definition. You can motivate your decision in the property comment.

Comments

YOU MUST add an English comment to the ontology property. And not just repeat the prop name, but give some useful info on usage, and contrast to other similar props. If the template documentation has a definition of the property, use it as comment. A short description of the property or a definition of the property values is really helpful for other people, which have to decide whether this property can be used for their mapping.

The biggest complaint of your fellow ontology editors is that props are not documented. In the near bright future, new props and classes without comment will be deleted.

Some examples for good comments:

Bad examples of missing comments:

Validate the infobox mapping

Validate your mapping. Use the "Test this mapping"-link at the mapping page. Especially, check properties that you have created yourself.

Add a few example Wikipedia articles that use the infobox you just mapped as test cases