Old Main Page: Difference between revisions

From DBpedia Mappings
Jump to navigationJump to search
No edit summary
m (put old home page content here)
Line 1: Line 1:
== DBpedia Mappings Wiki ==
'''''This was the home page content up until 2011-07-11.'''''


In this DBpedia Mappings Wiki you can help to enhance the information in DBpedia. The DBpedia Extraction Framework uses the mappings defined here to homogenize information extracted from Wikipedia before generating structured information in [http://en.wikipedia.org/wiki/Resource_Description_Framework RDF].
== About DBpedia ==
 
[http://dbpedia.org/ DBpedia] is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. The [http://wiki.dbpedia.org/Datasets DBpedia knowledge base], which has been created by extracting structured information from Wikipedia, currently describes more than 3.5 million things, including 364,000 persons, 462,000 places (including 340,000 populated places), 99,000 music albums, 54,000 films, 17,000 video games, 148,000 organizations (including 35,000 companies and 34,000 educational institutions), 169,000 species and 5,200 diseases.
 
== About mappings.dbpedia.org  ==
This wiki contains the infobox-to-ontology and the table-to-ontology mappings which are used by the DBpedia extraction framework as well as the ontology definition itself. The framework collects the templates defined in this Wiki and extracts the Wikipedia content according to them.
 
=== DBpedia Mappings ===
 
The type of Wikipedia content that is most valuable for the DBpedia extraction are infoboxes and tables. Infoboxes display an article's most relevant facts as a table of attribute-value pairs on the top right-hand side of the Wikipedia page.
 
As Wikipedia's infobox template system has decentrally evolved over time, different communities of Wikipedia editors use different templates to describe the same type of things (e.g. infobox_city_japan, infobox_swiss_town and infobox_town_de). Different templates use different names for the same attribute (e.g. birthplace and
placeofbirth). As many Wikipedia editors do not strictly follow the recommendations given on the page that describes a template, attribute values are
expressed using a wide range of different formats and units of measurement.
 
In order to overcome the problems of synonymous attribute names and multiple templates being used for the same type of things, the DBpedia project maps Wikipedia templates as well as tables within an article to the [http://wiki.dbpedia.org/Ontology DBpedia ontology].
These mappings are specified using the '''DBpedia Mapping Language'''. The mapping language makes use of MediaWiki templates that define DBpedia ontology classes and properties as well as template/table to ontology mappings.
 
The following mappings map Wikipedia infoboxes and tables to this ontology:
 
* [http://mappings.dbpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=204 Infobox Mappings]
* [http://mappings.dbpedia.org/index.php?title=Special%3APrefixIndex&prefix=Table&namespace=204 Table Mappings]
 
=== DBpedia Ontology ===
 
The DBpedia ontology is based on OWL and forms the structural backbone of DBpedia. It describes classes, e.g. person, city, country, and properties, e.g. birth place, longitude. Information in Wikipedia articles is then mapped via the above described mapping to this ontology. Most prominently, many Wikipedia pages use so called infoboxes. For instance, the English wikipedia article about [http://en.wikipedia.org/wiki/London London] contains a "settlement infobox". This infobox may be mapped to e.g. the class "populated place" (see [[OntologyClass:PopulatedPlace|PopulatedPlace]]) in the DBpedia ontology and the attributes in the infobox are mapped to properties in the DBpedia ontology. <!-- Please see the [http://en.wikipedia.org/wiki/Template:Infobox_settlement/doc documentation of the settlement infobox] for details. --> This way, a unified view over all data in infoboxes can be obtained. Since this information conforms to Semantic Web standards, it can be queried and combined by a broad range of tools in a useful way. This increases the value of information entered by the Wikipedia community.
 
A listing of all classes, properties and datatypes (units of measurement) used by the DBpedia ontology is found below:
 
* [http://mappings.dbpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=200 Ontology Classes] - OWL classes and their definitions
* [http://mappings.dbpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=202 Ontology Properties] - OWL Object and Datatype properties
* [http://mappings.dbpedia.org/index.php?title=Special%3AAllPages&from=&to=&namespace=206 Datatypes]
 
== How is the Mapping and the Ontology maintained?  ==
So far, few people inside the DBpedia project maintained the mapping and ontology, but in the spirit of open source projects, control will be handed over to the Wikipedia and DBpedia community. The members of the DBpedia team are not able to extend the mappings to cover all Wikipedia infoboxes and tables, due to the size of the task and the knowledge required to map templates from exotic domains. Therefore, the idea of this Wiki is to enable the interested public to contribute to the definition of DBpedia mappings by updating existing mappings and by adding new mappings to this wiki.
 
=== Editor rights ===
''This wiki is read-only.'' If you like to edit the mappings or ontology schema, please '''[[Special:UserLogin|register]]'''. The DBpedia maintainers will then give you editor rights (if this does not happen within a couple of days, please ask for editor rights at [mailto:dbpedia-discussion@lists.sourceforge.net dbpedia-discussion@lists.sourceforge.net]) .
Once you got editor rights, please provide some information on yourself on your user wiki page.


Anybody can help by editing:
=== Tutorials ===
* the [[How_to_edit_the_DBpedia_Ontology|DBpedia ontology schema]] (classes, properties, datatypes)
* the [[How_to_edit_DBpedia_Mappings|DBpedia infobox-to-ontology mappings]]


Mappings can be written for a variety of languages, connecting multiligual information to a language-independent unified ontology schema (language-specific labels can be provided [[How_to_edit_the_DBpedia_Ontology|there]]).
The Specification of the '''DBpedia Mapping Language''' can be found [http://dbpedia.svn.sourceforge.net/viewvc/dbpedia/trunk/extraction/core/doc/mapping%20language/ here]. Please find below step-by-step tutorials on:


* [[Ontology_Editing|How to edit the ontology schema]]
* How to write
** [[Writing_Mappings/Templates|Template mappings]]
** [[Writing_Mappings/Tables|Table mappings]]
* [[Mapping_Guide|Best practice mapping guide]]


== Mapping Example ==
=== Tools ===
This is how you write a simple infobox mapping.


'''Mapping:Infobox_actor'''
This wiki provides several tools that help you to edit the mappings and the ontology:  
<pre>
{{TemplateMapping
| mapToClass = Actor
| mappings =
  {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }}
  {{ PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }}
}}
</pre>


This mapping extracts three information bits:
* '''Ontology View.''' The [http://mappings.dbpedia.org/server/ontology/classes ontology view] gives you an overview about the current shape of the DBpedia ontology.
# the type information (Actor)
* '''Mapping Validator.''' When you are editing a mapping, there is a validate button on the bottom of the page. Pressing the button validates your changes for syntactic correctness and highlights inconsistencies such as missing property definitions.
# the name of the actor
* '''Extraction Tester.''' The extraction tester tests a mapping against a set of example Wikipedia pages. This gives you direct feedback about whether a mapping works and how the resulting data will look like.
# the actor's place of birth.
* '''MappingTool.''' The [[MappingTool|DBpedia MappingTool]] is a graphical user interface that supports users to create and edit mappings.


Therefore, three RDF triples for each Infobox_actor in the English Wikipedia are extracted. For example for [http://en.wikipedia.org/w/index.php?title=birth_place&oldid=437756176 Vince Vaughn]
== Mappings for new languages ==
<pre>
To create mappings for a new language, you first need to register (see [[#Editor rights|above]]). The DBpedia maintainers also have to create a new language namespace in the framework and on the wiki.
dbpedia:Vince_Vaughn  rdf:type                dbpedia-owl:Actor .
dbpedia:Vince_Vaughn  foaf:name              "Vince Vaughn"@en .
dbpedia:Vince_Vaughn  dbpedia-owl:birthPlace  dbpedia:Minneapolis .
</pre>


<u>Further information</u>:
=== Create new mappings ===
* Check the '''[[Mapping Guide]]''' that defines the best practices for how to write clean, efficient mappings that extract lots of high-quality data
To get an idea of how mappings are written, you can look at some example from the English mappings.
* '''[[How_to_edit_the_DBpedia_Ontology|How to edit the ontology schema]]'''
* '''[[How_to_edit_DBpedia_Mappings|How to edit infobox and table mappings]]'''
* [[Use the DBpedia Extraction Framework]] to extract structured data


To create a new mapping, type the following line into your web browser
http://mappings.dbpedia.org/index.php/Mapping_LANGUAGE:INFOBOXNAME
* replace LANGUAGE by the language code you are currently working on (for example mt for Maltese)
* replace INFOBOXNAME by the box that you want to create a mapping for (replace spaces with underscores)


== Prerequisites ==
e.g.
If you would like to edit the mappings or ontology schema this is what you need:
http://mappings.dbpedia.org/index.php/Mapping_mt:Infobox_album
* a user account on this wiki (''[[Special:UserLogin|register]]'')
for the Album infobox on the Maltese Wikipedia.
* editor rights
** they will be given to you within a couple of days
** if not, please ask for editor rights at [mailto:dbpedia-discussion@lists.sourceforge.net dbpedia-discussion@lists.sourceforge.net] '''''Max: question: should we make this dbpedia-developers? I don't think the whole discussion list cares about Heinz from Königs Wusterhausen getting editor rights'''''
** once you got editor rights, please provide some information about yourself on your user wiki page
* a namespace for the language you want to write mappings for
** if the namespace does not exist already (see the left side bar) please request it at [mailto:dbpedia-discussion@lists.sourceforge.net dbpedia-discussion@lists.sourceforge.net]


If there is no mapping for this box yet, you will see a page saying
"There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page."
On the top you can click on "create" and start writing the mapping.


== That's it! ==
=== Use new mappings in the extraction ===
That is all you need to kick-start. To get more detailed information, please follow the provided links.
Once that there are mappings for a language, you can run the DBpedia extraction. Several things have to be installed and configured, which is documented at http://wiki.dbpedia.org/Documentation


Your contributions will be available
* Section 1 describes what has to be installed to run the DBpedia extraction framework.
* in the [http://live.dbpedia.org/ DBpedia Live] end point shortly after your edit (currently only for English)
* in the next [http://dbpedia.org/downloads DBpedia datasets] release


''Happy mapping!''
* In 4.1., all things that must be specified before starting the extraction from a dump file are listed. In the file "dump/config.properties" (using the file "dump/config.properties.default" as a template), you can specify the languages for which you want to extract, and which extractors should be used. For example, to run the HomepageExtractor and the MappingExtractor for Maltese, specify


languages=mt
extractors.mt=org.dbpedia.extraction.mappings.HomepageExtractor \
              org.dbpedia.extraction.mappings.MappingExtractor


== About DBpedia ==
* When you run the extraction (see 4.2.), the MappingExtractor will extract the information from the infoboxes that you created a mapping for. The extracted triples will be saved in a file named "mappingbased_properties_mt.nt" (for Maltese) in the output directory you specified.
To learn more about DBpedia itself visit http://dbpedia.org/About.

Revision as of 15:49, 11 July 2011

This was the home page content up until 2011-07-11.

About DBpedia

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. The DBpedia knowledge base, which has been created by extracting structured information from Wikipedia, currently describes more than 3.5 million things, including 364,000 persons, 462,000 places (including 340,000 populated places), 99,000 music albums, 54,000 films, 17,000 video games, 148,000 organizations (including 35,000 companies and 34,000 educational institutions), 169,000 species and 5,200 diseases.

About mappings.dbpedia.org

This wiki contains the infobox-to-ontology and the table-to-ontology mappings which are used by the DBpedia extraction framework as well as the ontology definition itself. The framework collects the templates defined in this Wiki and extracts the Wikipedia content according to them.

DBpedia Mappings

The type of Wikipedia content that is most valuable for the DBpedia extraction are infoboxes and tables. Infoboxes display an article's most relevant facts as a table of attribute-value pairs on the top right-hand side of the Wikipedia page.

As Wikipedia's infobox template system has decentrally evolved over time, different communities of Wikipedia editors use different templates to describe the same type of things (e.g. infobox_city_japan, infobox_swiss_town and infobox_town_de). Different templates use different names for the same attribute (e.g. birthplace and placeofbirth). As many Wikipedia editors do not strictly follow the recommendations given on the page that describes a template, attribute values are expressed using a wide range of different formats and units of measurement.

In order to overcome the problems of synonymous attribute names and multiple templates being used for the same type of things, the DBpedia project maps Wikipedia templates as well as tables within an article to the DBpedia ontology. These mappings are specified using the DBpedia Mapping Language. The mapping language makes use of MediaWiki templates that define DBpedia ontology classes and properties as well as template/table to ontology mappings.

The following mappings map Wikipedia infoboxes and tables to this ontology:

DBpedia Ontology

The DBpedia ontology is based on OWL and forms the structural backbone of DBpedia. It describes classes, e.g. person, city, country, and properties, e.g. birth place, longitude. Information in Wikipedia articles is then mapped via the above described mapping to this ontology. Most prominently, many Wikipedia pages use so called infoboxes. For instance, the English wikipedia article about London contains a "settlement infobox". This infobox may be mapped to e.g. the class "populated place" (see PopulatedPlace) in the DBpedia ontology and the attributes in the infobox are mapped to properties in the DBpedia ontology. This way, a unified view over all data in infoboxes can be obtained. Since this information conforms to Semantic Web standards, it can be queried and combined by a broad range of tools in a useful way. This increases the value of information entered by the Wikipedia community.

A listing of all classes, properties and datatypes (units of measurement) used by the DBpedia ontology is found below:

How is the Mapping and the Ontology maintained?

So far, few people inside the DBpedia project maintained the mapping and ontology, but in the spirit of open source projects, control will be handed over to the Wikipedia and DBpedia community. The members of the DBpedia team are not able to extend the mappings to cover all Wikipedia infoboxes and tables, due to the size of the task and the knowledge required to map templates from exotic domains. Therefore, the idea of this Wiki is to enable the interested public to contribute to the definition of DBpedia mappings by updating existing mappings and by adding new mappings to this wiki.

Editor rights

This wiki is read-only. If you like to edit the mappings or ontology schema, please register. The DBpedia maintainers will then give you editor rights (if this does not happen within a couple of days, please ask for editor rights at dbpedia-discussion@lists.sourceforge.net) . Once you got editor rights, please provide some information on yourself on your user wiki page.

Tutorials

The Specification of the DBpedia Mapping Language can be found here. Please find below step-by-step tutorials on:

Tools

This wiki provides several tools that help you to edit the mappings and the ontology:

  • Ontology View. The ontology view gives you an overview about the current shape of the DBpedia ontology.
  • Mapping Validator. When you are editing a mapping, there is a validate button on the bottom of the page. Pressing the button validates your changes for syntactic correctness and highlights inconsistencies such as missing property definitions.
  • Extraction Tester. The extraction tester tests a mapping against a set of example Wikipedia pages. This gives you direct feedback about whether a mapping works and how the resulting data will look like.
  • MappingTool. The DBpedia MappingTool is a graphical user interface that supports users to create and edit mappings.

Mappings for new languages

To create mappings for a new language, you first need to register (see above). The DBpedia maintainers also have to create a new language namespace in the framework and on the wiki.

Create new mappings

To get an idea of how mappings are written, you can look at some example from the English mappings.

To create a new mapping, type the following line into your web browser

http://mappings.dbpedia.org/index.php/Mapping_LANGUAGE:INFOBOXNAME
  • replace LANGUAGE by the language code you are currently working on (for example mt for Maltese)
  • replace INFOBOXNAME by the box that you want to create a mapping for (replace spaces with underscores)

e.g.

http://mappings.dbpedia.org/index.php/Mapping_mt:Infobox_album

for the Album infobox on the Maltese Wikipedia.

If there is no mapping for this box yet, you will see a page saying "There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page." On the top you can click on "create" and start writing the mapping.

Use new mappings in the extraction

Once that there are mappings for a language, you can run the DBpedia extraction. Several things have to be installed and configured, which is documented at http://wiki.dbpedia.org/Documentation

  • Section 1 describes what has to be installed to run the DBpedia extraction framework.
  • In 4.1., all things that must be specified before starting the extraction from a dump file are listed. In the file "dump/config.properties" (using the file "dump/config.properties.default" as a template), you can specify the languages for which you want to extract, and which extractors should be used. For example, to run the HomepageExtractor and the MappingExtractor for Maltese, specify
languages=mt
extractors.mt=org.dbpedia.extraction.mappings.HomepageExtractor \
              org.dbpedia.extraction.mappings.MappingExtractor
  • When you run the extraction (see 4.2.), the MappingExtractor will extract the information from the infoboxes that you created a mapping for. The extracted triples will be saved in a file named "mappingbased_properties_mt.nt" (for Maltese) in the output directory you specified.