Main Page: Difference between revisions
No edit summary |
|||
(25 intermediate revisions by 7 users not shown) | |||
Line 21: | Line 21: | ||
{{ PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }} | {{ PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }} | ||
}} | }} | ||
</pre> | </pre> | ||
This mapping extracts three information bits: | This mapping extracts three information bits: | ||
Line 28: | Line 28: | ||
# the actor's place of birth. | # the actor's place of birth. | ||
Therefore, three RDF triples for each Infobox_actor in the English Wikipedia are extracted. For example for [http://en.wikipedia.org/w/index.php?title= | Therefore, three RDF triples for each Infobox_actor in the English Wikipedia are extracted. For example for [http://en.wikipedia.org/w/index.php?title=Vince_Vaughn&oldid=437756176 Vince Vaughn] | ||
<pre> | <pre> | ||
dbpedia:Vince_Vaughn rdf:type dbpedia-owl:Actor . | dbpedia:Vince_Vaughn rdf:type dbpedia-owl:Actor . | ||
Line 38: | Line 38: | ||
== Detailed Information == | == Detailed Information == | ||
* Check the '''[[Mapping Guide]]''' that defines the best practices for how to write clean, efficient mappings that extract lots of high-quality data | * Check the '''[[Mapping Guide]]''' that defines the best practices for how to write clean, efficient mappings that extract lots of high-quality data | ||
* Take a look at the [[Mapping_Statistics|Mapping Statistics]] to search for relevant infoboxes to map. | * Take a look at the '''[[Mapping_Statistics|Mapping Statistics]]''' to search for relevant infoboxes to map. | ||
* '''[[How_to_edit_the_DBpedia_Ontology|How to edit the ontology | * '''[[How_to_edit_the_DBpedia_Ontology|How to edit the DBpedia ontology]]''' | ||
* '''[[How_to_edit_DBpedia_Mappings|How to edit infobox and table mappings]]''' | * '''[[How_to_edit_DBpedia_Mappings|How to edit infobox and table mappings]]''' | ||
* [[Use the DBpedia Extraction Framework]] to extract structured data | * [[Use the DBpedia Extraction Framework]] to extract structured data | ||
== Prerequisites == | == Prerequisites == | ||
If you would like to edit the mappings or ontology schema this is what you need: | If you would like to edit the mappings or ontology schema this is what you need: | ||
* a user account on this wiki (''[[Special:UserLogin|login/sign up]]'') | * a user account on this wiki (''[[Special:UserLogin|login/sign up]]'') | ||
* editor rights | * editor rights: application for editor rights is done by: | ||
** | ** register for http://forum.dbpedia.org | ||
** | ** ask for editor rights [https://forum.dbpedia.org/t/mappings-wiki-accounts/38 here]. Include your user name in the message and a short introduction of yourself. | ||
* a namespace for the language you want to write mappings for | * a namespace for the language you want to write mappings for | ||
** if the namespace does not exist already (see the left side bar) please request it at [mailto:dbpedia-discussion@lists.sourceforge.net dbpedia-discussion@lists.sourceforge.net] | ** if the namespace does not exist already (see the left side bar) please request it at [mailto:dbpedia-discussion@lists.sourceforge.net dbpedia-discussion@lists.sourceforge.net] | ||
* If you will contribute frequently, get a Github account (see below) | |||
== Editorial Process == | |||
A significant quality problem until 2015 was that there was neither bug tracking nor discussion on the best approaches. A major strength of Wikipedia and Wikidata is that editors are in constant discussion and there are established editorial processes. Such were missing on this mapping wiki, and it is our collective task to rectify the situation. If you find a problem: | |||
* Post a new issue to one of the following trackers, depending on the nature of the issue: | |||
** Mapping: https://github.com/dbpedia/mappings-tracker/issues | |||
** Ontology: https://github.com/dbpedia/ontology-tracker/issues | |||
** Extraction framework: https://github.com/dbpedia/extraction-framework/issues | |||
* Edit the corresponding Discussion page (of the mapping or ontology element): | |||
** Describe the problem in detail. The reason to do it here and not in Github is so that we have most of the info in one place | |||
** Provide a link to the issue | |||
** Propose a solution if you'd like | |||
== Best Practices == | |||
If you write a best practice, list it here: | |||
* [[Mapping Guide]] (thorough) | |||
* [http://vladimiralexiev.github.io/pres/20150209-dbpedia/add-mapping.html Adding a Mapping] (shorter) | |||
* [[Main Page#Editorial Process]] | |||
* [[Main Page#Testing Best Practices]] | |||
Focused investigations of massive problems that require discussion, fixes to many props/templates, documenting a pattern: | |||
* [[What's in a Name]] | |||
* [[Connecting Places]] [https://github.com/dbpedia/mappings-tracker/issues/29 #29] | |||
* [[Agent Relations]] | |||
== Testing Best Practices == | |||
Whenever we find or fix a problem, we should have some test cases for it. This serves many important purposes: | |||
* to illustrate the problem | |||
* as proof it works after the problem is fixed | |||
* to provide test cases for any bugs in the extraction framework (upstream bug reporting) | |||
Every infobox mapping has a link "test this mapping", eg | |||
* http://mappings.dbpedia.org/server/mappings/fr/extractionSamples/Mapping_fr:Infobox_Ville_de_Serbie | |||
Unfortunately this works mostly for EN dbpedia, see bug [https://github.com/dbpedia/extraction-framework/issues/289 #289]. But you can still test per resource, eg | |||
* http://mappings.dbpedia.org/server/extraction/fr/extract?title=Požega_(Serbie)&revid=&format=turtle-triples | |||
* http://mappings.dbpedia.org/server/extraction/bg/extract?title=Лили+Иванова&revid=&format=turtle-triples | |||
This is even better because it provides specific test cases. | |||
Also provide a link to the corresponding wiki pages in edit mode, so the markup can be seen immediately. | |||
Add these to the mapping's Discussion page. | |||
Eg on [[Mapping fr talk:Infobox Ville de Serbie]] we have: | |||
* Testing: | |||
** page: https://fr.wikipedia.org/w/index.php?title=Požega_(Serbie)&action=edit | |||
** result: http://mappings.dbpedia.org/server/extraction/fr/extract?title=Požega_(Serbie)&revid=&format=turtle-triples | |||
We've asked the developers to add UTF-8 encoding [https://github.com/dbpedia/extraction-framework/issues/304 #304], which will make it easier to inspect the output. Else you need to save it to file and open it in a proper editor. | |||
=== Custom or Default Extractor === | |||
The above URLs use the default extractor, which extracts only labels and mappings. This is probably what you need for testing, since you're debugging the mapped triples, right? | |||
If you want to see more triples, add "&extractors=custom" to the URL. This runs all available extractors. | |||
But there is a limit in the extraction samples (1000 triples?) so for big articles this may not return all expected triples. | |||
Let's illustrate with Elvis Presley: [http://mappings.dbpedia.org/server/extraction/en/extract?title=Elvis_Presley&revid=&format=turtle-triples&extractors=custom custom] 921 triples, [http://mappings.dbpedia.org/server/extraction/en/extract?title=Elvis_Presley&revid=&format=turtle-triples default] 118 triples. | |||
So the limit is not reached in this case. | |||
=== Copy IRIs not URL-encoded === | |||
The URLs above use non-ASCII characters, so they are '''International''' Resource Identifiers (IRIs). | |||
These are readable and allow a user to see what they represent. | |||
But when you copy from the browser's address box, an IRI is URL-encoded to an unreadable ugliness like: | |||
* http://mappings.dbpedia.org/server/extraction/fr/extract?title=Po%C5%BEega_(Serbie)&revid=&format=turtle-triples | |||
* http://mappings.dbpedia.org/server/extraction/bg/extract?title=%D0%9B%D0%B8%D0%BB%D0%B8+%D0%98%D0%B2%D0%B0%D0%BD%D0%BE%D0%B2%D0%B0&revid=&format=turtle-triples | |||
The browsers do that for obscure historical reasons. | |||
Please be kind to your fellow editors and use an addon that preserves IRIs, eg: | |||
* Chrome addon: [https://chrome.google.com/webstore/detail/copy-url/mkhnbhdofgaendegcgbmndipmijhbili Copy URL] | |||
If you don't have such, you can use this trick: | |||
* Copy everything but the first letter "m" | |||
* Paste, then add the missing letter "m" (or "http://m"). | |||
=== Domain Validation === | |||
The [http://mappings.dbpedia.org/validation/index.html Domain Validation service] generates a list of domain exceptions, updating it daily. | |||
For more information please refer to A. Dimou, D. Kontokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens, S. Hellmann, and R. Van de Walle. [http://jens-lehmann.org/files/2015/iswc_rml_rdfunit.pdf Assessing and refining mappings to rdf to improve dataset quality]. In Proceedings of the 14th International Semantic Web Conference, Oct. 2015. | |||
For each '''predicate''' used in a '''mapping''', it shows the '''expected''' domain class (defined for the predicate) and '''existing''' class (corresponding to that mapping). | |||
Please filter for your language (the first column) and correct as many errors as you can: | |||
* Make the '''existing''' class into a subclass of '''expected''', OR | |||
* Correct (usually raise) the domain of '''predicate''', OR | |||
* Correct the '''mapping''' to use the expected mapToClass | |||
In all cases, ''document'' the property according to the changes you made! You can see some examples of such changes in this [http://mappings.dbpedia.org/index.php?limit=50&tagfilter=&title=Special%3AContributions&contribs=user&target=VladimirAlexiev&namespace=&year=2015&month=8 list of contributions] | |||
== That's it! == | == That's it! == | ||
That is all you need to kick-start. | That is all you need to kick-start. Your contributions will be available: | ||
Your contributions will be available | |||
* in the [http://live.dbpedia.org/ DBpedia Live] end point shortly after your edit (currently only for English) | * in the [http://live.dbpedia.org/ DBpedia Live] end point shortly after your edit (currently only for English) | ||
* in the next [http://dbpedia.org/downloads DBpedia datasets] release | * in the next [http://dbpedia.org/downloads DBpedia datasets] release | ||
''Happy mapping!'' | ''Happy mapping!'' | ||
== About DBpedia == | == About DBpedia == | ||
To learn more about DBpedia itself visit http://dbpedia.org/About. | To learn more about DBpedia itself visit http://dbpedia.org/About. |
Latest revision as of 14:43, 10 July 2019
DBpedia Mappings Wiki
In this DBpedia Mappings Wiki you can help to enhance the information in DBpedia. The DBpedia Extraction Framework uses the mappings defined here to homogenize information extracted from Wikipedia before generating structured information in RDF.
Anybody can help by editing:
- the DBpedia ontology schema (classes, properties, datatypes)
- the DBpedia infobox-to-ontology mappings
Mappings can be written for a variety of languages, connecting multiligual information to a language-independent unified ontology schema (language-specific labels can be provided there).
Mapping Example
This is how you write a simple infobox mapping.
Mapping:Infobox_actor
{{TemplateMapping | mapToClass = Actor | mappings = {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }} {{ PropertyMapping | templateProperty = birth_place | ontologyProperty = birthPlace }} }}
This mapping extracts three information bits:
- the type information (Actor)
- the name of the actor
- the actor's place of birth.
Therefore, three RDF triples for each Infobox_actor in the English Wikipedia are extracted. For example for Vince Vaughn
dbpedia:Vince_Vaughn rdf:type dbpedia-owl:Actor . dbpedia:Vince_Vaughn foaf:name "Vince Vaughn"@en . dbpedia:Vince_Vaughn dbpedia-owl:birthPlace dbpedia:Minneapolis .
Detailed Information
- Check the Mapping Guide that defines the best practices for how to write clean, efficient mappings that extract lots of high-quality data
- Take a look at the Mapping Statistics to search for relevant infoboxes to map.
- How to edit the DBpedia ontology
- How to edit infobox and table mappings
- Use the DBpedia Extraction Framework to extract structured data
Prerequisites
If you would like to edit the mappings or ontology schema this is what you need:
- a user account on this wiki (login/sign up)
- editor rights: application for editor rights is done by:
- register for http://forum.dbpedia.org
- ask for editor rights here. Include your user name in the message and a short introduction of yourself.
- a namespace for the language you want to write mappings for
- if the namespace does not exist already (see the left side bar) please request it at dbpedia-discussion@lists.sourceforge.net
- If you will contribute frequently, get a Github account (see below)
Editorial Process
A significant quality problem until 2015 was that there was neither bug tracking nor discussion on the best approaches. A major strength of Wikipedia and Wikidata is that editors are in constant discussion and there are established editorial processes. Such were missing on this mapping wiki, and it is our collective task to rectify the situation. If you find a problem:
- Post a new issue to one of the following trackers, depending on the nature of the issue:
- Mapping: https://github.com/dbpedia/mappings-tracker/issues
- Ontology: https://github.com/dbpedia/ontology-tracker/issues
- Extraction framework: https://github.com/dbpedia/extraction-framework/issues
- Edit the corresponding Discussion page (of the mapping or ontology element):
- Describe the problem in detail. The reason to do it here and not in Github is so that we have most of the info in one place
- Provide a link to the issue
- Propose a solution if you'd like
Best Practices
If you write a best practice, list it here:
- Mapping Guide (thorough)
- Adding a Mapping (shorter)
- Main Page#Editorial Process
- Main Page#Testing Best Practices
Focused investigations of massive problems that require discussion, fixes to many props/templates, documenting a pattern:
Testing Best Practices
Whenever we find or fix a problem, we should have some test cases for it. This serves many important purposes:
- to illustrate the problem
- as proof it works after the problem is fixed
- to provide test cases for any bugs in the extraction framework (upstream bug reporting)
Every infobox mapping has a link "test this mapping", eg
Unfortunately this works mostly for EN dbpedia, see bug #289. But you can still test per resource, eg
- http://mappings.dbpedia.org/server/extraction/fr/extract?title=Požega_(Serbie)&revid=&format=turtle-triples
- http://mappings.dbpedia.org/server/extraction/bg/extract?title=Лили+Иванова&revid=&format=turtle-triples
This is even better because it provides specific test cases. Also provide a link to the corresponding wiki pages in edit mode, so the markup can be seen immediately. Add these to the mapping's Discussion page.
Eg on Mapping fr talk:Infobox Ville de Serbie we have:
- Testing:
We've asked the developers to add UTF-8 encoding #304, which will make it easier to inspect the output. Else you need to save it to file and open it in a proper editor.
Custom or Default Extractor
The above URLs use the default extractor, which extracts only labels and mappings. This is probably what you need for testing, since you're debugging the mapped triples, right? If you want to see more triples, add "&extractors=custom" to the URL. This runs all available extractors. But there is a limit in the extraction samples (1000 triples?) so for big articles this may not return all expected triples.
Let's illustrate with Elvis Presley: custom 921 triples, default 118 triples. So the limit is not reached in this case.
Copy IRIs not URL-encoded
The URLs above use non-ASCII characters, so they are International Resource Identifiers (IRIs). These are readable and allow a user to see what they represent. But when you copy from the browser's address box, an IRI is URL-encoded to an unreadable ugliness like:
- http://mappings.dbpedia.org/server/extraction/fr/extract?title=Po%C5%BEega_(Serbie)&revid=&format=turtle-triples
- http://mappings.dbpedia.org/server/extraction/bg/extract?title=%D0%9B%D0%B8%D0%BB%D0%B8+%D0%98%D0%B2%D0%B0%D0%BD%D0%BE%D0%B2%D0%B0&revid=&format=turtle-triples
The browsers do that for obscure historical reasons. Please be kind to your fellow editors and use an addon that preserves IRIs, eg:
- Chrome addon: Copy URL
If you don't have such, you can use this trick:
- Copy everything but the first letter "m"
- Paste, then add the missing letter "m" (or "http://m").
Domain Validation
The Domain Validation service generates a list of domain exceptions, updating it daily. For more information please refer to A. Dimou, D. Kontokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens, S. Hellmann, and R. Van de Walle. Assessing and refining mappings to rdf to improve dataset quality. In Proceedings of the 14th International Semantic Web Conference, Oct. 2015.
For each predicate used in a mapping, it shows the expected domain class (defined for the predicate) and existing class (corresponding to that mapping). Please filter for your language (the first column) and correct as many errors as you can:
- Make the existing class into a subclass of expected, OR
- Correct (usually raise) the domain of predicate, OR
- Correct the mapping to use the expected mapToClass
In all cases, document the property according to the changes you made! You can see some examples of such changes in this list of contributions
That's it!
That is all you need to kick-start. Your contributions will be available:
- in the DBpedia Live end point shortly after your edit (currently only for English)
- in the next DBpedia datasets release
Happy mapping!
About DBpedia
To learn more about DBpedia itself visit http://dbpedia.org/About.