Mapping statistics: Difference between revisions

From DBpedia Mappings
Jump to navigationJump to search
No edit summary
No edit summary
 
(14 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== DBpedia Mapping Statistics ==
== DBpedia Mapping Statistics ==


The statistics will give you an overview of already mapped infoboxes and their properties. In order to spend your "mapping time" efficiently, the statistics reveal on which infoboxes you should pay your main focus of attention. The Statistics are live, thus you can see your changes immediately.
The statistics will give you an overview of already mapped infoboxes and their properties. In order to spend your "mapping time" efficiently, the statistics reveal on which infoboxes you should pay your main focus of attention. The statistics are calculated live for the current mappings, thus you can see your changes immediately. They are based on numbers that are extracted from Wikipedia dumps now and then.


Statistics are available for the following languages:
Please see the [http://mappings.dbpedia.org/server/statistics/ statistics overview page] for links to the statistics for each language.
*[http://mappings.dbpedia.org/server/statistics/ca/ Catalan(ca)]
*[http://mappings.dbpedia.org/server/statistics/de/ German(de)]
*[http://mappings.dbpedia.org/server/statistics/el/ Greek(el)]
*[http://mappings.dbpedia.org/server/statistics/en/ English(en)]
*[http://mappings.dbpedia.org/server/statistics/es/ Spanish(es)]
*[http://mappings.dbpedia.org/server/statistics/fr/ French(fr)]
*[http://mappings.dbpedia.org/server/statistics/ga/ Irish(ga)]
*[http://mappings.dbpedia.org/server/statistics/hr/ Croatian(hr)]
*[http://mappings.dbpedia.org/server/statistics/hu/ Hungarian(hu)]
*[http://mappings.dbpedia.org/server/statistics/it/ Italian(it)]
*[http://mappings.dbpedia.org/server/statistics/nl/ Dutch(nl)]
*[http://mappings.dbpedia.org/server/statistics/pl/ Polish(pl)]
*[http://mappings.dbpedia.org/server/statistics/pt/ Portuguese(pt)]
*[http://mappings.dbpedia.org/server/statistics/ru/ Russian(ru)]
*[http://mappings.dbpedia.org/server/statistics/sl/ Slovene(sl)]
*[http://mappings.dbpedia.org/server/statistics/tr/ Turkish(tr)]


For each language you'll find three percentages at the top of the page. To explain them, we look at the English mapping statistics:  
For each language you'll find three percentages at the top of the page. To explain them, we look at the English mapping statistics:  
Line 28: Line 12:
</pre>
</pre>


In the first line we can see that 3.94 % of the templates in the English Wikipedia are mapped. The significance of this percentage should be handled with care, because there are of course more than 7225 templates in the English Wikipedia. But these 7225 templates have multiple properties and therefore fulfil our requirements for a potential infobox. Due to this low criterion, the statistics contain non relevant templates like [http://en.wikipedia.org/wiki/Template:Unreferenced Unreferenced] or [http://en.wikipedia.org/wiki/Template:Rail_line Rail line]. These templates aren't classical infoboxes and shouldn't be included in the statistics. On that account they can be ignored. If a template is on the ignore list, it does not count for the number of potential infoboxes. If you want templates to be ignored, send me a mail (paul.kreis@gmx.de) with the template names. If you are a really active person in the mappings wiki, we will give you the hint how to add templates to the ignore list.
In the first line we can see that 3.94 % of the templates in the English Wikipedia are mapped. The significance of this percentage should be handled with care, because there are of course more than 7225 templates in the English Wikipedia. But these 7225 templates have multiple properties and therefore fulfil our requirements for a potential infobox. Due to this low criterion, the statistics contain non relevant templates like [http://en.wikipedia.org/wiki/Template:Unreferenced Unreferenced] or [http://en.wikipedia.org/wiki/Template:Rail_line Rail line]. These templates aren't classical infoboxes and shouldn't affect the statistics. On that account they can be ignored. If a template is on the ignore list, it does not count for the number of potential infoboxes. If you want templates to be ignored, send [[User:Kreis|me]] a mail with the template names. If you are a really active person in the mappings wiki, we will give you the hint how to add templates to the ignore list.


The second line shows the mapped template occurrences. Here we can see that 80.73 % of all template occurrences are mapped already. This means that 3.94 % mapped templates cover 80.73 % of all templates used in the English Wikipedia. To understand this relation we take a look at the Infobox settlement. It occurs 226467 times in the English Wikipedia, which corresponds to about 10 % of all template occurrences. So, writing the mapping for this infobox was really effective.
The second line shows the mapped template occurrences. Here we can see that 80.73 % of all template occurrences are mapped already. This means that 3.94 % mapped templates cover 80.73 % of all templates used in the English Wikipedia. To understand this relation we take a look at the Infobox settlement. It occurs 226467 times in the English Wikipedia, which corresponds to about 10 % of all template occurrences. So, writing the mapping for this infobox was really effective.
Line 44: Line 28:
# In the sixth column (num property occurrences) you see the number of all template properties that occur in Wikipedia. For a fictive template that has 5 properties and occurs 10 times in Wikipedia, the number would be 50. The properties must have values to count here. If the fictive template occurs 10 times, but in one case only has 8 properties with a value, the number would be 48.
# In the sixth column (num property occurrences) you see the number of all template properties that occur in Wikipedia. For a fictive template that has 5 properties and occurs 10 times in Wikipedia, the number would be 50. The properties must have values to count here. If the fictive template occurs 10 times, but in one case only has 8 properties with a value, the number would be 48.
# The seventh column (mapped property occurrences (%)) contains the percentage of mapped property occurrences for this template. This percentage represents the completeness of the mapping and therefore determining for the colour of the row, which indicates the completeness.
# The seventh column (mapped property occurrences (%)) contains the percentage of mapped property occurrences for this template. This percentage represents the completeness of the mapping and therefore determining for the colour of the row, which indicates the completeness.
== DBpedia Mapping Creation Sprint/Race ==
Check this page for where your language stands in the ''race for excellence'': http://mappings.dbpedia.org/sprint/

Latest revision as of 01:58, 16 May 2012

DBpedia Mapping Statistics

The statistics will give you an overview of already mapped infoboxes and their properties. In order to spend your "mapping time" efficiently, the statistics reveal on which infoboxes you should pay your main focus of attention. The statistics are calculated live for the current mappings, thus you can see your changes immediately. They are based on numbers that are extracted from Wikipedia dumps now and then.

Please see the statistics overview page for links to the statistics for each language.

For each language you'll find three percentages at the top of the page. To explain them, we look at the English mapping statistics:

3.94 % templates are mapped ( 285 of 7225 ).
80.73 % of all template occurrences in Wikipedia ( en ) are mapped ( 1695763 of 2100472 ).
49.23 % of all property occurrences in Wikipedia ( en ) are mapped ( 16090379 of 32686631 ).

In the first line we can see that 3.94 % of the templates in the English Wikipedia are mapped. The significance of this percentage should be handled with care, because there are of course more than 7225 templates in the English Wikipedia. But these 7225 templates have multiple properties and therefore fulfil our requirements for a potential infobox. Due to this low criterion, the statistics contain non relevant templates like Unreferenced or Rail line. These templates aren't classical infoboxes and shouldn't affect the statistics. On that account they can be ignored. If a template is on the ignore list, it does not count for the number of potential infoboxes. If you want templates to be ignored, send me a mail with the template names. If you are a really active person in the mappings wiki, we will give you the hint how to add templates to the ignore list.

The second line shows the mapped template occurrences. Here we can see that 80.73 % of all template occurrences are mapped already. This means that 3.94 % mapped templates cover 80.73 % of all templates used in the English Wikipedia. To understand this relation we take a look at the Infobox settlement. It occurs 226467 times in the English Wikipedia, which corresponds to about 10 % of all template occurrences. So, writing the mapping for this infobox was really effective.

In the third line we can see that 49.23 % of all property occurrences in the English Wikipedia are mapped. This is the most interesting percentage, because it includes the property completeness of mappings. Imagine a template mapping for the Infobox person in which only the name property is mapped to the ontology.


Below the statistics at the top, you see a table with all templates ordered by their occurrences. Here is the explanation for the columns:

  1. Template occurrence
  2. The name of the template with a link to detailed property statistics.
  3. Via the "Edit" link you can directly go to the mapping of an infobox.
  4. The fourth column (num properties) holds the number of properties of the template. With a click on a template name, you can inspect this properties also ordered by their occurrence. Therefore, properties at the top are best top map.
  5. In the fifth column (mapped properties (%)) you see the percentage of properties mapped.
  6. In the sixth column (num property occurrences) you see the number of all template properties that occur in Wikipedia. For a fictive template that has 5 properties and occurs 10 times in Wikipedia, the number would be 50. The properties must have values to count here. If the fictive template occurs 10 times, but in one case only has 8 properties with a value, the number would be 48.
  7. The seventh column (mapped property occurrences (%)) contains the percentage of mapped property occurrences for this template. This percentage represents the completeness of the mapping and therefore determining for the colour of the row, which indicates the completeness.

DBpedia Mapping Creation Sprint/Race

Check this page for where your language stands in the race for excellence: http://mappings.dbpedia.org/sprint/