Mapping statistics: Difference between revisions

From DBpedia Mappings
Jump to navigationJump to search
(Created page with '== DBpedia Mapping Statistics == The statistics will give you an overview of the already mapped infoboxes and their properties. In order to spend your "mapping time" efficiently...')
 
No edit summary
Line 4: Line 4:


Statistics are available for the following languages:
Statistics are available for the following languages:
[http://mappings.dbpedia.org/server/statistics/ca/ Catalan(ca)]
*[http://mappings.dbpedia.org/server/statistics/ca/ Catalan(ca)]
[http://mappings.dbpedia.org/server/statistics/de/ German(de)]
*[http://mappings.dbpedia.org/server/statistics/de/ German(de)]
[http://mappings.dbpedia.org/server/statistics/el/ Greek(el)]
*[http://mappings.dbpedia.org/server/statistics/el/ Greek(el)]
[http://mappings.dbpedia.org/server/statistics/en/ English(en)]
*[http://mappings.dbpedia.org/server/statistics/en/ English(en)]
[http://mappings.dbpedia.org/server/statistics/es/ Spanish(es)]
*[http://mappings.dbpedia.org/server/statistics/es/ Spanish(es)]
[http://mappings.dbpedia.org/server/statistics/fr/ French(fr)]
*[http://mappings.dbpedia.org/server/statistics/fr/ French(fr)]
[http://mappings.dbpedia.org/server/statistics/ga/ Irish(ga)]
*[http://mappings.dbpedia.org/server/statistics/ga/ Irish(ga)]
[http://mappings.dbpedia.org/server/statistics/hr/ Croatian(hr)]
*[http://mappings.dbpedia.org/server/statistics/hr/ Croatian(hr)]
[http://mappings.dbpedia.org/server/statistics/hu/ Hungarian(hu)]
*[http://mappings.dbpedia.org/server/statistics/hu/ Hungarian(hu)]
[http://mappings.dbpedia.org/server/statistics/it/ Italian(it)]
*[http://mappings.dbpedia.org/server/statistics/it/ Italian(it)]
[http://mappings.dbpedia.org/server/statistics/nl/ Dutch(nl)]
*[http://mappings.dbpedia.org/server/statistics/nl/ Dutch(nl)]
[http://mappings.dbpedia.org/server/statistics/pl/ Polish(pl)]
*[http://mappings.dbpedia.org/server/statistics/pl/ Polish(pl)]
[http://mappings.dbpedia.org/server/statistics/pt/ Portuguese(pt)]
*[http://mappings.dbpedia.org/server/statistics/pt/ Portuguese(pt)]
[http://mappings.dbpedia.org/server/statistics/ru/ Russian(ru)]
*[http://mappings.dbpedia.org/server/statistics/ru/ Russian(ru)]
[http://mappings.dbpedia.org/server/statistics/sl/ Slovene(sl)]
*[http://mappings.dbpedia.org/server/statistics/sl/ Slovene(sl)]
[http://mappings.dbpedia.org/server/statistics/tr/ Turkish(tr)]
*[http://mappings.dbpedia.org/server/statistics/tr/ Turkish(tr)]


For each language you'll find three percentages at the top of the page. To explain them, we look at the English mapping statistics:  
For each language you'll find three percentages at the top of the page. To explain them, we look at the English mapping statistics:  
Line 27: Line 27:
49.23 % of all property occurrences in Wikipedia ( en ) are mapped ( 16090379 of 32686631 ).
49.23 % of all property occurrences in Wikipedia ( en ) are mapped ( 16090379 of 32686631 ).
</pre>
</pre>
In the first line we can see that 3.94 % of the templates in the English Wikipedia are mapped. The significance of this percentage should be handled with care, because there are of course more than 7225 templates in the English Wikipedia. But these 7225 templates have multiple properties and therefore fulfil our requirements for a potential infobox. Due to this low criterion, the statistics contain non relevant templates like [http://en.wikipedia.org/wiki/Template:Unreferenced Unreferenced] or [http://en.wikipedia.org/wiki/Template:Rail_line Rail line]. These templates aren't classical infoboxes and shouldn't be included in the statistics. On that account they can be ignored. If a template is on the ignore list, it does not count for the number of potential infoboxes.
In the first line we can see that 3.94 % of the templates in the English Wikipedia are mapped. The significance of this percentage should be handled with care, because there are of course more than 7225 templates in the English Wikipedia. But these 7225 templates have multiple properties and therefore fulfil our requirements for a potential infobox. Due to this low criterion, the statistics contain non relevant templates like [http://en.wikipedia.org/wiki/Template:Unreferenced Unreferenced] or [http://en.wikipedia.org/wiki/Template:Rail_line Rail line]. These templates aren't classical infoboxes and shouldn't be included in the statistics. On that account they can be ignored. If a template is on the ignore list, it does not count for the number of potential infoboxes.
The second line shows the mapped template occurrences. Here we can see that 80.73 % of all template occurrences are mapped already. This means that 3.94 % mapped templates cover 80.73 % of all templates used in the English Wikipedia. To understand this relation we take a look at the Infobox settlement. It occurs 226467 times in the English Wikipedia, which corresponds to about 10 % of all template occurrences. So, writing the mapping for this infobox was really effective.
The second line shows the mapped template occurrences. Here we can see that 80.73 % of all template occurrences are mapped already. This means that 3.94 % mapped templates cover 80.73 % of all templates used in the English Wikipedia. To understand this relation we take a look at the Infobox settlement. It occurs 226467 times in the English Wikipedia, which corresponds to about 10 % of all template occurrences. So, writing the mapping for this infobox was really effective.
In the third line we can see that 49.23 % of all property occurrences in English Wikipedia are mapped. This is the most interesting percentage, because it includes the property completeness of mappings. Imagine a template mapping for the Infobox person in which only the name property is mapped to the ontologie.
In the third line we can see that 49.23 % of all property occurrences in English Wikipedia are mapped. This is the most interesting percentage, because it includes the property completeness of mappings. Imagine a template mapping for the Infobox person in which only the name property is mapped to the ontologie.


Below the statistics at the top, all templates are listed ordered by their occurrences.
 
Via the "Edit" link you can directly go to the mapping of an infobox.
Below the statistics at the top, you see a table with all templates ordered by their occurrences.
The fourth column (num properties) holds the number of properties of the template. With a click on a template name, you can inspect the properties also ordered by their occurrence. Therefore, properties at the top are best top map.
Here is the explanation for the columns:
In the fifth column (mapped properties (%)) you see the percentage of properties mapped.
# Template occurrence
In the sixth column (num property occurrences) you see the number of all template properties that occur in Wikipedia. For a fictive template that has 5 properties and occures 10 times in Wikipedia, the number would be 50. The properties must have values to count here. If the fictive template occures 10 times, but in one case only has 8 properties with a value, the number would be 48.
# The name of the template with a link to detailed property statistics.
The seventh column (mapped property occurrences (%)) contains the percentage of mapped property occurrences for this template. This percentage represents the completeness of the mapping and therefore determining for the colour of the row, which indicates the completeness.
# Via the "Edit" link you can directly go to the mapping of an infobox.
# The fourth column (num properties) holds the number of properties of the template. With a click on a template name, you can inspect the properties also ordered by their occurrence. Therefore, properties at the top are best top map.
# In the fifth column (mapped properties (%)) you see the percentage of properties mapped.
# In the sixth column (num property occurrences) you see the number of all template properties that occur in Wikipedia. For a fictive template that has 5 properties and occures 10 times in Wikipedia, the number would be 50. The properties must have values to count here. If the fictive template occures 10 times, but in one case only has 8 properties with a value, the number would be 48.
# The seventh column (mapped property occurrences (%)) contains the percentage of mapped property occurrences for this template. This percentage represents the completeness of the mapping and therefore determining for the colour of the row, which indicates the completeness.

Revision as of 16:58, 26 July 2011

DBpedia Mapping Statistics

The statistics will give you an overview of the already mapped infoboxes and their properties. In order to spend your "mapping time" efficiently, the statistics reveal on which infoboxes you should pay your main focus of attention.

Statistics are available for the following languages:

For each language you'll find three percentages at the top of the page. To explain them, we look at the English mapping statistics:

3.94 % templates are mapped ( 285 of 7225 ).
80.73 % of all template occurrences in Wikipedia ( en ) are mapped ( 1695763 of 2100472 ).
49.23 % of all property occurrences in Wikipedia ( en ) are mapped ( 16090379 of 32686631 ).

In the first line we can see that 3.94 % of the templates in the English Wikipedia are mapped. The significance of this percentage should be handled with care, because there are of course more than 7225 templates in the English Wikipedia. But these 7225 templates have multiple properties and therefore fulfil our requirements for a potential infobox. Due to this low criterion, the statistics contain non relevant templates like Unreferenced or Rail line. These templates aren't classical infoboxes and shouldn't be included in the statistics. On that account they can be ignored. If a template is on the ignore list, it does not count for the number of potential infoboxes.

The second line shows the mapped template occurrences. Here we can see that 80.73 % of all template occurrences are mapped already. This means that 3.94 % mapped templates cover 80.73 % of all templates used in the English Wikipedia. To understand this relation we take a look at the Infobox settlement. It occurs 226467 times in the English Wikipedia, which corresponds to about 10 % of all template occurrences. So, writing the mapping for this infobox was really effective.

In the third line we can see that 49.23 % of all property occurrences in English Wikipedia are mapped. This is the most interesting percentage, because it includes the property completeness of mappings. Imagine a template mapping for the Infobox person in which only the name property is mapped to the ontologie.


Below the statistics at the top, you see a table with all templates ordered by their occurrences. Here is the explanation for the columns:

  1. Template occurrence
  2. The name of the template with a link to detailed property statistics.
  3. Via the "Edit" link you can directly go to the mapping of an infobox.
  4. The fourth column (num properties) holds the number of properties of the template. With a click on a template name, you can inspect the properties also ordered by their occurrence. Therefore, properties at the top are best top map.
  5. In the fifth column (mapped properties (%)) you see the percentage of properties mapped.
  6. In the sixth column (num property occurrences) you see the number of all template properties that occur in Wikipedia. For a fictive template that has 5 properties and occures 10 times in Wikipedia, the number would be 50. The properties must have values to count here. If the fictive template occures 10 times, but in one case only has 8 properties with a value, the number would be 48.
  7. The seventh column (mapped property occurrences (%)) contains the percentage of mapped property occurrences for this template. This percentage represents the completeness of the mapping and therefore determining for the colour of the row, which indicates the completeness.