DBpedia Release Evaluation: Difference between revisions
No edit summary |
No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/raw-file/946fc9d51783/qualityAssessmentFramework/Kreis2011__Design_of_a_Quality_Assessment_Framework.pdf Quality Assessment Framework (QAF)] is developed to document the quality of the knowledge base and furthermore the progress of DBpedia's extraction framework. The main idea of the QAF is a comparison between a manually created best-case dataset (Gold Standard) and the output from DBpedia's ontology based extraction. The QAF estimates the precision of the extraction framework and the completeness (recall) of DBpedia compared to its source Wikipedia. | The [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/raw-file/946fc9d51783/qualityAssessmentFramework/Kreis2011__Design_of_a_Quality_Assessment_Framework.pdf Quality Assessment Framework (QAF)] is developed to document the quality of the knowledge base and furthermore the progress of DBpedia's extraction framework. The main idea of the QAF is a comparison between a manually created best-case dataset (Gold Standard) and the output from DBpedia's ontology based extraction. The QAF estimates the precision of the extraction framework and the completeness (recall) of DBpedia compared to its source Wikipedia. | ||
== | == Gold Standard & Mappings == | ||
For a significant evaluation, only potentially extractable triples are considered. Only if a triple arise from a mapped property it can be extracted. Here this triples are called mapped triples. | For a significant evaluation, only potentially extractable triples are considered. Only if a triple arise from a mapped property it can be extracted. Here this triples are called mapped triples. | ||
The following table shows the number of | The following table shows the total number of triples in the Gold Standard, the number of mapped triples and the percentage of mapped triples for each category. The categories result from different patterns in which the information is given in Wikipedia infoboxes. The number of cases differ in a high extent depending on the category. The results for the categories based on small numbers should be handled with care. | ||
{| class="wikitable" | {| class="wikitable" | ||
|+ Mapped triples | |+ Mapped triples | ||
|- | |- | ||
! rowspan="3" | Category | ! rowspan="3" | Category | ||
Line 19: | Line 19: | ||
! Mapped Triples | ! Mapped Triples | ||
! % | ! % | ||
|- | ! Mapped Triples | ||
! % | |||
! Mapped Triples | |||
! % | |||
|- align="right" | |||
| Total | | Total | ||
| 3221 | | 3221 | ||
Line 28: | Line 32: | ||
| 1732 | | 1732 | ||
| 53.8 | | 53.8 | ||
|- | |- align="right" | ||
| Plain Property | | Plain Property | ||
| 893 | | 893 | ||
Line 37: | Line 41: | ||
| 599 | | 599 | ||
| 67.1 | | 67.1 | ||
|- | |- align="right" | ||
| Number-Unit | | Number-Unit | ||
| 76 | | 76 | ||
Line 46: | Line 50: | ||
| 57 | | 57 | ||
| 75.0 | | 75.0 | ||
|- | |- align="right" | ||
| Coordinate | | Coordinate | ||
| 54 | | 54 | ||
Line 55: | Line 59: | ||
| 48 | | 48 | ||
| 88.9 | | 88.9 | ||
|- | |- align="right" | ||
| Interval | | Interval | ||
| 31 | | 31 | ||
Line 64: | Line 68: | ||
| 24 | | 24 | ||
| 77.4 | | 77.4 | ||
|- | |- align="right" | ||
| List | | List | ||
| 801 | | 801 | ||
Line 73: | Line 77: | ||
| 560 | | 560 | ||
| 69.9 | | 69.9 | ||
|- | |- align="right" | ||
| One-Property-Table | | One-Property-Table | ||
| 447 | | 447 | ||
Line 82: | Line 86: | ||
| 236 | | 236 | ||
| 52.8 | | 52.8 | ||
|- | |- align="right" | ||
| Multi-Poprty-Table | | Multi-Poprty-Table | ||
| 625 | | 625 | ||
Line 91: | Line 95: | ||
| 109 | | 109 | ||
| 17.4 | | 17.4 | ||
|- | |- align="right" | ||
| Open Property | | Open Property | ||
| 139 | | 139 | ||
Line 100: | Line 104: | ||
| 16 | | 16 | ||
| 11.5 | | 11.5 | ||
|- | |- align="right" | ||
| Open Property Table | | Open Property Table | ||
| 26 | | 26 | ||
Line 109: | Line 113: | ||
| 0 | | 0 | ||
| 0.0 | | 0.0 | ||
|- | |- align="right" | ||
| Internal Template | | Internal Template | ||
| 116 | | 116 | ||
Line 118: | Line 122: | ||
| 76 | | 76 | ||
| 65.5 | | 65.5 | ||
|- | |- align="right" | ||
| Merged Properties | | Merged Properties | ||
| 13 | | 13 | ||
Line 140: | Line 144: | ||
! DBpedia 3.6 | ! DBpedia 3.6 | ||
! DBpedia 3.7 | ! DBpedia 3.7 | ||
|- | |- align="right" | ||
| Total | | Total | ||
| 45 | | 45.7% | ||
| 60 | | 60.2% | ||
| 61 | | 61.8% | ||
|- | |- align="right" | ||
| Plain Property | | Plain Property | ||
| 80 | | 80.5% | ||
| 83 | | 83.7% | ||
| 86 | | 86.0% | ||
|- | |- align="right" | ||
| Number-Unit | | Number-Unit | ||
| 68 | | 68.6% | ||
| 68 | | 68.6% | ||
| 66 | | 66.7% | ||
|- | |- align="right" | ||
| Coordinate | | Coordinate | ||
| 100 | | 100.0% | ||
| 100 | | 100.0% | ||
| 100 | | 100.0% | ||
|- | |- align="right" | ||
| Interval | | Interval | ||
| 72 | | 72.7% | ||
| 68 | | 68.2% | ||
| 72 | | 72.7% | ||
|- | |- align="right" | ||
| List | | List | ||
| 33 | | 33.9% | ||
| 75 | | 75.7% | ||
| 77 | | 77.8% | ||
|- | |- align="right" | ||
| One-Property-Table | | One-Property-Table | ||
| 5 | | 5.4% | ||
| 5 | | 5.8% | ||
| 5 | | 5.8% | ||
|- | |- align="right" | ||
| Multi-Poprty-Table | | Multi-Poprty-Table | ||
| 0 | | 0.0% | ||
| 0 | | 0.0% | ||
| 0 | | 0.0% | ||
|- | |- align="right" | ||
| Open Property | | Open Property | ||
| 23 | | 23.1% | ||
| 23 | | 23.1% | ||
| 30 | | 30.8% | ||
|- | |- align="right" | ||
| Open Property Table | | Open Property Table | ||
| na | | na | ||
| na | | na | ||
| na | | na | ||
|- | |- align="right" | ||
| Internal Template | | Internal Template | ||
| 8 | | 8.6% | ||
| 10 | | 10.3% | ||
| 10 | | 10.3% | ||
|- | |- align="right" | ||
| Merged Properties | | Merged Properties | ||
| 57 | | 57.1% | ||
| 57 | | 57.1% | ||
| 71 | | 71.4% | ||
|} | |} | ||
Line 209: | Line 213: | ||
! DBpedia 3.6 | ! DBpedia 3.6 | ||
! DBpedia 3.7 | ! DBpedia 3.7 | ||
|- | |- align="right" | ||
| Total | | Total | ||
| 91 | | 91.2% | ||
| 92 | | 92.3% | ||
| 92 | | 92.4% | ||
|- | |- align="right" | ||
| Plain Property | | Plain Property | ||
| 96 | | 96.3% | ||
| 96 | | 96.6% | ||
| 97 | | 97.4% | ||
|- | |- align="right" | ||
| Number-Unit | | Number-Unit | ||
| 85 | | 85.4% | ||
| 85 | | 85.4% | ||
| 85 | | 85.0% | ||
|- | |- align="right" | ||
| Coordinate | | Coordinate | ||
| 100 | | 100.0% | ||
| 100 | | 100.0% | ||
| 100 | | 100.0% | ||
|- | |- align="right" | ||
| Interval | | Interval | ||
| 100 | | 100.0% | ||
| 100 | | 100.0% | ||
| 88 | | 88.9% | ||
|- | |- align="right" | ||
| List | | List | ||
| 91 | | 91.5% | ||
| 92 | | 92.8% | ||
| 93 | | 93.2% | ||
|- | |- align="right" | ||
| One-Property-Table | | One-Property-Table | ||
| 32 | | 32.5% | ||
| 36 | | 36.8% | ||
| 34 | | 34.1% | ||
|- | |- align="right" | ||
| Multi-Poprty-Table | | Multi-Poprty-Table | ||
| na | | na | ||
| na | | na | ||
| na | | na | ||
|- | |- align="right" | ||
| Open Property | | Open Property | ||
| 100 | | 100.0% | ||
| 100 | | 100.0% | ||
| 100 | | 100.0% | ||
|- | |- align="right" | ||
| Open Property Table | | Open Property Table | ||
| na | | na | ||
| na | | na | ||
| na | | na | ||
|- | |- align="right" | ||
| Internal Template | | Internal Template | ||
| 83 | | 83.3% | ||
| 75 | | 75.0% | ||
| 75 | | 75.0% | ||
|- | |- align="right" | ||
| Merged Properties | | Merged Properties | ||
| 80 | | 80.0% | ||
| 80 | | 80.0% | ||
| 100 | | 100.0% | ||
|} | |} |
Latest revision as of 16:56, 14 September 2011
The Quality Assessment Framework (QAF) is developed to document the quality of the knowledge base and furthermore the progress of DBpedia's extraction framework. The main idea of the QAF is a comparison between a manually created best-case dataset (Gold Standard) and the output from DBpedia's ontology based extraction. The QAF estimates the precision of the extraction framework and the completeness (recall) of DBpedia compared to its source Wikipedia.
Gold Standard & Mappings
For a significant evaluation, only potentially extractable triples are considered. Only if a triple arise from a mapped property it can be extracted. Here this triples are called mapped triples. The following table shows the total number of triples in the Gold Standard, the number of mapped triples and the percentage of mapped triples for each category. The categories result from different patterns in which the information is given in Wikipedia infoboxes. The number of cases differ in a high extent depending on the category. The results for the categories based on small numbers should be handled with care.
Category | Gold Triples | DBpedia Release | |||||
---|---|---|---|---|---|---|---|
DBpedia 3.5.1 | DBpedia 3.6 | DBpedia 3.7 | |||||
Mapped Triples | % | Mapped Triples | % | Mapped Triples | % | ||
Total | 3221 | 1504 | 46.7 | 1522 | 47.3 | 1732 | 53.8 |
Plain Property | 893 | 514 | 57.6 | 521 | 58.3 | 599 | 67.1 |
Number-Unit | 76 | 51 | 67.1 | 52 | 68.4 | 57 | 75.0 |
Coordinate | 54 | 36 | 66.7 | 39 | 72.2 | 48 | 88.9 |
Interval | 31 | 22 | 71.0 | 22 | 71.0 | 24 | 77.4 |
List | 801 | 478 | 59.7 | 482 | 60.2 | 560 | 69.9 |
One-Property-Table | 447 | 242 | 54.1 | 244 | 54.6 | 236 | 52.8 |
Multi-Poprty-Table | 625 | 83 | 13.3 | 83 | 13.3 | 109 | 17.4 |
Open Property | 139 | 13 | 9.4 | 13 | 9.4 | 16 | 11.5 |
Open Property Table | 26 | 0 | 0.0 | 0 | 0.0 | 0 | 0.0 |
Internal Template | 116 | 58 | 50.0 | 59 | 50.9 | 76 | 65.5 |
Merged Properties | 13 | 7 | 53.8 | 7 | 53.8 | 7 | 53.8 |
Evaluation Results
The following two tables show the completeness (recall) and precision of the different release versions. To achieve comparability and to clean the results from the effect of more and better mappings, the mapping version of the 3.5.1 release is taken as constant.
Category | DBpedia 3.5.1 | DBpedia 3.6 | DBpedia 3.7 |
---|---|---|---|
Total | 45.7% | 60.2% | 61.8% |
Plain Property | 80.5% | 83.7% | 86.0% |
Number-Unit | 68.6% | 68.6% | 66.7% |
Coordinate | 100.0% | 100.0% | 100.0% |
Interval | 72.7% | 68.2% | 72.7% |
List | 33.9% | 75.7% | 77.8% |
One-Property-Table | 5.4% | 5.8% | 5.8% |
Multi-Poprty-Table | 0.0% | 0.0% | 0.0% |
Open Property | 23.1% | 23.1% | 30.8% |
Open Property Table | na | na | na |
Internal Template | 8.6% | 10.3% | 10.3% |
Merged Properties | 57.1% | 57.1% | 71.4% |
Category | DBpedia 3.5.1 | DBpedia 3.6 | DBpedia 3.7 |
---|---|---|---|
Total | 91.2% | 92.3% | 92.4% |
Plain Property | 96.3% | 96.6% | 97.4% |
Number-Unit | 85.4% | 85.4% | 85.0% |
Coordinate | 100.0% | 100.0% | 100.0% |
Interval | 100.0% | 100.0% | 88.9% |
List | 91.5% | 92.8% | 93.2% |
One-Property-Table | 32.5% | 36.8% | 34.1% |
Multi-Poprty-Table | na | na | na |
Open Property | 100.0% | 100.0% | 100.0% |
Open Property Table | na | na | na |
Internal Template | 83.3% | 75.0% | 75.0% |
Merged Properties | 80.0% | 80.0% | 100.0% |