DBpedia Release Evaluation: Difference between revisions

From DBpedia Mappings
Jump to navigationJump to search
No edit summary
No edit summary
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
The [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/raw-file/946fc9d51783/qualityAssessmentFramework/Kreis2011__Design_of_a_Quality_Assessment_Framework.pdf Quality Assessment Framework (QAF)] is developed to document the quality of the knowledge base and furthermore the progress of DBpedia's extraction framework. The main idea of the QAF is a comparison between a manually created best-case dataset (Gold Standard) and the output from DBpedia's ontology based extraction. The QAF estimates the precision of the extraction framework and the completeness (recall) of DBpedia compared to its source Wikipedia.
The [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/dbpedia/raw-file/946fc9d51783/qualityAssessmentFramework/Kreis2011__Design_of_a_Quality_Assessment_Framework.pdf Quality Assessment Framework (QAF)] is developed to document the quality of the knowledge base and furthermore the progress of DBpedia's extraction framework. The main idea of the QAF is a comparison between a manually created best-case dataset (Gold Standard) and the output from DBpedia's ontology based extraction. The QAF estimates the precision of the extraction framework and the completeness (recall) of DBpedia compared to its source Wikipedia.


== Sample Data / Gold Standard ==
== Gold Standard & Mappings ==


For a significant evaluation, only potentially extractable triples are considered. Only if a triple arise from a mapped property it can be extracted. Here this triples are called mapped triples.
For a significant evaluation, only potentially extractable triples are considered. Only if a triple arise from a mapped property it can be extracted. Here this triples are called mapped triples.
The following table shows the number of mapped triples, the total number of triples in the Gold Standard and the percentage of mapped triples for each category. The categories result from different patterns in which the information is given in Wikipedia infoboxes. The number of cases differ in a high extent depending on the category. The results for the categories based on small numbers should be handled with care.
The following table shows the total number of triples in the Gold Standard, the number of mapped triples and the percentage of mapped triples for each category. The categories result from different patterns in which the information is given in Wikipedia infoboxes. The number of cases differ in a high extent depending on the category. The results for the categories based on small numbers should be handled with care.


{| class="wikitable"
{| class="wikitable"
|+ Mapped triples (Mappings from 3.5.1 DBpedia release)
|+ Mapped triples
|-
|-
! Category
! rowspan="3" | Category
! rowspan="3" | Gold Triples
! colspan="6" | DBpedia Release
|-
! colspan="2" | DBpedia 3.5.1
! colspan="2" | DBpedia 3.6
! colspan="2" | DBpedia 3.7
|-
! Mapped Triples
! %
! Mapped Triples
! %
! Mapped Triples
! Mapped Triples
! Triples
! %
! %
|-
|- align="right"
| Total
| Total
| 3221
| 1504
| 1504
| 3221
| 46.7
| 46.7
|-
| 1522
| 47.3
| 1732
| 53.8
|- align="right"
| Plain Property
| Plain Property
| 893
| 514
| 514
| 893
| 57.6
| 57.6
|-
| 521
| 58.3
| 599
| 67.1
|- align="right"
| Number-Unit
| Number-Unit
| 76
| 51
| 51
| 76
| 67.1
| 67.1
|-
| 52
| 68.4
| 57
| 75.0
|- align="right"
| Coordinate
| Coordinate
| 54
| 36
| 36
| 54
| 66.7
| 66.7
|-  
| 39
| 72.2
| 48
| 88.9
|- align="right"
| Interval
| Interval
| 31
| 22
| 71.0
| 22
| 22
| 31
| 71.0
| 71.0
|-
| 24
| 77.4
|- align="right"
| List
| List
| 801
| 478
| 478
| 801
| 59.7
| 59.7
|-
| 482
| 60.2
| 560
| 69.9
|- align="right"
| One-Property-Table
| One-Property-Table
| 447
| 242
| 242
| 447
| 54.1
| 54.1
|-
| 244
| 54.6
| 236
| 52.8
|- align="right"
| Multi-Poprty-Table
| Multi-Poprty-Table
| 625
| 83
| 13.3
| 83
| 83
| 625
| 13.3
| 13.3
|-
| 109
| 17.4
|- align="right"
| Open Property
| Open Property
| 139
| 13
| 9.4
| 13
| 13
| 139
| 9.4
| 9.4
|-
| 16
| 11.5
|- align="right"
| Open Property Table
| Open Property Table
| 26
| 0
| 0.0
| 0
| 0.0
| 0
| 0
| 26
| 0.0
| 0.0
|-
|- align="right"
| Internal Template
| Internal Template
| 116
| 58
| 58
| 116
| 50.0
| 50.0
|-
| 59
| 50.9
| 76
| 65.5
|- align="right"
| Merged Properties
| Merged Properties
| 13
| 7
| 53.8
| 7
| 53.8
| 7
| 7
| 13
| 53.8
| 53.8
|}
|}
Line 86: Line 144:
! DBpedia 3.6
! DBpedia 3.6
! DBpedia 3.7
! DBpedia 3.7
|-
|- align="right"
| Total
| Total
| 45,7%
| 45.7%
| 60,2%
| 60.2%
| 61,8%
| 61.8%
|-
|- align="right"
| Plain Property
| Plain Property
| 80,5%
| 80.5%
| 83,7%
| 83.7%
| 86,0%
| 86.0%
|-
|- align="right"
| Number-Unit
| Number-Unit
| 68,6%
| 68.6%
| 68,6%
| 68.6%
| 66,7%
| 66.7%
|-
|- align="right"
| Coordinate
| Coordinate
| 100,0%
| 100.0%
| 100,0%
| 100.0%
| 100,0%
| 100.0%
|-  
|- align="right"
| Interval
| Interval
| 72,7%
| 72.7%
| 68,2%
| 68.2%
| 72,7%
| 72.7%
|-
|- align="right"
| List
| List
| 33,9%
| 33.9%
| 75,7%
| 75.7%
| 77,8%
| 77.8%
|-
|- align="right"
| One-Property-Table
| One-Property-Table
| 5,4%
| 5.4%
| 5,8%
| 5.8%
| 5,8%
| 5.8%
|-
|- align="right"
| Multi-Poprty-Table
| Multi-Poprty-Table
| 0,0%
| 0.0%
| 0,0%
| 0.0%
| 0,0%
| 0.0%
|-
|- align="right"
| Open Property
| Open Property
| 23,1%
| 23.1%
| 23,1%
| 23.1%
| 30,8%
| 30.8%
|-
|- align="right"
| Open Property Table
| Open Property Table
| na
| na
| na
| na
| na
| na
|-
|- align="right"
| Internal Template
| Internal Template
| 8,6%
| 8.6%
| 10,3%
| 10.3%
| 10,3%
| 10.3%
|-
|- align="right"
| Merged Properties
| Merged Properties
| 57,1%
| 57.1%
| 57,1%
| 57.1%
| 71,4%
| 71.4%
|}
|}


Line 155: Line 213:
! DBpedia 3.6
! DBpedia 3.6
! DBpedia 3.7
! DBpedia 3.7
|-
|- align="right"
| Total
| Total
| 91,2%
| 91.2%
| 92,3%
| 92.3%
| 92,4%
| 92.4%
|-
|- align="right"
| Plain Property
| Plain Property
| 96,3%
| 96.3%
| 96,6%
| 96.6%
| 97,4%
| 97.4%
|-
|- align="right"
| Number-Unit
| Number-Unit
| 85,4%
| 85.4%
| 85,4%
| 85.4%
| 85,0%
| 85.0%
|-
|- align="right"
| Coordinate
| Coordinate
| 100,0%
| 100.0%
| 100,0%
| 100.0%
| 100,0%
| 100.0%
|-
|- align="right"
| Interval
| Interval
| 100,0%
| 100.0%
| 100,0%
| 100.0%
| 88,9%
| 88.9%
|-
|- align="right"
| List
| List
| 91,5%
| 91.5%
| 92,8%
| 92.8%
| 93,2%
| 93.2%
|-
|- align="right"
| One-Property-Table
| One-Property-Table
| 32,5%
| 32.5%
| 36,8%
| 36.8%
| 34,1%
| 34.1%
|-
|- align="right"
| Multi-Poprty-Table
| Multi-Poprty-Table
| na
| na
| na
| na
| na
| na
|-
|- align="right"
| Open Property
| Open Property
| 100,0%
| 100.0%
| 100,0%
| 100.0%
| 100,0%
| 100.0%
|-
|- align="right"
| Open Property Table
| Open Property Table
| na
| na
| na
| na
| na
| na
|-
|- align="right"
| Internal Template
| Internal Template
| 83,3%
| 83.3%
| 75,0%
| 75.0%
| 75,0%
| 75.0%
|-
|- align="right"
| Merged Properties
| Merged Properties
| 80,0%
| 80.0%
| 80,0%
| 80.0%
| 100,0%
| 100.0%
|}
|}

Latest revision as of 16:56, 14 September 2011

The Quality Assessment Framework (QAF) is developed to document the quality of the knowledge base and furthermore the progress of DBpedia's extraction framework. The main idea of the QAF is a comparison between a manually created best-case dataset (Gold Standard) and the output from DBpedia's ontology based extraction. The QAF estimates the precision of the extraction framework and the completeness (recall) of DBpedia compared to its source Wikipedia.

Gold Standard & Mappings

For a significant evaluation, only potentially extractable triples are considered. Only if a triple arise from a mapped property it can be extracted. Here this triples are called mapped triples. The following table shows the total number of triples in the Gold Standard, the number of mapped triples and the percentage of mapped triples for each category. The categories result from different patterns in which the information is given in Wikipedia infoboxes. The number of cases differ in a high extent depending on the category. The results for the categories based on small numbers should be handled with care.

Mapped triples
Category Gold Triples DBpedia Release
DBpedia 3.5.1 DBpedia 3.6 DBpedia 3.7
Mapped Triples % Mapped Triples % Mapped Triples %
Total 3221 1504 46.7 1522 47.3 1732 53.8
Plain Property 893 514 57.6 521 58.3 599 67.1
Number-Unit 76 51 67.1 52 68.4 57 75.0
Coordinate 54 36 66.7 39 72.2 48 88.9
Interval 31 22 71.0 22 71.0 24 77.4
List 801 478 59.7 482 60.2 560 69.9
One-Property-Table 447 242 54.1 244 54.6 236 52.8
Multi-Poprty-Table 625 83 13.3 83 13.3 109 17.4
Open Property 139 13 9.4 13 9.4 16 11.5
Open Property Table 26 0 0.0 0 0.0 0 0.0
Internal Template 116 58 50.0 59 50.9 76 65.5
Merged Properties 13 7 53.8 7 53.8 7 53.8

Evaluation Results

The following two tables show the completeness (recall) and precision of the different release versions. To achieve comparability and to clean the results from the effect of more and better mappings, the mapping version of the 3.5.1 release is taken as constant.

Completeness / Recall (fixed mappings from 3.5.1)
Category DBpedia 3.5.1 DBpedia 3.6 DBpedia 3.7
Total 45.7% 60.2% 61.8%
Plain Property 80.5% 83.7% 86.0%
Number-Unit 68.6% 68.6% 66.7%
Coordinate 100.0% 100.0% 100.0%
Interval 72.7% 68.2% 72.7%
List 33.9% 75.7% 77.8%
One-Property-Table 5.4% 5.8% 5.8%
Multi-Poprty-Table 0.0% 0.0% 0.0%
Open Property 23.1% 23.1% 30.8%
Open Property Table na na na
Internal Template 8.6% 10.3% 10.3%
Merged Properties 57.1% 57.1% 71.4%
Precision (fixed mappings from 3.5.1)
Category DBpedia 3.5.1 DBpedia 3.6 DBpedia 3.7
Total 91.2% 92.3% 92.4%
Plain Property 96.3% 96.6% 97.4%
Number-Unit 85.4% 85.4% 85.0%
Coordinate 100.0% 100.0% 100.0%
Interval 100.0% 100.0% 88.9%
List 91.5% 92.8% 93.2%
One-Property-Table 32.5% 36.8% 34.1%
Multi-Poprty-Table na na na
Open Property 100.0% 100.0% 100.0%
Open Property Table na na na
Internal Template 83.3% 75.0% 75.0%
Merged Properties 80.0% 80.0% 100.0%