Cleaning up Countries

From DBpedia Mappings
Jump to: navigation, search

Countries on EN DBpedia

select count(*) {
  ?country a dbo:Country}

returns 1694 countries. Obviously this won't do.

Let's analyze the reasons for this pollution and try to fix it.

Countries on EN Wikipedia

There are 638 Infobox_country instances. Most of them are international organizations (free trade zones, unions, etc). On the other hand, transclusion count says 1137.

The following templates redirect to Template:Infobox country:

  • Template:Infobox Countries
  • Template:Infobox Country
  • Template:Infobox Country or territory
  • Template:Infobox Geopolitical organisation
  • Template:Infobox Geopolitical organization
  • Template:Infobox Micronation
  • Template:Infobox geopolitical organisation
  • Template:Infobox geopolitical organization
  • Template:Infobox micronation
  • Template:Infobox nation

DBpedia works only with the target template (Infobox_country), so these redirects are not a problem.

Analysis Needed

  • Why are there more dbo:Country in DBpedia than "Infobox country" in Wikipedia?
  • Why "transclusion count" shows more than uses of "Infobox country"
  • Many pages have a "type" that allows us to map to a better class (eg has Type "Single market"). Eg #296 "Why Infobox_Geopolitical_organization (eg United_Nations) is mapped to Country" is resolved in this way.
  • But many other instances remain, eg United_Nations_Transitional_Authority_in_Cambodia: analyze and see if there's some other discriminator than "type"
  • How to filter out the sports organizations, eg Cricket_Samoa, IBA_Asia, Great_Britain_men%27s_national_basketball_team?
  • How about country-related articles, eg Radio_in_the_United_States, Human_trafficking_in_the_United_States, History_of_the_Jews_in_20th-century_Poland?
  • How about articles that are not even country-related, eg Comic_book_collecting, Record_collecting?
  • How about admin locations (eg cities) that are not countries, eg Russian_Dalian?
  • How about non-administrative locations, eg Reñaca_beach?

Infobox national basketball team

Great_Britain_men%27s_national_basketball_team uses template Infobox national basketball team that is not mapped in the wiki. But why does it come out as dbo:Country?

The extraction sample doesn't have any type...


From #296, we need to:

|org_type =          <!-- e.g. Trade bloc -->
|membership_type =   <!-- (default "Membership") -->
|membership =        <!-- Type/s and/or number/s of members -->
  • Map to the subclass GeopoliticalOrganisation
  • Maybe map Template:United_Nations props list1 and list2, to capture the sub-orgs.

Any takers?