Cleaning up Countries

From DBpedia Mappings
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Countries on EN DBpedia

select count(*) {
  ?country a dbo:Country}

returns 1694 countries. Obviously this won't do.

Let's analyze the reasons for this pollution and try to fix it.

Countries on EN Wikipedia

There are 638 Infobox_country instances. Most of them are international organizations (free trade zones, unions, etc). On the other hand, transclusion count says 1137.

The following templates redirect to Template:Infobox country:

  • Template:Infobox Countries
  • Template:Infobox Country
  • Template:Infobox Country or territory
  • Template:Infobox Geopolitical organisation
  • Template:Infobox Geopolitical organization
  • Template:Infobox Micronation
  • Template:Infobox geopolitical organisation
  • Template:Infobox geopolitical organization
  • Template:Infobox micronation
  • Template:Infobox nation

DBpedia works only with the target template (Infobox_country), so these redirects are not a problem.

Analysis Needed

  • Why are there more dbo:Country in DBpedia than "Infobox country" in Wikipedia?
  • Why "transclusion count" shows more than uses of "Infobox country"
  • Many pages have a "type" that allows us to map to a better class (eg https://en.wikipedia.org/wiki/Eurasian_Economic_Space has Type "Single market"). Eg #296 "Why Infobox_Geopolitical_organization (eg United_Nations) is mapped to Country" is resolved in this way.
  • But many other instances remain, eg United_Nations_Transitional_Authority_in_Cambodia: analyze and see if there's some other discriminator than "type"
  • How to filter out the sports organizations, eg Cricket_Samoa, IBA_Asia, Great_Britain_men%27s_national_basketball_team?
  • How about country-related articles, eg Radio_in_the_United_States, Human_trafficking_in_the_United_States, History_of_the_Jews_in_20th-century_Poland?
  • How about articles that are not even country-related, eg Comic_book_collecting, Record_collecting?
  • How about admin locations (eg cities) that are not countries, eg Russian_Dalian?
  • How about non-administrative locations, eg Reñaca_beach?

Infobox national basketball team

Great_Britain_men%27s_national_basketball_team uses template Infobox national basketball team that is not mapped in the wiki. But why does it come out as dbo:Country?

The extraction sample doesn't have any type... http://mappings.dbpedia.org/server/extraction/en/extract?title=Great_Britain_men%27s_national_basketball_team&format=turtle-triples

Tasks

From #296, we need to:

|org_type =          <!-- e.g. Trade bloc -->
|membership_type =   <!-- (default "Membership") -->
|membership =        <!-- Type/s and/or number/s of members -->
  • Map to the subclass GeopoliticalOrganisation
  • Maybe map Template:United_Nations props list1 and list2, to capture the sub-orgs.

Any takers?