MediaWiki API result

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.

Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.

See the complete documentation, or the API help for more information.

{
    "batchcomplete": "",
    "continue": {
        "gapcontinue": "Volkswagen_Golf_Jokes",
        "continue": "gapcontinue||"
    },
    "warnings": {
        "main": {
            "*": "Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
        },
        "revisions": {
            "*": "Because \"rvslots\" was not specified, a legacy format has been used for the output. This format is deprecated, and in the future the new format will always be used."
        }
    },
    "query": {
        "pages": {
            "11046": {
                "pageid": 11046,
                "ns": 0,
                "title": "Rewriting templateProperty",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "'''WARNING: Work in progess. The content of this page is largely incorrect.'''\n\nTODO: Make an EN example using Infobox Politician\n--[[User:VladimirAlexiev|VladimirAlexiev]] 12:12, 25 February 2015 (UTC)\n\n== Intro ==\nThe basic way the extractor works is like this:\n\n* data is extracted from template props\n* these are emitted as language-specific '''raw''' props, eg\n** http://dbpedia.org/property/parent for EN (usual prefix [http://prefix.cc/dbp dbp:])\n** http://bg.dbpedia.org/property/\u0440\u043e\u0434\u0438\u0442\u0435\u043b for BG (usual prefix [http://prefix.cc/bgdbp bgdbp:]\n<s>* the raw data is passed through mappings templateProperty -> ontologyProperty</s>\n\n<s>You'd think that templateProperty is the same as the raw prop name. Yeah but not always.</s>\n\nThe last part (''data is passed through mappings'') is wrong. The mapping based extractor processes the Wikitext source, '''not''' the output of the InfoboxExtractor. A pipeline architecture would make a lot of sense, but that's not how DBpedia works. [[User:Chrisahn|Chrisahn]] 17:54, 25 February 2015 (UTC)\n\nHere's what actually happens:\n\n* Wikitext is parsed into an AST (abstract syntax tree)\n* The AST is passed to several different extractors according to the configuration\n* Each extractor processes the AST and produces triples\n* The triples are not used as input for any other extractors.\n\nHere's what the [http://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/InfoboxExtractor.scala InfoboxExtractor] does:\n\n* data is extracted from template props in the AST\n* these are emitted as language-specific '''raw''' props, eg\n** http://dbpedia.org/property/parent for EN (usual prefix [http://prefix.cc/dbp dbp:])\n** http://bg.dbpedia.org/property/\u0440\u043e\u0434\u0438\u0442\u0435\u043b for BG (usual prefix [http://prefix.cc/bgdbp bgdbp:]\n\nHere's what the [http://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/MappingExtractor.scala MappingExtractor] does:\n\n* data is extracted from template props in the AST and passed through mappings templateProperty -> ontologyProperty\n* these are emitted as generic mapping-based props, eg\n** http://dbpedia.org/ontology/parent for EN, BG and any other language (usual prefix dbo:)\n\n== Wikipedia Prop Structures ==\nMany Wikipedia templates allow creating '''several''' instances of something. \nEg [https://en.wikipedia.org/wiki/Template:Listen Listen] allows a Wikipedia editor to attach up to 11 soundRecording to the subject,\nusing \"parallel\" arrays of properties:\n* filename, filename1... filename10\n* title, title1... title10\n* description, description1.. description10\nThe parallelism is reflected in a numeric suffix.\n\nGood maps take care of this, by grouping the \"parallel props\" in separate IntermediateNodeMappings or a similar structure that can produce an \"array\".\nEg [http://mappings.dbpedia.org/index.php?title=Mapping_en:Listen&action=edit mapping Listen] has this 11 times:\n<pre>\n  {{IntermediateNodeMapping | nodeClass = Sound | correspondingProperty = soundRecording | mappings =\n    {{ PropertyMapping | templateProperty = type          | ontologyProperty = dc:type }}\n    {{ PropertyMapping | templateProperty = filename1     | ontologyProperty = filename }}\n    {{ PropertyMapping | templateProperty = title1        | ontologyProperty = title }}\n    {{ PropertyMapping | templateProperty = description1  | ontologyProperty = description }}\n  }}\n</pre>\n\nNow consider Politicians. They may hold several Positions, each over several Mandates (they are nasty that way).\nFor each Position>Mandate (say 5*3=15), there's a bunch of props such as\nparty, predecessor, successor, colleagues (eg vicePresident, governor...), years the subject came to that position,\nyears the colleagues came to their respective positions, etc.\n\nEg see prop names of [https://bg.wikipedia.org/wiki/\u0428\u0430\u0431\u043b\u043e\u043d:\u0414\u044a\u0440\u0436\u0430\u0432\u043d\u0438\u043a_\u0438\u043d\u0444\u043e \u0414\u044a\u0440\u0436\u0430\u0432\u043d\u0438\u043a_\u0438\u043d\u0444\u043e], but that's not the complete story: there's also \u0442\u0440\u0435\u0442\u0438_\u043c\u0430\u043d\u0434\u0430\u0442_* (\"third mandate\" fields) etc.\n* If the 2D data arrays below the photos of Rosen Plevneliev and Angela Merkel don't strike your fancy, check out one of them Socialists that ruled for 40 years: [https://bg.wikipedia.org/wiki/\u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432 \u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432]\nSee a full list of props and an incomplete attempt to group them all at [http://mappings.dbpedia.org/index.php?title=Mapping_bg:\u0414\u044a\u0440\u0436\u0430\u0432\u043d\u0438\u043a_\u0438\u043d\u0444\u043e&action=edit Mapping \u0414\u044a\u0440\u0436\u0430\u0432\u043d\u0438\u043a_\u0438\u043d\u0444\u043e].\n\nWikidata editors were at a loss to create meaningful two-dimensional parallel arrays of names, so they created parasitc prefixes & suffixes that are not so easy to match up. Eg there are 10 props \"\u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d\u041e\u0442\", all mapped to \"predecessor\" but in different groups:\n<pre>\n \u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d \u043e\u0442\n \u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d \u043e\u04422\n \u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d \u043e\u04423\n \u0432\u0442\u043e\u0440\u0438_\u043c\u0430\u043d\u0434\u0430\u0442_\u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d \u043e\u0442\n \u0432\u0442\u043e\u0440\u0438_\u043c\u0430\u043d\u0434\u0430\u0442_\u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d \u043e\u04422\n \u0432\u0442\u043e\u0440\u0438_\u043c\u0430\u043d\u0434\u0430\u0442_\u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d \u043e\u04423\n \u0442\u0440\u0435\u0442\u0438_\u043c\u0430\u043d\u0434\u0430\u0442_\u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d \u043e\u0442\n ...\n</pre>\n\n* The prefixes may have any form\n* The suffixes are digits, optionally followed by letters\n\n== Rewriting templateProperty ==\n\nThe parasitic prefixes/suffixes encode important info about the grouping of props,\nbut that info is not transmitted in any clear way.\n\nAssume a mapping fragment like this, extracting data for resource bgdbr:\u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432\n<pre>\n{{IntermediateNodeMapping | nodeClass = CareerStation | correspondingProperty = careerStation | mappings = \n    {{ PropertyMapping | templateProperty = \u0432\u0442\u043e\u0440\u0438_\u043c\u0430\u043d\u0434\u0430\u0442_\u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d \u043e\u04423 | ontologyProperty = predecessor }}\n</pre>\nWhat the extractor '''really''' does is:\n\n: No it doesn't. See above. [[User:Chrisahn|Chrisahn]] 18:06, 25 February 2015 (UTC)\n\n* Takes data from the templateProperty provided (as expected)\n* Strips parasitic prefixes & suffixes from the templateProperty (maybe unexpected) and converts to camelCase\n* Emits the data using the original subject and this '''rewritten''' templateProperty, eg:\n      bgdbr:\u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432 bgdbp:\u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d\u041e\u0442 <data>\n* Makes an IntermediateNode and connects it with correspondingProperty (as expected), eg:\n      bgdbr:\u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432 dbo:careerStation bgdbr:\u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432__1\n* Emits the data using the IntermediateNode and the ontologyProperty as provided (as expected), eg;\n      bgdbr:\u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432__1 dbo:predeccessor <data>\n\nThis achieves several goals:\n* the general semantics of the raw property is preserved, but not its grouping\n* the grouping is preserved by the creation of IntermediateNodes that use mapped properties (if the mapping is good)\n\nThis allows you to make queries such as:\n* all predecessors of \u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432 lumped together (regardless of the position). This works even if these raw props are not mapped!\n      select * {bgdbr:\u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432 bgdbp:\u043f\u0440\u0435\u0434\u0448\u0435\u0441\u0442\u0432\u0430\u043d\u041e\u0442 ?pred}\n* all predecessors of \u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432, paired with successors, and the corresponding position name (office). (Note: you may want to throw in some OPTIONALs)\n      select * {bgdbr:\u0422\u043e\u0434\u043e\u0440_\u0416\u0438\u0432\u043a\u043e\u0432 dbo:careerStation\n        [dbo:predecessor ?pred; dbo:successor ?succ; dbo:office ?office]}\n\nNeat!\n\n'''NOTE''' Currently only purely numeric parasitic suffixes are stripped.\nPrefixes and alphanumeric suffixes would be stripped after [https://github.com/dbpedia/extraction-framework/issues/317 issue #317] is implemented"
                    }
                ]
            },
            "4109": {
                "pageid": 4109,
                "ns": 0,
                "title": "Use the DBpedia Extraction Framework",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "Once that there are infobox and/or table mappings for a language, you can run the DBpedia extraction. Several things have to be installed and configured, which is documented at\n\n'''http://dbpedia.org/documentation'''\n\n* Section 1 describes what has to be installed to run the DBpedia extraction framework.\n\n* In section 4.1., all things that must be specified before starting the extraction from a dump file are listed. In the file \"dump/config.properties\" (using the file \"dump/config.properties.default\" as a template), you can specify the languages for which you want to extract, and which extractors should be used. For example, to run the HomepageExtractor and the MappingExtractor for Maltese, specify\n\n languages=mt\n extractors.mt=org.dbpedia.extraction.mappings.HomepageExtractor \\\n               org.dbpedia.extraction.mappings.MappingExtractor\n\n* When you run the extraction (see section 4.2.), the MappingExtractor will extract the information from the infoboxes that you created a mapping for. The extracted triples will be saved in a file named \"mappingbased_properties_mt.nt\" (for Maltese) in the output directory you specified."
                    }
                ]
            }
        }
    }
}