How to add a mapping namespace: Difference between revisions

From DBpedia Mappings
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
As an example, we use a fictitious language with code "xx" and Wikipedia rank 44.
As an example, we use a fictitious language with code "xx" and Wikipedia rank 44.


'''CAUTION''': some subtle code changes will be needed for '''the first language code that contains a dash "-".''' In this case, Please update the code and this guide.
'''CAUTION''': some subtle code changes will be needed for '''the first language code that contains a dash "-".''' In this case, please update the code and this guide.


=== Get language code and rank ===
=== Get language code and rank ===


Get the wiki language code and rank from http://s23.org/wikistats/wikipedias_html.php
Get the wiki language code and rank from [http://s23.org/wikistats/wikipedias_html.php the list of Wikipedias].


namespace number: multiply the rank by 2 and add 200
Namespace number: multiply the rank by 2 and add 200


talk namespace number: add 1 to the namespace number
Example: language code "xx", rank 44, namespace number 288.
 
Example: language code "xx", rank 44, namespace number 288, talk namespace number 289


'''CAUTION''': If the calculated namespace number already exists for another language (because the ranking has changed) do '''not''' change the existing namespace number. Please find a neighboring or close enough number that works.
'''CAUTION''': If the calculated namespace number already exists for another language (because the ranking has changed) do '''not''' change the existing namespace number. Please find a neighboring or close enough number that works.


If 288 is in use, we choose other numbers, let's say 298 and 299.
If 288 is in use, we choose some other number that is not used, let's say 298.
 


=== Update the extraction framework ===
=== Update the extraction framework ===
Line 25: Line 22:


<pre>
<pre>
"xx"->288
"xx"->288,
</pre>
</pre>


Line 42: Line 39:
=== Update and restart the mapping server ===
=== Update and restart the mapping server ===


Log onto the machine that's serving http://mappings.dbpedia.org/server/ URLs.
Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.


Stop the server:
Stop the server:
Line 59: Line 56:


<pre>
<pre>
cd /home/dbpedia-server/dbpedia/extraction_framework
cd extraction_framework
hg pull
hg pull
hg update
hg update
Line 66: Line 63:
../run server &>server-<YYYY>-<MM>-<DD>.01.log &
../run server &>server-<YYYY>-<MM>-<DD>.01.log &
</pre>
</pre>


=== Update mappings wiki ===
=== Update mappings wiki ===


==== Update the MediaWiki settings ====
==== Update MediaWiki settings ====


Log onto the machine that is serving http://mappings.dbpedia.org/index.php URLs.
Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs.


Open htdocs/mappings/LocalSettings.php
Open htdocs/mappings/LocalSettings.php. Add the following snippet at the correct position in the code:
 
Add the following line at the right position in the code:


<pre>
<pre>
"xx" => 288
"xx"=>288,
</pre>
</pre>


Line 94: Line 88:
==== Edit [[DBpedia datasets]] ====
==== Edit [[DBpedia datasets]] ====


Edit [[DBpedia datasets]]. Add a column for the new language and update all rows. Ouch...
=== Generate and deploy statistics ===
==== Extract data from Wikipedia dump file ====
Download the latest dump for language xx.
Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. [http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/default/dump/extract.stats.properties dump/extract.stats.properties] should contain the correct settings. cd into directory dump/, copy extract.stats.properties to extract.properties, modify if necessary, and run
<pre>
dump> ../run extract
</pre>
==== Extract statistics from triples files ====
cd into directory server/ and run
<pre>
server> ../run stats
</pre>
Copy src/main/statistics/mappingstatistics_xx.txt to same folder on the mappings server.


=== Update and deploy sprint stuff ===


e) generate statistics for new language
Ask Pablo how to do that...
run RedirectExtractor, InfoboxExtractor, TemplateParameterExtractor (see dump/extraction.server.properties)
run CreateMappingStats (launcher ‘stats’ in server/pom.xml)
copy src/main/statistics/mappingstatistics_bg.txt to same folder on server
Update and deploy sprint stuff.

Revision as of 02:43, 16 May 2012

As an example, we use a fictitious language with code "xx" and Wikipedia rank 44.

CAUTION: some subtle code changes will be needed for the first language code that contains a dash "-". In this case, please update the code and this guide.

Get language code and rank

Get the wiki language code and rank from the list of Wikipedias.

Namespace number: multiply the rank by 2 and add 200

Example: language code "xx", rank 44, namespace number 288.

CAUTION: If the calculated namespace number already exists for another language (because the ranking has changed) do not change the existing namespace number. Please find a neighboring or close enough number that works.

If 288 is in use, we choose some other number that is not used, let's say 298.

Update the extraction framework

Edit core/org.dbpedia.extraction.wikiparser.Namespace.scala

Edit core/org.dbpedia.extraction.wikiparser.Namespace.scala. Add something like this at the appropriate position:

"xx"->288,

Edit dump/extract.default.properties

Edit dump/extract.default.properties. Add something like this at the appropriate position:

extractors.xx=MappingExtractor

Commit changes

Commit and push the changes to default branch.

Update and restart the mapping server

Log onto the machine that is running the mapping server, i.e. serving http://mappings.dbpedia.org/server/ URLs.

Stop the server:

ps axfu | grep java

Look for class ...server.Server, and then:

kill <process id>

Then update, compile and start the server:

cd extraction_framework
hg pull
hg update
mvn clean install --projects core,server
cd server
../run server &>server-<YYYY>-<MM>-<DD>.01.log &

Update mappings wiki

Update MediaWiki settings

Log onto the machine that is running this mappings wiki, i.e. serving http://mappings.dbpedia.org/index.php URLs.

Open htdocs/mappings/LocalSettings.php. Add the following snippet at the correct position in the code:

"xx"=>288,

Restart the Apache server.

Edit mappings wiki sidebar

Edit MediaWiki:Sidebar. Add a link for the new language:

** {{fullurl:Special:AllPages|namespace=288}}|Mappings (xx)

Edit DBpedia datasets

Edit DBpedia datasets. Add a column for the new language and update all rows. Ouch...

Generate and deploy statistics

Extract data from Wikipedia dump file

Download the latest dump for language xx.

Run RedirectExtractor, InfoboxExtractor and TemplateParameterExtractor. dump/extract.stats.properties should contain the correct settings. cd into directory dump/, copy extract.stats.properties to extract.properties, modify if necessary, and run

dump> ../run extract

Extract statistics from triples files

cd into directory server/ and run

server> ../run stats

Copy src/main/statistics/mappingstatistics_xx.txt to same folder on the mappings server.

Update and deploy sprint stuff

Ask Pablo how to do that...