WP-MIRROR
Installation
Debian GNU/Linux 6.0 (squeeze)
1. Download the .DEB package
Releases are found at http://download.savannah.gnu.org/releases/wp-mirror/. Select the most recent .DEB package.
2. Install the .DEB package
WP-MIRROR provides a patched version of one file in the MEDIAWIKI package (/usr/share/mediawiki/includes/Import.php). In other words, installing WP-MIRROR involves overwriting that file with the patched version. For this, the DPKG utility requires an extra command-line option.
(shell)# dpkg --install --force-overwrite wp-mirror-0.3-1_all.deb
The install process copies over a dozen files to the appropriate directories and sets permissions. Note that `/etc/wp-mirror/local.conf.template' must be manually copied to `/etc/wp-mirror/local.conf'. This is done to avoid inadvertently overwriting any modifications that you might have previously made to `/etc/wp-mirror/local.conf'.
(shell)# cd /etc/wp-mirror/
(shell)# cp local.conf.template local.conf
3. Configuration
To configure `mysql', `mediawiki', `curl', and `wp-mirror', please refer to the Configuration section below.
Install from Tarball
1. Download the tarball
Releases are found at http://download.savannah.gnu.org/releases/wp-mirror/. Select the most recent tarball (e.g. wp-mirror-0.3.tar.gz) and its checksum (e.g. wp-mirror-0.3.tar.gz.sig). Then verify the integrity of the download.
(shell)# gpg --verify wp-mirror-0.3.tar.gz.sig
2. Install clisp
On Debian systems (or derivatives thereof) first install the build dependencies.
(shell)# aptitude install clisp cl-asdf cl-getopt cl-md5
Check that you have CLISP 2.48 or higher.
(shell)$ clisp --version
GNU CLISP 2.48 (2009-07-28) (built 3487543663) (memory 3534965158)
...
Earlier distributions, such as Debian GNU/Linux 5.0 (lenny) and its derivatives such as Ubuntu 10.04 LTS (lucid), provide older versions of CLISP that lack some of the functions called by WP-MIRROR.
3. Configure cl-asdf
All modern language systems come with libraries and packages that provide functionality greatly in excess of that needed for standards compliance. Often these packages are provided by third parties. Lisp systems are no exception. For Debian distributions, third-party libraries and packages are installed under `/usr/share/common-lisp/source/', and symbolic links to said source files are collected under `/usr/share/common-lisp/systems/'.
Another System Definition Facility (ASDF), is the link between a Lisp system and any libraries and packages that it calls. ASDF is not immediately usable upon installation. Your lisp system (CLISP in this case) must first be made aware of its location. ASDF comes with documentation that discusses configuration.
(shell)$ less /usr/share/doc/cl-asdf/README.Debian
(shell)$ lynx /usr/share/doc/cl-asdf/asdf/index.html
Before configuring, first note that WP-MIRROR will be run as `root', so the configuration file `.clisprc', must be put in the root directory, rather than the user's home directory. To configure CLISP to use ASDF, append the following lines to `/root/.clisprc'.
(load #P"/usr/share/common-lisp/source/cl-asdf/asdf.lisp")
(push #P"/usr/share/common-lisp/systems/" asdf:*central-registry*)
CLISP should now be ready to use ASDF. Check this by running
(shell)# clisp -q
[1]> *features*
(:ASDF2 :ASDF ...
[2]> (asdf:asdf-version)
"2.011"
[3]> asdf:*central-registry*
(#P"/usr/share/common-lisp/systems/")
4. Unpack WP-MIRROR
(shell)# tar --extract --gzip --preserve-permissions --file wp-mirror-0.3.tar.gz
(shell)# cd wp-mirror-0.3
5. Install WP-MIRROR
(shell)$ make build
(shell)# make install
(shell)# cd /etc/wp-mirror/
(shell)# cp local.conf.template local.conf
6. Configuration
To configure `mysql', `mediawiki', `curl', and `wp-mirror', please refer to the Configuration section below.
Configuration
1. Trial Run
WP-MIRROR is designed to be launched from a command-line interface (CLI). This is so that one may set up a mirror farm on a remote server accessed via SSH (servers usually do not have a GUI installed).
Open two consoles (terminals), and then in separate consoles (terminals) execute:
(shell)# wp-mirror --mirror
(shell)# wp-mirror --gui
The first time you try this, you will get an error message. This is because WP-MIRROR first asserts hardware and software prerequisites. Hardware prerequisites include adequate memory and disk space, and internet connectivity. Software prerequisites include MySQL and MediaWiki configuration, directory and file permissions, etc. If anything is amiss, WP-MIRROR will exit with a neatly formatted error message. In most cases the error message requests that you install or configure something. So please proceed directly to the next step.
2. System planning
At this point you should pause to study the README file.
(shell)$ less /usr/share/doc/wp-mirror/README
This document contains highly valuable advice that could save you weeks or months of time. Why?
- The default configuration for MySQL is suboptimal for WP-MIRROR. With careful configuration, its performance can be improved by an order of magnitude. This is an indispensable condition for mirroring any of the top ten largest wikipedias.
- The default configuration for MediaWiki uses ImageMagick for resizing images. However, ImageMagick grabs too much system memory and will frequently hang your system. MediaWiki should instead be configured to use GraphicsMagick and RSVG. For the top ten wikipedias, which can have upwards of a million image files, this is a sine qua non.
- CURL sometimes fails to completely download a file. The default configuration lets CURL hang. CURL can instead be configured with a timeout. For the top ten wikipedias, where partial downloads can afflict hundreds or thousands of image files, correct configuration is a must.
- If your internet traffic goes through a caching web proxy, such as polipo, there are additional issues to address.
But the best reason to study the README is this: If you intend to build a mirror of any of the largest wikipedias, you will greatly benefit by first going down the learning curve with a small wikipedia.
- Building a mirror of the `en' wikipedia, which requires over 2T disk space, presents the most demanding case, and may take weeks to complete on a server.
- Building a mirror of the `simple' wikipedia, which requires 40G disk space, can be done in a day on a laptop.
The README treats both projects (`simple' and `en') in detail.
3. Install mysql (if not already done)
(shell)# aptitude install hdparm mysql-client mysql-server
These need to be configured as described in the README.
4. Install mediawiki (if not already done)
(shell)# aptitude install graphicsmagick gv librsvg2-2
(shell)# aptitude install mediawiki mediawiki-extensions mediawiki-math
(shell)# aptitude install php5-suhosin texlive-latex-base tidy
The mediawiki* packages need to be configured as described in the README.
5. Install file handling packages (if not already done)
(shell)# aptitude install bzip2 curl openssl wget
CURL needs to be configured with a timeout as described in the README.
6. Configure WP-MIRROR
Finally, choose the wikipedias that you wish to mirror. This is done by editting `/etc/wp-mirror/local.conf'. By default, WP-MIRROR builds a mirror of `simple'. The config file contains several additional examples (all commented out). If you like the classics, try
(defparameter *mirror-languages* '("el" "la" "simple"))
If you want to start very small, build a mirror of the `zu' wikipedia (isiZulu). For it has just a few hundred articles, and some nice animal photos (category:isiLwane).
(defparameter *mirror-languages* '("zu"))
Then click a few times on the "Special:Random" link to get an idea of what is there.
Enjoy!