-
Notifications
You must be signed in to change notification settings - Fork 146
Building and installing
On the machine from which you want to run your Dumbo programs, do:
$ wget -O ez_setup.py http://bit.ly/ezsetup
$ python ez_setup.py -z dumbo
easy_install
installed, simply:$ easy_install -z dumbo
-z
option is important since Dumbo only works when it’s installed as a self-contained, zipped egg.
Alternatively, you can also install Dumbo in a virtual Python environment:
$ wget -O virtualenv.py http://bit.ly/virtualenv
$ python virtualenv.py env
$ env/bin/easy_install -z dumbo
Once you have completed the steps above, you can move on to Running programs. The recommended Hadoop distribution to run your Dumbo programs on is Cloudera’s Hadoop distribution, which supports Dumbo out of the box from version 2 (CDH2) onwards.
If you use an old version of Hadoop then you’ll first have to apply a few patches. More precisely, you then have to download the patches for HADOOP-1722, HADOOP-5450, and MAPREDUCE-764 and rebuild Hadoop after applying these patches (the order in which you apply the patches is important!):
$ cd /path/to/hadoop
$ patch -p0 < /path/to/HADOOP-1722.patch
$ patch -p0 < /path/to/HADOOP-5450.patch
$ patch -p0 < /path/to/MAPREDUCE-764.patch
$ ant package
Dumbo 0.21 has been around for quite a while now and can be considered very stable, but if you still want to use 0.20 or lower for some reason then you’ll have to follow these instructions instead.
To build Dumbo, you just have to add it to the src/contrib directory of Hadoop (version 0.18) and build Hadoop:
$ wget http://github.com/klbostee/dumbo/tarball/release-0.20.28 -O dumbo.tar.gz
$ tar zxvf dumbo.tar.gz
$ mv klbostee-dumbo* $HADOOP_HOME/src/contrib/dumbo
$ cd $HADOOP_HOME
$ ant package
$ ls build/hadoop-*/contrib/dumbo
bin examples lib
You can also install Dumbo as a Python module on your system:
$ cd $HADOOP_HOME/src/contrib/dumbo
$ sudo ant install_pymod