ottomata

« What’s up Scribe? Too many Hive JSON SerDes »

Puppet and CDH4

| Comments

I just pushed the first bits of a complete Puppet module for Cloudera’s CDH4 distribution for Hadoop. Currently it only supports YARN (not MapReduce v1), and assumes that your NameNode also runs your ResourceManager.

puppet-cdh4

There is another CDH4 puppet module out there, but it uses MapReduce v1, and also adds the HDFS data directories as hardcoded facter variables. puppet-cdh4 will allows you to specify Hadoop directories using a config class. As long as your JBOD mounts have been partitioned and are mounted, then you should be able to configure a DataNode like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
include cdh4

class { "cdh4::hadoop::config":
    namenode_hostname => "namenode.hostname.org",
    mounts            => [
        "/var/lib/hadoop/data/a",
        "/var/lib/hadoop/data/b",
        "/var/lib/hadoop/data/c"
    ],
    dfs_name_dir      => ["/var/lib/hadoop/name", "/mnt/hadoop_name"],
}

# Installs and starts the DataNode and NodeManager services.
include cdh4::hadoop::worker

This is a work in progress, so leave me a comment on github if you try to use it and have troubles.