I just pushed the first bits of a complete Puppet module for Cloudera’s CDH4 distribution for Hadoop. Currently it only supports YARN (not MapReduce v1), and assumes that your NameNode also runs your ResourceManager.
There is another CDH4 puppet module out there, but it uses MapReduce v1, and also adds the HDFS data directories as hardcoded facter variables. puppet-cdh4 will allows you to specify Hadoop directories using a config class. As long as your JBOD mounts have been partitioned and are mounted, then you should be able to configure a DataNode like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
This is a work in progress, so leave me a comment on github if you try to use it and have troubles.