How do I?

This is an old revision of the document!

Everyone in the department can access small amounts of clustered computer systems for the purpose of teaching and learning Big Data processing on distributed/parallel computing systems. You do have to speak with us before you or your students attempt to work with these systems. Below is basic information about the systems, so you know what we're talking about exactly.

The clusters referenced in this document are unsuitable for research computing

Everything written here is accurate on 02MARCH2017 and not a single day after.

Hardware

There are two tiers of systems, but they are not logically seperated (e.g., YARN sees one cluster).

12-30 miniature, Desktop-class computer systems: 8GB RAM, Single core 3.6GHz, about 200GB free space per node, sluggish I/O.

10+ 1U, enterprise-class computer systems: 24GB RAM, Dual Socket 2.6G-3.6GHz, < 100GB free, sluggish I/O.

5 or more virtual systems with similar characteristics.

Software

Hadoop 2.7.3 (including HDFS, YARN, MapReduce) Spark 2.1.0 Hive 2.1.1 HBase 1.2.4 Zookeeper Pig 0.16 Avro Ambari Tez Storm Kafka We do make an effort to support protobuf, Thrift, FB, and many more.

Some of the above requires you to have a conversation with us, and much of the above is inflexible with regard to customization, but don't hestiate to ask (in particular, we will not probably kerberize the cluster for the sake of the users)