Configuring Elastic MapReduce 4 applications from the AWS console

Lead Developer Phil Kendall recently blogged about getting started with Spark on EMR. In this follow up post he explains how to configure EMR 4 applications from the AWS console.

Update 12th November: Jon Fritz, one of the Elastic MapReduce PM team, let me know that they’ve now fixed this bug in the console

Back in July, Amazon released “v4” of their Elastic MapReduce platform which introduced some fairly big changes as to how applications are configured. While there are some nice examples on that page, those examples don’t work if you try them in the AWS console: if you copy and paste an example into the “Edit software settings” box and then try and create a cluster, you get the following error:
…which is perhaps not the world’s most informative error ever, and definitely a bit disappointing when all you’ve done is taken an AWS-supplied example. After much frustration, I finally discovered that it’s the capitalisation of the keys that is significant: if you change the supplied example to

[

{

"classification": "core-site",

"properties": {

"hadoop.security.groups.cache.secs": "250"

}

},

{

"classification": "mapred-site",

"properties": {

"mapred.tasktracker.map.tasks.maximum": "2",

"mapreduce.map.sort.spill.percent": "90",

"mapreduce.tasktracker.reduce.tasks.maximum": "5"

}

}

]

…then everything works just fine – note the lower case “c” and “p” in “classification” and “properties” as opposed to the upper case versions used in AWS’s example. I’ve sent feedback to the AWS team on this one so I suspect it may end up getting fixed pretty soon, but if anyone else is suffering from the same problem then hopefully this gets you out of a hole!