Setup a SASS file

Watching SASS files in dev

Need to add SASS support. Checked out sass4clj.
There is a lein plugin, but that didn’t play well with figwheel. I did end
up using some ideas from that and the sass4clj project to integrate with figwheel.

I started with this SASS Watcher
which was a good starting point, but didn’t load in the webjars. So the next step
is replace with sass4clj which does reference webjars.

Still had to do a fair amount of clean up from the converted markdown.

Plugins

These make the stucture and navigation match the google sites somewhat.

Related Pages

Lots of our page had files as downloads. I like the idea of putting downloads in a
sub directory and having them auto populate on the page. Also some of our navigation
is based on pages in a matching directory. This plugin populates a sub_pages collection
and a downloads collection. The view renders those collections

We have some Spark jobs that we want the results stored as a CSV with headers so they can be directly used. Saving the data as CSV is pretty straight forward, just map the values into CSV lines.

The trouble starts when you want that data in one file. FileUtil.copyMerge is the key for that. It takes all the files in a directly, like those output by saveAsTextFile and merges them into one file.

Great, now we just need a header line. My first attempt was to union an RDD w/ the header and the output RDD. This works sometimes, if you get lucky. Since union just smashes everything together, more often then not, the CSV has the header row somewhere in the middle of the results.

No problem! I’ll just prepend the header after the copyMerge. Nope, generally Hadoop is write only, you can get append to work, but still not a great option.

The solution was to write the header as a file BEFORE the copyMerge using a name that puts it first in the resulting CSV! Here’s what we ended up using:

(nsroximity.spark.output(:require[sparkling.conf:asconf][sparkling.core:asspark][sparkling.destructuring:asde][clojure.data.csv:ascsv][clojure.java.io:asio])(:import[org.apache.hadoop.fsFileUtilFileSystemPath]))(defn-csv-row[values](let[writer(java.io.StringWriter.)](clojure.data.csv/write-csvwriter[values])(clojure.string/trimr(.toStringwriter))))(defnsave-csv"Convert to CSV and save at URL.csv. URL should be a directory.
Headers should be a vector of keywords that match the map in a tuple value.
and should be in the order you want the data writen out in."[urlheadersscrdd](let[header(str(csv-row(mapnameheaders))"\n")fileurldest(strfile".csv")conf(org.apache.hadoop.conf.Configuration.)srcFs(FileSystem/get(java.net.URI/createfile)conf)](FileUtil/fullyDelete(io/as-filefile))(FileUtil/fullyDelete(io/as-filedest))(->>rdd(spark/map(de/value-fn(fn[value](let[values(mapvalueheaders)](csv-rowvalues)))))(spark/coalesce1true)(#(.saveAsTextFile%file)))(with-open[out-file(io/writer(.createsrcFs(Path.(strfile"/_header"))))](.writeout-fileheader))(FileUtil/copyMergesrcFs(Path.file)srcFs(Path.dest)trueconfnil)(.closesrcFs)))

This works for local files and s3, and it should work for HDFS. Since we’re using s3 and the results are not huge, we use (coalesce 1 true) so that only one part file is written to s3, without that we had issues with too many requests. Could probably use a higher number and find a happy medium, but we just use 1.

When we initially started development of ROXIMITY, I decided to go with MongoDB. There were three reasons for this choice: Geospatial support, redundancy and scalability and a lack of schema. If you are thinking about MongoDB, these are still all valid reasons for considering it and our experience should aid your decision making. read more