Post navigation

I spent the past year writing a book called Make Art with Python. It teaches Python programming through drawing and art, instead of print statements. If that sounds interesting to you, sign up to hear when it’s released here.

Why Write a Book in Emacs?

I needed a distraction free way to write my code and prose in one place. I also needed to track my progress during the writing itself, and settled on using a paper journal and org-pomodoro.

If you’ve ever wanted to write a technical book, the past few months have taught me it’s possible for mortals with day jobs. I’ve just sent my fourth draft off to my editor, and tens of thousands of lines of code and prose have already been written. It’s the first time in my life writing a book has felt achievable.

But there have been painful false starts and mistakes along the way.

In this article I’ll share some basic rules I stole to keep myself focused, motivated, and productive while writing. I’m sharing in the hopes that they will help you too, and you’ll begin writing your book much sooner than it took me.

Most of these ideas and habits are generally applicable, and although they may seem insignificant, improved my writing tremendously. In fact, all the writing and code for my first draft was written while I was the sole backend developer at a startup which grew to millions of users.

I’m a big believer in concrete examples as the only way to learn anything, so I’ll also walk you through the mechanics of the tools I use. We’ll cover the process of writing, file and time management, visualizing the work in progress, exporting, and maintaining a positive attitude to keep the work flowing. Of all these parts, the biggest challenge is mental.

When things get difficult, it’s mostly because you don’t know what to do next. It’s during these times that you need have a routine that happens, no matter what, to get you focused back on writing and making progress.

If you’ve been thinking about writing a book, this guide will hopefully convince you it’s possible.

Last Wednesday, while joking at the end of the workday, the idea came up to make an neural network that generates Amazon reviews.

Above you can see the results for images, reviews, prices, and product names generated by neural networks trained on Amazon’s data.

All of this was possible thanks to the dataset provided by Julian McAuley, used in the SIGIR and KDD papers, along with the torch-gan and char-rnn sources. We’ll walk through adapting this dataset and adapting it to train these neural networks in this writeup.

We’ll step through the thought process behind adapting and extending a dataset, and we’ll document the road blocks as we run into them along the way. I hope leaving these in will help beginners see that very often, programming requires running into wall after wall until you finally reach the other side.

We’ll cover how to load the dataset, how to generate fake product images, reviews, prices, and product names, and then export them for presentation. I hope you’ll enjoy the ride, even if you’re not necessarily into programming. I just want to give you an idea for how writing and extending an AI bot works when working from a given data set.

The finished neural networks and code with instructions are at Github.

So, let’s begin.

Meeting Your Data

The very first thing I did once I received the data set was to take a look at it, to get an idea for its formatting. It was originally compressed with gzip, and so I needed to uncompress it on my external drive.

Uncompressed, it was 68 gigabytes. When you’re working with large files, it can get tricky to figure out what you’ve got, and to make the most basic of assumptions. So the first thing to do is take a look at what you’re working with, and how messy your data might be.

In my case, I just used the ‘less’ command, to take a look at the first few lines in my terminal. This is easy enough to do in the command line:

$ less user_dedup.json
{"reviewerID": "A00000262KYZUE4J55XGL", "asin": "B003UYU16G", "reviewerName": "Steven N Elich", "helpful": [0, 0], "reviewText": "It is and does exactly what the description said it would be and would do. Couldn't be happier with it.", "overall": 5.0, "summary": "Does what it's supposed to do", "unixReviewTime": 1353456000, "reviewTime": "11 21, 2012"}{"reviewerID": "A000008615DZQRRI946FO", "asin": "B005FYPK9C", "reviewerName": "mj waldon", "helpful": [0, 0], "reviewText": "I was sketchy at first about these but once you wear them for a couple hours they break in they fit good on my board an have little wear from skating in them. They are a little heavy but won't get eaten up as bad by your grip tape like poser dc shoes.", "overall": 5.0, "summary": "great buy", "unixReviewTime": 1357603200, "reviewTime": "01 8, 2013"}{"reviewerID": "A00000922W28P2OCH6JSE", "asin": "B000VEBG9Y", "reviewerName": "Gabriel Merrill", "helpful": [0, 0], "reviewText": "Very mobile product. Efficient. Easy to use; however product needs a varmint guard. Critters are able to gorge themselves without a guard.", "overall": 3.0, "summary": "Great product but needs a varmint guard.", "unixReviewTime": 1395619200, "reviewTime": "03 24, 2014"}{"reviewerID": "A00000922W28P2OCH6JSE", "asin": "B001EJMS6K", "reviewerName": "Gabriel Merrill", "helpful": [0, 0], "reviewText": "Easy to use a mobile. If you're taller than 4ft, be ready to tuck your legs behind you as you hang and pull.", "overall": 4.0, "summary": "Great inexpensive product. Mounts easily and transfers to the ground for multiple push up positions.", "unixReviewTime": 1395619200, "reviewTime": "03 24, 2014"}

$ less user_dedup.json
{"reviewerID": "A00000262KYZUE4J55XGL", "asin": "B003UYU16G", "reviewerName": "Steven N Elich", "helpful": [0, 0], "reviewText": "It is and does exactly what the description said it would be and would do. Couldn't be happier with it.", "overall": 5.0, "summary": "Does what it's supposed to do", "unixReviewTime": 1353456000, "reviewTime": "11 21, 2012"}
{"reviewerID": "A000008615DZQRRI946FO", "asin": "B005FYPK9C", "reviewerName": "mj waldon", "helpful": [0, 0], "reviewText": "I was sketchy at first about these but once you wear them for a couple hours they break in they fit good on my board an have little wear from skating in them. They are a little heavy but won't get eaten up as bad by your grip tape like poser dc shoes.", "overall": 5.0, "summary": "great buy", "unixReviewTime": 1357603200, "reviewTime": "01 8, 2013"}
{"reviewerID": "A00000922W28P2OCH6JSE", "asin": "B000VEBG9Y", "reviewerName": "Gabriel Merrill", "helpful": [0, 0], "reviewText": "Very mobile product. Efficient. Easy to use; however product needs a varmint guard. Critters are able to gorge themselves without a guard.", "overall": 3.0, "summary": "Great product but needs a varmint guard.", "unixReviewTime": 1395619200, "reviewTime": "03 24, 2014"}
{"reviewerID": "A00000922W28P2OCH6JSE", "asin": "B001EJMS6K", "reviewerName": "Gabriel Merrill", "helpful": [0, 0], "reviewText": "Easy to use a mobile. If you're taller than 4ft, be ready to tuck your legs behind you as you hang and pull.", "overall": 4.0, "summary": "Great inexpensive product. Mounts easily and transfers to the ground for multiple push up positions.", "unixReviewTime": 1395619200, "reviewTime": "03 24, 2014"}

We can immediately see that our JSON file isn’t really in JSON, but it’s in a kind of Python dictionary object. And indeed, when I look at the web url that I got this info from, it says specifically that each line can just be ‘eval’d in Python in order to generate one object at a time.

This makes it easier to deal with this large of a file, because it means we don’t need to read the entire list into memory just to create our object. (Which could take minutes to hours to do.) Instead, we can go line by line in our file, and each review individually into memory.