0x8107D

20120822

You might know about Buffer, a web service where you can create a stash of tweets which will be posted to your Twitter account at regular intervals. Instead of overwhelming your followers with all the interesting things you have to say, you can configure it such that it will send out a tweet at 9am and 1pm every day (for instance). The only thing you have to worry about is to keep your buffer filled.

However, the buffer size at Buffer is only 10 tweets. If you want more you need to pay (or make your friends sign up as a referral). Because I find $10/mo a bit much for just a buffer, I decided to make my own.

We'll do this with a Google Spreadsheet and a Google Apps Script. The spreadsheet is the buffer, which will contain all tweets you are about to post. The script will be triggered at regular intervals which fetches a single tweet from your spreadsheet, posts it to twitter and removes the tweet from the spreadsheet.
The script is written in Google Script, which is actually Javascript, but it provides access to Google services and other convenience functions for sending email and making HTTP requests (and more). We'll use this functionality below to implement the buffer application.

The spreadsheet

First we'll be creating a spreadsheet: head over to Google Drive, create a new spreadsheet and give it a descriptive name (e.g. Buffer). Next, let's fill in some example tweets in the first column:

The script
The next step is to create a Google Apps Script. In the spreadsheet we've just created, go to Tools, Script editor. We'll start with the following script, which is merely a skeleton of the final version:

The script should be fairly readable to anyone familiar with Javascript: we fetch the spreadsheet we've just created and get our hands on the first sheet inside. There are two stubs for fetching the next tweet and posting the tweet at Twitter. Both stubs will return false if something went wrong. The getNextTweet function will return the string with the tweet inside if the buffer was not depleted.
If the tweet is posted to Twitter successfully, it will be removed from the spreadsheet by removing row 1 (note that spreadsheet indices start at 1 rather than at 0).

Getting the next tweet

Let's implement the getNextTweet function first. As mentioned, it will return the value of the top left cell of the document's first sheet. This value is accessed by calling getValue() on a range.
However, before we do this we need to check whether there's a value at all. If the buffer is empty, we'd like to be notified by email. Sending an email is easy in Google Apps Script, just call MailApp.sendEmail() and it will send an email as the user who's running the script. Currently, the quota is 500 emails per day, which should be plenty if this is the only script you run.
All this functionality is implemented with the following code:

If the buffer should remain filled at all times, you could also modify the code to send a warning email when the number of tweets drops below a certain threshold. That's up to you.

Registering the Twitter application

Before we are going to send the tweet we've just obtained, head over to dev.twitter.com to register our application. By registering our app we'll be able to authenticate our script with Twitter, which uses OAuth. Fortunately, Google Script provides a service which takes care of all the hairy implementation details behind this authentication scheme.
At the registration it is important to enter a callback URL: by entering an URL we'll be redirected to the Twitter website upon authentication. The authentication will simply fail if we leave this empty. For this script, we'll just enter the script's URL (starting with https://script.google.com/). After filling in the necessary fields you'll immediately obtain a consumer key and a consumer secret, which we'll need in the next step.

After you have registered the application, make sure it is allowed to Read and Write, otherwise we cannot post any tweets. Check the application's settings as shown in the screenshot below:

Sending the tweet

Everything is in place now to write the code which authenticates with Twitter and sends the tweet to the outside world. The code below takes care of the authentication and posting the tweet we got passed as a parameter:

Google Script has a UrlFetchApp class which can make HTTP(S) requests for you. It also provides access to the OAuth service which will do all the dirty work for us. We simply need to provide the consumer key and secret and the URLs. When we want to post something, we'll need to pass some additional data with the request which tells the service we're using OAuth authentication.

If no authentication has taken place before, the script has no access token and secret. In that case you'll see a popup when you execute this code for the first time:

If we proceed, we are redirected to the Twitter website, where you enter the credentials of the Twitter user whose timeline should receive the tweets from the buffer. If all goes well, the Twitter website closes and in the background our script has obtained an access token and secret. This access token remains valid unless the Twitter user revokes the app's permissions in his or her settings.

We're almost done: each time we run the full script, a tweet is popped from the buffer and posted as the authenticated Twitter user. There's one more thing left: automation.

Setting up the triggers

Of course we don't want to run this script manually, so by adding one or more triggers we don't have to worry about this. In the script editor, go to Resources, Current script's triggers. A dialog appears where we can add time-driven triggers to our script. The screenshot below describes the example where we post twice during the day, one tweet in the morning and one tweet in the afternoon.

We're done

Now we have a fully automated script which reads your spreadsheet and posts them at regular intervals on your Twitter account. The full source can be found here. You only have to worry about creating content in the spreadsheet. You could also extend its functionality to post to Facebook or LinkedIn (which also use the OAuth authentication scheme), or write a more elaborate warning system when the buffer is becoming empty. At least you've saved yourself $10/month by doing the buffering yourself.

20120504

Memoization is a technique to remember the output of a pure function, in the mathematical sense. That is, for a given input it will always produce the same output. By remembering, you can serve the output right away if you have seen the input before and you may omit (costly) computation steps.

This post has an emphasis on Perl, since that is a language I use quite often at the time, but in fact the technique is applicable to programming in general.

As you can see, this is a very ineffecient approach, because the subtree of fib(3) appears twice in this call graph. Imagine what it would look like for fib(100), where fib(3) is computed billions and billions of times, again and again. So the idea with memoization is to compute fib(3) once, and serve its outcome immediately for every future call to fib(3). You could call it a cache, but since this is a specific caching technique on a subroutine level I prefer to stick with the term memoization.

And I'm sure you can find an implementation for any other programming language. I found this interesting article on how to accomplish memoization in Javascript.

To me, the Perl version excels because of its ultimate simplicity. One call to memoize() and your running time is only 2% of what it was before (this actually happened to me). A script with an exponential running time might turn into something that looks linear (or at least less exponential).

To demonstrate the effects on the fib function, consider the following numbers. n is the argument passed to fib, as defined above. The two columns show the time spent by the CPU (in user space, not wall-clock time). You can see the dramatic improvements by adding a simple line to your program:

n

No memoization (s)

Memoization (s)

10

0.00

0.01

20

0.01

0.01

30

1.24

0.01

40

147

0.01

50

18485

0.01

The results speak for themselves. The case n=10 looks a bit strange, going from 0.00 to 0.01 seconds. This is due to the overhead of loading the Memoize module, which takes a bit of time. But for larger n this cost is totally worth it, from 5 hours down to almost nothing.

This may look impressive, but there's no such thing as a free lunch. As always with optimization there's a trade-off between time (CPU cycles) and space (memory). Memoization has little use when you hit the limits of your RAM because you chose to memoize every single function that moves. So you'd better make a wise choice for which functions you want to memoize.

Not all functions are suitable for memoization, and you can find a list of caveats in the Perl documentation. Although it's a Perl site, the caveats are actually language-independent. Summarized, these are:

the behavior only depends on its own parameters;

there are no side-effects;

the output is not modified by the caller;

the function is not too simple.

And even when a function passes these criteria, it doesn't automatically mean that it's suitable for memoization. You should ask the following three questions:

How often is this function called?

What is the size of the input domain for this function?

What is the output size for this function?

Remembering the output of a function which is called once during the execution of a script is not very useful, since there is exactly zero reuse. But even a function which is called multiple times might not be useful. And how do you know how often a function is called? Simple: by measurement. Run your code through a profiler. For Perl I can recommend the NYTProf profiler:

$ perl -d:NYTProf fibonacci.pl
$ nytprofhtml

From the HTML it generates you can easily check for each function how often it was executed. But the number of calls is just one figure, you also need to know the domain of your input parameters. For example, if you know that one parameter has 400 possible values and the other one 2, that makes 800 possible inputs. If you observe that a function is called 4000 times, you know that there are 5 calls per input on average. In that case it might be worthwhile to memoize. If it is less than 800 times, you should investigate the distribution of your inputs. In case there is a lot of inequality then you're likely to win some time by memoizing the function. Contrarily, if (almost) all inputs are unique it makes no sense to waste precious memory.

Additionally, you should also take the actual size of the input and output into consideration, because this is what will end up in memory. After all, a boolean occupies less RAM than a huge array of data. If the output is a large data structure, you should consider to memoize smaller parts of the solution and use these parts to assemble the final solution for each call. For example, suppose you have a function which returns a list of Fibonacci numbers, starting from the mth number until the nth, it is often sufficient to only store the output of fib and build the list on every call.

To conclude, applying memoization has the potential to bring you huge performance benefits with minimal effort. But it's not a silver bullet, you should be aware under which conditions it is applicable and be aware of the trade-offs you're about to make.