Thursday, 17 December 2015

Note: all script output is included
at the bottom of each code block and indicated with '>>>' (or
sometimes the output is summarised in a code comment). Yes, it's confusing, but
you're smart ;)

Let's imagine you want a simple array
of consecutive floating point numbers in Python: [0.1, 0.2, 0.3 ... 0.9]. You
start by trying to use Python's built-in range() function:

x = range(0.1,1,0.1) # Don't do this

Expecting to get an array of numbers
from 0.1 (first argument, inclusive) to 1 (second argument, exclusive) with a
step-size of 0.1 (third argument).

But the Python range function can
only deal with integer step sizes, and complains:

>>> TypeError: range()
integer step argument expected, got float.

OK, so you import numpy and use arange(), right?

import numpy as np

x = np.arange(0.1, 1, 0.1)

# x is [0.1, 0.2, ..., 0.9]

Exactly what you wanted. But what if
you don't have numpy? What if you care about code footprint, portability, and
all those things? What if you want someone else to be able to generate your
super interesting array, and they don't have numpy? Do you ask them to install
numpy, knowing their lives will be better in the long run? After internal
debate, you delete the numpy dependency, and try a list comprehension instead:

x = [x * 0.1 for x in range(1,10)]

Hah, now you must surely have the
best of all worlds. One line, no dependencies, and a list comprehension. This
is great. This is amazing. You print it to double-check your handiwork:

Oh yes, computers suck at floating
point numbers. Now what? Back to numpy? Round the numbers? Write a library to
handle all of this? Re-write numpy? You try once more for a simple solution:

x = [x/10.0 for x in range(1,10)]

print(x)

>>> [0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8, 0.9]

Interesting. Division works where
multiplication doesn't. You try out a few variations of each to make sure the
distinction is consistent, and find out it is. You're kind of happy with the
division solution, but floating point division is slower than floating pointmultiplication, right? What if someone wants to do this a million times? You
decide to see how much time you're losing for floating point precision:

import time

N = 1000000

t1 = time.time()

for j in range(N):

foo = [x * 0.1 for x in range(1, 10)]

print("multiplication:
{}".format(time.time() - t1))

t1 = time.time()

for j in range(N):

foo = [x / 10.0 for x in range(1, 10)]

print("division:
{}".format(time.time() - t1))

>>> multiplication:
4.11618614197

>>> division: 4.24211502075

That .1 second for every run of a
million hurts a bit. Division is slow. Why not just multiply the numbers as in
the first attempt, and then round them to 2 places? Last try:

# ...

t1 = time.time()

for j in range(N):

foo = [round(x * 0.1, 2) for x in range(1, 10)]

print("round:
{}".format(time.time() - t1))

No more slow division! And how long
can it take to shave off some decimal places?

Quite a while, it turns out. Is that
nearly 30 seconds? Yes, it is.

>>> multiplication:
4.11618614197

>>> division: 4.24211502075

>>> round: 29.5979890823

OK, that's slow. Slower than
sub-string manipulation as it turns out. You convert the broken float to a
string, take the first three characters off it, and then turn it back to a
float, just to prove a point:

# ...

t1 = time.time()

for j in range(N):

foo = [float(str(x * 0.1)[:3]) for x in range(1, 10)]

print("string:
{}".format(time.time() - t1))

>>> multiplication:
4.10650587082

>>> division: 4.21635198593

>>> round: 30.1995661259

>>> string: 23.5980100632

Now that you've put that 0.1 second
paid for division into context, you feel OK using it. For comparison (even
though you said 'last attempt' a while back), you add timing for numpy as well:

# ...

t1 = time.time()

for j in range(N):

foo = np.arange(0.1, 1, 0.1)

print("numpy:
{}".format(time.time() - t1))

>>> multiplication:
4.17134094238

>>> division: 4.23171901703

>>> round: 36.1641287804

>>> string: 23.5332589149

>>> numpy: 3.10889482498

A whole second faster! Maybe you
should include that massive dependency after all. But you remind yourself what
you've read a million times in start-up blogs. The most important thing is code
readability. The compiler will do optimization better than you can ever hope
to, right? Right?? Or - maybe not.

Saturday, 12 December 2015

When Spritz announced their new speed-reading technology with a flashy website, an impressive demo, and some complimentary media articles, people noticed. They claimed that they would change the future of reading, and people believed them. I believed them. The premise is simple - words flash one-by-one in front of you on a fixed point, saving you the time you usually spend moving your eyes backwards and forwards while reading text in lines.

It's difficult to re-imagine ideas as popular as reading. Books have chapters, pages, paragraphs, and lines. They have many of these things not because they are inherently important to reading, but because they were necessary to print words out on paper. Some web pages still try to incorporate the idea of pagination - you get half way through a news article and then have to press "Go to page 2". Most people agree that pagination is generally A Bad Idea and no longer use it.

Two years since the Spritz announcement, and the technology built on Spritz is disappointing. There are some half-baked attempts to create speed-reader applications for most popular platforms, but nothing revolutionary. Most of these applications are just a wrapper of the Spritz demo that allow users to upload the content they want to read or find it online.

The Spritz applications that I find useful, but far from complete, are:

The Spritz 'bookmarklet' (web) allows you to Spritz most text that you come across online, simply by selecting it and pressing the bookmark in your (desktop) browser

Pros: Free, easy to install and use, fairly versatile

Cons: Doesn't work on mobile devices, Isn't designed to read books or other files.

SpeedRead (web) - a website that allows you to upload your own files (including PDFs), and read them with a very attractive modification of the standard Spritz interface.

Pros: Looks good, works well, allows user files

Cons: Not free, although the 'try it out' functionality has no time or number of use restrictions.

ReadMe! (Android) - an Android e-reader that includes Spritz technology

Cons: Android-only, difficult to add files (you have to transfer from PC or download directly, but there are no options to link to Dropbox or equivalent).

There are a number of other applications that I have tried over the last few weeks, and they have all been very disappointing. Many have been very unstable or are incomplete/ no-longer developed. Perhaps the iOS ones are better, but not owning any Apple devices (and generally finding myself unwilling to pay for digital 'things'), I haven't been able to test these.

I find Spritz very useful for reading longish articles (looking at you Medium), and to skim through books that I'm not sure I want to read. But when reading through Spritz, one tends to mentally 'hear' everything in a monotone, and the applications of the technology so far still make it very difficult to navigate through a book in anything but a beginning-to-end pattern. When people read, they often tend to go back to re-read a complicated sentence, or to compensate for their mind wandering off for a bit. This is still something that is very difficult to do in all the Spritz applications I have seen so far.

About Me

I'm far away from home in this country called "Europe". I'm studying towards a Master's in Computational Linguistics (I think - this might help: https://xkcd.com/114/). I write about web applications and Python and other things that you may find interesting (considering you got this far).