Sunday, 30 October 2016

It is well-known that the computer keyboard has been around for a long time, and has not recently undergone any major changes. Although the basic QWERTY layout has been around since the days of typewriters, people are against any change to this design as they are 'used to it'. (They were also used to type-writer, but not many complained when they were given a backspace key that worked).

Since Apple's recent announcements, it is obvious that the time has come to rethink the keyboard. The important points are:

Looking at the keyboard is not actually necessary to type

Having 3D keys instead of virtual keys allows people to more easily work out which keys their fingers are resting on or about to press

Different applications do different things, and therefore it would be nice if different keys could take on different behaviour based on which application was currently in use.

With the above in mind, I make the revolutionary proposal to do away with displays on the keyboard altogether. I know you're immediately wondering how we can cope with only a single display (the monitor), and with no digital display on the keyboard, but consider the following:

New keyboards would have a single row of keys above the number row. I suggest for now that this row should comprise of 13 keys. Twelve of these keys would be called 'function keys', and they would be assigned to different functions in a context-sensitive manner. The F4 key (Function 4 key) would be used to close the currently open window, for example, while the F1 Key would be used to open an entirely unhelpful 'help Wizard' on M$Windoze systems. If a media application is open, then instead of opening an entirely unhelpful 'help Wizard', F1 would instead immediately mute the volume. This would be useful, for example, when you bring your laptop out of hybernation in a public lecture and it resumes playing a film, or other private media that you were watching the night before, at full volume.

The final key would be at the top left of the keyboard, and this would be called the Escape (ESC) key. This key would remove the current focus, (e.g. deselect an image in word processing applications and put the focus back on the last cursor position to resume typing); close JavaScript overlays; and exit insert mode in text editors such as vi (somewhat similar to the arguably easier to press Ctrl + [ combination, but one that is simpler and more beginner friendly).

Advantages of removing the keyboard display:

No need to glance up and down between two displays

Fingers can easily work out which function key they need to press without looking down

One less thing to break -- these function keys would be integrated into the main keyboard. They would not even require their own processor, meaning fewer hardware, software, firmware, and driver issues, as well as creating more environmentally friendly laptops.

Disadvantages of removing the keyboard display:

We'll never get a "I rewrote Doom 1 using only the Touchbar" post on Medium.

Saturday, 3 September 2016

The
Internet seems to become less pleasant by the day for those of us who are here
primarily to read. Every now and again (i.e. dozens of time per day), I see a
URL that points to an article which looks like it might contain some
interesting information. I click on the URL hoping to get a nice big piece of
text for me to digest, but instead I'm presented with auto-play videos, a
JavaScript overlay asking me to subscribe to a newsletter, another JavaScript
overlay asking me to use the site's app (obligatory XKCD: App), another JavaScript
overlay telling me not to use an adblocker and still another one which thanks
me for not using an adblocker after I've told my adblocker to block the
previous one... You get the picture.

Today
I'm going to describe how you can greatly improve this experience, focusing
specifically on news articles from online media, by building a Reader
application and a browser extension. The application will transform web pages
from looking like the image on the right, to instead look like the one on the
left.

Our arsenal

To
create our Reader app, we'll use Python and Flask. The browser extension we
create is for Google Chrome, although it should be pretty trivial to adapt for
Firefox. We'll be using the Newspaper library for article extraction,
and we'll write a little bit of HTML and CSS to display our final article as we
want to read it.

I assume
that you know some basics, and that you have a working version of Python and
Pip installed on your system. I don't go into too much depth about how the
various components work, so if you have some previous knowledge of Python,
HTML, CSS, and JavaScript, you'll find everything below makes a lot more sense.
You should be able to piece everything together even without prior experience
though.

Setting up

Newspaper,
the library we use for text extraction, is primarily a Python3 library. There
is a buggy fork for Python2, but I strongly recommend that you use Python3 to
take advantage of the maintained version. I therefore assume that your system
is set up in such a way that pip invokes pip3 and python points
to the python3 interpreter. Adapt the
following as necessary if this is not the case. I'm not going to show the extra
commands needed to create a virtualenv and install the packages in that. If you
feel strongly about this, feel free to adapt as you see fit.

First we
need to install Flask and Newspaper. Run the following commands:

pip
install Flask

pip
install newspaper3k

For the
latter, you may have some issues with the installation of the lxml library.
GIYF.

Writing the Python code

The core
of our app will be a web server that receives a URL from the user, downloads
the content from that URL, extracts the text, reformats it, and returns it.

Create a
directory for your project and create a file within this directory called reader.py. Add
the following code to this file:

The
first few lines simply import the parts of Flask we'll be using and the Article
class from Newspaper, which is all we need to download the article from the URL
and perform text extraction on it.

The next
line initialises our Flask app. We then see a single route, which will detect
traffic going to the "/read" route, and call the function defined
directly below it.

Our
actual read() function grabs the URL of the
desired article from the arguments of the current URL. It initalises an Article object,
downloads the content from the URL, does Newspapers magic parsing on it (text
extraction is actually a lot more difficult than one might imagine), and splits
the resulting text into paragraphs. Finally, it returns an HTML template (which
we'll write in the next section), and passes in the paragraphs of the article
as well as the article's title as arguments. We pass in a list of paragraphs
instead of the whole text chunk as Newspaper gives us text delimited with newline
characters, which will be ignored in our HTML. We therefore will re-insert <p> tags
between each paragraph in our template (see the next section).

The
final part of the script starts up our web application if we are running it
locally and turns on debug mode.

Writing the HTML

Now we
need to create an HTML template which will form the skeleton of all news
articles read through our app. Create a new directory inside your project
directory called templates (this name will allow Flask to
find your templates, so don't change it). Create a new file inside this
directory called article.html. Your project should now have the
following structure:

reader

|--
templates

| +-- article.html

+--
reader.py

In the article.html file,
add the following code:

<html>

<head>

<title>{{title}}</title>

<style>

body {

font-family:
"Helvetica";

max-width: 900px;

padding-left: 20px;

padding-right: 20px;

padding-top: 30px;

margin: 0 auto;

text-align: justify;

}

</style>

</head>

<body>

<h1>{{title}}</h1>

{% for paragraph in paragraphs %}

<p>{{paragraph}}</p>

{% endfor %}

</body>

</html>

This is
a Flask template (or more specifically a Jinja2 template). It has the normal
structure of an HTML document (starting and ending with <html>, <body>, and <head> tags).
We have a few lines of internal CSS which will make our article be displayed in
a decent font, create margins on the left and right of the article on screens
that are wider than 900px, add some padding so that the text doesn't try creep
off the screen, put the text in the middle of the screen, and stretch out the
text (fully justify) to give nice vertical lines on the left and right (which
many people do not like, so feel free to remove the justify line
if you prefer ragged right).

The
non-html parts of the above code are enclosed in either double braces {{}} or
in the brace-percent combination {%%}. The former are simply placeholders
for the arguments that we pass in from our Python code (i.e. the paragraphs and
the article's title). The latter defines a control sequence -- in our case, a
simple for loop which will loop through
each of our paragraphs and add them to the page, opening and closing <p> tags
as required.

That's
our entire app. Let's test it.

Testing our web application

To see
if our app works, navigate to your project directory in terminal or command
prompt and then run the reader.py script. To do this, run
commands similar to the following (depending on where your project directory is
located)

cd
git/reader

python
reader.py

You
should see output similar to Running
on http://127.0.0.1:5000/ (Press CTRL+C to quit). Now fire up your web browser and find the URL of
a news article you'd like to read (e.g. this one about Mother Theresa: http://www.bbc.com/news/world-europe-37258156).

Although
our application is already usable, it's not very user-friendly. Each time you
want to read an article, you have to copy the URL to the clipboard and then
construct the long version as shown above. Instead of this, we want to be able
right-click on any URL that we come across while browsing the web, and to
easily send that article to our app. To do this, we'll build a Google Chrome
extension. A basic Google Chrome extension consists of two parts: a manifest
file (JSON), which describes the extension and requests the necessary
permissions, and a JavaScript file, which is where the functionality of the
extension lives.

Create a
new directory called readerExtension and inside this create a file
called manifest.json as well as one called script.js.

Inside manifest.json add
the following code:

{

"manifest_version": 2,

"name": "Plaintext Article
Reader",

"description": "Reformats
online news to remove all the gunk",

"version": "1.0",

"permissions": [

"contextMenus"

],

"background": {

"scripts":
["script.js"]

}

}

The
first few lines simply describe our extension. In the permissions section,
we state that we need permission to fiddle with the user's context menus (i.e.
the menu that appears when you right click), and in the background section,
we point to the script.js script, which will get called
automatically by the browser.

In the script.js file,
add the following code:

function
plaintext(info,tab) {

chrome.tabs.create({

url: "http://localhost:5000/reader?url="
+ info.linkUrl,

});

}

chrome.contextMenus.create({

title: "View Plaintext",

contexts:["link"],

onclick: plaintext,

});

We start
off by defining a function plaintext() which will create a new tab in
the user's browser. This tab will redirect to localhost and add the URL that we
receive.

The
second part creates a context menu (which Chrome will automatically collapse
into the existing right-click context menu for us) and adds a "View
Plaintext" section. We use contexts to say that we only want this to
appear if the user right-clicks on a link and we use onclick to
specify that our plaintext() function should be called when
the user selects this option.

Installing the Google Chrome
extension

To
actually publish this as a proper Google Chrome extension would involve going
through a lengthy set of steps (and paying Google $5). However, it's easy
enough to set Chrome to use Developer mode and to load unpacked extensions.

In the
"omnibox" or address bar of Google Chrome, type . At the top of
the page, tick the box that says "developer mode". Then choose
"Load unpacked extension" and select your readerExtension directory
from the file chooser that appears.

Now you've
written a Google Chrome extension and installed it! To try it out, simply visit
any web page (preferably an online news site, such as http://bbc.co.uk/news),
right click on one of the articles, and click "View Plaintext", which
will now appear in the context menu whenever you right click on a link.

All
that's left to do is to enjoy online reading again. Note that your local Flask
app has to be running in order for the extension to work, so you'll need to run python reader.py from
your project directory before browsing the web.

Where next?

Instead
of running the Flask application locally, you can run it permanently from a
VPS. Digital Ocean will give you a basic VPS for $5 a month (and if you sign up
with them using my referral link, I'll get some credit with them
that I can use to keep messing around with stuff like this and writing about
it). I'm not going to go into detail on how to deploy a Flask application to a
server (although I do do so in my book Flask By Example).
Another advantage of running the app remotely is that if you're on a mobile
device and have a slow Internet connection, the server can download the large
version of the page with all the attached JavaScript and CSS and serve you a
much smaller version that still contains the important parts (i.e. the text
that you want to read).

Thursday, 25 August 2016

WhatsApp recently updated their privacy policy. To prevent users from getting skittish, they also wrote a blog post explaining how wonderful everything was. I found some mistakes in their blog post, though, so I thought I'd fix it up for them. The original post can be found here: https://blog.whatsapp.com/10000627/Looking-ahead-for-WhatsApp

About those 17 billion dollars we paid for a chat app? Um, we kind of
need to make that back again

Today, we’re updating WhatsApp’s terms and privacy policy for the first
time in four years, as part of our plans to test ways for people to
communicate with businessesmaking WhatsApp profitable by
allowing businesses to contact you in the months ahead. The updated
documents also reflect that we’ve joined Facebook and that we've recently
rolled out many new features (we’d like you to focus on the new features,
instead of the changes to our privacy policy), like end-to-end encryption,
WhatsApp Calling, and messaging tools like WhatsApp for web and desktop. You
can read the full documents here.

People use our app every day to keep in touch with the friends and loved
ones who matter to them, and this isn't changing (Please go ahead and think about just
how useful WhatsApp is to you for a moment. You don’t really have a choice but
to agree to our new terms). But as we announced earlier this year, we want to
explore ways for you to communicate with businesses that matter
to you too may be able to finally turn a profit for us, while still
giving you an experience without third-party banner ads and spam (depending on your
definition of Spam). Whether it's hearing from your bank about a potentially fraudulent
transaction, or getting notified by an airline about a delayed flight, or maybe seeing a
text message or two that’s actually an advertisement to help us become
profitable, many of us get this information elsewhere, including in text messages
and phone calls. We want to test these features in the next several months, but
need to update our terms and privacy policy to do so (well, maybe “need”
is a strong word, but the current ones are a bit inconvenient for us).

We're also updating these documents to make clear that we've rolled out
end-to-end encryption (remember to focus on our new features please). When you and the
people you message are using the latest version of WhatsApp, your messages are
encrypted by default, which means you're the only people who can read them.
Even as we coordinate more with Facebook in the months ahead, your encrypted
messages stay private and no one else can read them. Not WhatsApp, not
Facebook, nor anyone else (History and common sense say that we’ve probably opened
up a back door for NSA, but that’s for like terrorism and stuff, so don’t worry
about it). We won’t post or share your WhatsApp number with others, including on
Facebook, and we still won't sell, share, or give your phone number to
advertisers (but we might let them contact you through WhatsApp. Even though they
can use your number in the only way that matters, please focus on the fact that
they don’t actually possess those 10 digits that you value so much).

But (remember, anything we say before the word “but” doesn’t really count) by coordinating
more with Facebook, we'll be able to do things like track basic metrics about
how often people use our services and better fight spam on WhatsApp (Please focus on the
‘fight spam’ part, and skip over the ‘tracking’ part. Also please don’t read
this piece on how much can be inferred by looking only at metadata from the
EFF: https://www.eff.org/deeplinks/2013/06/why-metadata-matters). And by connecting
your phone number with Facebook's systems, Facebook can offer better friend
suggestions and show you more relevant ads (which will help us make money) if you have an
account with them. For example, you might see an ad from a company you already
work with, rather than one from someone you've never heard of (not in a creepy
way though. Don’t worry. This is all about profit). You can learn
more, including how to control the use of your data, here.

Our belief in the value of profiting from private
communications is unshakeable, and we remain committed to giving you the
fastest, simplest, and most reliable experience on WhatsApp. As always, we look
forward to your feedback and thank you for using WhatsApp.

Friday, 19 August 2016

A common pattern among the computer science crowd is the desire to find a gap in the market. We've seen people like Mark Zuckerberg receive the same knowledge that we have, and turn that knowledge into money. Many people I know of have gone through approximately the same progression that I did in terms of becoming dissatisfied with academia for being too impractical (is anyone actually going to read that thesis?), followed by becoming dissatisfied with industry for being too uninspiring (yay, I fixed that unit test. Again). These people then start looking for gaps in the market -- waiting for that One Great Idea (tm) to come down from above and strike them between the eyes.

The first thing to realise is that ideas are worthless. As many people have noted, there is no market for ideas, and this is for good reason. They're not worth anything. You can patent an invention, but not a startup idea. Your idea might be good, but it's not going to make money on its own. Your product might be OK, but it's not going to make money unless it's polished and marketed. And as a single developer working on your weekends, you're unlikely to be able to build anything reliable that's also easy to use and which solves an actual problem. And then tell people about it.

Now that we have that out of the way, ideas are still important. And ideas are fun. I have notebooks full of ideas -- some of them I've shared with others for feedback. A select few are in the process of being transformed into code in private git repositories. I enjoy playing around with ideas, even if it's good to keep a healthy scepticism on how successful they'll become.

A good shortcut for finding more interesting ideas than those of other people is through the concept of 'meta'. A meta-thought is a thought about thoughts -- i.e. one of the things that we believe makes us better than the apes. Metadata is data that we keep about other data -- think of that "last modified" column in your file explorer. That's data. Your files are also data. So it's data which is describing data. Wow. Inception. Metaception. Mind == Blown.

But more seriously, as you listen to other people's ideas, try to see a layer behind their idea. Or if you are thinking of an idea, look for the idea behind that. Three quick examples will hopefully clarify this:

People are creating startups. Most of them fail. Some smart people avoid failure by creating startup incubators instead of startups. They buy some cheap warehouse space and offer internet, coffee, and 'mentorship' to other people who want to run a startup. Most of the startups themselves fail, but they still pay their fees to the incubator. And the few that are successful also give a percentage of their shares to the incubator. The incubator isn't hurt by the failures and makes a fortune out of the successes -- all through taking other people's ideas one layer of meta deeper.

People are playing on the stock market and buying crypto-currencies. Some of them make a lot of money and write about their successes to encourage others to try the same. Many others are losing all their money -- they tend to be a bit quieter and keep their heads down. No-one likes talking about them. The people in the game who are reliably making money are either the stock markets themselves (Wall Street is worth a bit), or the ones who are selling data, books, code, and tutorials to the people who want to gamble their money directly. Again, these people are making money on others' successes and not losing it on their failures.

In non-tech circles, people still make money by proofreading, though not very much. If you are part of the minority that has a good understanding of the grammar of your native language, it's easy enough to find clients who are a bit bewildered by exactly how commas and apostrophes work, and who have read the distinction between effect and affect several times and have given up trying to work out when to use which. However the hourly rate for proofreading tends to be pretty miserable. I once attended a three day proofreading course though, and paid the single instructor several thousand ZAR for the privilege. I was one of dozens of people to do so, and the instructor made more money in three days using his proofreading knowledge than many of the attendees would make in their lifetimes with the same knowledge.

Of course, once you start doing this, you might never stop. What about a startup incubator that trains other people to create startup incubators? Or someone who teaches people who to teach? Or someone who writes blog posts like this one? Be careful of the rabbit hole, Alice. People who go down do not always re-emerge.

Data Science"Data Science" is as much of a buzzword as "The Cloud", "Big Data", and "Artificial Intelligence", and many intelligent people will make unidentifiable sounds of contempt when they hear or read it. But like like the other buzzwords mentioned, "Data Science" started out as an interesting idea, which the media, recruiters, and marketing departments ran away with in order to impress various stakeholders and make lots of money.With an increasing amount of open data sets being made available (see https://en.wikipedia.org/wiki/Open_data), being able to get information from raw data is an an ever-more useful skill to learn. I came across an fun and simple puzzle recently here http://priceonomics.com/the-priceonomics-data-puzzle-treefortbnb/ and decided to use it as as a starting point for learning more about technologies that are useful for data analysis. While Python is normally the first tool I'd turn towards to solve a problem like this, I recently saw some quite impressive work done with R. I was surprised by how easy it was to carry out common data manipulations and visualisations and I wanted to try it for myself. I won't go into detail in how I solved the puzzle linked above, as Priceonomics use it as part of their recruitment process. But I downloaded R, and messed around with it and the Treefort dataset for an evening and had a lot of fun. Below is a brief write-up on my first experiences with R, and the most interesting graphs from the South Africa education data set I was using. There's also a link to the Excel spreadsheet I used instead of R.Why not Python?One of the main reasons I enjoy Python is the intuitiveness of its syntax. If I don't know how to do something using a Python library, I can usually fire up a shell and with a combination of dir() and guesswork work out how to do what I want faster than looking it up on stackoverflow. However, with matplotlib, numpy, and pandas, I always find the opposite. Even when faced with a very basic problem, I often find myself trawling through documentation and examples to work out how to solve it. While manipulating and plotting data from a .csv file in R, I very quickly got into my Python habits of using trial-and-error and the R help() command. It was very satisfying to read, manipulate, and plot data in a few lines of code. My current impression of R (which will almost certainly change drastically as I use it more), is that it will fit somewhere between M$Excel and Python for me. If I just want to do some really basic calculations, I'll use Excel. If I want to build and maintain a 100+ line programme, that I'll need to use and change for the foreseeable future, I'll use Python. And if I need to mess around programatically with rows and columns, but I don't need to build anything maintainable, R looks like it could be a good compromise between the two. South African Education DataThere's no shortage of data sets to play with. Cape Town open data (https://web1.capetown.gov.za/web1/OpenDataPortal/) was the first place I looked, but it seems that that initiative was a bit of a let down. While there's some interesting data available, most of it is hugely inconsistent in format, and looks as if it was intended for human consumption instead of for programmatic analysis. I thought education data might be interesting, and I found that the datasets available here http://chet.org.za/data/sahe-open-data were comprehensive and fairly consistent. Unfortunately they're also presented in xlsx format instead of .csv and are not as ideal as the Treefort data set to load directly into R. I converted them to .csv files and loaded them into R, but I need to spend some more time with R's syntax and libraries to efficiently work with data in non-ideal formats. The pain point was the double headers in most of the data sets. For example, the dataset of enrollments by race looks like this: I wanted to graph the data by institution, as in the picture below. Getting the specific row that represented each institution in R was straightforward enough, but I couldn't easily find a way to transform the data to use the year as the x-axis, the categories as separate series, and the the numbers as the y-axis. I'm sure it'll seem trivial once I've worked out how to do it, but I decided to play around with the data in M$Excel first so I could have a clear goal in mind before diving deeply into R.

UCT enrollment by raceInterestingly, of all the institutions listed, UCT is the only one to have any crossing lines.

I've used Excel pretty extensively in the past, and even taught an introductory course on it, so it was much easier to clean and manipulate the data and create pretty graphs than working out how to do everything in R. I loaded the simplest datasets from the CHET collection (Race, Gender, and Success) into separate worksheets, created some hacky VLOOKUPs to separate the time and category data by institution, and added some graphs in a separate sheet. A screenshot of the result is below - the big cell at the top is a dropdown that contains all the institutions, and the graphs update dynamically when a new institution is selected.

All data for Rhodes UniversityI'm not sure what happened to the success rate in 2013 - none of the other institutions showed a similar decline. Hopefully it's a mistake. (My brief lecturing attempt at Rhodes was last year, so it can't be caused by that).

Soon, I'll attempt to replicate the graphs using R, and write a follow up post about how I do it. I'll also extend the data sets I looked at, and if there's anything interesting I'll write a post which focuses on the data instead of the technology used to analyse it. If you want to play around with the education data and see the graphs for the other institutions, you can download the Excel spreadsheet I built here: https://docs.google.com/uc?authuser=0&id=0ByEENivQuwUBSmNJUXdPbDI2cU0&export=download. The messy VLOOKUPs would probably be enough to have me expelled from any respectable computer science institution, but luckily I don't belong to any. Feel free to write me snarky comments below on how I could have done it in a cleaner way.

About Me

I'm far away from home in this country called "Europe". I'm studying towards a Master's in Computational Linguistics (I think - this might help: https://xkcd.com/114/). I write about web applications and Python and other things that you may find interesting (considering you got this far).