Alan Kent's Bloghttps://alankent.me
Occasional personal opinions on thingsSun, 11 Feb 2018 01:17:06 +0000enhourly1http://wordpress.com/https://s2.wp.com/i/buttonw-com.pngAlan Kent's Bloghttps://alankent.me
Exploring Jupyter with Machine Learninghttps://alankent.me/2017/11/06/exploring-jupyter-with-machine-learning/
https://alankent.me/2017/11/06/exploring-jupyter-with-machine-learning/#commentsTue, 07 Nov 2017 03:54:18 +0000http://alankent.me/?p=2458Jupyter is an open source platform that lets you mix code (I will use Python here) with Markdown, allowing you develop notebooks that mix both documentation with the output of code. This is popular with data scientists in the machine learning arena as it lets you write down your notes right next to running code. The purpose of this blog post is to demystify what machine learning is via a glimpse into the tools readily available.

What is Machine Learning vs What is Statistics?

Firstly, it is important to understand machine learning steals lots of concepts from other fields of study, then runs them on a computer. Linear regression for example first came from statistics. Linear regression is where you have a set of points on a graph and you try to draw a straight line through the data points as the best possible estimate. If you use a computer to work it out, then you are doing machine learning. The machine is using the available data points to “learn” a model that estimates the available data.

Jupyter

In this post I am going to be using Jupyter, as provided by Google Cloud as a part of Google “Datalab”. This allows you to write Python code in a page with calls off to BigQuery (to fetch data) and TensorFlow (the Google Machine Learning platform for dealing with very large data sets). I am not going to use TensorFlow in this blog post, just Jupyter. Note however that Jupyter is open source and used by a range of different projects.

The following screenshot shows the Google implementation of Jupyter.

A notebook consists of a series of blocks. There are two types of blocks – code blocks and markdown blocks. You can click on any block to select it. Once selected you can move the block up/down, delete it, or add a new block after it. When you are ready, you can then “run” the block. For code blocks this will execute the code in the block, for markdown blocks it will render the block.

Hypothesis

For the purposes of this blog, I am going to walk through my experience trying to use Jupyter to solve a real-world use case.

My hypothesis is “there is a relationship of the time from a consumer first viewing a product to whether the user completed a purchase of the product”. This may or may not be true, which is what I want to explore using real world data.

Why would this be useful in real life? If we can work out the probability of purchase as a function of duration from first view, this may be a useful input into a discounting algorithm. That is, if we see someone view a product but not purchase it, after a period of time we may decide to create a special offer for the product in case we can entice the customer to then make a purchase. (We might do this for items where we have excessive stock levels.) We don’t want to do this too early however as it lowers our profit, but we don’t want to leave it too long either if it means we will lose the customer.

Data Set and Munging

My data set contains timestamped events of users viewing products, adding products to cart, and then purchasing the product (checkout). The purpose of this blog is to explore the tools rather than the outcomes, so don’t use the results of this blog post blindly in the real world – the data may be inaccurate.

My first real-life lesson was machine learning algorithms are fussy about their inputs. They want to be given data in a very specific format. For example, having a series of product view, add to cart, and purchase events with timestamps is not that easy to feed into a machine learning algorithm. I had to first massage the data.

In my case, I wrote some Python code to take the raw series of events and consolidate multiple events into single rows which have a visitor id, item id, timestamps of first and last events in the group, the duration of the group (end timestamp minus start timestamp), the number of times the product was viewed, whether the item was added to the cart (true/false), and whether the cart was then checked out (purchased). I used a timeout so if a product view was not followed up in a reasonable time by a checkout, I considered that part of a separate “session”. These consolidated events are not normal user sessions in that each product is treated as belonging to a separate session (although in real life a user will frequently have multiple items in their cart at checkout).

So, if I have an event stream for a product of “view, view, add-to-cart, checkout, view” this will generate two consolidated events:

The first event would have a view count of 2, with add-to-cart and purchase set to true.

The second event would have a view count of 1, with add-to-cart and purchase set to false.

My second real-life lesson was real world data is noisy. Looking through the events by hand, there were numerous cases of purchasing an item without items first being added to the cart. In the real world, things go wrong. Systems go down, data gets lots, you need to deal with it.

Pandas and Jupyter

Above I showed the top of simple notebook that had text I entered using Markdown formatting interspersed with Python code. Frankly if that is all you could do, you would just use Python comments! Jupyter starts becoming interesting in that it can display the output of Python code. In particular, there is a useful Python library called Pandas that is useful for manipulating tabular data. This is useful to clean up the inputs to machine learning libraries.

Let’s start with fetching a subset of the data we have – in particular, consolidated product session events that resulted in a purchase.

What Jupyter allows you to do is type in Python code and then execute it. The output is then rendered by Jupyter into the page. If you want to change the Python code, you just click on it and start typing (e.g. to change the “True” above to “False”), then click “Run” again. Jupyter in this case is preconfigured to display the tabular data returned in a HTML table.

I am not going to explain Pandas in great depth here, but in the above example “df_sessions” is a “data frame” (table-like) data structure. Python allows you to define operators for your own data types, so in the above code the square brackets accepts a vector of Boolean values then returns rows from the table where the corresponding Boolean value in the vector is true. That is, “df_sessions.purchased == True” returns a vector of true/false values (all rows in df_sessions where the purchased column is true). This vector is then used to choose which rows to return out of the df_sessions table.

It’s a bit funky to get used to, but the Pandas library comes with all sorts of data manipulation functions that are really useful.

My third real-world lesson? As a beginner, I found predicting what Pandas expressions would complete efficiently difficult. Simple cases were of course simple, but I gave up using Pandas to consolidate events because when I tried the “group by” functionality, sometimes the code would not return (I let it run overnight). So, its useful, but understanding its strengths and weaknesses is an important lesson. In my case, I fell back to writing my own native Python code (not shown).

Debugging

One thing I found very useful with Jupyter was the ability to incrementally write code and embed little fragments to check my results as I went along. For example, the following shows checking the consolidated data (df_sessions) against the original data (df_data).

In this example you can see that visitor id 435495 had one consolidated “session” event formed from three original events. The duration (in seconds) was computed by subtracting the two timestamps (in milliseconds).

Plotting Data

Cool! So next I decided to do a scatter plot of session durations vs whether the user made a purchase or not. I was curious to see if there was some visual trend I could spot.

Note that the graph is displayed (when run) directly under the code that generated it. This also means if you view someone else’s notebook, you can see exactly what they did to create the output.

Well, the above graph is a little interesting in that there are clearly some really long sessions (one was 40,000 seconds or 11 hours) that did not result in a purchase. (This may indicate a bug in my consolidation code, something I plan to go check up on later.) But frankly the graph is not that useful. A part of the problem is all the samples at the same coordinates all display one on top of the other. You cannot tell how many values are present.

So next experiment – plot histograms (so we can see how many values are present) for purchases and non-purchases side by side.

Well, it’s a little better, but again not quite right. So, let’s reduce the x-range and plot the y-axis logarithmically.

Now the graph is getting more interesting. The first bucket (the first dark green and light green bars) might be a result of noisy data. So, let’s dig into that a bit more first to work out what is going on.

Oh, well that makes sense. There are lots of occurrences where a user viewed a single product and bounced from the site. They never went on a did a second operation, so the duration for the “session” was computed as zero. Let’s filter out those data points and keep going.

Much nicer. But what does it mean? Well, the dark green bars show that there are not many cases where users view a product then quickly purchase. The most common duration for a session is around 150 seconds (2.5 minutes). That may be interesting, but it does not help us work out if there is a duration after which the probability of purchase goes down. We really want to look at the ratio of purchases to non-purchases.

That leads us on to the following slightly mega example.

That is starting to get more interesting data. Remember that the bucket number is duration divided by 10, so “50” means 500 seconds. If you guess a line to draw through the data, you can see the number of purchases as a percentage of total sessions is decreasing as the length of the session increases.

Let’s draw a line through the data, as estimated using Python libraries. This can be done using “linear regression”, an approach from the statistical branch of mathematics.

Note that there were a few outliers for buckets of 350 and larger (3,500 seconds since buckets are the duration divided by 10). These figures are probably created by outlier data, so let’s drop that data as well by limiting ourselves to the first 300 buckets.

And thus we have used machine learning to predict the probability of a product being purchased as a function of time from when the product was first viewed.

Conclusions

The purpose of this blog post was not to draw real world conclusions from the data (there may be errors in the data or code above), but rather show how Jupyter notebooks can be a useful tool to explore a problem space. They allow quick experimentation with a data set and visualization of the results. Having code embedded in a web page (Jupyter notebook) with immediate results being displayed inline in the page makes it easier to try out ideas as you learn about the problem and debug your experiments. Having the full power of Python also allows complex logic to be used, although useful and sophisticated data manipulation libraries are available for Python.

Further, rather than discarding intermediate results, Jupyter notebooks allow you to document and capture both your thought process and the results directly on the one page. And then as new data comes along, you can easily rerun the code to redo plots based on additional data.

In this example we massaged the data until we had clean enough data to feed into a machine learning algorithm. Given the improvements in machine learning library implementations, I have heard that it is not uncommon to now spend 80% of your time getting data into the right format, with only 20% of the effort actually spent on the machine learning code.

]]>https://alankent.me/2017/11/06/exploring-jupyter-with-machine-learning/feed/3screenshot-10alankentGetting into Machine Learninghttps://alankent.me/2017/10/24/getting-into-machine-learning/
https://alankent.me/2017/10/24/getting-into-machine-learning/#respondTue, 24 Oct 2017 16:48:00 +0000http://alankent.me/?p=2456I am not a deep machine learning expert. This post is intended for those who wanted to learn more about the practical aspects of machine learning – where is the effort really? This series of posts are my experiences as I explore the space.

What is Machine Learning?

So what is Machine Learning in practice? Most developers I think understand the basics: you get lots of data, throw it at the computer, and it “learns”. But how? And how clever is it when learning? My answer is “it’s not clever at all,” at least not “clever” in terms of the way humans think of being clever. Machines are number crunchers. They can do lots of computations very fast. But machine learning is not about a computer “thinking”. Machine learning is about a human coming up with a theory that they then model in a way a computer can understand, then the computer can optimize the model parameters in a way that best fits the model by doing lots of number crunching.

For example, I have a data set from the net that has people viewing products, adding them to the cart, then purchasing. Not every viewed product is added to carts, not every cart is purchased. My question is “what should I do to get more people to purchase”. Well, that is too hard a question to ask. A more realistic question is “does viewing a product more than once indicate that the user is more likely to buy? If they view it three times is that an indication they may want it, but its too expensive?” That is where machine learning can help.

Machine Learning Models

For example, you can come up with a model that predicts the probability of purchase based on the number of product views. You don’t know the numbers, but you have a rough idea of the shape of the curve and a formula that can follows that general shape with a few parameters that you don’t know the values of. You then feed all the observations so far to “train” the model. That is, it can work out the probability based on past events. Training is where the machine does lots of number crunching to work out the best parameter values to use.

The consider a formula such as y = a * x + b, a linear equation – you feed in ‘x’ and you get your answer ‘y’. But what should ‘a’ and ‘b’ be? Machine learning can work out the best values for ‘a’ and ‘b’ to fit your sample input data. But if ‘y’ is a probability, it won’t do things like make sure the line does not get larger than 1. You might need a different formula for that to guarantee the values stay between 0 and 1.

Then you could go a step further. The view data has timestamps. Maybe it is a combination of how long between views that also plays a factor. So I could consider both the number of views and the length of time since the first view. Machine learning can help here again – it can handle multiple parameters and optimize across them.

Further, the machine learning library can return you and estimate of reliability based on the data. It may be the library comes back with the best ‘a’ and ‘b’ values, but actually the result will be terrible because you tried to fit a straight line to data that is a curve. Sometimes this is where the value lies – not in the answer to the question, but just knowing your model is working well or not. Is there a relationship between ‘x’ (number of product views) and ‘y’ (probability of purchase)? If not, then move on to something more productive!

But how to express the model? The different machine learning packages come with a set of tools. It is up to a human to work out how best to adapt those tools to implement the model you want. “Linear regression” for example is where you fit a straight line through a series of points. But linear regression might not be the right approach if you know the line is not straight. You either need to use some clever maths equations to turn it into a straight line, or pick another tool from the set available. Again, this needs a human and skill.

Data Munging

So far then a human has to come up with a question they want to answer, then work out a model for the question that is supported by the machine learning toolkit they are using. Great! What’s next?

Well, the next challenge I hit was the input data I have is not in the right format to feed into the library I picked. In my reading around, a number of people have quoted figures like 70% to 80% of the effort these days is actually around massaging the data into the right format for the machine learning libraries to use. This is not “clever” work – its just necessary grunt work to clean up the data. Huge progress has been made in the machine learning libraries themselves having lots of clever ways to work out the optimal values for parameters efficiently, so much so that that is not where the main effort is any more.

For example, I am playing with “Datalab” on Google Cloud. It is built on Jupyter (a nice live notebook approach for writing up your experiments in). Python can be embedded directly amongst Markdown syntax, where the results of the Python code are displayed directly in the page. Python is used as a glue language to call the different libraries around (including the machine learning libraries).

One of the libraries for data massaging I have been using is called Pandas. It allows you to do some data manipulation without having to write too much code, but I have been bitten a few times as well. In practice, you can write down how you want the data massaged, but you still need to worry about performance. For example, I took the original input data (views, add-to-carts, purchases – with timestamps) and annotated the views with a purchase they were associated with (no that no purchase was made). I then tried to group the data by purchase to count the number of views per purchase. Well, you have to write the code “just right” or else performance kills you. I used some provided “group by” functionality, but the next morning the code had still not finished. So simple data cleanup worked well, but more fancy transformations it seems I just have to roll up the sleeves and write some serious code.

Conclusion

To wrap up, this post was not trying to go into detail, but rather give a bit of a feel of where the effort is with machine learning. Humans are still needed to come up with theories and test them. Humans need to know the different forms of models the libraries support and pick the right one for the problem. They also need to draw conclusions from the results (e.g. determining if the model is a good predictor of the question you have). The machine learning libraries help for sure – but at the number crunching level. And don’t underestimate the effort to get data into the right format for a machine learning library to use. The libraries are still fussy. They won’t clean up the data for you even for the simplest of tasks.

The tools are however getting better and quickly. The machine learning engines are getting more and more accessible. Using Google Cloud Datalab I did not have to install anything locally – I am doing all the coding in a web browser directly in Datalab (the other platforms I am sure have the same capabilities). You may have noticed that I am also talking about pretty simple questions – nothing like speech or image recognition. But there are more and more libraries from clever people providing this functionality for you. That is what makes this field so exciting. Google, Microsoft, Amazon, etc are in a race at present all providing better and better tools, commoditizing what would have been a dream just a few years back.

The other aspect I have not touched is what to do with such a model? One example of an action would be to offer a discount if the model predicts they are interested but not likely to purchase. Another would be to include in follow up email campaigns. Doing something for a specific user based on their actions is one form of personalization. But that opens up a new can of worms – experimentation. If you offer a discount, how to do get confidence that the discount helped close a sale you would have lost rather than just reducing your profit margin for something they were going to purchase anyway? Running split tests across different users can help here (offer the discount to some users but not all) – but this assumes you get enough customers to be able to run the test and get meaningful results.

For me, next step is back to the drawing board for my data cleansing step. Using Pandas saved me lots of code for the easy steps, but when things go complicated it has proven not to be so useful. It hides really slow computations from me because of the abstraction, meaning I never know how fast or slow the next thing I try is. (Remember I am a newbie here!) Once I have that done I plan to do a quick write up on Jupyter – its pretty cool.

]]>https://alankent.me/2017/10/24/getting-into-machine-learning/feed/0alankentIs Machine Learning Good for Small E-Commerce Business?https://alankent.me/2017/10/16/is-machine-learning-good-for-small-e-commerce-business/
https://alankent.me/2017/10/16/is-machine-learning-good-for-small-e-commerce-business/#respondMon, 16 Oct 2017 21:23:13 +0000http://alankent.me/?p=2453Machine learning, a part of the broader field of artificial intelligence, uses computers to spot patterns from data. Larger companies like Amazon, Google, Walmart, and eBay use machine learning in multiple areas of their businesses to great effect. But what about small businesses? Can they use machine learning as well?

What is Machine Learning Good For?

There are numerous blog posts out on the internet listing many different use cases. You have the more advanced examples like chat bots learning from customer support responses so you can improve your customer support quality without having to hire and train more staff, or image recognition to automate categorization of products (or spot categorization errors – such as “short dress”) based on your personal catalog. But there are many more mundane examples such as product recommendations “my customers who bought X typically also buy Y, so I should offer that a recommendation”. (Note the emphasis on *your* customers – you want to learn from your own demographics.)

I am not going to go into details of use cases in this blog post, but rather provide some considerations to think about when deciding if you should look at applying machine learning to your store.

Access to Technology

Access to the technology required for a machine learning project is definitely more accessible than just a year ago. Hadoop and Spark clusters can be spun up with a few clicks on cloud hosting providers; Google, Amazon, and Azure all have specialized machine learning offerings. Access to cloud hosted technology is easily accessible. Getting technology running is not the main problem.

The challenge is to learn how to use these technologies. Data scientists are in demand. What is your return on investment going to be on building up your own data scientist team? (I personally am wary of getting only one data scientist – single points of failure always scare me.) Are you going to get a return on such a staffing investment? The smaller the business, the less likely this is going to be true.

So smaller businesses are more likely to get benefit from one of the increasing number of vendors that use machine learning technologies within their product offerings. This can be a much more serious option for small businesses as it avoids the need to have in-house expertise. The vendor makes it cost effective by building up the expertise for you.

Volume of Data

So what area should you tackle first? Should you take on inventory forecasting, specific customer predictive personalization (offer a discount if the customer behavior indicates they may be about to leave your site without making a purchase) – there are many areas you could tackle.

One key point to remember is machine learning is primarily about learning patterns that emerge from data. If you don’t have much data, then the machine is not going to learn very well. For example, if you are trying to make personalized predictions based on an individual user’s behavior, but most customers do not return to your site often, the project is probably doomed for failure.

So don’t only look for use cases where you think you can get good ROI (which is clearly also important), think also about how much data you can collect in that area to feed into a machine learning algorithm. Do you have lots of anonymous customers or do they log in? Do you have a large catalog? Volume of data is one of the real challenges for smaller businesses that I don’t see talked about as often. If you don’t have volume, machine

Site Performance

Another consideration is site performance. If you have a more personalized experience you need to be careful of the performance impacts on your site. Personalized content caches less well. Is the improved experience you offer better than the performance hit customers observe?

Performance is more of a consideration of where it fits into your overall business. If you are using machine learning to optimize shipping expenses, then that will not affect the on-site experience of a user. If your site only has low traffic, again, caching may not be such a significant issue.

A/B Testing

And if you do decide to go ahead with a project, think about how you are going to test your new solution. If an on-site customer experience, are you going to use an A/B testing framework to make sure the offering is improving? This is more important with an external vendor as you won’t necessarily have the same access to data as you would with an in-house team. Shipping cost optimization however you can automatically compare by computing shipping using two strategies and comparing the results (there is no need to split users into different groups to compare results).

Conclusion

This blog post is not a “all machine learning is hype” bash. On the contrary, I am a big believer in machine learning. But it is important to understand its strengths and weaknesses. Large can organizations get real benefits from machine learning. Machine learning is becoming more affordable for smaller businesses. But if you don’t have data volume, don’t expect machine learning techniques to do better than what you can observe yourself manually. Machine learning comes into its own when you have volume of data, more than humans can handle.

My general advice for smaller businesses is to find a vendor that helps you solve needs in your business. The fact that they use machine learning is actually secondary to whether they can deliver you a good ROI on your investment.

Building your own machine learning expertise however becomes more useful as your business grows and you have the data volume and potential benefit to make it a worthwhile investment.

]]>https://alankent.me/2017/10/16/is-machine-learning-good-for-small-e-commerce-business/feed/0alankentFaking a Composer Repohttps://alankent.me/2017/08/16/faking-a-composer-repo/
https://alankent.me/2017/08/16/faking-a-composer-repo/#commentsThu, 17 Aug 2017 02:57:55 +0000http://alankent.me/?p=2443This is just a quick tip. How can you have a project behave like a Composer repository, but where the packages are in subdirectories of a git repository.

First, if the package is the only package in the git repository and the composer.json file for the package is in the root directory, composer can reference that git repository directly. See https://getcomposer.org/doc/05-repositories.md#vcs. I am not going to talk about that case here.

But what if the packages are in a subdirectory? The following is a quick workaround to experiment with using the Composer “path” repository type. This is not a perfect solution, but it can be a useful temporary fix.

First, check out the git repository holding the packages alongside your main project. E.g. If you main project is in /var/www/htdocs/foo, then check out the other git repo in /var/www/htdocs/bar. What we are going to do in the foo/composer.json file is reference ../bar/ as a relative path. (You can use any path you like in practice.)

Let’s say all the packages are in a “packages” subdirectory (might be app/code/Magento for Magento modules). The trick is to add a new “repository” entry of type “path” with a URL of “../bar/packages/*” (or “../bar/app/code/Magento/*”). Composer will look for a composer.json file in the top directory of each package directory matching the path. If you reference exactly the same name and version number as in the composer.json file, it will be a match and Composer will use that package. There is a “symlink” property that if true will cause a symlink to be created, if false a copy of the package will be made.

]]>https://alankent.me/2017/08/16/faking-a-composer-repo/feed/2alankentWhy My Microsoft Surface Pro is not an iPadhttps://alankent.me/2017/08/12/why-my-microsoft-surface-pro-is-not-an-ipad/
https://alankent.me/2017/08/12/why-my-microsoft-surface-pro-is-not-an-ipad/#commentsSat, 12 Aug 2017 21:49:23 +0000http://alankent.me/?p=2433I got a Microsoft Surface Pro as my new laptop replacement. Why did I go with a Surface? It was light (the same weight as my 12.9” iPad Pro so is good for traveling), has good specs (I have the 16GB RAM model which can drive a high a good resolution monitor), and it can run Magento, Docker, the full Office suite, etc. (I was not interested in a Mac – this is not a Apple/Mac vs Microsoft/Windows debate.)

I find I use my iPad all the time (I have Office, VPN, Chrome, Slack, etc – I spend most of my time in these tools these days), so I was curious – would the Surface replace my iPad?

TL;DR: No.

This does not mean I don’t like the Surface as a laptop, but here are the reasons why it does not feel like an iPad replacement.

The iPad has instant start – you click the button and its on. The Surface hibernates at times so goes through a longer boot sequence.

The iPad has a pin code or fingerprint to unlock – the Surface (maybe just my corporate rules) goes through a full “enter your password sequence” which takes longer.

The iPad shows me notifications of slack messages, emails, calendar reminders even while off. I can swipe and go straight into the app to view it.

The iPad turns off fairly quickly allowing the battery to last all day.

These points combined means I can react to events very quickly without interrupting my normal thought flow. Maybe the Surface can do them too, but its not out of the box.

There are other differences that I notice.

The iPad has a bigger screen (taller).

The Surface stand is less comfortable on my lap that the iPad Apple keyboard.

The iPad has the same charger as all the iPads and iPhones in the family, so we can share charging cables much more easily. The Surface is custom.

At the airport I have to put the Surface in a separate tray for security screening (they don’t consider it a tablet, even though it’s the same weight as my tablet).

Apps like Netflix, Slack, etc generally feel nicer on the iPad. They just work and are generally minimal.

I still like the Surface over my older laptop. I like having the same files on my desktop at work and home (and on the plane) without worrying about network connectivity. I never use my iPad for authoring PowerPoint presentations – too painful. The Surface is much nicer there. I also find my iPad screen gets greasy much faster as I am always touching the screen.

So for me, I am not letting go of my iPad as a productivity device. But I also like the new Surface so far as my laptop replacement for more serious work.

Oh and which device am I writing this post on? Microsoft Word on my iPad – I like the slightly bigger screen and lap comfort. Word lets me share documents with the Surface trivially when I need to.

]]>https://alankent.me/2017/08/12/why-my-microsoft-surface-pro-is-not-an-ipad/feed/2alankentWindows PowerShell, Control-Z, and Kitematichttps://alankent.me/2017/07/04/windows-powershell-control-z-and-kitemati/
https://alankent.me/2017/07/04/windows-powershell-control-z-and-kitemati/#respondTue, 04 Jul 2017 21:13:15 +0000http://alankent.me/?p=2421I quite like Kitematic for Docker. Its marked as “legacy” so it may go away one day, but until then it provides a nice GUI to spin up containers, launch a web browser pointing to your container, and so on. (I use it on Windows.)

But there has been one thing bugging me. When I hit the “Exec” button on Windows it starts up a PowerShell for the terminal emulator, and if I ever hit CTRL-Z (suspend process in Linux) it closed the whole window. I trying searching, but no mention of this strange control-Z behavior that I could find anywhere.

Finally I worked out a workaround. When right clicking on the title bar of the PowerShell window, the first tab for the properties is “Options”. By selecting the “legacy” mode, ^Z no longer exits the window and all is good!

No more disappearing windows for me!

]]>https://alankent.me/2017/07/04/windows-powershell-control-z-and-kitemati/feed/0alankentInstalling GraphIQL for use with GraphQL in PHPhttps://alankent.me/2017/05/29/installing-graphiql-for-use-with-graphql-in-php/
https://alankent.me/2017/05/29/installing-graphiql-for-use-with-graphql-in-php/#commentsMon, 29 May 2017 21:26:33 +0000http://alankent.me/2017/05/29/installing-graphiql-for-use-with-graphql-in-php/I was playing with GraphQL (an alternative to REST) recently using a PHP library and was trying to get GraphIQL (note the extra “I”), a web based “IDE” for writing GraphQL queries. This blog post is to save anyone else who stumbles across this post some of the wasted pain I went through.

GraphQL has been open sourced by Facebook (http://graphql.org/). It is not really a “query language for graphs”, but more of a way to do APIs. Facebook being all about social graphs, they use it for APIs to access the Facebook social graph. I think of it more as another way to do a web API (an alternative to REST, JSON RPC, etc).

There are some great articles on GraphQL, so I am not going to explain it in depth here, but some characteristics that make GraphQL interesting to me include:

Schema introspection is built into the spec, making interactive exploration and diagnostic tools possible.

Clients are forced to specify the data they want, reducing data volume transfers. It also has nice side effects around upgradability as adding fields on the server never impacts existing clients.

There is one URL endpoint. I personally prefer this to REST for writing programs (JSON RPC does this as well).

It is gaining some traction (e.g. GitHub have made a GraphQL API available), which means tools are appearing, such as GraphIQL.

But I don’t think GraphQL is anything magical or revolutionary. It’s another approach with its own pros and cons.

GraphIQL

GraphIQL (https://github.com/graphql/graphiql) is a web based “IDE”, although I think “IDE” is a bit of a stretch. It is more like a web page that allows you to browse around the API with live documentation and autocompletion based on context. The autocompletion is nice, made possible because the server exposes the schema. Again, there are lots of articles out there with screen shots – such as the README file on the GitHub account above.

Installing GraphIQL

And this is where my time wasting started. All the articles I found talked about how to have GraphIQL running in the same NodeJS server as your GraphQL endpoint, to simplify development and debugging of the service. That is all great and well, but I was trying out a PHP version of GraphQL. That is not NodeJS, so I was trying various ways to getting GraphIQL to point to my PHP server instead of the local server.

After several days of frustration, inching my way forwards but hitting all sorts of cross origin access controls (CORS etc), I finally came across several GraphIQL extensions to Chrome. I searched the Chrome App Store for “GraphIQL”, installed one of several options, and life was good. Sigh.

Conclusion

This post was not meant to be a critique of GraphQL (and GraphIQL). But if you are playing around with one of the non-NodeJS GraphQL servers and want to use GraphIQL (and trust me, if you are exploring GraphQL you will want to!) then don’t bother trying to set up the GraphIQL software yourself. Save yourself the pain – just install a browser extension.

]]>https://alankent.me/2017/05/29/installing-graphiql-for-use-with-graphql-in-php/feed/1alankentAmazon Echo 2?https://alankent.me/2017/04/16/amazon-echo-2/
https://alankent.me/2017/04/16/amazon-echo-2/#respondMon, 17 Apr 2017 05:39:14 +0000http://alankent.me/?p=2403Last November there were rumors of an Amazon Echo 2 coming with a 7 inch touch screen. I’m not sure I want one. I want one with a 20 inch screen instead! Wall mounted.

Is 20 inches too big? For me I want to load up my family photos as well – so its a personalized picture frame when not in use, and a device when I activeate it. So I want it painting-on-the-wall size.

One of the things I love about voice control is I can do it from across the room. Personally, I don’t want to walk over to the device to control it. I want a nice big screen to see across the room as well. And a touch screen? I can see the benefits, but why not voice selection of oprions presented on the screen?

Voice by itself has a problem. “Add tissues to shopping list” is easy to say, but there different types and sizes of tissues. My previous order could be remembered, but I have bought different types of tissues (e.g. Travelers packs versus for in the house). What I really want is a fast way to chose between the most likely options. The advantage of a screen over voice listing off options is its much faster to see the most likely options and use voice to select one. Preferably across the kitchen without having to get physically close to the Echo.

So, yes, I can see an Echo with a screen is a useful device. But personally, I look forward to a nice big screen, or maybe a HDMI port out of the back of the Echo.

]]>https://alankent.me/2017/04/16/amazon-echo-2/feed/0alankentA Personal Experience of Amazon Freshhttps://alankent.me/2017/03/19/a-personal-experience-of-amazon-fresh/
https://alankent.me/2017/03/19/a-personal-experience-of-amazon-fresh/#respondMon, 20 Mar 2017 02:51:39 +0000http://alankent.me/?p=2396I was curious about the real life experience Amazon Fresh and the Amazon Dash wand, so I signed up for the 30 day free trial. This post shares my personal first impressions after my first day of use. (This post is not an “expert’s view after deep experience and analysis.)

As background, Amazon Fresh is an additional service you can subscribe to (on top of Amazon Prime) in supported areas where an additional monthly fee gets you into the program, then orders over a minimum amount ($40 for me) gets you free delivery. So it is not an effective program for buying small quantities. The idea of Fresh is it more targeted at your weekly grocery shopping trip.

Set up

Enrollment to Amazon Fresh was pretty easy and painless. It was interesting however that it encouraged me to immediately register my preferred time of deliveries. I declined initially (I wanted to understand Fresh better first), causing it to nag me numerous times through my experience. I am guessing Amazon wants to collect the data about delivery patterns from as many users as possible, otherwise it could have waited until checkout.

My first experience however was rather confusing to get stuff into my first cart. There is Amazon Prime, Amazon Pantry, Amazon Fresh, add-on products, and more. This often confused me in terms of navigation. (More on this later.) My first order was really to get enough into my cart to purchase the Amazon Dash Wand(a voice control and bar code scanner device). Ordering without being subscribed to Amazon Fresh came up with “sorry, product not available yet”.

Once I got the dash wand in hand (the morning after ordering!), my next challenge was to get it working. I tried on my wife’s iPad, but the setup instructions in the box basically said to use the app and get the instructions there. That would have been fine, except the app crashed as soon as I got into the relevant area. Turned out the iPad was still on iOS 9 – upgrading to iOS 10 addressed the issue. (Glad to hear it is not only Magento having backwards compatibility problems!) But trying lots of options here wasted 30+ mins of my time trying to work out why the app was crashing.

Using Voice

The Amazon ads I had seen had seen has a person looking in the fridge saying “apples”, “strawberry yogurt” etc. to the wand. So I tried that. Nothing appeared in my cart. After a while I realized it had put them into this special area (on my iPhone). It was asking “what does ‘apples’ mean” – e.g. what type of apples, what quantity, etc. I have not used it long enough yet to work out if it will learn my preferences, or whether each time I use voice I have to review the list on my computer afterwards.

So my first experience of voice with the device was not particularly compelling. If the device does not learn my presences, I will probably give up on using voice. It did not remove total effort for me – scanning bar codes was much more precise.

Product Coverage

Next came the question of product coverage – how many of the things I normally buy are in Amazon Fresh? (I expect Amazon Fresh to get better over time, but I was still curious.)

First, I decided to skip over fruits and vegetables. I like to see the products before buying them – I don’t like paying bruised fruit. So I restricted myself to packaged products (boxes, tins, bottles, etc). (I did try searching for “gold kiwi fruit” which failed – Amazon only had the normal kiwi green fruit.)

This step reminded me of relocating to the US from Australia. There were lots of similar brands in the supermarket, but there were more different than same. It took a while of product experimentation to work out the good from the bad brands. I suspect this will be the same with Amazon Fresh, just as it is true for moving to any other different supermarket chain. So when I talk about products not being found below, that does not mean there are not other alternative products available on Amazon. It just meant they did not come up with low friction.

One interesting thing also was I found the same product (such as Ovaltine) was available in different sizes. When going to my local supermarket, I often picked the best size to give me a better discount. It felt like there were fewer size options on Amazon Fresh. But given all the extra costs of home delivery etc, maybe the few cents saved really are not that significant with the benefits of home delivery.

Experiment – Scanning my Fridge and Pantry

Having got the dash wand (bar code scanner), I was curious to see how much of my typical weekly purchases were on Amazon Fresh. So I grabbed the wand and scanned my fridge and pantry contents – as many bar codes as I could find.

(In the back of my mind I immediately heard Amazon say “thank you for all that data about you Alan”…)

In total, I scanned 109 items. Here were my results:

7 items the scanner said “I don’t know that item – please tell me what it was”. Implied in this was there were many products that were not available on Amazon, but Amazon still knew the bar codes for and tried to suggest alternative products based on the product description.

It found 26 Amazon Fresh products correctly, 15 were grouped under Amazon Fresh but noted as not available with acceptable alternative product suggestions, and 18 were not present and the suggestions were not acceptable (I would not have bought them).

Interestingly, quite a products were found in Amazon Prime. 25 were listed as Amazon Prime products (including instant noodles!), 10 it made acceptable alternative prime product suggestions, and 8 bar codes it did not make reasonable suggestions on.

The experience was a bit confusing really – Fresh and Prime feel like different product areas, but not all the food products where available in Fresh.

One interesting thing was Ovaltine was available on Amazon Prime and in Amazon Pantry. The pantry option was much cheaper, but the wand did not pick it up. It added the Amazon Prime product instead. I did not check all the other products. I did not actually check out, but it was a concern.

Safeway – Barcode Scanning in App

So what about other supermarket chains? My local supermarket is Safeway. Safeway has a home delivery app that includes a barcode scanner using the camera. So why not use that? I found that it had some problem scanning items in my cupboard, but probably the biggest issue was the extra friction of opening up the app. The Amazon Dash was so easy to grab, point, and click to scan the bar code to add to my cart. It was very quick. The app approach took more steps – unlock phone, open app, click “scan” button, try to line up the camera with the bar code, etc. The dedicated device was more convenient.

Conclusions

All up, it was interesting to see that the experience was pretty good, but certainly not perfect – even for Amazon. Around 47% of the products in my cupboard it found by scanning the bar code (although only roughly half in Amazon Fresh). That goes up to 70% if I include alternative product suggestions that it made. Not bad, although it implies almost 1/3rd of products in my fridge/pantry I did not easily find on Amazon.

It was also interesting to see such high percentage of products still in Amazon Prime rather than Fresh. This was a problem in some cases, as it might be a set of 10 products in a bigger bulk purchase that I would have had to buy.

Is this a criticism of Amazon Fresh? No, not really. I am sure they will expand inventory over time, and as I said at the start changing to a different supermarket chain you expect to talk a little while to work out the products available in that chain.

Will I stop using my normally supermarket in favor of Amazon fresh? That is to be determined. I like selecting fruit and vegetables by hand, so I suspect it won’t replace it completely. And if not completely, only time will tell if the complexity of split purchase experiences will be worth it. That is, if I am going to the supermarket anyway, why not get everything in one hit.

What was also interesting was how confusing the Amazon Prime, Amazon Pantry, Amazon Fresh, etc experiences were. I often found myself lost. Going to amazon.com, I had to remind myself how to get into the “Fresh” experience.

But all up it was interesting to see how much Amazon was not a perfect experience. But they are clearly trying and learning as they go.

]]>https://alankent.me/2017/03/19/a-personal-experience-of-amazon-fresh/feed/0alankentMoving Magento DevBloghttps://alankent.me/2017/02/17/moving-magento-devblog/
https://alankent.me/2017/02/17/moving-magento-devblog/#commentsFri, 17 Feb 2017 20:12:21 +0000http://alankent.me/?p=2382Dear followers, in case you had not noticed, Magento has launched an official developer blog in the forums, and so most of my work related blogs will be going there now. I may do the occasional more “out there” blogs here still, but if you want to keep up with the latest from the Magento development team, please refer to Magento DevBlog from now on.