Microsoft Azure still lives in two worlds, the classic portal and the Resource Manager (RM) model. This is similar to AWS except AWS integrated and retired the old one. The new account I created in the Azure, has all assets in the RM model. Microsoft is known for its UI and they are pretty good compared to AWS Management console.

That I have all my assets in Azure in the RM model made life easier in some and slightly painful to do some such as setting up VPN. I wanted to enable VPN access and was looking to setup a Point to Site (P2S) VPN. In my earlier post on “Securing Cloud assets“, I mentioned different mechanisms (P2S, S2S, ExpressRoute etc.). Here we will see the simplest one P2S in detail. Every Azure documentation was pointing to using the PowerShell rather than the portal (UI). The documentation given here in Azure is pretty complete.

I got nothing against PowerShell, but for users of Microsoft products, they are used to the beautiful and elegant UI. I started the journey to do this using the portal and I was successful in doing so but for couple of cmdlets.

First things, first, I followed the instructions in this link to make sure the PowerShell is working and that it has the right version of Azure cmdlets.

For simplicity sake, I did not have a loadbalancer, fron-end and back-end vnets. I had only One server instance with one vnet and I created another vnet for the VPN Gateway. Once you are done, creating the vnet Gateway using the portal, you need to goto virtual network gateway, click on the Point to Site configuration and add the root certificate (the one I generated by using the cmdlet and exported using the certificate manager) by opening the .cer file and copy the content between “—Begin …—” and “—End…” (not including those two lines) and paste it in the portal. If the IP range is missing, please add that as well. Finally, go to the top and download the client package.

Once downloaded, install the package and then goto “Change VPN settings” in your windows 10 PC/laptop, you will see your new VPN. Click and connect and you are all set. One more important item to remember after you connect the VPN is to use the Private IP of the server to RDP and not the public IP. You can find the private IP of the server in the network interface section in the portal or you can get it by connecting and doing ipconfig.

This is a demo, the secure way is to use the loadbalancer, define a new port for the RDP and do port forwarding from the loadbalancer to the server. The VPN should be set to connect through the loadbalancer.

I have not done book reviews, this is more like a cheat sheet or notes rather than an actual book review.

The author of this book is the famed prof. Clayton Christensen of HBS. He is a pioneer in innovation management and it is not an overstatement to say, he has dedicated his life towards this. He shot to fame earlier with the term “Disruptive Innovation” in the book “Innovator’s Dilemma”. He has been researching on the theory of “Job to be done” as the basis of successful innovation for the past two decades. He has been carefully building this theory based on collected data inductively.

The core premise of this theory is that customers are “hiring” a product / service to make a progress in their life situation and that they are not buying a product or service. If they don’t like it, they “fire” the product. If an organization understands this, then the entire culture can be built towards this and there is less and less friction and things get done. The author points to numerous examples, but the two that standout are GM’s OnStar and Intuit’s TurboTax. The story of TurboTax is something we all (folks in Information Technology) can easily appreciate. In case of TurboTax, historically the software used to ask a whole bunch of questions in a wizard approach so it can prepare the tax form. The folks kept asking for what other questions can we ask to make your (customer’s) life easier and that they were getting so much feedback. The team was happy and set out to implement all these new questions in the wizard. However, when they started thinking what is that the user of the TurboTax want? He/She hates the tax preparation process (like everyone) and wants to get it done quick and correctly. They would be happy if they were not asked any questions! Thus they started working towards minimizing the number of questions, starting with automatic pulling of W-2 from payroll processors. Thus one should focus on “why the customers want to use your product?”. He quotes Ted Levitt “they want a quarter inch hole and not quarter inch drill”. If you understand that then you can build a product / service that results in a quarter inch hole.

According to the author, majority of the start-ups begin with the laser sharp focus on “the job to be done”, but they lose focus as they start growing. They lose sight of why the customer wanted them in the first place and starts focusing on the customer (demographics etc), the sales (geography, kind of stores, seasonal) and many more metrics that are getting collected. One very good perspective he reminded in this book on data analytics is that you (the management) decides on which data to collect and thus your results are skewed by analyzing the collected data as you don’t know anything about stuff on which data is not collected.

The book goes on to state how to identify the jobs to be done. Here are the broad five categories

Finding a job close to home – Look at yourself (eg. Reed Hastings of Netflix)

Competing with Nothing – Look at “non consumption”- people choose to continue the statusquo in the absence of a product / solution. – Airbnb – people would chose not to go when hotel space is not available.

Workarounds and compensating behavior – Look out how people are accomplishing a task using different products. – Kimberly Clark created “Silhouettes” based on this approach.

Look for what people don’t want to do – CVS created Minute-Clinics based on the observation that people don’t want to wait to see a doctor.

Unusual Uses – Look out how people are using products that are not meant for that specific job. Arm & Hammer created so many products based on the observation how customers were using their “Baking Soda”.

Once you identify the “job to be done”, the book also explains how to convert that into a product that people want. Once you created the product, the book talks about how you can stay focused on it by discussing the “Fallacies of Innovation Data”.

The fallacy of active vs. passive data

The fallacy of surface growth

The fallacy of conforming data

I want to end this entry with a couple of thoughts. I always thought about the impulse purchase with a suspect. I’m not denying that someone buys a product on impulse, but most of the purchases are not impulse. The seed for that is sown a while back. This is explained in a very live fashion in this book as well on an interview documentary on a person’s mattress purchase in Costco. Those who frequent to Costco can truly appreciate this section (page 107 – 115). These pages has brought Costco live in my mind, written up so wonderful. This shows how you should analyze why people buy your products, need to get to the root of it.

Having written on “Disruptive Innovation” and “Jobs to be done”, I would love to read or be involved in a research work on what is the optimal point to introduce new products without killing an existing product prematurely. In other words, what is the optimal way to cannibalize a product.

In my earlier post, we saw that how we can use AWS CloudSearch to make data in our RDBMS easily searchable. I read through the documentation that AWS supports variety of other document formats such as PDF, Word etc. I tried to figure out how do so with content of PDF files. This information is not readily available in any documentation offered by AWS or on any forums or blogs.

This was dead on arrival. The first step, I tried is to use the “upload documents” feature in the domain dashboard in CloudSearch. This successfully uploaded the file successfully and it asked me to run indexing and I did. However, when I searched for any words that was in the PDF file, I got nothing. I attempted to repeat the steps and I reviewed the data it generated before uploading. To my surprise, I found that the “content” field was in binary! No wonder, it didn’t work. Then, I started my research and noticed two other people posted to stackoverflow with the same question but no one had answered. I posted to AWS forum but no answer came by for two days. [ I answered my own query at the end of it – nice way to score some brownie points :-)]

Started researching on competing products – ElasticSearch, Solr and other and the documentation was stating that we need to encode it to Base64. Thus, I tried converting it to Base64 and uploaded to AWS, no luck and it didn’t work. No matter what I do, it was not working. I was re-reading the documentation on AWS again and decided to try their CLI command cs-import-documents. This is the equivalent of the “upload document” wizard on the command line. We can upload directly to the domain from the CLI using this command. However, my attempt to upload directly gave a fatal error. Then I used the –output option to generate the SDF (Searchable Document Format) file for the PDF I wanted. The tool generated a SDF JSON file. The content was all text and it looked as follows

I uploaded this json document using the “upload documents” button on the AWS CloudSearch console. Then used the “Run Test” interface and did a search and voila! it found my pdf and displayed the content of the content field!

Though, I got it to work, here are some limitations of it as such. We already saw that each document cannot be more than 1 MB in size and the batch cannot be more than 5MB. Thus you need to chop your PDFs into multiple smaller files, then run through this CLI utility or use one of the content extraction tools such as Apache Tika, or other similar, create a JSON document and then upload (It would be nice if this CLI tool automatically split them as 1MB and uploaded or generated the SDF files). The other thing is that it returns you the entire content and not the matching paragraph / page number or any other similar additional information. However, based on the document name, you can get the PDF and search and show the PDF the way you want it.

I already signed up for ElasticSearch service, hope I can try this out on that, However in case if you did this on your own, do share your experience and give me a pointer. I want to know your thoughts.

Google has been the pioneer in the technology field from popularizing AJAX to cloud to Bigdata, you name it. However, Amazon is the one that is successful in monetizing on these technologies. IMHO, The reason is two fold, one it was late to the market and second being documentation. If it was pretty late to join the cloud party, preparing documentation like a IEEE journal didn’t help its cause either. In fact, it made it worse. I have been playing with AWS since 2008 but when I looked at GCP (formerly GCE) couple of years ago at one of their events, I left pretty depressing, couldn’t accomplish a simple task of creating a server and hosting a site!

With this background, I started my journey couple of days back to deploy a node.js website in GCP. The webapp is a simple SPA, developed with AngularJS 2.0 as the front and node.js on the back-end. One small twist was that this app was accessing AWS CloudSearch to perform the search and return the results.

One thing I should admit that their documentation has come a long way (I was successful in hosting the site:-)). I started with this https://cloud.google.com/nodejs/getting-started/hello-world. Now that Google is also offering a generous $300 credit for the first year and I was able to follow their document and host the hello-world sample app. Having said that, I looked at the documentation to achieve this on three other places Heroku, Azure and AWS. I must say GCP is the simplest. However, it was not without its quirks. I had my webapp developed using node.js, Express and Angular.JS and here is what you can do to get your webapp ready for deploying in GCP .

Copy the app.yaml from the hello-world sample that you downloaded from the github. Make sure the exclude ^nodemodules is not there.

Went to the command prompt and followed the documentation.

I did gcloud app deploy.

This took a long time and errored out saying “Error loading module (‘aws-sdk’)”. I searched around and after couple of hours found out by trial & error that this is not mentioned in the dependencies in the package.json file. Finally the app got successfully deployed after it took a very long time. I was thrilled but my euphoria was short-lived as the app was not loading on the browser and was throwing 502 error and asked me to try in 30 seconds.

I started researching, there is no solution in sight. Everyone was pointing me to read the logs and being new to GCP, was moping around figured where the logs are and started going through them which is like finding needle in a haystack. Some error was pointing to upstream connectivity failure on nginix and I noticed a loadbalancer error of unable to connect back to the server. Having setup a loadbalancer in both Azure and AWS, I thought that the loadbalancer is not able to direct the traffic to the actual webserver where the site is hosted (though, i didn’t ask for a loadbalancer, GCP by default deploys the site atleast on two servers to make it HA). I found an article (https://cloud.google.com/compute/docs/load-balancing/http/ ) that loadbalancer uses port 80 or 8080 for HTTP traffic and 443 for HTTPS. My server.js file was listening on 8000. I changed it to 8080, voila! it worked. I was ecstatic, how cool is that?

I built and hosted a site on Google Firebase + Angular earlier and now this one, my perception about GCP is changing. Google is becoming a player as well.

In my earlier post, I talked about CloudSearch and now I’m following up with the approach I followed to get the data searchable. There are multiple components. At the core is the AWS CloudSearch which gets populated with the data in JSON format, index those data and ready for search. Now, we have searchable data but need a mechanism for a user to be able to search that data.

I chose node.js, Express and Angular.JS to create a web based app that will offer a UI where the users can type in the search terms and choose criteria to filter data. The app will convert the search terms into structured or simple query (similar to SQL for RDBMS), call AWS CloudSearch APIs to perform the search and display the search results using a Bootstrap grid. Right upto this point, my thought process was straight. Then I threw a spin, how about we host this web app on Google Cloud Platform?

I had no idea on GCP, but with prior knowledge of AWS and Azure and a liberal credit of $300 from Google, thought, it shouldn’t be a problem. I could have hosted the app on heroku or AWS or Azure. However, after reviewing the documentation, I felt confident that I can make it work. Yes, indeed I was right but only after those frustrating moments one have when attempting a new technology. I will write a separate blog entry on this and I don’t want to cloud out the AWS CloudSearch.

Created two IAM users, one with authorization to only search and the other with authorization to upload and index CloudSearch. One good thing about CloudSearch being able to auto convert CSV to JSON. The source was data from MS SQL server, which was extracted as CSV and a separate script uploaded (in 5MB chunks) the data to CloudSearch and indexed the data. I was able to upload data with different formats (number of fields). Then the webapp will build the search terms using “structured syntax” if it is a compound query checking against multiple fields or a “simple” query if it is just text and all text fields are to be searched. The call to CloudSearch APIs through AWS SDK is made to return the search results which are displayed using a Bootstrap grid.

The full-text search on the relational data (MS SQL Server) took upto 30 seconds while the AWS CloudSearch was less than a second! For people who are looking to know why CloudSearch? 3 seconds vs. 30 seconds is a compelling argument to ignore.

If you have data in CSV format, I can show you a searchable demo pretty quick, ping me.

This was a great experience to get the hands dirty and there is nothing like it when it works! right?

The cloud is filled with wide ranging options to store and retrieve data and so is on-premise. Every cloud provider has their own cloudsearch solution from Amazon to Azure to Google. In addition to these proprietary solutions, there are these open source platforms ElasticSearch, Apache Solr etc.

In short, all offer similar features with little difference. I would say there are mainly two big differences between AWS CloudSearch and the other two.

Data import is a batch process in AWS CloudSearch. If you have streaming data or immediate data update, the go for elastic or Solr.

If you don’t need to worry about infrastructure, backups, patches, then go with AWS Cloudsearch. Out-of-the box it comes out as a true cloud product.

Elastic.co as well as AWS offers elastic search as a service where they have simplified the infrastructure part. Elastic.co, infact offers it as a service on AWS cloud. However, Elastic and Solr are more popular than CloudSearch. Thus, it is easy to find resources online for these two compared to AWS CloudSearch.

Thus, I embarked on a journey to take-up AWS CloudSearch and you know what, it is not that difficult (though I went through those gnawing issues and had my own share of frustrating moments). To begin with, I did the manual route of extracting the data out of my RDBMS (SQL Server), upload the data to CloudSearch, indexed it and used the rudimentary UI provided by AWS and was able to search in an hour. The biggest advantage, I see with AWS CloudSearch data upload is that it takes a CSV file and converts to JSON by itself. You can write a batch program to upload in chunks of 5MB files. In addition to CSV, it support multiple other types such as PDF, EXCEL, PPTX, docx etc.

Both Solr and Elastic search, you need to provision a Linux server, then install and configure similar to any software that you download. Even if you take the service route, you still need to worry about backup, upgrades, applying patches etc. One big advantage of these is that you can have it on-premise as well, while AWS CloudSearch is truly available only on the AWS cloud. Beyond that, Elastic also has data visualization tool Kibana or it comes like a suite (ELK – Elastic, Logstash, Kibana). AWS ColudSearch offers only indexing and search and no visualization which is a separate product Quicksight (I haven’t looked at this but I plan to).

I will write more about the programmatic approach in my next entry. Please drop me a line, if you can’t wait and wish to see it in action!

The MES manager has been receiving calls from the QA folks regarding specific product complaints on a regular basis and spends an inordinate amount of time to figure out what went wrong. The manager wanted to find the root cause, which lot, which batch and when a particular component was produced. Historically, the audit trail served this purpose and he used it to a good extent. But the issue is, it takes time. Viewing Audit trail sometimes might get complicated, in case one is analyzing an issue and interrelated containers are involved and specially if they have long histories.

We customized the Audit trails keeping the above issues in mind to make the histories run faster and more user friendly. This is built using Camstar history APIs but with few variations.

We gave users an option, whether they want audit trail that gives entire genealogy (cross reference details) or short history which gives the details of the transactions executed directly against the interested container.

This feature enables the users to look at what they are interested precisely rather than spending time in searching in a lengthy historical data. We reduced the number of mouse clicks and key strokes needed to access the data of interest. This improved the user friendliness by expanding the details and reducing the time it takes to gain access to the critical data. We added on top of this, features like Export to excel so the manager could share that information with the quality folks. Not only that, but they can also change the sort order, exclude or include transaction reversal info etc.

This solution is heavily used and now the MES manager is free to focus on more significant areas. The necessary folks have access to the Audit trail and they can see for themselves and it brought in the much-needed transparency and accountability.

If you would like to know more about it, please feel free to reach out to me.

In this post, I’m going to point out the ways in which you can access your assets (VMs, Storage, DBs) in the cloud securely. This is no different than accessing the assets in your corporate network. The goal is to make you feel comfortable about having your assets in the cloud.

To accomplish anything meaningful, you would need at least one server, a VM (if you are going IaaS) and may be storage as well. The basic level of access restriction can be achieved with AWS IAM and Azure AD. You can restrict and control access to your storage and other assets with different users.

The first and foremost activity is to form a network (virtual) and keep your servers inside that network. A network helps you to group your assets, so all security actions you take can be applied equally on all the constituents rather than doing it on individual assets. Forming a network also enables / helps in free flow of access among its constituents (like an economic bloc). Now, this network need to be protected from the rest of the world. The Security groups will help you define the rules on enabling and blocking access.

Now that you created a network and setup rules, since this whole network lives outside your corporate network location, you need access to it first. There are many ways to accomplish this. The basic and rudimentary approach is to restrict access to this virtual network in the cloud to specific IP addresses (location 1, location 2, etc). However, in this method, though the access is limited to specific IP addresses, the traffic is over the internet and there is no security.

The next level is setting up a VPN. If you are in development mode or access is limited to specific small set of individuals you can setup Point-to-Site (P2S in the Azure world). However, if you want your entire corporate network to be able to connect to the virtual network in the cloud, you can setup Site-to-Site VPN. In both scenarios, the traffic is encrypted but you still are going over public internet. Thus your speed, latency, SLA all limited to the bandwidth & SLA of your ISP. If you are not happy with that, you can try something in AWS it is called Direct Connect and in Azure it is Express Route. This is not public internet but a dedicated pipe, you can call up providers like AT&T, Level3, the cable companies, they will be happy to provide. In addition to this, most of the Datacenter providers such as Sungard, Datapipe, IO offer direct connect between their datacenters and AWS, Azure and other cloud providers.

You can see, how this is no different than working from home and connecting to your corporate network. If you have enabled your employees to work from home, then you are ready for cloud.

This is not a big write up exploring why you need SSL certificates, their need etc. You need to see this as a continuity to my previous post on browser push notifications.

This is the beginning of the year and is a good excuse to review all your websites and their SSL (HTTPS) certificates and make sure they are valid. It is one of the best practice for your websites to support https if it allows login. Your webapp will look so sorry and in some cases, outright rejected and you end up losing your ground. The current browsers do a validation themselves as well as the myriad security products from Norton to free Avast may block, suggest that your website is not secure.

In the last post, we noticed the beauty and significance of browser push notifications. Here, in the current one, I’m talking about servers that host & send these messages. The floating pop-ups or suggestive windows such as the Chat app, that show up to assist the customers may be a totally different app. Gone are the days were the app and all its components lived in one server. now every one of them may live on their own and not even in the same data center or country! This is the day and age of Micro services architecture (better than SOA), each app is a service plugged in to make a complete offering. The web solutions these days are like a giant 1000 piece puzzle and you should orchestrate better.

Here is an example where even a company such as Bank of America has an SSL certificate that is not valid. You could spend all your money in developing nice features but at the end something as simple (wild card certificates are not cheap) an invalid SSL, will block and the messages / features / notifications would not see the light!

You might have heard about push notifications for apps (Android, Apple, Windows – for completeness sake :-)), but to bridge the gap, browsers are coming up with support for push notifications. Currently Firefox and Google Chrome support push notifications. You might have seen something like this recently when you visited a certain website. for eg: one from economictimes.com

It is a pretty cool feature. With email becoming ubiquitous and gmail tagging emails under “promotion”, you need a way to reach out to your customers on significant events. I can state a laundry list of scenario where this will be useful.

Let us say,

You booked a flight and checked-in, the gate changed

You booked a ticket for an event, you want to show parking tips or parking coupon

You have a business website, wherein a crucial information is made available and the client need to be notified.

customer left items in the shopping cart, show promotional or price change alert