When building a highly scalable website like amazon.com, there would be thousands of web servers and all of them would be fronted by multiple load balancers as shown below. The end user would be pointing to the load balancer which would forward the requests to the web servers. In the case of the AWS ELB (Elastic Load Balancer), the distribution of the traffic from load balancer to the servers is in a round-robin fashion and doesn’t consider the size of the server or how busy/idle the servers are. May be AWS will add this feature in the upcoming releases.

In this blog, we would be analyzing the number of users coming to a website from different ip addresses. Here are the steps at a high level which we would be exploring in a bit more detail. This is again a lengthy post where would be using a couple of AWS services (ELB, EC2, S3 and Athena) and see how they work together.

– Create two Linux EC2 instance with web servers with different content – Create an Application Load Balancer and forward the requests to the above web servers – Enable the logging on the Application Load Balancer to S3 – Analyze the logging data using Athena

To continue further, the following can be done (not covered in this article)

– Create a Lambda function to call the Athena query at regular intervals – Auto Scale the EC2 instances depending on the resource utilization – Remove the Load Balancer data from s3 after a certain duration

Step 1: Create two Linux instances and install web servers as mentioned in this blog. In the /var/www/html folder have the files as mentioned below. Ports 22 and 80 have to be opened for accessing the instance through ssh and for accessing the web pages in the browser.

server1 – index.html server2 – index.hml and img/someimage.png

Make sure that ip-server1, ip-server2 and ip-server2/img/someimage.png are accessible from the web browser. Note that the image should be present in the img folder. The index.html is for serving the web pages and also for the health check, while the image is for serving the web pages.

Step 2: Create the Target Group.

Step 3: Attach the EC2 instances to the Target Group.

Step 4: Change the Target Group’s health checks. This will make the instances healthy faster.

Step 5: Create the second Target Group. Associate server2 with the target-group2 as mentioned in the flow diagram.

Step 6: Now is the time to create the Application Load Balancer. This balancer is relatively new when compared to the Classic Load Balancer. Here is there difference between the different Load Balancers. The Application Load Balancer operates at the layer 7 of the OSI and supports host-based and path-based routing. Any web requests with ‘/img/*’ pattern would be sent to the target-group2, rest by default would be sent to target-group1 after completing the below settings.

Step 7: Associate the target-group1 with the Load Balancer, the target-group2 will be associated later.

Step 8: Enable access logs on the Load Balancer by editing the attributes. The specified S3 bucket for storing the logs will be automatically created.

Few minutes after the Load Balancer has been created, the instances should turn into a healthy state as shown below. If not, then maybe one of the above steps has been missed.

Step 9: Get the DNS name of the Load Balancer and open it in the browser to make sure that the Load Balancer is working.

Step 10: Now is the time to associate the second Target Group (target-group2). Click on View/edit rules) and add a rule.

Any requests with the path /img/* would be sent to the target-group2, rest of them would be redirected to the target-group2.

Step 11: Once the Load Balancer has been accessed from different browsers a couple of times, the log files should be generated in S3 as shown below.

Step 12: Now it’s time to create tables in Athena and then map it to the data in S3 and query the tables. The DDL and the DML commands for Athena can be found here.

We have seen how to create a Load Balancer, associate Linux web servers with them and finally check how to query the log data with Athena. Make sure that all the AWS resources which have been created are deleted to stop the billing for them.