The demand to learn AWS is quite high and the number of people getting certified in AWS is going up steadily. Slowly having an AWS certification will become mandatory if you want to work in AWS or if you are searching for an AWS job.

Those who are working in AWS projects know that knowing AWS is just one part of the job. There are lot more things involved in their jobs. They have to more than AWS in order to execute their role to perfection. They need deep understanding of the Data Center, understanding of the network, understanding of the application, awareness of the compliance and security requirements and much more. The job of a data center architect is not easy and the lack of tools in AWS for certain things makes it more difficult.

Take for example, billing. I recently did a architect course for a MNC and lot of senior architects attended it. One of them asked me, “How do we makes sense out of AWS bill? The bill runs into hundreds of pages. Yes, there is total transparency but to understand what has been spent and why is a nightmare”. I had heard similar comments from architects of the now non-existing CSC. They were also talking about the huge bills they to contend with. One of their customers had more than 700 EC2 instances running. Not to mention storage and databases. So the bill was humongous. How do you read such a bill and make sense? It definitely needs some tool. What are the tools available?

Here is a post which talks about various free and paid tools for the purpose of cost analysis

If you love to develop a solution of your own, here is a post on how you can do it using Google Big Query

When I teach the participants about VPC, there is question that is almost always asked, ‘Can we get a network diagram of the VPC we created’? It is not possible in the console and to be honest if you have more than one VPC with subnets in each VPC, it is a bit of pain to see things on the console even if you have named the VPC and subnets in a very logical fashion. What would be helpful is a diagram of each VPC detailing the subnets within the VPC, the route tables and the NACL.

There are multiple tools which help in both generating a design diagram as well as generating a diagram from the existing infrastructure. What would be cool is the feature which can generate a CloudFormation template from our existing infrastructure in a few clicks. I browsed through a few tools which I am listing here. Please note that I have NOT analyzed any of them and I have no idea how effective they are. I will do some testing the coming days to find out how useful these could be. For the time being here are some tools that you can explore.

Hava is a tool which can import your existing infrastructure as a diagram. They also claim that you can get a CloudFormation template from the infrastructure in a matter of a few steps.

CloudCraft allows you to do drawings of the infrastructure. You can also import your existing infrastructure into this tool.

As usual, in re:Invent 2017, Amazon AWS has announced a spate of new services and added new features to existing services. I will summarize some of these. I am going to concentrate on the more generic and not the very specialized services, though I will mention a few of them.

Compute:

AWS Fargate : This new service from AWS allows you to run your docker container without worrying about the systems that will run the container. In other words, AWS will take care of setting up a cluster of instances and will run your containers. The cluster will be maintained by AWS leaving you free to worry about your application. Azure already has the ability to run container instances. In this case, I think AWS is catching up with Azure

Bare Metal: IBM and a few others had the Bare Metal offering earlier. As the name indicates, you get complete control of a server and you can load the hypervisor or OS of your choice on the system. This helps you in many ways, especially in getting better performance, achieving compliance, tackling licensing issues and you can also build a cloud of your choice within AWS!! Bare Metal is still in preview stage but I am sure you will see it being generally available soon

Hibernation of Spot Instances: Earlier whenever your spot instance was running and your bid price fell below the spot price, AWS terminated your spot instances. So spot instances were suitable only for such applications which could withstand sudden termination. Later, they stopped the spot instances instead of terminating them. Now the spot instances will go into the hibernation mode. Here, the state of your memory is also stored on disk and when capacity becomes available again your instance will start running from where you left off. The private IP and the Elastic IP are also preserved. This makes spot instances even more attractive to use

Elastic Container Service for Kubernetes (EKS): Many of you would know that Kubernetes is a Docker orchestration service. AWS had only ECS (Elastic Container Service) earlier for Docker orchestration. They have now given us the option of using Kubernetes as well. Here, AWS will take care of all the infrastructure required for running Kubernetes, so that we need not worry about setting up servers and setting up Kubernetes. Given that Kubernetes is having a lot of traction, this is a good move from Amazon. This is now in the Preview stage

Databases

Amazon Aurora multi master: Now you can create more than one read/write master database. The applications can use these multiple databases in the cluster to read and write. As you can guess, the high availability of the database will increase as you can have each of the masters in a different Availability Zone

DynamoDB Global Tables:In this case your DynamoDB tables are automatically replicated across regions of your choice. Earlier if you wanted a replica of your DynamoDB table in another region you had to setup the replication on your own. With DynamoDB you no longer need to worry about it now. You can immediately see how this will be effective in a DR scenario.

DynamoDB Backup and Restore: Now AWS allows you to backup and restore your DynamoDB tables. This is to help enterprises meet the regulatory requirements. AWS promises that the backup will happen very fast irrespective of the size of the table

AWS Neptune: Amazon launches a graph database which it has name AWS Neptune. If you have seen my webinar on NoSQL Databases you would know that graph database is a type of NoSQL Database. I will write a separate post on graph database and what AWS Neptune’s features are in a future post

Networking

Inter-region VPC Peering:Earlier you could peer two VPCs only if they were in the same region. Now Amazon allows you to peer two VPCs even if they belong to different regions. So an EC2 instance can access another EC2 instance in a peered VPC of another region using only the private IP

Messaging

Amazon MQ: This is a managed broker service for Apache ActiveMQ. Amazon will setup the ActiveMQ and maintain it. I don’t have much of an idea about ActiveMQ. I haven’t worked on it. From what I can gather, Amazon now has two messaging solutions, its own SQS (Simple Queue Service) and Amazon MQ. Maybe Amazon MQ has more features than SQS? I will find out and let you know

There are tons more of announcement that were made. I have just touched on ones that affect the AWS Solution Architect and AWS SysOps exams. I will write more about other new services and features in another post.

Cloud brought with it cheap storage and it also brought with it durability This meant that storing your data on the cloud is cost effective and you don’t need to worry about losing data. (S3 of Amazon for example, gives us 11 9s durability. This means that for all practical purposes, you will never ever lose your data). It is but natural that people start using the Cloud storage service and these Cloud Storage service of different vendors are storing trillions and trillions of objects.

Cloud Storage is based on Object Storage. This is different from the standard Block and File storage we are used to. In case of Object store, we need to fetch the object from the Cloud using REST APIs. This is quite different from reading and writing a file on to your disk. There are other differences as well between Object store and Block/File based storage devices.

While Cloud Storage is cheap, users are more comfortable with a filesystem interface. Is there a way in which we can deal with the Cloud Storage as if it is a filesystem? This means that the user just reads and writes a file and does not need to use REST API to fetch or store a object. If this can be done, users will find it easier to use the Cloud storage, thus reducing storage cost. This is possible by using Storage Gateways.

Storage Gateways have the Cloud Storage as their backend but expose a filesystem to the users. The users deal with files whereas the Storage Gateway will store these files are objects in Cloud Storage. This background processing is done transparent to the user. The Storage Gateways could expose either a block device or a filesystem (or a virtual tape library) to the user. AWS, for example, has Storage Gateways which expose a filesystem, Storage Gateways which expose a block device (iSCSI device) and a Storage Gateway which exposes a Virtual Tape Library (VTL). All of these use S3 as their backend to store the data.

The question that will be uppermost in your mind is that if the storage is in the Cloud and if you are using this storage as the primary storage, will there be no impact on the performance? It is a very pertinent question. Accessing the Cloud is definitely not as fast as accessing your disk drive in the data center. In order to address this, Storage Gateways have disks in them wherein the cache the recently accessed files. This helps in bolstering the performance of the gateways. Other than AWS, Avere System is another company which does Cloud based NAS filers. ( http://www.averesystems.com/ )

Azure has now come with a FUSE adapter for BLOB storage (BLOB storage is the object store of Azure). Once you install this FUSE adapter on a Linux system, you can mount a BLOB container onto your Linux system. Once that is done, you can access the files as if they are part of your filesystem. You don’t need to use the REST APIs. The advantage of this wrt the Storage Gateways is that Storage Gateways are generally virtual appliance. For example, in case of AWS Storage Gateway, you need VMWare on prem because AWS Storage Gateway is a virtual appliance which runs on VMWare ESXi. In case of FUSE, you don’t need any additional device. Once you have the driver installed, you can start accessing the object storage as normal files.

Ofcourse, FUSE adapter of Azure is in the initial stage and hence has limitations. Not all filesystem calls have been implemented. So you need to be careful when you are using it.

I am excited to announce that we will be launching CloudSiksha Academy on 30th September 2017.

We have been listening to the needs of myriad set of engineers: sysadmins, network admins, developers, engineering managers, senior executives and so on. Based on the feedback we received, we perceived that there is a need for an Academy which can focus on role based courses in technical areas. Hence this initiative of CloudSiksha Academy.

Our idea is to make you perform better in your role or if you so desire, to shift to a new role. We understand that the requirement is different individually. Some want to ensure they upgrade their skills to perform their jobs better and move up the ladder within the organization. Some find they are stuck in obsolete technology and want to shift to a place where exciting things are happening. Managers want to update themselves on new technologies, which will have an impact on their jobs. They want a holistic view and not a hands on training. Senior executives would be looking at how the newer technologies challenge them in terms of cost, people management, process change and so on. Ofcourse, there are people who are looking at jobs, be it the college freshers or experienced folks. It is important that we address the needs of each of the constituents in a unique way. Hence you will find modules in CloudSiksha Academy tailored towards various roles and just not towards certification.

What we also realized while talking to people is that each person has their own pace of learning and each person is more comfortable with a certain methodology. For example, some engineers who are already doing their job well don’t need person instruction. They are comfortable looking at videos and learning. While some are not so comfortable and would love to interact with an instructor and want to ‘attend’ a course. Whereas some others would watch the videos and then may want to talk to an expert to clarify their doubts. Keeping this in mind, for each of the roles we will have video based at-your-pace learning, blended learning and hand holding online classes. This will allow you to choose course based on the role and it will also allow you to decide on which methodology you will be comfortable with and choose that methodology for knowledge acquisition.

What courses with CloudSiksha Academy offer? What roles are we envisaging? What will be the duration of each course? What will be the fee?

Wait till 30th September 2017 to get all your questions answered. I can promise you that you will have some excellent deals when CloudSiksha Academy is inaugurated. Looking forward to your kind support to make this venture a success.

In this post, I will talk about another question that I get asked often: “Which Cloud is Better?”. As with many things in life, there is no single or a simple answer to this question.

When you are looking to use a public cloud, you are looking at various aspects of the cloud. Some of them would include:

What services does the Cloud provide?

What will be the performance of my VMs?

What is the cost that I will incur?

How easy it is to migrate to this Cloud?

Will I be locked in with this vendor?

These are the basic minimum questions that will arise when you are choosing a Cloud provider. Against all of these, you will find that it is very difficult to do an Apple to Apple comparison between various Cloud providers.

Let us take performance for example. Assume you have the same configuration (say a 2 vCPU system with 4 GB RAM and 500GB hard disk) from two vendors, will the performance of your VM be the same in both places? We cannot answer this with any assurance because performance of a VM will depend on how over-provisioned the bare metal is and also on the noisy neighbors. Noisy neighbors are the other VMs which are running on the same bare metal your VM is running on and if any of the other VMs start consuming more of the resources, it can have an impact on the performance of your VM. We do not know how the Cloud provider places the VMs on Bare Metal and hence you cannot speak with confidence about performance of the VM. As you would have guessed, depending on the neighbors and when they consume resources, your performance will vary.

One way of guaranteeing the performance would be to take a dedicated VM. This means only your VMs will run on the bare metal and no other VM will be placed on this bare metal. AWS has dedicated instance and Softlayer has Virtual Private instances. As you can expect, these options will give you a more reliable performance but at a higher cost.

This brings us to the cost comparison. The standard question I hear is ‘Which Cloud is cheaper?’. Once again this is not an easy question to answer and will depend on the workload you have and the services that you use. Let us take a very simple case in AWS. If you are using say a t2.micro instance with say 100GB disk as a web server, you cannot immediately calculate what will be your monthly outflow. You need to have an idea on the network traffic which goes out from your instance. AWS doesn’t charge for incoming traffic but outgoing traffic is charged. So the cost you incur will depend on the traffic. Or take the case of S3. It is not just about the cost of storing object. The no:of GET, POST, PUT etc requests also get charged. Hence computing the cost is not an exact science. You need to get some data before you can compute cost with some confidence.

Most people tend to compare the instance cost and decide on which is cheaper. We need to understand that instance/VM is just a small part of the larger equation. We have storage costs, I/O costs, networking costs, support cost and so on. Comparing only the VM costs does not give the big picture regarding costs at all.

Migration is a topic which requires a post of its own. I will write about it in the near future.

July 20, 2017 by admin·Comments Off on Will I lose my job to the Cloud? : Concern of the mid level managers

In my last blog post, I had spoken about the concern of Administrators about losing their job to the cloud. In this post I want to examine the concerns of middle level managers with respect to their job security in the era of Cloud.

I have had many conversations with mid and senior level managers, who have experience ranging from 10 to 20 yrs in the industry and are now feeling insecure about their job because of projects slowly moving to Cloud. Their major concern is two fold: One, the IT industry itself has been harsh on the mid level managers, laying off lot of them. Second, they fear that their skills, or lack of it, will not fetch them another job at the same level in the industry.

Many of them want to learn about Cloud in order to keep themselves relevant in the industry but are faced with the problem, what should I learn? The biggest challenge for mid level managers is not that they cannot learn new technologies but what should be the next step after learning that technology? The dilemma is due to the fact that the managers have lot of experience and the industry will hire you for your experience and that experience is not on the new technology (Cloud, in our case here).

Many ask me if they should take up a course and get themselves certified as AWS Architect – Associate. This can only lead you so far but not farther. It will demonstrate that you are willing to learn new technologies, that you are willing to adapt yourself to new situations and you are quite aware of how the environment is changing. Along with it, you need to try and check out how you can work on Cloud and how you can bring in a perspective which a person with 3 to 4 yrs experience cannot bring to the table. It is very important that you think about this carefully. Because the companies will not hire you for your certification. They can hire a 3 to 4 yr experience person for that. What they will hire you for is your knowledge on the development processes, your knowledge on migrations and your ability to understand Cloud in a wider enterprise context.

So what should managers so in this situation? One, is to choose one of the Public Cloud providers and try to understand the working of the Cloud. Get a certification if you can. Second, understand the challenges of migrating workloads to Cloud. (There is lot of literature out there.) How would you meet these challenges as a Manager? Thirdly, understand why moving to Cloud would benefit your organization and what could be the limitations. Finally, try and write articles (in your own blog, on LinkedIN and so on) in order to display your passion for the Cloud. It will also let the world know that you are interested in Cloud and have expertise on it. The best way ofcourse is to lead a project within your company (either a full fledged project or atleast a proof-of-concept project) which is based on Cloud. Nothing gives you more leverage than working on a project.

Times are tough for mid level managers in many organizations but you can definitely tide over them if you consistently work hard in learning and disseminating your knowledge.

One of the questions that I get asked constantly nowadays is, “Will I lose my job to the Cloud?”. The people asking me range from system administrators to senior managers. They could be involved in Infrastructure projects or Development projects but their concern about cloud taking away their job is real.

I had written earlier that Cloud now demands a broader skill set from administrators. Earlier you were a server admin, AD admin, storage admin, network admin and so on. Some of these tasks are simplified on the cloud that if you are specialized in only of these, you may not be a right fit for the cloud. Let us take the case of Storage. We have excellent admins who specialize in administering complex storage products from Dell-EMC, NetApp, Hitachi and so on. The cloud storage takes most of the complexity. If you take the case of block storage in AWS, you have EBS for block storage, EFS for file storage and S3 for Object storage. All three of the them are setup for you and there is nothing much for a storage administrator to do. Similarly when it comes to networking, the complexity in the cloud is much less than what it is when you have to setup networking in your data center. Setting up a VPC is much less complicated than setting up routers and switches (sometimes from different vendors) in your data center. Similarly starting an EC2 instance is a very easy job and you don’t really require a server administrator to do it.

In other words, Cloud values technical knowledge over product knowledge. Additionally it also values breadth of knowledge. Ofcourse some areas may not be impacted much like say Microsoft AD Administrator or DBA, until and unless someone is using a PaaS in which case some of these will also be impacted.

So what should you do if you are an administrator? How scared should you be of losing your job? To be honest, I cannot answer you with hundred percent certainty about what the future holds but these are a few steps you can take:

Expand your knowledge base. If you are a storage admin, start checking what networking is all about and vice versa

Understand what the roadmap is for the product you are supporting. Let us say you are supporting a NetApp product, you need to understand what the company’s roadmap is for that particular product. This will give you an idea if you are supporting a soon to be obsolete product or an evergreen product.

Find out the roadmap of your company and whether it has a Cloud strategy. In many cases, once people land a job, they rarely ever try to find out the roadmap of their own company. You must get rid of this lethargy and find out if and when your company will move to the cloud.

Also try and understand how the external market is growing. Is everyone going to the cloud? Are the sales of Dell-EMC, NetApp, Hitach etc are going up or going down. Your job depends on how the market is growing and in which direction it is growing

If you are serious about moving to the Cloud, then check if there are any cloud projects within the company. In order to show your seriousness, try and get yourself certified in any of the major Cloud vendor certification based on what is required in your company. Certification will cost money but it may be worthwhile if you are serious about moving to cloud

As I see there is no need to panic because though cloud migration is happening it is not happening at a pace wherein major companies are dismantling their data centers. That will not happen soon or may never happen. Yet, the demands of the future would be different: more wider knowledge on diverse topics, good grip on the fundamentals and so on and you must be prepared for it.

I also get questions from mid level managers on the impact of cloud on their jobs. I will write a separate post on that soon.

May 7, 2017 by admin·Comments Off on CloudSploit and Security in the Cloud : An Interview

Security in the cloud is beyond a doubt the most important criteria for enterprises migrating to the cloud. Security in cloud is a shared responsibility. While Cloud providers like Amazon have certain responsibilities towards securing the infrastructure, users need to be vigilant and secure their data.

There are companies which help users to ensure that their cloud environment is secure. One such company is CloudSploit. The founder of Cloudsploit, Matthew Fuller, was kind enough to answer my questions regarding cloud security, over email.

Matthew Fuller, Inventor and Co-Founder of CloudSploit

Matt is a DevOps Security Engineer with a wide array of security experience, ranging from web application pentesting to securing complex networks in the cloud. He began his security career, and love for open source, while working as a Web Application Security Engineer for Mozilla. He enjoys sharing his passion for technology with others and is an author of the best selling eBook on AWS’s new service – Lambda. He lives in Brooklyn, NY where he enjoys the fast paced, and growing, tech scene and abundant food options.

Here is our conversation

CloudSiksha: In your experience, what are the major security concerns of enterprises wanting to migrate to Cloud?

Matt:The biggest concern Enterprises should have with moving to the cloud is simply not understanding or having the in-house expertise to manage the available configuration options. Cloud providers like AWS do a tremendous job of securing their infrastructure and providing their users with the tools to secure their environments. However, without the proper knowledge and configuration of those tools, the settings can be mis-applied, or disabled entirely. Oftentimes, the experience that the various engineering teams may have with traditional infrastructure does not translate to the cloud equivalent, resulting in mismanaged environments. Multiply this across the hundreds of accounts and engineers a large organization may have, and the security risk becomes very concerning.

CloudSiksha: You are security company which helps people who migrate to AWS to be secure. What do you bring over and above what Amazon provides to users?

Matt:AWS does an excellent job of allowing users to tune their environments. However, while they provide comprehensive security options for every product they offer, they do not enforce best practice usage of those options. CloudSploit helps teams quickly detect which options have not been configured properly, and provides meaningful steps to resolve the potential security risk. We do not compete with any of AWS’s tools; instead, we help ensure that AWS users are using them correctly with the most secure settings.

CloudSiksha:AWS itself has services like Inspector, CloudTrail and so on. So can the users not use these services for their needs? How does CloudSploit differ from these? Or do you supplement / Complement these services?

Matt:AWS currently provides several security-related services including CloudTrail, Config, Inspector, and Trusted Advisor. The CloudTrail service is essentially an audit log of every API call made within the AWS account, along with metadata of those calls. From a security perspective, CloudTrail is a must-have, especially in accounts with multiple users. If there is ever a security incident, CloudTrail provides a historical log that can be analyzed to determine exactly what led to the intrusion, what actions the malicious user took, and what resources were affected.

AWS Config is slightly different in that it records historical states of every enabled resource within the account, allowing AWS users to see how a specific piece of the infrastructure changed over time and how future updates or changes might affect that piece.

Finally, Inspector is an agent that runs on EC2 instances, tracking potential compliance violations and security risks at the server level. These are aggregated to show whether a project as a whole is compliant or not.

While these services certainly aid in auditing the infrastructure, they only scratch the surface of potential risks. Like many of AWS’s services, they cover the basics, while leaving a large opening for third party providers. CloudSploit is one such service that aims to make security and compliance incredibly simple with as little configuration as possible. It uses the AWS APIs (so it is agentless, unlike Inspector) to check the configuration of the account and its resources for potential security risks. CloudSploit is most similar to AWS Config, but provides many advantages over it. For example, it does not require any manual configuration, continually updates with new rule sets, does not charge on a per-resource-managed basis, and covers every AWS region.

CloudSploit is designed to operate alongside these AWS services as part of a complete security toolset, and helps ensure that when you do enable services like CloudTrail, that you do so in a secure fashion (by enabling log encryption and file validation, for example).

Matt:CloudSploit has two main components. First, it connects to your account via a cross-account IAM role and queries the AWS APIs to obtain metadata about the configuration of resources in your account. It uses that data to detect potential security risks based on best practices, industry standards, and in-house and community-provided standards. For example, CloudSploit can tell you if your account lacks a secure password policy, if your RDS databases are not encrypted, or your ELBs are using insecure cipher suites (plus over 80 other checks). These results are compiled into scan reports at predefined intervals and sent to your email or any of our third-party integrations.

The second component of CloudSploit is called Events. Events is a relatively new service that we introduced to continually monitor all administrative API calls made in your AWS account for potentially malicious activity. Within 5 seconds of an event occurring, CloudSploit can make a security threat prediction and trigger an alert. The Events service is monitoring for unknown IP addresses accessing your account, activity in unused regions, high-risk API calls, modifications to security settings and over 100 other data points.

All of this information is delivered to your account to help them take action and improve the security of your AWS environment.

CloudSiksha:What are the dangers of providing you with a user account in AWS?

Matt:There is very little danger. CloudSploit uses a secure, third-party, cross-account IAM role to obtain temporary, read-only access to your AWS account. Even if this role information were compromised, an attacker would still not be able to gain access without also compromising CloudSploit’s AWS account resources. The information we obtain and store is also very limited in nature – metadata about the resources but never the contents of those resources.

CloudSiksha:Can you tell me something about how your software has been used by companies and what value they are seeing?

Matt: Companies using our product have integrated it in a number of unique ways. For example, using our APIs, a number of our users have built integrations into their Jenkins-based pipelines, allowing them to scan for security risks when making changes to their accounts, shortening the feedback loop between changes being made and security issues being detected. Other companies have made CloudSploit the central dashboard for all of their engineering teams across every business unit to ensure that security practices are being implemented across the entire company.

Individual developers and pre-revenue projects tend to use our Free option, and are happy with the value it provides. 20% of these users move on to a paid plan in order to have the scans and remediation advice occur automatically.

Medium-sized teams prefer the Plus account in order to connect CloudSploit with third-party plug-ins such as email, SNS, Slack, and OpsGenie.

Advanced users, those who like to automate everything in their CI/CD workflow, as well as larger enterprises prefer the Premium plan for its access to APIs and all of our various features and maximum retention limits.

CloudSiksha:I see you have multiple options with varying payments. Has any of your client shifted from one tier to another? What was the reason for them upgrading to a higher tier?

Matt: Expect to see a stronger focus on compliance. Besides the 80+ plugins and tests that we currently have, we are working to expand our footprint for more compliance-based best practices. In addition, we are launching a new strategy to get information sooner and react to it faster than any competing AWS security and compliance monitoring tool. Amazon released CloudWatch Events in January and a month later we had already taken advantage of those features. We plan to continue to enhance this Events integration, delivering ever more useful results to our users.

March 27, 2017 by admin·Comments Off on Passing the AWS Solution Architect Professional certification exam

If someone were to ask me how they should prepare for the AWS Solution Architect Professional exam, I would advice them not to prepare like I did. In the sense that I went to the exam quite under- prepared and I had to spend considerable time on each question in the initial stages before I got an hang of the questions. As the test progressed I was able to speed up my response.

I had taken a target of March end to complete this certification. My earlier Associate certification was expiring by March end and instead of getting re-certified I though I will attempt this certification. Unfortunately I got involved in getting my online courses ready (you should see them in a couple of month’s time) and didn’t have much time to prepare. Most preparation I did was in the last one week and I don’t think that is enough.

My friend Kalyan had sent me links to videos which need to be watched and also links to important white papers. Kalyan is a certified professional himself and these were helpful though I did not see all the videos and did not read all the white papers. What I did was to read the developer documents of most of the services and then depend on my logical ability to deduce the answer. This will backfire if you do not have a good grip on the services of AWS.

A few points from what I could gather from the exam:

1. Quite a few questions involve Big Data services: Kinesis, RedShift, Elastic Cache and EMR. So understand these services well. You must know when to use which service

2. I got a few questions on SWF and Datapipeline. Again you need to understand which is used for which situation

3. Lot of questions on hybrid cloud. So be very thorough with Direct Connect, VPN and Route 53

5. Understand when you must use RDS and when you must use DynamoDB. Quite a few questions have both these services as answers

6. Understand the difference between Layer 4 and Layer 7 in Networking

7. If you know your theory well, you can easily discard some of the options. This is the approach I used in most of the questions. To paraphrase Sherlock Holmes, “Remove all the impossible answers. Whatever remains, however improbable, must be true”

The major problem with this exam will be that you may not have used many of the services. Many of us will not have a chance to use Direct Connect or VPN or RedShift or Elastic Cache and so on. So we must rely on theory and an understanding of these services to answer the questions. Therefore it is imperative that you read the documentation in detail and watch the 300 and 400 series videos to understand the theory thoroughly. A good understanding of the theory couple with good analytical reasoning skills will let us cross the line.

Recently I read read about two outages, the AWS S3 being the bigger one. The other outage, being at GiLab.com. In both cases the root cause of the problem boiled down to human error. Even with tons and tons of automation around, we need to depend on System Operators to perform certain tasks and this is where human error gets induced. Also remember, not every automation tool is fool proof. You never know which corner condition it was not designed for and that could also induce problems. For now let us concentrate on human error.

I am sure each of the system administrator has his/her own horror story to related regarding human errors. I have known too many. I will tell you a few of them here.

When I worked for my company, in the late 80s, getting the root password was not a difficult thing. Lots of people had the root password for the systems. Once a sysadmin went to a lab of another department as he wanted to copy some files from there. He had root access on the system. After copying files, he some some unnecessary files in the system and gave rm -rf *.* Unfortunately he was not in the same directory where those unwanted files existed but at a directory at a higher level. Before he ould realize his mistake the system went down. It was later said that whenever the department people saw him coming that side, they would shut down all systems till he left the place.

This was a minor one as it impacted only system. The major one I heard of was in the private cloud segment, where they were hosting database as a service. It seems that one of the DB administrators had to manually connect the database to a client system. Unfortunately he connected the DB of another client instead of the correct one. So the first client was able to see the database of another company!! All hell broke loose and the client had to be pacified by people at the very top.

If you look at the GitLab.com case, you will see another standard horror story. People take backups but never test if the backups are good. A friend of mine related a story wherein some major design drawings were being backed up regularly. One day their servers crashed and became non recoverable. So they tried to restore from the backups only to find that though backup jobs were run daily there were failures which the sysadmin had not noticed. So there were nothing in the tapes. To add to their horror the sysadmin had quit only a few weeks before. So almost 6 months of effort had to be repeated !!

The more complex the system, the more impact any such error has. Additionally the complexity, as in the case of AWS, induces its own error checking and consistency checks, so that recovering from errors will not be an easy task.

The job of System Administrator will grow more and more tense with the evolving complexity of systems. The fact is that some of the best SysAdmins are chosen for such jobs and yet there could always be an instance wherein due to tiredness, temporary lack of focus, oversight or sheer bad luck an error could be made. Unfortunate in this cloud era, if you are a service provider, the repercussions are bound to be heavy. The System Administrators must be more vigilant than ever and the organizations need to put in lots of checks and balances and ofcourse automate wherever they can.