Amazon Web Services EC2 is a great IaaS Platform and many organizations use it to host SQL Server instances. Being IaaS platform is opens up easy migration scenarios from on-premises deployments and for many people its a great first step into cloud. However, hopefully soooner than later, many companies realize that they are ready to take next step – move their workloads and databases to PaaS cloud service. This is where Microsoft SQL Azure DB shines.

Microsoft AzureSQL Database (formerly SQL Azure, Windows Azure SQL Database) is a cloud-based database service from Microsoft offering data-storage capabilities. The aim is for users to just communicate with a T-SQL endpoint rather than managing database storage, files, and high availability. Azure SQL Database obviously has number of limitations like lack of support of cross database queries, SQL Broker, etc. meaning that not every database can be migrated to Microsoft Azure SQL DB without prerequisite work. However, for folks that are ready to migrate Microsoft Data Migration Assistant can be viable option for such migration from AWS EC2 IaaS.

For my tutorial I have SQL Server on Windows instance in EC2 running well known venerable AdventureWorks2016 database. Here it is amongst my other servers in EC2 console:

With AWS elastic IP assigned to that Windows machine I can easily connect to default SQL instance via my local SSMS:

Obviously previously I had to setup proper inbound rules for this machine both with AWS EC2 Security Groups and Windows Firewall.

I also had to again go to security group for the server and open inbound ports for SQL Server traffic again (port 1433). Smart way to do it is to filter by client IP of my workstation where I run my SSMS and will run Data Migration Assistant tool

Now I can make sure that I can connect to my target Microsoft Azure SQL DB from my client machine via SSMS as well:

Now lets proceed with migration. For that you will need to download and install Microsoft Data Migration Assistant from – https://www.microsoft.com/en-us/download/details.aspx?id=53595 . Once you installed the tool lets create new project. I will start with assesment project to make sure I dont have any critical incompatibilities that preclude my migration to PaaS platform.

Tool will check for compatibility issues in my database as well as feature parity to see what needs to be done to migrate any SQL Server instance features to PaaS if anything at all.

We can then provide credentials to our AWS EC2 based instance and pick database after connecting to the source:

Now that we added source lets start assessment. DMA can have any version of SQL Server as assesment source up to 2016. As assesment done you will get a nice report where you can see all of the potential issues before you migrate. It should something look like this :

After that tool will do quick assesment and present you with schema objects you can migrate , highlighting\marking any objects that may have issues

After you pick objects you wish to migrate , tool will generate SQL DML script for these objects. You can then save that script for analysis and changes , as well as apply that script via SSMS on your own. Or you can go ahead and apply the script via tool as I done for this example by pressing Deploy Schema button.

After schema deployment finish we can proceed with data migration by pressing Migrate Data button.

Next, pick your tables to migrate and start data migration. Migration of data for smaller database like AdventureWorks takes few minutes. Everything transferred other than 2 temporal tables , but those are very different from regular database table. SQL Server 2016 introduced support for system-versioned temporal tables as a database feature that brings built-in support for providing information about data stored in the table at any point in time rather than only the data that is correct at the current moment in time. There are other ways to migrate temporal tables to SQL Azure that I will post in my next blog posts.

Well, we are pretty much done. Next, check the data has migrated to target by doing few queries in SSMS:

This topic is where I actually planned to pivot to for quite a while, but either had no bandwith\time with moving to Seattle or some other excuse like that. This topic is an interesting to me as its intersection of technologies where I used to spend quite a bit of time including:

Data, including both traditional RDBMS databases and NoSQL designs

Cloud, including Microsoft Azure, AWS and GCP

Containers and container orchestration (that is area new to me)

Those who worked with me either displayed various degrees of agreement, but more often of annoyance, on me pushing this topic as growing part of data engineering discipline, but I truly believe in it. But before we start combining all these areas and do something useful in code I will spend some time here with most basic concepts of what I am about to enter here.

I will skip introduction to basic concepts of RDBMS, NoSQL or BigData\Hadoop technologies, as if you didn’t hide under the rock for last five years, you should be quite aware of those. That brings us straight to containers:

As definition states – “A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings. Available for both Linux and Windows based apps, containerized software will always run the same, regardless of the environment. Containers isolate software from its surroundings, for example differences between development and staging environments and help reduce conflicts between teams running different software on the same infrastructure.”

Containers are a solution to the problem of how to get software to run reliably when moved from one computing environment to another. This could be from a developer’s laptop to a test environment, from a staging environment into production, and perhaps from a physical machine in a data center to a virtual machine in a private or public cloud.

But why containers if I already have VMs?

VMs take up a lot of system resources. Each VM runs not just a full copy of an operating system, but a virtual copy of all the hardware that the operating system needs to run. This quickly adds up to a lot of RAM and CPU cycles. In contrast, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program.
What this means in practice is you can put two to three times as many as applications on a single server with containers than you can with a VM. In addition, with containers you can create a portable, consistent operating environment for development, testing, and deployment.

On Linux, containers run on top of LXC. This is a userspace interface for the Linux kernel containment features. It includes an application programming interface (API) to enable Linux users to create and manage system or application containers.

Docker is an open platform tool to make it easier to create, deploy and to execute the applications by using containers.

The heart of Docker is Docker engine. The Docker engine is a part of Docker which create and run the Docker containers. The docker container is a live running instance of a docker image. Docker Engine is a client-server based application with following components :

A server which is a continuously running service called a daemon process.

A REST API which interfaces the programs to use talk with the daemon and give instruct it what to do.

A command line interface client.

The command line interface client uses the Docker REST API to interact with the Docker daemon through using CLI commands. Many other Docker applications also use the API and CLI. The daemon process creates and manage Docker images, containers, networks, and volumes.

Docker client is the primary service using which Docker users communicate with the Docker. When we use commands “docker run” the client sends these commands to dockerd, which execute them out.

You will build Docker images using Docker and deploy these into what are known as Docker registries. When we run the docker pull and docker run commands, the required images are pulled from our configured registry directory.Using Docker push command, the image can be uploaded to our configured registry directory.

Finally deploying instances of images from your registry you will deploy containers. We can create, run, stop, or delete a container using the Docker CLI. We can connect a container to more than one networks, or even create a new image based on its current state.By default, a container is well isolated from other containers and its system machine. A container defined by its image or configuration options that we provide during to create or run it.

That brings us to container orchestration engines. While the CLI meets the needs of managing one container on one host, it falls short when it comes to managing multiple containers deployed on multiple hosts. To go beyond the management of individual containers, we must turn to orchestration tools. Orchestration tools extend lifecycle management capabilities to complex, multi-container workloads deployed on a cluster of machines.

Some of the well known orchestration engines include:

Kubernetes. Almost a standard nowdays , originally developed at Google. Kubernetes’ architecture is based on a master server with multiple minions. The command line tool, called kubecfg, connects to the API endpoint of the master to manage and orchestrate the minions. The service definition, along with the rules and constraints, is described in a JSON file. For service discovery, Kubernetes provides a stable IP address and DNS name that corresponds to a dynamic set of pods. When a container running in a Kubernetes pod connects to this address, the connection is forwarded by a local agent (called the kube-proxy) running on the source machine to one of the corresponding backend containers.Kubernetes supports user-implemented application health checks. These checks are performed by the kubelet running on each minion to ensure that the application is operating correctly.

Apache Mesos. This is an open source cluster manager that simplifies the complexity of running tasks on a shared pool of servers. A typical Mesos cluster consists of one or more servers running the mesos-master and a cluster of servers running the mesos-slave component. Each slave is registered with the master to offer resources. The master interacts with deployed frameworks to delegate tasks to slaves. Unlike other tools, Mesos ensures high availability of the master nodes using Apache ZooKeeper, which replicates the masters to form a quorum. A high availability deployment requires at least three master nodes. All nodes in the system, including masters and slaves, communicate with ZooKeeper to determine which master is the current leading master. The leader performs health checks on all the slaves and proactively deactivates any that fail.

Over the next couple of posts I will create containers running SQL Server and other data stores, both RDBMS and NoSQL, deploy these on the cloud and finally attempt to orchestrate these hopefully as well. So lets move off theory and pictures into world of parctical data engine deployments.

In my previous post I introduced you to Memcached, in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering. This continued my interest in in-memory NoSQL cache systems like AppFabric Cache and Redis.

Both Redis and Memcached are offered by AWS as a cloud PaaS service called ElastiCace. With ElastiCache, you can quickly deploy your cache environment, without having to provision hardware or install software.You can choose from Memcached or Redis protocol-compliant cache engine software, and let ElastiCache perform software upgrades and patch management for you automatically. For enhanced security, ElastiCache runs in the Amazon Virtual Private Cloud (Amazon VPC) environment, giving you complete control over network access to your cache cluster.With just a few clicks in the AWS Management Console, you can add resources to your ElastiCache environment, such as additional nodes or read replicas, to meet your business needs and application requirements.

Existing applications that use Memcached or Redis can use ElastiCache with almost no modification; your applications simply need to know the host names and port numbers of the ElastiCache nodes that you have deployed.The ElastiCache Auto Discovery feature lets your applications identify all of the nodes in a cache cluster and connect to them, rather than having to maintain a list of available host names and port numbers; in this way, your applications are effectively insulated from changes to cache node membership.

Before I show you how to setup AWS ElastiCache cluster lets go through basics:

A cache node is the smallest building block of an ElastiCache deployment. Each node has its own memory, storage and processor resources, and runs a dedicated instance of cache engine software — either Memcached or Redis. ElastiCache provides a number of different cache node configurations for you to choose from, depending on your needs.You can use these cache nodes on an on-demand basis, or take advantage of reserved cache nodes at significant cost savings.

A cache cluster is a collection of one or more cache nodes, each of which runs its own instance of supported cache engine software.You can launch a new cache cluster with a single ElastiCache operation (CreateCacheCluster), specifying the number of cache nodes you want and the runtime parameters for the cache engine software on all of the nodes. Each node in a cache cluster has the same compute, storage and memory specifications, and they all run the same cache engine software (Memcached or Redis).The ElastiCache API lets you control cluster-wide attributes, such as the number of cache nodes, security settings, version upgrades, and system maintenance windows.

Cache parameter groups are an easy way to manage runtime settings for supported cache engine software. Memcached has many parameters to control memory usage, cache eviction policies, item sizes, and more; a cache parameter group is a named collection of Memcached specific parameters that you can apply to a cache cluster

. Memcached clusters contain from 1 to 20 nodes across which you can horizontally partition your data

1. In the Name enter desired cluster name. Remember, it must begin with the letter and can contain 1 to 20 alphanumeric characters, however cannot have two consecutive hyphens nor end with the hyphen

2. In Port you can accept default at 11211. If you have a reason to use a different port, type the port number.

3. For Parameter group, choose the default parameter group, choose the parameter group you want to use with this cluster, or choose Create new to create a new parameter group to use with this cluster.

4. For Number of nodes, choose the number of nodes you want for this cluster. You will partition your data across the cluster’s nodes.If you need to change the number of nodes later, scaling horizontally is quite easy with Memcached

5. Choose how you want the Availability zone(s) selected for this cluster. You have two options

No Preference. ElastiCache selects availability zone for each node in your cluster

6. For Security groups, choose the security groups you want to apply to this cluster.

7. The Maintenance window is the time, generally an hour in length, each week when ElastiCache schedules system maintenance for your cluster. You can allow ElastiCache choose the day and time for your maintenance window (No preference), or you can choose the day, time, and duration yourself

For clusters running the Memcached engine, ElastiCache supports Auto Discovery—the ability for client programs to automatically identify all of the nodes in a cache cluster, and to initiate and maintain connections to all of these nodes.From the application’s point of view, connecting to the cluster configuration endpoint is no different from connecting directly to an individual cache node

Process of Connecting to Cache Nodes

1. The application resolves the configuration endpoint’s DNS name. Because the configuration endpoint maintains CNAME entries for all of the cache nodes, the DNS name resolves to one of the nodes; the client can then connect to that node.

2. The client requests the configuration information for all of the other nodes. Since each node maintains configuration information for all of the nodes in the cluster, any node can pass configuration information to the client upon request.

3. The client receives the current list of cache node hostnames and IP addresses. It can then connect to all of the other nodes in the cluster.

The configuration information for Auto Discovery is stored redundantly in each cache cluster node. Client applications can query any cache node and obtain the configuration information for all of the nodes in the cluster.

Continuing on SQL Server on Amazon Web Services RDS journey that I started on previous post. In that post I covered basics of RDS, creating SQL Server instance in RDS and connecting to it. In this post I would like to cover additional topics that are near and dear to any RDBMS DBA, including backups\restores, HADR and talk about limitations of SQL Server on RDS.

Limitations for SQL Server in Amazon Web Services RDS. When you are setting up SQL Server in RDS you need to be aware of following:

You only get SQL Server basic RDBMS Engine Services. SQL Server comes in with large number of components in addition to basic database services, These include Analysis, Reporting and Integration Services, Master Data Services, Distributed Relay, etc. As you setup SQL Server on premises you pick components that you need. With RDS all you can host is DB Engine Service, So if your application architecture involves running SQL Server instances with SSAS or SSRS, those components will have to be hosted elsewhere: this can be either an on-premise server within your network or an EC2 instance in the Amazon cloud. As an architectural best practice, it would make sense to host them in EC2.

Although you can login from SQL Management Studio (SSMS) there is no Remote Desktop option available that I could find.

Very limited file access.

Size limitations. AFAIK the minimum size of a SQL Server RDS instance for Standard or Enterprise Edition is 200 GB.and max is 4 TB. Beyond 4 TB you would have create another instance and probably implement some sort of sharding methodology.

Another limitation to be aware of is the number of SQL Server databases an RDS instance can host. It’s only 30 per instance.

There are no high availability options available. So no replication, no log shipping, no AlwaysOn and no manual configuration of database mirroring. Mirroring is enabled for all databases if you are opting for a Multi-AZ rollout. The secondary replica hosts the mirrored databases.

No distributed transaction support with MSDTC

No FileStream, CDC, SQL Audit , Policy Based Management, etc.

So if I have no AlwaysOn Availability Groups or cluster how would HADR work?

Amazon RDS provides high availability and failover support for DB instances using Multi-AZ deployments. Multi-AZ deployments for Oracle, PostgreSQL, MySQL, and MariaDB DB instances use Amazon technology, while SQL Server DB instances use SQL Server Mirroring. In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to a standby replica to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups.

The RDS console shows the Availability Zone of the standby replica (called the secondary AZ), or you can use the command rds-describe-db-instances or the API action DescribeDBInstances to find the secondary AZ. When using the BYOL licensing model, you must have a license for both the primary instance and the standby replica.

In the event of a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have enabled Multi-AZ. The time it takes for the failover to complete depends on the database activity and other conditions at the time the primary DB instance became unavailable. Failover times are typically 60-120 seconds. However, large transactions or a lengthy recovery process can increase failover time. The failover mechanism automatically changes the DNS record of the DB instance to point to the standby DB instance.

Backups and Restores. Backup and restore has always been the easiest way of migrating SQL Server databases. Unfortunately this option is not available for RDS so DBAs have to fall back on the manual process of creating database schemas and importing data. You can also use the Database Import Export Wizard in SQL Server to move your data. In general Amazon defines has more on that procedure here – http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/SQLServer.Procedural.Importing.html

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the cloud. It provides cost-efficient, resizeable capacity for an industry-standard relational database and manages common database administration tasks.

The basic building block of Amazon RDS is the DB instance. A DB instance is an isolated database environment in the cloud. A DB instance can contain multiple user-created databases, and you can access it by using the same tools and applications that you use with a stand-alone database instance. You can create and modify a DB instance by using the Amazon RDS command line interface, the Amazon RDS API, or the AWS Management Console.

Each DB instance runs a DB engine. Amazon RDS currently supports the MySQL, PostgreSQL, Oracle, and Microsoft SQL Server DB engines. As per my background I will illustrate running DB instance of SQL Server in RDS here, but in the future I may venture further touching MySQL and especially its specialized Amazon cousin known as Aurora. The computation and memory capacity of a DB instance is determined by its DB instance class. You can select the DB instance that best meets your needs. If your needs change over time, you can change DB instances. For each DB instance, you can select from 5 GB to 3 TB of associated storage capacity, instance storage comes in three types: Magnetic, General Purpose (SSD), and Provisioned IOPS (SSD). They differ in performance characteristics and price, allowing you to tailor your storage performance and cost to the needs of your database.

Amazon cloud computing resources are housed in highly available data center facilities in different areas of the world (for example, North America, Europe, or Asia). Each data center location is called a region.Each region contains multiple distinct locations called Availability Zones, or AZs. Each Availability Zone is engineered to be isolated from failures in other Availability Zones, and to provide inexpensive, low-latency network connectivity to other Availability Zones in the same region. By launching instances in separate Availability Zones, you can protect your applications from the failure of a single location. You can run your DB instance in several Availability Zones, an option called a Multi-AZ deployment. When you select this option, Amazon automatically provisions and maintains a synchronous standby replica of your DB instance in a different Availability Zone. The primary DB instance is synchronously replicated across Availability Zones to the standby replica to provide data redundancy, failover support, eliminate I/O freezes, and minimize latency spikes during system backups.

Amazon RDS supports DB instances running several editions of Microsoft SQL Server 2008 R2 and SQL Server 2012. Amazon RDS currently supports Multi-AZ deployments for SQL Server using SQL Server Mirroring as a high-availability, failover solution. Amazon also supports TDE (Transparent Database Encryption) feature in SQL Server , as well as allows SSL connections to your SQL Server instance as necessary. In order to deliver a managed service experience, Amazon RDS does not provide shell access to DB instances, and it restricts access to certain system procedures and tables that require advanced privileges. Amazon RDS supports access to databases on a DB instance using any standard SQL client application such as Microsoft SQL Server Management Studio. Amazon RDS does not allow direct host access to a DB instance via Telnet, Secure Shell (SSH), or Windows Remote Desktop Connection. When you create a DB instance, you are assigned to the db_owner role for all databases on that instance, and you will have all database-level permissions except for those that are used for backups (Amazon RDS manages backups for you).

Obviously before you proceed you need to sign up for AWS account. You can do so here – https://aws.amazon.com/. Next you should create IAM (Identity and Access Management) user.

Once you log into AWS Console and go to IAM, as above.

In the navigation pane, choose Groups, and then choose Create New Group

For Group Name, type a name for your group, such as RDS_Admios, and then choose Next Step.

In the list of policies, select the check box next to the AdministratorAccess policy. You can use the Filter menu and the Search box to filter the list of policies.

Choose Next Step, and then choose Create Group.

Your new group is listed under Group Name after you are done. Next create a user and add user to the group. I covered creating IAM users in my previous blogs, but here it is again:

In the navigation pane, choose Users, and then choose Create New Users

In box 1, type a user name. Clear the check box next to Generate an access key for each user. Then choose Create.

In the Groups section, choose Add User to Groups

Select the check box next to your newly created admins group. Then choose Add to Groups.

Scroll down to the Security Credentials section. Under Sign-In Credentials, choose Manage Password. Select Assign a custom password. Then type a password in the Password and Confirm Password boxes. When you are finished, choose Apply.

Next in AWS Console I will select RDS from Data Section

I will pick SQL Server from available DB Instance types.

Picking SQL Server Standard for sake of example I am presented with choice of Multi-Availability Zone deployment for HADR and consistent throughput storage, For the sake of this quick post I will do what you will not do for production and decline that option: After all doing things properly does cost money:

After adding some more info including Availability Group, Backup preferences and retention I can hit Create Instance button.

Once Amazon RDS provisions your DB instance, you can use any standard SQL client application to connect to the instance. In order for you to connect, the DB instance must be associated with a security group containing the IP addresses and network configuration that you will use to access the DB instance. So lets connect to my new DB instance using SSMS:

On the Instances page of the AWS Management Console, select the arrow next to the DB instance to show the instance details. Note the server name and port of the DB instance, which are displayed in the Endpoint field at the top of the panel, and the master user name, which is displayed in the Username field in the Configuration Details section

Amazon DynamoDB is AWS primary NoSQL data storage offering. Announced on January 18, 2012, it is a fully managed NoSQL database service that provides fast and predictable performance along with excellent scalability. DynamoDB differs from other Amazon services by allowing developers to purchase a service based on throughput, rather than storage. Although the database will not scale automatically, administrators can request more throughput and DynamoDB will spread the data and traffic over a number of servers using solid-state drives, allowing predictable performance. It offers integration with Hadoop via Elastic MapReduce.

The above diagram shows how Amazon offers its various cloud services and where DynamoDB is exactly placed. AWS RDS is relational database as a service over Internet from Amazon while Simple DB and DynamoDB are NoSQL database as services. Both SimpleDB and DynamoDB are fully managed, non-relational services. DynamoDB is build considering fast, seamless scalability, and high performance. It runs on SSDs to provide faster responses and has no limits on request capacity and storage. It automatically partitions your data throughout the cluster to meet the expectations while in SimpleDB we have storage limit of 10 GB and can only take limited requests per second. Also in SimpleDB we have to manage our own partitions. So depending upon your need you have to choose the correct solution.

As I already went through basics of getting started with AWS .NET SDK in my previous post I will not go through it here. Instead there are following basics to consider:

The AWS SDK for .NET provides three programming models for communicating with DynamoDB: the low-level model, the document model, and the object persistence model

The low-level programming model wraps direct calls to the DynamoDB service.You access this model through the Amazon.DynamoDBv2 namespace. Of the three models, the low-level model requires you to write the most code.

The document programming model provides an easier way to work with data in DynamoDB. This model is specifically intended for accessing tables and items in tables.You access this model through the Amazon.DynamoDBv2.DocumentModel namespace. Compared to the low-level programming model, the document model is easier to code against DynamoDB data. However, this model doesn’t provide access to as many features as the low-level programming model. For example, you can use this model to create, retrieve, update, and delete items in tables.

The object persistence programming model is specifically designed for storing, loading, and querying NET objects in DynamoDB.You access this model through the Amazon.DynamoDBv2.DataModel namespace.

We will start by navigating to AWS console and creating a table.

For the sake of easy tutorial I will create a table customer with couple of fields. Here I have to think about design of my table a little, in particular around hash and range keys. In DynamoDB concept of “Hash and Range Primary Key” means that a single row in DynamoDB has a unique primary key made up of both the hash and the range key.

Hash Primary Key – The primary key is made of one attribute, a hash attribute. For example, a ProductCatalog table can have ProductID as its primary key. DynamoDB builds an unordered hash index on this primary key attribute. This means that every row is keyed off of this value. Every row in DynamoDB will have a required, unique value for this attribute. Unordered hash index means what is says – the data is not ordered and you are not given any guarantees into how the data is stored. You won’t be able to make queries on an unordered index such as Get me all rows that have a ProductID greater than X. You write and fetch items based on the hash key. For example, Get me the row from that table that has ProductID X. You are making a query against an unordered index so your gets against it are basically key-value lookups, are very fast, and use very little throughput.

Hash and Range Primary Key – The primary key is made of two attributes. The first attribute is the hash attribute and the second attribute is the range attribute. For example, the forum Thread table can have ForumName and Subject as its primary key, where ForumName is the hash attribute and Subject is the range attribute. DynamoDB builds an unordered hash index on the hash attribute and a sorted range index on the range attribute.This means that every row’s primary key is the combination of the hash and range key. You can make direct gets on single rows if you have both the hash and range key, or you can make a query against the sorted range index. For example, get Get me all rows from the table with Hash key X that have range keys greater than Y, or other queries to that affect. They have better performance and less capacity usage compared to Scans and Queries against fields that are not indexed

In my case for my very simplistic example here I will use simple unique Hash Primary Key:

After finishing Create Table Wizard I can now see my table in the console.

Better yet, I can now easily see and modify my table in AWS Explorer in VS, add\remove indexes, etc.:

Few words on indexing. A quick question: while writing a query in any database, keeping the primary key field as part of the query (especially in the where condition) will return results much faster compared to the other way. Why? This is because of the fact that an index will be created automatically in most of the databases for the primary key field. This the case with DynamoDB also. This index is called the primary index of the table. There is no customization possible using the primary index, so the primary index is seldom discussed. DynamoDB has concept of two types of secondary indexes:

Local Secondary Indexes- an index that has the same hash key as the table, but a different range key. A local secondary index is “local” in the sense that every partition of a local secondary index is scoped to a table partition that has the same hash key

Global Secondary Indexes – an index with a hash and range key that can be different from those on the table. A global secondary index is considered “global” because queries on the index can span all of the data in a table, across all partitions

Local Secondary Indexes consume throughput from the table. When you query records via the local index, the operation consumes read capacity units from the table. When you perform a write operation (create, update, delete) in a table that has a local index, there will be two write operations, one for the table another for the index. Both operations will consume write capacity units from the table. Global Secondary Indexes have their own provisioned throughput, when you query the index the operation will consume read capacity from the table, when you perform a write operation (create, update, delete) in a table that has a global index, there will be two write operations, one for the table another for the index*.

Local Secondary Indexes can only be created when you are creating the table, there is no way to add Local Secondary Index to an existing table, also once you create the index you cannot delete it.Global Secondary Indexes can be created when you create the table and added to an existing table, deleting an existing Global Secondary Index is also allowed

Important Documentation Note: In order for a table write to succeed, the provisioned throughput settings for the table and all of its global secondary indexes must have enough write capacity to accommodate the write; otherwise, the write to the table will be throttled.

Next, I will fire up Visual Studio and create new AWS console application project:

In order to illustrate how easy it is to put and get data from table I created a very simple code snippet. First I created very simplistic class Customer, representation of some simple mythical company customer where we track unique id, name, city and state

Switching off my usual Azure and Google Cloud hang outs I decided to check out AWS. Moreover as AWS offers a nice ,NET SDK and toolkit for Visual Studio, I decided to start with .NET as technology for AWS.

First thing you need to do if you decided to develop with .NET for AWS is to create AWS account. To sign up for AWS account:

As part of sign up they will call you and you will enter PIN using phone keypad

AWS sends you a confirmation email after the sign-up process is complete. At any time, you can view your current account activity and manage your account by going to http://aws.amazon.com and clicking My Account/Console. To use the AWS SDK for .NET, you must have a set of valid AWS credentials, which consist of an access key and a secret key. These keys are used to sign programmatic web service requests and enable AWS to verify the request comes from an authorized source. You can obtain a set of account credentials when you create your account.However, AWS docs recommend that you do not use these credentials with AWS SDK for .NET. Instead, they want you to create one or more IAM users, and use those credentials. For applications that run on EC2 instances, you can use IAM roles to provide temporary credentials.

To use the AWS SDK for .NET, you must have the following installed:

Microsoft .NET Framework 3.5 or later

Visual Studio 2010 or later

AWS SDK for .NET

AWS Toolkit for Visual Studio, plugin that provides a user interface for managing your AWS resources from Visual Studio, and includes the AWS SDK for .NET

You can install AWS SDK for .NET from NuGet. NuGet always has the most recent versions of the AWSSDK assemblies, and also enables you to install previous versions. NuGet is aware of dependencies between assemblies and installs all required assemblies automatically. Assemblies installed with NuGet are stored with your solution rather than in a central location, such as Program Files. This enables you to install assembly versions specific to a given application without creating compatibility issues for other applications.

To use NuGet from Visual Studio Solution Explorer, right-click on your project and choose Manage NuGet Packages from the context menu.

From the Package Manager Console, you can install the desired AWSSDK assemblies using the Install-Package command. For example, to install the AWSSDK.AutoScaling assembly, use the following command:

PM> Install-Package AWSSDK.AutoScaling -Version 3.0.0.1

NuGet will also install any dependencies like AWSCore.

Now lets start a new project in Visual Studio. Since I have installed AWS VS Toolkit things are really simple here: