AWS代写 | CPSC 4910 / 5910 Cloud Computing Systems Lab RDS

Overview

From the previous lab, where you ran two instances of the MiniTwit application, you noticed that they’re both completely separate! Each one has its own completely separate set of posts, user accounts, etc. That’s because each one is using a database that is running locally on the server, so they have independent databases with different sets of data stored. If we’re going to use the cloud to scale out by creating more instances to handle more load, we want all those server instances to be running the same application, with the same posts, users, etc. What we need to do is to separate the database out of our EC2 instances so that all the web app servers connect to the same database, instead of each having their own.

We could use EC2 to run instances that have a database installed on them, so that we’d have two different types of instances: the ones running the web app itself using Flask (and no database), and the ones running the database only. However, the AWS Relational Database Service (RDS) provides us with a managed database service that runs in the cloud, which has several advantages: it easily scales, provides high availability and data resiliency, and we don’t have to manage the servers ourselves (e.g., dealing with updates and patches to the OS and database software over time).

Here is the overview page for RDS: https://aws.amazon.com/rds/

and here is the getting started documentation: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.html

In this lab, you are going to update the MiniTwit application to use RDS instead of the oninstance database, and you’re going to deploy your new MiniTwit and RDS database in AWS.

Part A: Using the Relational Database Service (RDS)

To create an RDS database, go to the RDS service in the AWS Management Console. Click on the “Create Database” button. RDS lets you choose which DB engine you want to use – we’ll use MySQL for this assignment. Select “MySQL” and also go ahead and check “Only enable options eligible for RDS Free Usage Tier” at the bottom of the page as well, then click “Next”. On the next page, choose the “db.t2.micro” (Note: t2, not t3) instance type – this is probably the default.

Leave the rest of the Instance specifications section at the default values, but scroll down to the Settings section to configure your database. Give your database an instance identifier (this is the name of your RDS instance, not the name of the database hosted on the RDS instance), and choose a username and password for connecting to the database, then click “Next”.

On the final page, choose the “Default VPC” for your Virtual Private Cloud (VPC) and “No” for public accessibility (i.e., the DB should not be publicly accessible! Only your EC2 instances will be able to connect to it.). For the Availability Zone, you can choose “No preference” and for the VPC security groups, you should choose “Select existing VPC security groups” and then the default. In the Database Options section, give the database a name (this is the actual name of your database), and leave the rest of the options as their defaults. Everything after that can also stay at its defaults. Then click the “Create database” button!

It will take a minute to start up, but once it does, take a little time to explore the RDS Console and see what it shows you about your new database.

There is also some documentation on starting new RDS instances (specifically, MySQL instances) here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingCo nnecting.MySQL.html

Warning: You are charged per hour for each RDS instance you have running. Just like your EC2 instances, you shouldn’t leave RDS instances or clusters running for hours and hours without a good reason. Be sure to shut down your RDS instances and clusters when you’re done for the day. (Yes, that does mean you lose your data, but that should be okay for class assignment purposes.)

Note: MySQL databases don’t currently support the “Stop” option. You will need to “Delete” them instead – this means all the data stored in them will be deleted. This probably isn’t a big deal for the assignment, since we’re just creating test data, not real users or tweets. However, if you do want to preserve your data, you can take a “snapshot” of your database, and then restore that snapshot when you start up new instances again.

Part B: Updating MiniTwit to use RDS

Now you need to change the MiniTwit code to access the RDS database instead of the local one on the EC2 server instance. Since we chose the “MySQL” engine for our RDS database, we write our Python code to connect to it the same way we’d write Python code to connect to any MySQL database.

And the official documentation: https://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html

If you need to know more about Flask so you can understand the MiniTwit code, there is a quickstart overview here:

http://flask.pocoo.org/docs/0.12/quickstart/

One catch for this assignment is that the MiniTwit application was initially built using the SQLite database engine, which is a lightweight single-application database that runs as a library inside the application that’s using it (rather than the database engine being its own process). SQLite is very popular for good reason – it’s lightweight and relatively easy to use. However, it’s not the best option for a database that you access over the network, which is one reason we’re switching to MySQL for our RDS database. (The other reason is that SQLite is not available as an option for RDS.)

Making this change presents two challenges:

SQLite uses a slightly idiosyncratic SQL syntax, so we need to patch a couple things back to standard SQL so that MySQL will understand (more on this shortly)

we need to use a different (though similar) library to connect to the database – the MySQL library is the one described in the documentation links above

SSH to your EC2 instance (start a new one as you did in the previous lab, if you don’t have one running already – you do not need any User Data for now). Go to the flask/examples/minitwit/minitwit directory and edit the minitwit.py file – this is the main application code for MiniTwit. Find the line:

from sqlite3 import dbapi2 as sqlite3 Comment out or delete that line and replace it with the MySQL library:

import mysql.connector

Now find the code that calls connect() with the sqlite3 library and change that to connect using the MySQL library. (See the documentation linked above for more information about how to do this.) You’ll want to use your RDS database “endpoint” for the host parameter, and the user and password parameters should be the username and password you chose for it. The database parameter is the name of your database (not the instance identifier name).

To find the name of the connection endpoint that you should connect to, go to your database’s page in the RDS Console. Scroll down to the Connectivity (Endpoint & port) section – the endpoint is what you’ll use to connect to this database. (Details on this are explained in the RDS documentation linked above.)

The other thing you’re going to have to patch up is that get_db() is not returning a Connection object instead of a Cursor object. (It’s okay if you don’t know what those are; the point is it’s a different kind of object being returned now.) This will be okay for most cases, but

you’ll notice in the code that the db.exceute() function is called in several places… However, Connections don’t have an execute() method; Cursors do. Fortunately, the Connection does have a cursor() method that you can use to retrieve the cursor, but you’ll want to use the “dictionary=True” argument so that the results are returned in the same data structure (i.e., a dict()) that the previous code used. (Otherwise it is returned as a tuple, and we’d have to chance a bunch of other code to accommodate that.) For example: db.cursor(dictionary=True).execute()

Where the fetchall() function is being called, that is also a method on Cursor objects. So instead of calling it on the return value of execute(), you’ll want to get the Cursor and save that to a variable. Then call execute() on the cursor (and disregard the return value), then call fetchall() on the cursor – that should successfully get all the results from the query you executed.

Here’s a page with more information and some examples: http://www.mysqltutorial.org/python-mysql-query/

Note: If you are experiencing an error that says _mysql_connector.MySQLInterfaceError: “Commands out of sync; you can’t run this command now” then you may need to make a change to how the init_db() function is reading the schema.sql file. The file contains multiple SQL commands separated by semicolons (;), and the provided code just reads the entire file and passes it to the execute() method. If you instead split the file on the semicolons (;) and pass each command individually to successive calls to execute(), that seems to fix this problem. See the Python split() method for this.

At this point you should be able to test your connection by running flask initdb. If it fails with an Exception _mysql_exceptions.ProgrammingError that says something like “You have an error in your SQL syntax;…” then that is because we haven’t accounted for the differences in SQL syntax yet, which we will take care of in Part D, but your code did successfully connect to the database and attempt to run the SQL query, which is all we care about for the moment. (If you see some other error, then you code may be having trouble connecting.)

Part C: Getting Secure (and maintainable)

There’s a critical problem with what you’ve (probably) just done! You probably just hard coded your database password into your application code. This is a serious security risk – if someone gets your code then they have the password to your database. Keeping the code secure might seem like a reasonable solution, but you quickly run into problems in the real world because you may do things like check the code into source control, or something else may happen that leaks the code. Industry best practice is to never hard code passwords or other security

credentials into source code – instead, they should be saved in configuration files (that are never checked into source control!) or environment variables that the code can dynamically read at runtime.

It also helps code maintainability because if you ever change the password to the database (or the database name, or any other connection info), then you need to make a code change in order to continue being able to connect. It’s more maintainable if matters of database configuration are also matters of application configuration, rather than the application code itself. So, let’s store not only the password, but the username and database endpoint in environment variables as well. (We’re using environment variables rather than a config file because they can be set in the User Data of your instance, rather than a file stored on the instance’s hard drive – you’ll see shortly why it’s easier to chance User Data than a file on the drive.)

To access environment variables from Python code, you can use the os.environ data structure. To import this, use this import:

from os import environ

Then you can access environ like a dict() (A.K.A. map) data structure. For example, to read the FOO environment variable, you can use:

environ[‘FOO’]

Here is the official documentation about os.environ: https://docs.python.org/2/library/os.html#os.environ

Now you can come up with appropriate environment variable names (DB_PASSWD, for example) and use environ to read them and use them to connect to the database.

Before you run the Flask app, you’ll need to actually set the environment variables, though. In order to do that, run commands like this one at the command line:

export DB_PASSWD=donotusethisasyourpassword

These will be “sticky” so you don’t need to rerun the export commands each time you start Flask, although you will need to rerun them if you log out and SSH in again or reboot the instance.

When you start an EC2 instance, you can add these export statement to your User Data in order for them to run automatically when the instance starts – then you don’t need to type them manually each time! This also means that if any of this info changes, you just update the User Data and don’t have to change the code or the files on the instance. It also means the password isn’t hard coded in the source code, nor in a file that’s stored on the instance! We’ve already set an environment variable in the User Data in the previous lab – here’s the User Data from that lab again, so you can spot the export command.

Part D: Fixing SQL

There are some differences between the SQL syntax and API details between SQLite and the MySQL library. We’ll need to patch these up.

1. SQLite uses a nonstandard autoincrement keyword instead of the standard auto_increment (which MySQL uses). Open the schema.sql file and change all of the instances of autoincrement to auto_increment.

At this point you can try running flask initdb again, and it should work. (The output should say “Initialized the database.”) If you get that result, then everything is working okay so far! Not only that, but we needed to run the “initdb” command successfully once after creating the RDS database to set up the database schema and get ready to store data… This should only need to be (successfully) run once ever, even if you terminate your EC2 instance or start other instances of the MiniTwit app – although you will need to run it again (just once) if you ever delete your RDS database and re-create it. (There’s no harm in running “initdb” more than once, though, with the caveat that it will delete any data stored in the database when you do…)

Note: At this point, if you get Exception _mysql_exceptions.OperationalError that reads “Lost connection to MySQL server during query”, try the command again. Don’t worry about this if it only happens once.

2. Parameterized queries have a slightly different syntax. When the execute() function is called, SQLite uses ? to denote parameters in the query, but MySQL uses %s instead. Edit minitwit.py to fix this.

This page documents the SQLite syntax (i.e., how MiniTwit is written now): https://docs.python.org/2/library/sqlite3.html#sqlite3.Cursor.execute

This page documents the MySQL library syntax that you need to change to: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorexecute.html

At this point, things should just about be working… Run flask run –host=0.0.0.0 manually and connect to your instance in the browser (remember to connect on port 5000). If you run into errors, you will see them in the Flask output on your SSH terminal, and you may get a 500 Internal Server Error on your browser. Use the output in your SSH session to go about debugging your application. Be sure to test it thoroughly in the browser – creating users, logging in, logging out, creating posts, looking at the personal timeline, looking at the public timeline (with multiple users’ posts), etc.

You’re not done yet, but your modified minitwit.py file is what you’ll upload to Canvas to submit this assignment. You may want to SCP a copy of it off your EC2 instance and onto a machine that you can access Canvas from. For example, on cs1 or your machine:

Now that you’ve made you changes, you’ll want to create a new Amazon Machine Image (AMI) that has your changes on it so that you can create new EC2 instances from that image and have your new code ready to run. Go to the EC2 console and select the running instance that has your updated code on it. From the Actions menu, go to Actions->Image->Create Image. Give it a name (and optionally, you may provide a description) and then click “Create Image”. If you then click on the AMIs section of the Console (under the IMAGES heading), you will see your new image there with the status “pending” – this means your image is in the process of being created. (Warning: Do not terminate the EC2 instance you used to create this AMI while the image is still “pending”!) Once the status changes to “available”, your AMI is created and ready for you to use it to create new EC2 instances! You can either select your AMI here and click the “Launch” button, or you can go to the EC2 Instances console, click “Launch Instance”, then select “My AMIs” and you’ll see your AMI listed there for you to choose from.

Note: There is a very small per-GB charge for storing AMIs. It doesn’t add up to much, but it’s probably best not to keep around dozens of old AMIs that you aren’t using anymore. Be sure to clean up your old, unused ones – but use caution that you don’t accidentally delete on you still need, or that has the only copy of some significant change! Keeping just a couple AMIs around for the duration of the quarter won’t be a big deal, though.

If you need to make more changes later, you can always start from this AMI, make changes to your instance, and then create another new AMI. However, you can see now why it’s easier to simply change the User Data of an instance than to change a file on the hard drive (which you would have to save into a new AMI).

Part F: Going Live!

The last part of your assignment is to set up a live instance of your MiniTwit application!

Start by making sure you’ve created your AMI successfully and everything you need is on there. (You might even want to start an instance using your new AMI and check that it works.) Optionally, you may run the flask initdb command in one of your good instances to “refresh” your RDS database (i.e., delete the old data). Then terminate all your running instances to be sure you’re starting from a clean slate.

Start 2 instances and create an Elastic Load Balancer (ELB) for them. (You don’t need to do Auto Scaling for this assignment; just the load balancer is fine.) Review the instructions from Lab 1 if you need to refresh your memory on how to do this.

Be sure your instances are set up as Spot Requests and that you have given them appropriate User Data so they start up on their own. See the User Data from Lab 1, but change the line that starts Flask to:

su -c “flask run –host=0.0.0.0” ec2-user

This change makes it run MiniTwit as the ec2-user instead of as root, which is more secure, but also needed to give it access to the MySQL library that we used. You’ll also need to add the export statements for your database connection info. (You may optionally create a Launch Template if you want it to store your User Data for you, instead of typing it again each time you start an instance – this isn’t required and won’t be graded, though.)

Test your application and be sure everything is working. Your load balancer should route requests to the two instances, but you should have a consistent view of the data (due to using the same database instance) no matter which instance you are routed to.

Submission

You should now have an AMI saved with your updated MiniTwit code on it, a running RDS database, 2 running instances that were created from your AMI and connect to the RDS database, and a load balancer connected to both instances.

When everything is running properly, submit the final version of you minitwit.py file on Canvas. Your grade will be based on a combination of that code and your running instances, database, and load balancer in the cloud. (The instructor will terminate those after grading, just as in the first lab.)