This assumes you are running the script.py Python script located in the /home/ec2-user/ directory and that you want the output of this script forwarded to the file script.py.log located in the same directory.

5. Monitor Script Output on the Server

This may be useful if you output a score each epoch or after each algorithm run.

This example will list the last few lines of your script log file and update the output as new lines are added to the script.

1

tail -f script.py.log

Amazon may aggressively close your terminal if the screen does not get new output in a while.

An alternative is to use the watch command. I have found Amazon will keep this terminal open:

1

watch "tail script.py.log"

I have found that standard out (stout) from python scripts does not appear to be updated frequently.

I don’t know if this is an EC2 thing or a Python thing. This means you may not see the output in the log updated often. It seems to be buffered and output when the buffer hits fixed sizes or at the end of a run.

Do you know more about this?
Let me know in the comments below.

6. Monitor System and Process Performance on the Server

It is a good idea to monitor the EC2 system performance. Especially the amount of RAM you are using and have left.

You can do this using the top command that will update every few seconds.

1

top -M

You can also monitor the system and just your process, if you know its process identifier (PID).

1

top -p PID -M

7. Monitor GPU Performance on the Server

It is a good idea to keep an eye on your GPU performance.

Again, keep an eye on GPU utilization, on which GPUs are running, if you plan on running multiple scripts in parallel and in GPU RAM usage.

You can use the nvidia-smi command to keep an eye on GPU usage. I like to use the watch command that keeps the terminal open and clears the screen for each new result.

1

watch "nvidia-smi"

8. Check What Scripts Are Still Running on the Server

It is also important to keep an eye on which scripts are still running.

You can do this with the ps command.

Again, I like to use the watch command to keep the terminal open.

1

watch "ps -ef | grep python"

9. Edit a File on Server

I recommend not editing files on the server unless you really have to.

Nevertheless, you can edit a file in place using the vi editor.

The example below will open your script in vi.

1

vi ~/script.py

Of course, you can use your favorite command line editor, like emacs; this note is really for you if you are new to the Unix command line.

10. From Your Workstation Download Files from the Server

I recommend saving your model and any results and graphs explicitly to new and separate files as part of your script.

You can download these files from your server instance to your workstation using secure copy (scp).

The example below is run from your workstation and will copy all PNG files from your home directory to your workstation.

1

scp -i ~/.ssh/aws-keypair.pem ec2-user@54.218.86.47:~/*.png .

Additional Tips and Tricks

This section lists some additional tips when working heavily on AWS EC2.

Run multiple scripts at a time. I recommend selecting hardware that has multiple GPUs and running multiple scripts at a time to make full use of the platform.

Write and edit scripts on your workstation only. Treat EC2 as a pseudo-production environment and only ever copy scripts and data there to run. Do all development on your workstation and write small tests of your code to ensure it will work as expected.

Save script outputs explicitly to a file. Save results, graphs, and models to files that can be downloaded later to your workstation for analysis and application.

Use the watch command. Amazon aggressively kills terminal sessions that have no activity. You can keep an eye on things using the watch command that send data frequently enough to keep the terminal open.

Run commands from your workstation. Any of the commands listed above intended to be run on the server can also be run from your workstation by prefixing the command with “ssh –i ~/.ssh/aws-keypair.pem ec2-user@54.218.86.47” and quoting the command you want to run. This can be useful to check in on processes throughout the day.

Summary

In this tutorial, you discovered the 10 commands that I use every time I am training large deep learning models on AWS EC2 instances with GPUs.

Specifically, you learned:

How to copy your data to and from your EC2 instances.

How to set up your scripts to run for days, weeks, or months safely.

How to monitor processes, the system, and GPU performance.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Hi Jason, another tool I can not live without is tmux. It enables you easy switching between programs in one terminal and keeps programs running when exiting terminal (https://github.com/tmux/tmux/wiki)

Hey Jason,
Thanks for your post, really helpful. One service that I’m using is Amazon batch, that will automate every training that I want to do using spot instances. Coupling it with S3 and docker, it removes the hassle of handling the code and dataset. Just a couple of dev ops steps more but i think it is worth it 🙂

Could you recommend a verbose setting to the Keras functions/methods? I’ve never understood the difference between 1, 2, and 100.

Also, I’ve noticed that if I have a long-running Jupyter notebook running Keras code on a P2 machine that it will often get disconnected. However, if I use verbose=2 for instance then the DL task will finish. Do you have any additional recommendations for using Jupyter notebooks with DL on AWS?

1) We could run py scripts on AWS and point the data source (csv files) to my local machine?
2) Or is a way to upload all my csv files to AWS and retrieve them?
3) What are the commands for retrieving the data?

Your books have helped me a lot, thank you and please keep up good work!