Getting started with CUDA on AWS

05 October 2019

I’m currently in the middle of preparing a presentation on CUDA for C and C++ programmers. Given that not everyone has access to an NVIDIA GPU, I was planning to include some instructions for getting started with CUDA on AWS. But instead of cramming it into an already over-crowded slide deck, I’ve decided to post the instructions here, with some additional information about EC2 instances with GPU support. Assuming that you’ve already set up an AWS account and know how to start an EC2 instance, these instructions will get you an EC2 instance that can compile and run examples from the CUDA Toolkit.

Getting started with CUDA on AWS

This guide assumes you have created an AWS account, and created or uploaded a Key Pair for use with EC2. We’ll be working on the command line, so the Key Pair is required to log in to our EC2 instance using SSH. These instructions also assume that you’re using some kind of UNIX environment, with ssh available.

Some notes on AWS

Recommended AMI

We can make our lives easier by starting with an AMI that has the CUDA Toolkit pre-installed.

The AMI we’re going to use is called ‘Deep Learning Base AMI (Ubuntu) Version 19.2’. This has the CUDA Toolkit pre-installed, and also contains a few CUDA-related additions that are useful for Deep Learning applications. The AMI ID is ami-03eb555c2d27cde91.

Instance types

The AWS documentation currently recommends three categories of instance types that have NVIDIA GPUs:

P3 Instances have up to 8 NVIDIA Tesla V100 GPUs
P2 Instances have up to 16 NVIDIA Tesla K80 GPUs
G3 Instances have up to 4 NVIDIA Tesla M60 GPUs

Older GPU instance types, such as G2, can still be used, but are not recommended. These previous generation instance types used older NVIDIA graphics cards, and do not support all of the features available in CUDA 10+.

The instance type that we’re going to use for this quick test is g3s.xlarge. This is the smallest G3 instance type, and at time of writing it is priced at USD$0.75 per hour in us-east-1.

Pricing

If you find yourself making regular use of EC2, one option for reducing the cost of your GPU experiments is to take advantage of spot pricing.

When using spot pricing, it may also help to consider different AWS regions. Depending on popularity and time of day, some regions can be significantly cheaper than others. Note that choosing a different region may require that you find equivalent AMIs in that region, or copy any relevant AMIs to that region yourself.

Spinning up…

Once you’ve logged in your AWS account, you can follow these instructions to spin up an EC2 instance with GPU support:

Go to the EC2 Management Console (us-east-1)
Click ‘Launch Instance’.
On the ‘Choose an Amazon Machine Image (AMI)’ page, click ‘Community AMIs’.
In the search box, enter ‘ami-03eb555c2d27cde91’.
The only result should be the ‘Deep Learning Base AMI (Ubuntu) Version 19.2’ AMI
Select that image
On the ‘Choose an Instance Type’ page, choose the ‘g3s.xlarge’ instance type.
At this point, you should be able to use the defaults for the rest of the wizard, so you can click ‘Review and Launch’.
On the ‘Review Instance Launch’ page, click ‘Launch’.
Finally, you’ll be asked to choose a Key Pair, after which you can click ‘Launch’.
On the confirmation page, you can click ‘View Instances’ to see the new instance in the AWS Web Console.

Logging in…

Here are the steps to log in using SSH:

Find the public DNS address in the description tab of the EC2 Management Console. The address will look something like ec2-54-166-30-107.compute-1.amazonaws.com.
It may take a while for the instance to boot up and for status checks to complete. Status checks should read ‘2/2 checks passed’.
Once the EC2 instance is ready, you can log in. The default user is ‘ubuntu’, so you can log in using SSH:
```
 ssh ubuntu@ec2-54-166-30-107.compute-1.amazonaws.com
```
It may be necessary to provide the path the private key for the keypair that you used, e.g:
```
 ssh -i ~/path/to/my/key ubuntu@ec2-54-166-30-107.compute-1.amazonaws.com
```

Finally, before compiling any sample code, it is a good idea to run nvidia-smi to check the status of the GPU:

 ubuntu@ip-54-166-30-107:~$ nvidia-smi
 Sat Oct  5 10:44:04 2019
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |===============================+======================+======================|
 |   0  Tesla M60           On   | 00000000:00:1E.0 Off |                    0 |
 | N/A   28C    P8    23W / 150W |      0MiB /  7618MiB |      0%      Default |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                       GPU Memory |
 |  GPU       PID   Type   Process name                             Usage      |
 |=============================================================================|
 |  No running processes found                                                 |
 +-----------------------------------------------------------------------------+

Compiling the Vector Addition sample

Now we can compile the vector addition sample from the CUDA SDK. The code for this example can be found at /usr/local/cuda/NVIDIA_CUDA-10.0_Samples/0_Simple/vectorAdd. The easiest way to compile this is to copy the example code into the home directory:

cp -R /usr/local/cuda/NVIDIA_CUDA-10.0_Samples/0_Simple/vectorAdd .
cd vectorAdd

The example includes a Makefile, but it is hard-coded to include headers from a relative path. Fixing this would be a pain, so instead we can run nvcc directly:

nvcc -o vectorAdd -I/usr/local/cuda/NVIDIA_CUDA-10.0_Samples/common/inc vectorAdd.cu

Finally, we can run the application:

./vectorAdd

This is the expected output:

Vector addition of 50000 elements
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Cleanup

Unfortunately, there are no GPU instances that qualify for the AWS free tier. So to avoid bill shock at the end of the month, you’ll want to stop or terminate your EC2 instance. You can do this from the AWS Web Console, or if you want to re-use it in the future, you can simply shut it down from the command line:

sudo shutdown -P now

Note that you may still incur fees for other resources attached to the instance, even when it is stopped.

Tristan Penman

Getting started with CUDA on AWS