Navigation

You are viewing the documentation for an older version of boto (boto2).

Boto3, the next version of Boto, is now
stable and recommended for general use. It can be used side-by-side with
Boto in the same project, so it is easy to start using Boto3 in your existing
projects as well as new projects. Going forward, API updates and all new
feature work will be focused on Boto3.

At this point the variable conn will point to an S3Connection object. In
this example, the AWS access key and AWS secret key are passed in to the
method explicitly. Alternatively, you can set the environment variables:

AWS_ACCESS_KEY_ID - Your AWS Access Key ID

AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key

and then call the constructor without any arguments, like this:

>>> conn=S3Connection()

There is also a shortcut function in the boto package, called connect_s3
that may provide a slightly easier means of creating a connection:

>>> importboto>>> conn=boto.connect_s3()

In either case, conn will point to an S3Connection object which we will
use throughout the remainder of this tutorial.

Once you have a connection established with S3, you will probably want to
create a bucket. A bucket is a container used to store key/value pairs
in S3. A bucket can hold an unlimited amount of data so you could potentially
have just one bucket in S3 for all of your information. Or, you could create
separate buckets for different types of data. You can figure all of that out
later, first let’s just create a bucket. That can be accomplished like this:

Whoa. What happened there? Well, the thing you have to know about
buckets is that they are kind of like domain names. It’s one flat name
space that everyone who uses S3 shares. So, someone has already create
a bucket called “mybucket” in S3 and that means no one else can grab that
bucket name. So, you have to come up with a name that hasn’t been taken yet.
For example, something that uses a unique string as a prefix. Your
AWS_ACCESS_KEY (NOT YOUR SECRET KEY!) could work but I’ll leave it to
your imagination to come up with something. I’ll just assume that you
found an acceptable name.

The create_bucket method will create the requested bucket if it does not
exist or will return the existing bucket if it does exist.

The example above assumes that you want to create a bucket in the
standard US region. However, it is possible to create buckets in
other locations. To do so, first import the Location object from the
boto.s3.connection module, like this:

As you can see, the Location object defines a number of possible locations. By
default, the location is the empty string which is interpreted as the US
Classic Region, the original S3 region. However, by specifying another
location at the time the bucket is created, you can instruct S3 to create the
bucket in that location. For example:

>>> conn.create_bucket('mybucket',location=Location.EU)

will create the bucket in the EU region (assuming the name is available).

Once you have a bucket, presumably you will want to store some data
in it. S3 doesn’t care what kind of information you store in your objects
or what format you use to store it. All you need is a key that is unique
within your bucket.

The Key object is used in boto to keep track of data stored in S3. To store
new data in S3, start by creating a new Key object:

>>> fromboto.s3.keyimportKey>>> k=Key(bucket)>>> k.key='foobar'>>> k.set_contents_from_string('This is a test of S3')

The net effect of these statements is to create a new object in S3 with a
key of “foobar” and a value of “This is a test of S3”. To validate that
this worked, quit out of the interpreter and start it up again. Then:

There are a couple of things to note about this. When you send data to
S3 from a file or filename, boto will attempt to determine the correct
mime type for that file and send it as a Content-Type header. The boto
package uses the standard mimetypes package in Python to do the mime type
guessing. The other thing to note is that boto does stream the content
to and from S3 so you should be able to send and receive large files without
any problem.

When fetching a key that already exists, you have two options. If you’re
uncertain whether a key exists (or if you need the metadata set on it, you can
call Bucket.get_key(key_name_here). However, if you’re sure a key already
exists within a bucket, you can skip the check for a key on the server.

At times the data you may want to store will be hundreds of megabytes or
more in size. S3 allows you to split such files into smaller components.
You upload each component in turn and then S3 combines them into the final
object. While this is fairly straightforward, it requires a few extra steps
to be taken. The example below makes use of the FileChunkIO module, so
pipinstallFileChunkIO if it isn’t already installed.

>>> importmath,os>>> importboto>>> fromfilechunkioimportFileChunkIO# Connect to S3>>> c=boto.connect_s3()>>> b=c.get_bucket('mybucket')# Get file info>>> source_path='path/to/your/file.ext'>>> source_size=os.stat(source_path).st_size# Create a multipart upload request>>> mp=b.initiate_multipart_upload(os.path.basename(source_path))# Use a chunk size of 50 MiB (feel free to change this)>>> chunk_size=52428800>>> chunk_count=int(math.ceil(source_size/float(chunk_size)))# Send the file parts, using FileChunkIO to create a file-like object# that points to a certain byte range within the original file. We# set bytes to never exceed the original file size.>>> foriinrange(chunk_count):>>> offset=chunk_size*i>>> bytes=min(chunk_size,source_size-offset)>>> withFileChunkIO(source_path,'r',offset=offset, bytes=bytes) as fp:>>> mp.upload_part_from_file(fp,part_num=i+1)# Finish the upload>>> mp.complete_upload()

It is also possible to upload the parts in parallel using threads. The
s3put script that ships with Boto provides an example of doing so
using a thread pool.

Note that if you forget to call either mp.complete_upload() or
mp.cancel_upload() you will be left with an incomplete upload and
charged for the storage consumed by the uploaded parts. A call to
bucket.get_all_multipart_uploads() can help to show lost multipart
upload parts.

Once a bucket exists, you can access it by getting the bucket. For example:

>>> mybucket=conn.get_bucket('mybucket')# Substitute in your bucket name>>> mybucket.list()...listing of keys in the bucket...

By default, this method tries to validate the bucket’s existence. You can
override this behavior by passing validate=False.:

>>> nonexistent=conn.get_bucket('i-dont-exist-at-all',validate=False)

Changed in version 2.25.0.

Warning

If validate=False is passed, no request is made to the service (no
charge/communication delay). This is only safe to do if you are sure
the bucket exists.

If the default validate=True is passed, a request is made to the
service to ensure the bucket exists. Prior to Boto v2.25.0, this fetched
a list of keys (but with a max limit set to 0, always returning an empty
list) in the bucket (& included better error messages), at an
increased expense. As of Boto v2.25.0, this now performs a HEAD request
(less expensive but worse error messages).

If you were relying on parsing the error message before, you should call
something like:

The S3 service provides the ability to control access to buckets and keys
within s3 via the Access Control List (ACL) associated with each object in
S3. There are two ways to set the ACL for an object:

Create a custom ACL that grants specific rights to specific users. At the
moment, the users that are specified within grants have to be registered
users of Amazon Web Services so this isn’t as useful or as general as it
could be.

Use a “canned” access control policy. There are four canned policies
defined:

To set a canned ACL for a bucket, use the set_acl method of the Bucket object.
The argument passed to this method must be one of the four permissable
canned policies named in the list CannedACLStrings contained in acl.py.
For example, to make a bucket readable by anyone:

>>> b.set_acl('public-read')

You can also set the ACL for Key objects, either by passing an additional
argument to the above method:

>>> b.set_acl('public-read','foobar')

where ‘foobar’ is the key of some object within the bucket b or you can
call the set_acl method of the Key object:

>>> k.set_acl('public-read')

You can also retrieve the current ACL for a Bucket or Key object using the
get_acl object. This method parses the AccessControlPolicy response sent
by S3 and creates a set of Python objects that represent the ACL.

The Python objects representing the ACL can be found in the acl.py module
of boto.

Both the Bucket object and the Key object also provide shortcut
methods to simplify the process of granting individuals specific
access. For example, if you want to grant an individual user READ
access to a particular object in S3 you could do the following:

The email address provided should be the one associated with the users
AWS account. There is a similar method called add_user_grant that accepts the
canonical id of the user rather than the email address.

S3 allows arbitrary user metadata to be assigned to objects within a bucket.
To take advantage of this S3 feature, you should use the set_metadata and
get_metadata methods of the Key object to set and retrieve metadata associated
with an S3 object. For example:

>>> k=Key(b)>>> k.key='has_metadata'>>> k.set_metadata('meta1','This is the first metadata value')>>> k.set_metadata('meta2','This is the second metadata value')>>> k.set_contents_from_filename('foo.txt')

This code associates two metadata key/value pairs with the Key k. To retrieve
those values later:

>>> k=b.get_key('has_metadata')>>> k.get_metadata('meta1')'This is the first metadata value'>>> k.get_metadata('meta2')'This is the second metadata value'>>>

Cross-origin resource sharing (CORS) defines a way for client web
applications that are loaded in one domain to interact with resources
in a different domain. With CORS support in Amazon S3, you can build
rich client-side web applications with Amazon S3 and selectively allow
cross-origin access to your Amazon S3 resources.

The first rule allows cross-origin PUT, POST, and DELETE requests from
the https://www.example.com/ origin. The rule also allows all headers
in preflight OPTIONS request through the Access-Control-Request-Headers
header. In response to any preflight OPTIONS request, Amazon S3 will
return any requested headers.

S3 buckets support transitioning objects to various storage classes. This is
done using lifecycle policies. You can currently transitions objects to
Infrequent Access, Glacier, or just plain Expire. All of these options are
capable of being applied after a number of days or after a given date.
Lifecycle configurations are assigned to buckets and require these parameters:

The object prefix that identifies the objects you are targeting. (or none)

The action you want S3 to perform on the identified objects.

The date or number of days when you want S3 to perform these actions.

For example, given a bucket s3-lifecycle-boto-demo, we can first retrieve the
bucket:

Then we can create a lifecycle object. In our example, we want all objects
under logs/* to transition to Standard IA 30 days after the object is created,
glacier 90 days after creation, and be deleted 120 days after creation.

Once an object has been transitioned to Glacier, you can restore the object
back to S3. To do so, you can use the boto.s3.key.Key.restore()
method of the key object.
The restore method takes an integer that specifies the number of days
to keep the object in S3.