Eventual consistency for DELETE and Overwrite PUT (may take some time)

The data are stored lexicographical/sorted alphabetically

For performance, save objects of random names (add salt before filename if its based on timestamp)

Tiered Storage

Standard

Availability 99.99% (4 nines)

Durability 99.999999999 (11 nines)

Standard-IA Infrequent access

Cheaper than S3 standard but retrieval fee is charged

99.9 (3 nines) availability

Single Zone -IA or Zone infrequent access (Released April 2018)

Single zone only. No redundancy

20% cheaper than Standard-IA

Use case: Store reproducible, infrequently accessed data. Example: second or third backup copies for compliance sake.

Reduced Redundancy Storage

Availability 99.99% (4 nines)

Durability 99.99% also

Glacier class

Cheap but takes 4 hours to retrieve

Use “Bulk retrieval” for cheaper cost

Use expedited retrieval for fast retrievals

Life-cycle policies

Specify rules to move across storage classes at specified age and then finally delete.

Versioning

Total data storage across all versions is billed

Once enabled you cannot disable versioning. You can suspend it for future updates. If you want to turn versioning off, you need to delete the bucket and recreate (version id)

Once you delete the delete marker, you can get the file back that you have deleted while versioning on

Access Control Lists

S3 ACLs is a legacy access control mechanism that predates IAM. However, if you already use S3 ACLs and you find them sufficient, there is no need to change. As a general rule, AWS recommends using S3 bucket policies or IAM policies for access control.

An S3 ACL is a sub-resource that’s attached to every S3 bucket and object. It defines which AWS accounts or groups are granted access and the type of access. When you create a bucket or an object, Amazon S3 creates a default ACL that grants the resource owner full control over the resource.

Bucket policies

Use IAM policies if:

You need to control access to AWS services other than S3. IAM policies will be easier to manage since you can centrally manage all of your permissions in IAM, instead of spreading them between IAM and S3.

You have numerous S3 buckets each with different permissions requirements. IAM policies will be easier to manage since you don’t have to define a large number of S3 bucket policies and can instead rely on fewer, more detailed IAM policies.

You prefer to keep access control policies in the IAM environment.

Use S3 bucket policies if:

You want a simple way to grant cross-account access to your S3 environment, without using IAM roles.

Your IAM policies bump up against the size limit (up to 2 kb for users, 5 kb for groups, and 10 kb for roles). S3 supports bucket policies of up 20 kb.

You prefer to keep access control policies in the S3 environment.

Make it public

S3 is AWS object storage service on the cloud. Lets you store key/value pairs (bucket name, filename is key the content of the object/file is value)

S3 access is global but a bucket will need a region

Encryption

Client side encryption

Server Side encryption

SSE-S3 using S3 managed Keys

SSE-KMS using KMS keys

SSE-C using client provided keys

Security

Control access to a bucket using bucket ACL or bucket policy

All buckets and objects are pvt by default

Two ways to stop people from accidentally delete objects

Enable versioning

Enable MFA delete

Cross region replication

You need to first turn on versioning

Then goto Management and choose cross region replication

create rule to replicate all or some objects to a destination bucket.

You can specify a different storage class for the replication target bucket

Only new objects (not the existing ones) are replicated

S3 transfer acceleration

Lets you copy files to cloud front edge location as opposed to directly copying to s3 bucket thus saving time/latency since the edge location is closer to you than the S3 bucket

Static website hosting on S3

Create a bucket whose name is same as your domain name (without .com)

Go to static website hosting and enable

Grant public read access

URL will be http://your-bucket-name.s3-website-REGION.amazonaws.com where region can be us-east-1 etc.

S3 is global but buckets reside in regions. But no need to provide region in url or arn since they are globally unique

Requester Pays Option: Can be used to pass on request/transfer costs to another AWS account

Events:

The bucket owner (or others, as permitted by an IAM policy) can arrange for notifications to be issued to Amazon Simple Queue Service (SQS) or Amazon Simple Notification Service (SNS) when a new object is added to the bucket or an existing object is overwritten. Notifications can also be delivered to AWS Lambda for processing by a Lambda function.

Every non-anonymous request to S3 must contain authentication information to establish the identity of the principal making the request. In REST, this is done by first putting the headers in a canonical format, then signing the headers using your AWS Secret Access Key.

You can use pre-signed urls

Amazon S3 Select is a new (Apr 2018) capability

designed to pull out only the data you need from an object, which can dramatically improve the performance and reduce the cost of applications that need to access data in S3.

In the past most applications have to retrieve the entire object and then filter out only the required data for further analysis.

Now S3 Select enables applications to offload the heavy lifting of filtering and accessing data inside objects to the Amazon S3 service.

By reducing the volume of data that has to be loaded and processed by your applications, S3 Select can improve the performance of most applications that frequently access data from S3 by up to 400%.

You can use S3 Select from the AWS SDK for Java, AWS SDK for Python, and AWS CLI.

Use SELECT command as opposed to GET command

Amazon Athena

is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL expressions.

Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run.

Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL expressions.