AWS Boto3 Example

Create IAM user

Firstly, create an IAM user with programmatic access enabled. Attach the following managed policies:

iam-user

Save your access key and secret key in a secure location. Create a file called credentials.cfg with the following contents:

Contents of credentials.cfg:

[AWS]
KEY=<your-access-key-for-the-iam-user>
SECRET=<your-secret-key-for-the-iam-user>
import configparser
config = configparser.ConfigParser()
config.read_file(open('credentials.cfg'))

KEY = config.get('AWS','KEY')
SECRET = config.get('AWS','SECRET')
import boto3

# Generate the boto3 client for interacting with S3
dynamodb = boto3.client('s3', region_name='us-east-1',
                        # Set up AWS credentials
                        aws_access_key_id=KEY,
                        aws_secret_access_key=SECRET)

AWS Buckets

S3 let’s us put any file in the cloud, and make it accessible anywhere in the world through a URL. Managing cloud storage is a key-component of a data pipeline. Many services depend on an object being uploaded to S3. The main components of S3 are Buckets and Objects. Buckets are like directories on our desktop and Objects are like files in those folders. (Just like directories can have some permissions, buckets have some policies). But there is a lot of power hidden underneath a Bucket (i.e, it is not merely a directory).

  • Buckets have their own permission policies.
  • Buckets can be configured to act as directories for a static website.
  • Buckets can generate logs about their own activity and in-turn store them to another bucket.

The most important thing that Buckets do is they contain Objects.

objects

An object can be anything: a csv file, log file, image file, audio, video etc. There are plenty of operations that we can do with objects.

But for now, let’s focus on what we can do with Buckets with boto3:

  • We can create buckets
  • We can list buckets that we have in our account.
  • We can delete buckets

Creating a bucket

Let’s say we wish to create a bucket named: skuchkula-test-bucket

bucket_name = 'skuchkula-test-bucket'
temp_bucket = s3.create_bucket(Bucket=bucket_name)

Running the above cell, will create a new bucket. Navigate to your AWS S3 dashboard, and you should see the new bucket created.

List all the buckets

from pprint import pprint as pp

# List the buckets
buckets = s3.list_buckets()

# Print the buckets
pp(buckets)
{'Buckets': [{'CreationDate': datetime.datetime(2019, 9, 11, 18, 15, 33, tzinfo=tzutc()),
              'Name': 'aws-emr-resources-506140549518-us-east-1'},
             {'CreationDate': datetime.datetime(2019, 10, 2, 14, 54, 23, tzinfo=tzutc()),
              'Name': 'aws-emr-resources-506140549518-us-west-2'},
             {'CreationDate': datetime.datetime(2019, 9, 11, 15, 32, 18, tzinfo=tzutc()),
              'Name': 'aws-logs-506140549518-us-east-1'},
             {'CreationDate': datetime.datetime(2019, 10, 2, 14, 54, 22, tzinfo=tzutc()),
              'Name': 'aws-logs-506140549518-us-west-2'},
             {'CreationDate': datetime.datetime(2019, 8, 27, 17, 39, 52, tzinfo=tzutc()),
              'Name': 'sagemaker-us-east-1-506140549518'},
             {'CreationDate': datetime.datetime(2019, 2, 10, 16, 25, 36, tzinfo=tzutc()),
              'Name': 'skuchkula'},
             {'CreationDate': datetime.datetime(2019, 10, 3, 20, 53, 39, tzinfo=tzutc()),
              'Name': 'skuchkula-sagemaker-airbnb'},
             {'CreationDate': datetime.datetime(2019, 9, 29, 18, 20, 50, tzinfo=tzutc()),
              'Name': 'skuchkula-topsongs'},
             {'CreationDate': datetime.datetime(2019, 2, 18, 18, 59, 53, tzinfo=tzutc()),
              'Name': 'skuchkula-websitebucket'},
             {'CreationDate': datetime.datetime(2019, 8, 29, 11, 3, 30, tzinfo=tzutc()),
              'Name': 'skuchkuladata'}],
 'Owner': {'DisplayName': 'shravan.kuchkula',
           'ID': 'ab87d89045475a22fccef1b80302f1e7d4e7f5d21c547b41d86cebe9827238b7'},
 'ResponseMetadata': {'HTTPHeaders': {'content-type': 'application/xml',
                                      'date': 'Mon, 07 Oct 2019 17:54:46 GMT',
                                      'server': 'AmazonS3',
                                      'transfer-encoding': 'chunked',
                                      'x-amz-id-2': 'NCD7Kpsxj9sX0GTGESmCzrA2CeWQ0BwomdmFZrt+LTnmlNuPm8X5RSdoqLM3RHaMA0C74Uyzm9A=',
                                      'x-amz-request-id': 'C08A5E44C5D1BC55'},
                      'HTTPStatusCode': 200,
                      'HostId': 'NCD7Kpsxj9sX0GTGESmCzrA2CeWQ0BwomdmFZrt+LTnmlNuPm8X5RSdoqLM3RHaMA0C74Uyzm9A=',
                      'RequestId': 'C08A5E44C5D1BC55',
                      'RetryAttempts': 0}}

When we invoke the s3.list_buckets() method, we get back the response shown above. The type of the response is a dictionary. From this dictionary, we want to get the Buckets, which is a list of dictionaries. Each of these dictionaries in the list corresponds to a Bucket in your account.

# List the buckets
buckets = s3.list_buckets()
type(buckets)
dict
type(buckets['Buckets'])
list
for bucket in buckets['Buckets']:
    print(bucket['Name'])
aws-emr-resources-506140549518-us-east-1
aws-emr-resources-506140549518-us-west-2
aws-logs-506140549518-us-east-1
aws-logs-506140549518-us-west-2
sagemaker-us-east-1-506140549518
skuchkula
skuchkula-sagemaker-airbnb
skuchkula-test-bucket
skuchkula-topsongs
skuchkula-websitebucket
skuchkuladata

We can see the bucket that we just created skuchkula-test-bucket.

Delete the bucket

response = s3.delete_bucket(Bucket='skuchkula-test-bucket')
pp(response)
{'ResponseMetadata': {'HTTPHeaders': {'date': 'Mon, 07 Oct 2019 18:50:18 GMT',
                                      'server': 'AmazonS3',
                                      'x-amz-id-2': '/yYSICkU5GZXwiQAmLgXDryV4il9SD2t+zUx17+g1oHc8J4bGx/ctggrbsGV7GgXLF7IzGldH2w=',
                                      'x-amz-request-id': '831CC0ED42A2BCD5'},
                      'HTTPStatusCode': 204,
                      'HostId': '/yYSICkU5GZXwiQAmLgXDryV4il9SD2t+zUx17+g1oHc8J4bGx/ctggrbsGV7GgXLF7IzGldH2w=',
                      'RequestId': '831CC0ED42A2BCD5',
                      'RetryAttempts': 0}}

Uploading and Retrieving files

It’s now time to put stuff into those buckets. Let’s first take a look at how Objects work. The files in S3 buckets are called Objects. Managing objects is a key component of many data pipelines. As mentioned earlier, Buckets and Objects are somewhat like Directories and Files on your local system.

bucket-object

We can perform operations on our Buckets and Objects using the s3 client object.

Upload an object into a bucket

Let’s upload an object into a bucket. We upload a file using the client’s upload_file() method. It takes 3 kwargs:

  • Filename is the local file path,
  • Bucket parameter is the name of the bucket we are uploading to,
  • Key is what we want to name the object in S3.

We are not capturing the result of this method in a variable, this method doesn’t return anything. If there is an error, it will throw and exception.

Listing objects in a bucket

Similar to listing buckets, we can list objects inside a bucket using list_objects(). Here it takes some parameters other than the bucket name:

  • Bucket is the name of the bucket the object belongs to.
  • MaxKeys you can limit the response to n-objects. By default s3 will return upto a 1000 objects in our bucket.
  • Prefix another way to limit the response is to use the Prefix argument.

The response dictionary contains the Contents key, this key contains a list of Objects and their info. Each of these object dictionaries is returned with a Key.

Checking object info

If instead we would like to know information about a single object, like it’s size etc., we can use the client’s head_object() method. This takes:

  • Bucket the bucket name
  • Key the object key

In this case, since we are only working with 1 object, the response will not contain a Contents dictionary. The object’s metadata is directly in the response dictionary.

Download a file

To download a file, we use the client’s download_file() method. (Notice how we say “download a file” and not “download an object”). This is consistent with upload_file() method. This takes the same 3 arguments that upload_file takes.

It takes 3 kwargs:

  • Filename is the local file path you want the file to be downloaded as,
  • Bucket is the name of the bucket we are downloading from,
  • Key is name the object in S3 that we want to download.

Delete an object

To delete an object, you can use the client’s delete_object() method. This takes 2 KWargs:

  • Bucket is the name of the bucket,
  • Key is name the object in S3.

Using DynamoDB API

# Generate the boto3 client for interacting with S3
dynamodb = boto3.client('dynamodb', region_name='us-east-1',
                        # Set up AWS credentials
                        aws_access_key_id=KEY,
                        aws_secret_access_key=SECRET)
dir(dynamodb)
['_PY_TO_OP_NAME',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_cache',
 '_client_config',
 '_convert_to_request_dict',
 '_emit_api_params',
 '_endpoint',
 '_exceptions',
 '_exceptions_factory',
 '_get_waiter_config',
 '_load_exceptions',
 '_loader',
 '_make_api_call',
 '_make_request',
 '_register_handlers',
 '_request_signer',
 '_response_parser',
 '_serializer',
 '_service_model',
 'batch_get_item',
 'batch_write_item',
 'can_paginate',
 'create_backup',
 'create_global_table',
 'create_table',
 'delete_backup',
 'delete_item',
 'delete_table',
 'describe_backup',
 'describe_continuous_backups',
 'describe_contributor_insights',
 'describe_endpoints',
 'describe_global_table',
 'describe_global_table_settings',
 'describe_limits',
 'describe_table',
 'describe_table_replica_auto_scaling',
 'describe_time_to_live',
 'exceptions',
 'generate_presigned_url',
 'get_item',
 'get_paginator',
 'get_waiter',
 'list_backups',
 'list_contributor_insights',
 'list_global_tables',
 'list_tables',
 'list_tags_of_resource',
 'meta',
 'put_item',
 'query',
 'restore_table_from_backup',
 'restore_table_to_point_in_time',
 'scan',
 'tag_resource',
 'transact_get_items',
 'transact_write_items',
 'untag_resource',
 'update_continuous_backups',
 'update_contributor_insights',
 'update_global_table',
 'update_global_table_settings',
 'update_item',
 'update_table',
 'update_table_replica_auto_scaling',
 'update_time_to_live',
 'waiter_names']
dynamodb.list_tables()
{'TableNames': ['Forum'],
 'ResponseMetadata': {'RequestId': 'Q81TU25QMPJUB548NBDPHD8SBJVV4KQNSO5AEMVJF66Q9ASUAAJG',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'server': 'Server',
   'date': 'Sat, 16 May 2020 23:23:17 GMT',
   'content-type': 'application/x-amz-json-1.0',
   'content-length': '24',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'Q81TU25QMPJUB548NBDPHD8SBJVV4KQNSO5AEMVJF66Q9ASUAAJG',
   'x-amz-crc32': '274869842'},
  'RetryAttempts': 0}}

Tags: ,

Updated: