AWS Boto3 Example
Create IAM user
Firstly, create an IAM user with programmatic access enabled. Attach the following managed policies:
Save your access key and secret key in a secure location. Create a file called credentials.cfg
with the following contents:
Contents of credentials.cfg:
[AWS]
KEY=<your-access-key-for-the-iam-user>
SECRET=<your-secret-key-for-the-iam-user>
import configparser
config = configparser.ConfigParser()
config.read_file(open('credentials.cfg'))
KEY = config.get('AWS','KEY')
SECRET = config.get('AWS','SECRET')
import boto3
# Generate the boto3 client for interacting with S3
dynamodb = boto3.client('s3', region_name='us-east-1',
# Set up AWS credentials
aws_access_key_id=KEY,
aws_secret_access_key=SECRET)
AWS Buckets
S3 let’s us put any file in the cloud, and make it accessible anywhere in the world through a URL. Managing cloud storage is a key-component of a data pipeline. Many services depend on an object being uploaded to S3. The main components of S3 are Buckets and Objects. Buckets are like directories on our desktop and Objects are like files in those folders. (Just like directories can have some permissions, buckets have some policies). But there is a lot of power hidden underneath a Bucket (i.e, it is not merely a directory).
- Buckets have their own permission policies.
- Buckets can be configured to act as directories for a static website.
- Buckets can generate logs about their own activity and in-turn store them to another bucket.
The most important thing that Buckets do is they contain Objects.
An object can be anything: a csv file, log file, image file, audio, video etc. There are plenty of operations that we can do with objects.
But for now, let’s focus on what we can do with Buckets with boto3:
- We can create buckets
- We can list buckets that we have in our account.
- We can delete buckets
Creating a bucket
Let’s say we wish to create a bucket named: skuchkula-test-bucket
bucket_name = 'skuchkula-test-bucket'
temp_bucket = s3.create_bucket(Bucket=bucket_name)
Running the above cell, will create a new bucket. Navigate to your AWS S3 dashboard, and you should see the new bucket created.
List all the buckets
from pprint import pprint as pp
# List the buckets
buckets = s3.list_buckets()
# Print the buckets
pp(buckets)
{'Buckets': [{'CreationDate': datetime.datetime(2019, 9, 11, 18, 15, 33, tzinfo=tzutc()),
'Name': 'aws-emr-resources-506140549518-us-east-1'},
{'CreationDate': datetime.datetime(2019, 10, 2, 14, 54, 23, tzinfo=tzutc()),
'Name': 'aws-emr-resources-506140549518-us-west-2'},
{'CreationDate': datetime.datetime(2019, 9, 11, 15, 32, 18, tzinfo=tzutc()),
'Name': 'aws-logs-506140549518-us-east-1'},
{'CreationDate': datetime.datetime(2019, 10, 2, 14, 54, 22, tzinfo=tzutc()),
'Name': 'aws-logs-506140549518-us-west-2'},
{'CreationDate': datetime.datetime(2019, 8, 27, 17, 39, 52, tzinfo=tzutc()),
'Name': 'sagemaker-us-east-1-506140549518'},
{'CreationDate': datetime.datetime(2019, 2, 10, 16, 25, 36, tzinfo=tzutc()),
'Name': 'skuchkula'},
{'CreationDate': datetime.datetime(2019, 10, 3, 20, 53, 39, tzinfo=tzutc()),
'Name': 'skuchkula-sagemaker-airbnb'},
{'CreationDate': datetime.datetime(2019, 9, 29, 18, 20, 50, tzinfo=tzutc()),
'Name': 'skuchkula-topsongs'},
{'CreationDate': datetime.datetime(2019, 2, 18, 18, 59, 53, tzinfo=tzutc()),
'Name': 'skuchkula-websitebucket'},
{'CreationDate': datetime.datetime(2019, 8, 29, 11, 3, 30, tzinfo=tzutc()),
'Name': 'skuchkuladata'}],
'Owner': {'DisplayName': 'shravan.kuchkula',
'ID': 'ab87d89045475a22fccef1b80302f1e7d4e7f5d21c547b41d86cebe9827238b7'},
'ResponseMetadata': {'HTTPHeaders': {'content-type': 'application/xml',
'date': 'Mon, 07 Oct 2019 17:54:46 GMT',
'server': 'AmazonS3',
'transfer-encoding': 'chunked',
'x-amz-id-2': 'NCD7Kpsxj9sX0GTGESmCzrA2CeWQ0BwomdmFZrt+LTnmlNuPm8X5RSdoqLM3RHaMA0C74Uyzm9A=',
'x-amz-request-id': 'C08A5E44C5D1BC55'},
'HTTPStatusCode': 200,
'HostId': 'NCD7Kpsxj9sX0GTGESmCzrA2CeWQ0BwomdmFZrt+LTnmlNuPm8X5RSdoqLM3RHaMA0C74Uyzm9A=',
'RequestId': 'C08A5E44C5D1BC55',
'RetryAttempts': 0}}
When we invoke the s3.list_buckets()
method, we get back the response shown above. The type of the response is a dictionary. From this dictionary, we want to get the Buckets, which is a list of dictionaries. Each of these dictionaries in the list corresponds to a Bucket in your account.
# List the buckets
buckets = s3.list_buckets()
type(buckets)
dict
type(buckets['Buckets'])
list
for bucket in buckets['Buckets']:
print(bucket['Name'])
aws-emr-resources-506140549518-us-east-1
aws-emr-resources-506140549518-us-west-2
aws-logs-506140549518-us-east-1
aws-logs-506140549518-us-west-2
sagemaker-us-east-1-506140549518
skuchkula
skuchkula-sagemaker-airbnb
skuchkula-test-bucket
skuchkula-topsongs
skuchkula-websitebucket
skuchkuladata
We can see the bucket that we just created skuchkula-test-bucket
.
Delete the bucket
response = s3.delete_bucket(Bucket='skuchkula-test-bucket')
pp(response)
{'ResponseMetadata': {'HTTPHeaders': {'date': 'Mon, 07 Oct 2019 18:50:18 GMT',
'server': 'AmazonS3',
'x-amz-id-2': '/yYSICkU5GZXwiQAmLgXDryV4il9SD2t+zUx17+g1oHc8J4bGx/ctggrbsGV7GgXLF7IzGldH2w=',
'x-amz-request-id': '831CC0ED42A2BCD5'},
'HTTPStatusCode': 204,
'HostId': '/yYSICkU5GZXwiQAmLgXDryV4il9SD2t+zUx17+g1oHc8J4bGx/ctggrbsGV7GgXLF7IzGldH2w=',
'RequestId': '831CC0ED42A2BCD5',
'RetryAttempts': 0}}
Uploading and Retrieving files
It’s now time to put stuff into those buckets. Let’s first take a look at how Objects work. The files in S3 buckets are called Objects. Managing objects is a key component of many data pipelines. As mentioned earlier, Buckets and Objects are somewhat like Directories and Files on your local system.
We can perform operations on our Buckets and Objects using the s3 client object.
Upload an object into a bucket
Let’s upload an object into a bucket. We upload a file using the client’s upload_file()
method.
It takes 3 kwargs:
- Filename is the local file path,
- Bucket parameter is the name of the bucket we are uploading to,
- Key is what we want to name the object in S3.
We are not capturing the result of this method in a variable, this method doesn’t return anything. If there is an error, it will throw and exception.
Listing objects in a bucket
Similar to listing buckets, we can list objects inside a bucket using list_objects()
. Here it takes some parameters other than the bucket name:
- Bucket is the name of the bucket the object belongs to.
- MaxKeys you can limit the response to n-objects. By default s3 will return upto a 1000 objects in our bucket.
- Prefix another way to limit the response is to use the Prefix argument.
The response dictionary contains the Contents key, this key contains a list of Objects and their info. Each of these object dictionaries is returned with a Key.
Checking object info
If instead we would like to know information about a single object, like it’s size etc., we can use the client’s head_object()
method. This takes:
- Bucket the bucket name
- Key the object key
In this case, since we are only working with 1 object, the response will not contain a Contents dictionary. The object’s metadata is directly in the response dictionary.
Download a file
To download a file, we use the client’s download_file()
method. (Notice how we say “download a file” and not “download an object”). This is consistent with upload_file() method. This takes the same 3 arguments that upload_file takes.
It takes 3 kwargs:
- Filename is the local file path you want the file to be downloaded as,
- Bucket is the name of the bucket we are downloading from,
- Key is name the object in S3 that we want to download.
Delete an object
To delete an object, you can use the client’s delete_object()
method. This takes 2 KWargs:
- Bucket is the name of the bucket,
- Key is name the object in S3.
Using DynamoDB API
# Generate the boto3 client for interacting with S3
dynamodb = boto3.client('dynamodb', region_name='us-east-1',
# Set up AWS credentials
aws_access_key_id=KEY,
aws_secret_access_key=SECRET)
dir(dynamodb)
['_PY_TO_OP_NAME',
'__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattr__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_cache',
'_client_config',
'_convert_to_request_dict',
'_emit_api_params',
'_endpoint',
'_exceptions',
'_exceptions_factory',
'_get_waiter_config',
'_load_exceptions',
'_loader',
'_make_api_call',
'_make_request',
'_register_handlers',
'_request_signer',
'_response_parser',
'_serializer',
'_service_model',
'batch_get_item',
'batch_write_item',
'can_paginate',
'create_backup',
'create_global_table',
'create_table',
'delete_backup',
'delete_item',
'delete_table',
'describe_backup',
'describe_continuous_backups',
'describe_contributor_insights',
'describe_endpoints',
'describe_global_table',
'describe_global_table_settings',
'describe_limits',
'describe_table',
'describe_table_replica_auto_scaling',
'describe_time_to_live',
'exceptions',
'generate_presigned_url',
'get_item',
'get_paginator',
'get_waiter',
'list_backups',
'list_contributor_insights',
'list_global_tables',
'list_tables',
'list_tags_of_resource',
'meta',
'put_item',
'query',
'restore_table_from_backup',
'restore_table_to_point_in_time',
'scan',
'tag_resource',
'transact_get_items',
'transact_write_items',
'untag_resource',
'update_continuous_backups',
'update_contributor_insights',
'update_global_table',
'update_global_table_settings',
'update_item',
'update_table',
'update_table_replica_auto_scaling',
'update_time_to_live',
'waiter_names']
dynamodb.list_tables()
{'TableNames': ['Forum'],
'ResponseMetadata': {'RequestId': 'Q81TU25QMPJUB548NBDPHD8SBJVV4KQNSO5AEMVJF66Q9ASUAAJG',
'HTTPStatusCode': 200,
'HTTPHeaders': {'server': 'Server',
'date': 'Sat, 16 May 2020 23:23:17 GMT',
'content-type': 'application/x-amz-json-1.0',
'content-length': '24',
'connection': 'keep-alive',
'x-amzn-requestid': 'Q81TU25QMPJUB548NBDPHD8SBJVV4KQNSO5AEMVJF66Q9ASUAAJG',
'x-amz-crc32': '274869842'},
'RetryAttempts': 0}}