Amazon EC2 Container Service
To understand EC2 Container Service AWS ECS, I will approach this from an Architectural Context, then I’ll cover the Computational Context, and then finally, the Software Development Life Cycle Context.
In terms of software architecture, the software industry has been shifting away from developing and deploying applications that are a single, large, tightly-coupled, monolithic application, and are shifting towards applications that are developed and deployed as many small, loosely-coupled applications or microservices.
This shift has been accelerated by container technologies, such as Docker and the rich ecosystem around it.
For quite some time, many people thought that the only companies that needed to really worry about massively scaling web applicatons were companies like Amazon, Google, and Netflix. This isn’t true anymore. People now expect the experience they get from Amazon or Google for all their web applications.
The lesson, that pioneers like Amazon learned, was that monolithic applications couldn’t scale beyond a certain point. Amazon was among the first companies to embrace a services-oriented architecture, which has allowed it to grow to a massive scale. In terms of the architecture of the infrastructure, traditionally, the services-oriented architectures, such as
Amazon Web Services, have been built on top of virtual machines, for example, using technologies like Xen, vmware, and Microsoft Hyper V. Container technology was improving over the years and finally took off in a big way when Docker 1.0 released in 2014.
Enter Docker: The current trend is to move toward container technology, Docker in particular, especially for microservices based workloads. Let’s briefly define microservices here, so that you have the context going into the rest of the course. The most concise microservice definition I know of is:
A loosely coupled service oriented architecture with bounded contexts.
In other words, each microservice does one thing well and doesn’t need to know too much about the rest of the services. The microservices are accessed by a clearly defined API so they can be mixed and remixed into various applications as needed. They’re managed and updated independently.
I think it’s worth clearly stating Docker’s Value Proposition as we begin to transition towards a computational and software development life cycle context. Docker provides a highly portable encapsulation of a specific software application and just the right amount of supporting infrastructure to make it fast to load and cheap to run across a variety of environments, from a developer’s laptop to an Amazon EC2 instance. When Docker was first released, people asked, “Aren’t there already good options “for this value proposition?” There are two ends of the spectrum. On one side, you had virtual machines.
Virtual machines don’t boot nearly as fast as Docker containers and generally take a lot more space. Virtual machines do provide portability but at the cost of encapsulating an entire operating system and kernel. On the other end of the spectrum, you have application packages, such as jar files. Application packages are fast and small but require an underlying technology, in the case of jar files, java, and all the dependencies needed to run the application, either in the class path or shipped with the application. In any case, this underspecified requirement on the underlying technology leaves room for error, the classic, “it runs on my development laptop, “why can’t you apps guys get it to run in production?”
In the middle, there’s the sweet spot that Docker fills. Docker is fast and small, and Docker ensures that portability is not an issue.
The Docker image that is built and run on the developer's laptop is the same image that is run in production. On top of all of that, it’s well-engineered for productive use cases right out of the box, with a great set of Docker images available on Docker Hub.
The next context that I’d like to explore is the computational context. At a high level, the role of EC2 container service and other container orchestration services is to
officially manage a cluster, which is made up of compute nodes.
Container orchestration services have three main roles: service management, scheduling, and resource management.
- Service management focuses on availability, life cycle, and discovery, so at a high level: making sure the service is running properly and is accessible to its users.
- Scheduling is focused on job placement, scaling, specifically scaling up during peak load times and scaling down when there’s not much work to do, and finally for new software deployments, a scheduler can help manage how much of the cluster gets a new version of the application, such that it can be tested before it is rolled out to the whole cluster.
- The third role of a container orchestration service is to manage resources, so for example, memory, CPU, and network ports. We know that at the lowest level, work gets done on physical hardware. The orchestration service is there to abstract away exactly how these low level resources are managed, so the developers and operation engineers don’t have to worry about the specifics of the low-level details.
As compute clusters have evolved from homogeneous jobs and scheduling algorithms to a world with heterogeneous jobs and scheduling algorithms, a few different leading strategies for scheduling jobs have emerged. The leading ones are:
- monolithic, exemplified by Kubernetes,
- two-level pessimistic concurrency, exemplified by Marathon and Apache Mesos,
- and shared state optimistic concurrency, exemplified by EC2 container service.
Software Development Context
The last context worth considering is a software development life cycle context. The concept of code running on a developer’s laptop and also in production was alluded to when we talked about Docker’s value proposition. However, to make this more explicit, let’s take a quick look at how application development happened before Docker and how Docker supports a much more robust software development life cycle. Without Docker, a developer writes and tests code on his or her laptop, in an environment that is highly customized to the developer’s needs and is always being tweaked by the developer on a per-project basis.
The storage, network, and security requirements can vary widely from project to project, and dependencies can easily get lumped into the same development environment, which leads to a developer having to manually keep track of settings and dependencies that need to be included with the application code that gets shipped into production. The end result is a developer throwing the code over the wall to the apps guys and insisting that it works on the development laptop so it should work the same in production, even if they forgot to commit a dependency.
With Docker, the developer packages his or her code into a Docker container, which includes the application dependencies and exposes the storage, network, and security requirements right into the configuration of the image. This packaged Docker image is shipped to operations, such that the image that is run and tested on the developer’s laptop is also run in production, while also allowing for production configuration differences to be managed by operations engineers.
What a difference in development, testing, and deployment Docker makes. So let’s briefly summarize what was covered in this lecture.
First, we covered the architectural context, which included going through the evolution from services oriented architecture to microservices and virtual machines to containers. In this context, we also covered Docker’s value proposition.
Next, we covered the computational context, which focused on container orchestration, which includes service management, scheduling, and resource management.
Finally, we covered the software development life cycle context by looking at a typical scenario without and with Docker. This concludes the introductory concepts around Docker and Amazon EC2. In the next lecture, we’ll cover a hands-on walk through of the EC2 container service.