Why do we need AWS X-Ray?: To understand why we would consider deploying AWS X-Ray, we will step back and analyze some of the current trends that exist within software patterns and infrastructure architectures, each of which provides a motivation for using a service like
Lately there has been a strong progression away from monolithic architectures towards microservices. Software applications built to use in microservices have multiplied rapidly in recent years and the trend shows no signs of slowing down. A microservices architecture is a software design pattern in which a large application is decomposed into many smaller services, each with its own simple and discrete objective. The individual microservices are designed to collaborate and communicate over a network, typically using a lightweight mechanism, such as HTTP/REST with JSON. Some of this communication will be done asynchronously, a challenge in its own right.
Coupled with the emergence and normality of cloud platforms, such as AWS, we are now seeing applications becoming more and more distributed and dynamic at the infrastructure layer. A single application nowadays could well be composed of hundreds of servers deployed across multiple regions, multiple physical facilities, multiple VPCs, and multiple availability zones. These architectures are highly elastic and dynamic, adjusting to application demand in real time.
The following quote provided by Jeff Barr from AWS succinctly articulates the problem space.
AWS X-Ray is a new service that addresses the problems mentioned above.
With AWS X-Ray, you can extract operational insights across distributed systems running at scale. AWS X-Ray provides visibility into the pathways and performance of your deployed applications by providing traces of requests as they are routed through the different service touch points. It can be used to not only monitor the performance of the request but also to identify bottlenecks and errors. X-Ray is a managed service that lets you dissect an application down to its individual components.
Let’s now highlight some quick ones that AWS X-Ray provides.
It allows you to visualize complex and detailed service relationships within highly distributed applications. You can trace message pathways and call stacks at scale. It supports grouping and filtering features enabling you to find performance bottlenecks and hotspots. And finally, it allows you to drill down and pinpoint service exceptions and errors.
We now start to dive a bit deeper into how AWS X-Ray is deployed and configured within an environment and how the telemetry or trace data is collected and published to the AWS X-Ray service.
As you can see from the diagram, message trace data is collected from within your application and from each and every other application service that the message flows through.
Your application code base needs to be instrumented with the appropriate AWS X-Ray SDK. This implements trace data extraction and correlation. The corresponding trace data is batched and published over the internet to the AWS X-Ray service for aggregation and analysis. And from here, you can start to visualize the collected telemetry by viewing and navigating it within the AWS X-Ray service console.
The AWS X-Ray service is made up of the following key components.
- X-Ray SDK, the AWS X-Ray SDK is used to instrument your application code.
- X-Ray daemon, the AWS X-Ray daemon collects all the local trace data and batches the information up and periodically send it over to the internet to the AWS X-Ray service. The daemon by default listens on port 2000 for UDP connections.
- X-Ray API, the AWS X-Ray service has an API endpoint which is used to receive the collected telemetry as delivered from the AWS X-Ray daemon.
- Other clients, the AWS X-Ray service can be integrated against using either the AWS SDK, the AWS CLI, or other third-party provided clients.
- X-Ray console, the AWS X-Ray service console is where all visualization magic comes together. You can login into the AWS X-Ray service console and navigate all your collected traces.
AWS X-Ray creates a map of services used by your application with trace data that you can use to drill into specific services or issues.
This provides a view of connections between services and your application and aggregated data for each service, including average latency and failure rates. Within the service map, the health of each node is represented by coloring on the ratio of successful calls to errors and faults. Green equals successful calls, red equals server faults, yellow equals client errors, and purple represents throttling errors.
The trace list in the AWS X-Ray console is used to find traces by URL, response code, or other data from the trace summary. The trace window allows you to analyze all the activity in your distributed application as it runs. It’s a good place to start investigating perceived performance issues in your application. AWS X-Ray currently supports the following three SDKs. Each individually available from their respective online package management repositories. Java from Maven, Node.js from npm, and .NET available from NuGet. expect to see AWS introduce newer language specific X-Ray SDKs over the course of time.
AWS X-Ray daemon
The AWS X-Ray daemon is an application that listens for traffic on UDP Port 2000. It gathers the raw segment data, batches it up, and periodically forwards to the AWS X-Ray API.
By default, the X-Ray daemon listens on port 2000 but this can be reconfigured. From a security point of view, the X-Ray daemon needs to be set up with permissions to authorize and to publish the tracking data into the AWS X-Ray API. The credentials that the X-Ray daemon uses at run time can be established either by supplying an IAM role when the daemon is deployed on an EC2 instance or by setting environment variables for AWS access key ID, and AWS secret access key when the daemon is deployed in a non-EC2 environment. The AWS X-Ray daemon is currently available for Linux, OSX, and Windows operating systems.
Becoming fluent in instrumenting your applications and optimizing your ability to monitor and troubleshoot applications through filtering requires a good appreciation of the fundamental concepts that AWS X-Ray uses. We will now briefly go through each of these concepts individually.
Segments are central to the way the AWS X-Ray service goes about collecting and constructing traces. Your application sends data about the work that it does in the form of segments. A segment provides the resource name, request, and response details and details about the job done. For example, when an HTTP request arrives at your application, the following segment data may be recorded.
The host, including host name, alias and/or IP address. The request, including HTTP method, client address, path, and/or user agent. The response, including status and content. The work done, including start and end times and other called upon subsegments.
Subsegments are used to record downstream calls from the point of view of the service that calls it.
X-Ray uses subsegments to identify downstream services that don’t send segments and create interest for them on the service graph.
To ensure that your tracing remains efficient while still providing a representative sample of the request that flow through your application, AWS X-Ray allows you to configure and customize a sampling rate which then governs the rate at which the trace data is collected. The ability to do so is provided to ensure your application remains performant and you have a cost-effective experience. In the example provided here, you can see that we have configured two service specific rules and one default rule.
Traces are formed by correlating segments using a unique trace ID. An X-Ray trace is a set of stage of data points that share/have the same common trace ID.
For example, when a client initiates a new request to your application, the request is tagged with a unique trace ID. As the service makes its way downstream through further services in your application, the services relay information regarding the request back to X-Ray using the same unique trace ID. The trace view within the AWS X-Ray service console provides a timeline view and hierarchy of each call made within the same trace. Each request within the timeline view provides details such as response codes and latency matrix. You can select any individual rule and drill down into more detailed specifics for that item.
Annotations are used to add further business meaning to your tracing data. You add annotations by calling the appropriate method within the X-Ray SDK. Annotations are simple key value peers that are indexed for use with the filter expressions. We go over filter expressions in the coming section. Use annotations to record extra business data that you want to use to group traces in the console.
Metadata, like annotations, are used to add further business meaning to your tracing data. You add metadata by calling the appropriate method within the X-Ray SDK. Metadata are key value peers with values of any type including objects and lists but unlike annotations are not indexed. Use metadata to record data that you want stored in the trace but don’t need to use for searching traces.
The real power in using AWS X-Ray begins when you start to filter the collected traces. AWS X-Ray empowers you to quickly navigate and pinpoint hotspots within your distributed application.
Filter expressions enable you to filter out the unimportant stuff allowing you to quickly and accurately drill into problematic areas of your application. Filter expressions allow you to specify search criteria that filters against various attributes of the trace data collected including custom annotations as previously presented. The first filter expression finds request where response time was more than one second. The second filter expression finds request that included a call to the calculator service with a fault or latency above 1.5 seconds. The third filter expression finds request that included a call to the calculator service with a fault and where one or more segments has an annotation named calcid with value 1234.
AWS X-Ray provides filter expression syntax that can be quickly called upon within the AWS X-Ray service console. This can be consulted to guide you in authoring your own filter expressions.
We will now quickly review the AWS security-related config required for AWS X-Ray to be fully functional.
- The first security requirement is to provide appropriate IAM permissions to X-Ray daemon to authorize it to send in its batch trace record data to the X-Ray service.
This is done by authorizing access to the following AWS X-Ray API endpoints.
xray:PutTelemetryRecords. This is highlighted is in the IAM policy as shown on the right hand side of the slide. This IAM policy would either be attached to the IAM credential with the IAM rule that the X-Ray daemon is configured with.
- The second security requirement is to provide appropriate IAM permissions to authorize the AWS X-Ray service console to allow it to read and filter the collected trace data. This is done by authorizing access to the following AWS X-Ray API endpoints. BatchGetTraces, GetServiceGraph, GetTraceGraph, GetTraceSummaries. This is highlight is in the IAM policy as shown on the right hand side of the screen. This IAM policy would be attached to the IAM user account that is used to authenticate into the AWS console.