This is the third in a series of articles on site reliability engineering (SRE) and how Traefik can help supply the monitoring and visibility that are necessary to maintain application health.
The first article discussed log analysis using tools from the Elastic stack. The second covered creating visualizations from Traefik metrics with Prometheus and Grafana. This third article explores using another open-source project, Jaeger, to perform request tracing for applications on Kubernetes.
Debugging anomalies, bottlenecks, and performance issues is a challenge in distributed architectures, such as microservices. Each user request typically involves the collaboration of many services to deliver the intended outcome. Because traditional monitoring methods like application logs and metrics tend to target monolithic applications, they can fail to capture the full performance trail for every request.
Distributed tracing, therefore, is an important profiling technique that complements log monitoring and metrics. It captures the transaction flow across various application components and services involved in processing a user request. The captured data can then be visualized to show which component malfunctioned and caused an issue, such as an error or bottleneck.
This post demonstrates how to integrate Traefik with Jaeger, an open-source tracing application that's a project of the Cloud Native Computing Foundation. The integration will capture traces for user requests across the various components of a hypothetical application running on a Kubernetes cluster.
This post will walk you through the process of integrating Traefik and Jaeger, but you'll need to have a few things setup first:
A Kubernetes cluster running at
localhost. The Traefik Labs team often uses k3d for this purpose, which creates a local cluster in Docker containers.
However, k3d comes bundles with the latest version of k3s, and
k3scomes packaged with Traefik ver 1.7, which you'll want to disable so you can use the latest version. The following command creates the cluster and exposes it on port 8081:
k3d cluster create dev -p "8081:80@loadbalancer" --k3s-server-arg --disable=traefik
kubectlcommand-line tool, configured to point to your cluster. (If you created your cluster using K3d and the instructions above, this will already be done for you.)
A recent version of the Helm package manager for Kubernetes.
The set of configuration files that accompany this article, which are available on GitHub:
git clone https://github.com/traefik-tech-blog/traefik-sre-tracing/
You do not need to have Traefik 2.x preinstalled, as you'll do that along the way.
Set Up Tracing
First, you'll need to install and configure Jaeger on your Kubernetes cluster. The simplest way is to use the official Helm chart. As a first step, add the
jaegertracing repository to your Helm repo list and update its contents:
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts helm repo update
The Jaeger repository provides two charts:
jaeger-operator. For the purpose of this discussion, you'll deploy the
jaeger-operator chart, which makes it easy to configure a minimal installation. To learn more about the Jaeger Operator for Kubernetes, consult the official documentation.
helm install jaeger-op jaegertracing/jaeger-operator
Deploying Jaeger in all its details is a topic well beyond the scope of this article. You will deploy Jaeger with all-in-one topology using the below configuration, which will be sufficient to demonstrate the integration:
# jaeger.yaml apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: jaeger
The above configuration will create an instance named
jaeger. It will also create a
agent, and a
collector. All these related services are prefixed with
jaeger. It will not deploy a database like Cassandra or Elastic; instead, it will rely on in-memory data processing.
kubectl apply -f jaeger.yaml
You can confirm Jaeger is running by doing a lookup of all deployed services:
$ kubectl get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 76m jaeger-op-jaeger-operator-metrics ClusterIP 10.43.86.167 <none> 8383/TCP,8686/TCP 82s jaeger-collector-headless ClusterIP None <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 47s jaeger-collector ClusterIP 10.43.163.147 <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 47s jaeger-query ClusterIP 10.43.27.251 <none> 16686/TCP 47s jaeger-agent ClusterIP None <none> 5775/UDP,5778/TCP,6831/UDP,6832/UDP 47s
Install and Configure Traefik
Now it's time to deploy Traefik, which you'll do using the official Helm chart. If you haven't already, add Traefik Labs to your Helm repository list using the below commands:
helm repo add traefik https://helm.traefik.io/traefik helm repo update
Next you'll deploy the latest version of Traefik in the
kube-system namespace. For this demo, however, the standard configuration of the Helm chart won't be enough. As part of the deployment, you need to ensure that Jaeger integration is enabled in Traefik. You do this by passing
additionalArguments configuration flags in the
- "--tracing.jaeger=true" - "--tracing.jaeger.samplingServerURL=http://jaeger-agent.default.svc:5778/sampling" - "--tracing.jaeger.localAgentHostPort=jaeger-agent.default.svc:6831"
As shown in the above configuration, you need to provide an address for the Jaeger agent. By default, this is
localhost, and if you deploy
jaeger-agent as a sidecar, this works as expected. In this deployment, however, you need to provide an explicit address for
jaeger-agent, which corresponds to the
jaeger-agent.default.svc hostname that was configured by the Helm chart.
Use the Helm chart to deploy Traefik into the
kube-system namespace with the configuration options for Jaeger, like so:
helm install traefik traefik/traefik -n kube-system -f ./traefik-values.yaml
Once the pods are created, you can verify the Jaeger integration by using port forwarding to expose the Traefik dashboard:
kubectl -n kube-system port-forward $(kubectl -n kube-system get pods --selector "app.kubernetes.io/name=traefik" --output=name) 9000:9000
If you access the Traefik dashboard at
http://localhost:9000/dashboard/, you will see that Jaeger tracing is enabled under the Features section:
Now is also a good time to expose the Jaeger UI, which is served on port 16686:
kubectl port-forward service/jaeger-query 16686:16686
When you access the Jaeger dashboard at
http://localhost:16686/, you will see
traefik in the Service pull-down, and the Traefik endpoints will be listed in the Operations pull-down:
Deploy Hot R.O.D.
Now that your integration is working, you need an application to trace. For this purpose, you should deploy Hot R.O.D. - Rides On Demand, which is an example application created by the Jaeger team. It is a demo ride-booking service that consists of three microservices:
route-service. Each service also has accompanying storage, such as a MySQL database or Redis cache.
The application includes four pre-built "customer personas" who can book a ride using the application UI. When a car is booked, the application will find a driver and dispatch the car.
Throughout the process, Jaeger will capture the user request as it flows through the various services (
route-service). Individual service handling will be shown as a "span," and all related spans are visualized in a graph known as the "trace."
Deploy the Service along with the IngressRoute using the following configuration file:
$ kubectl apply -f hotrod.yaml deployment.apps/hotrod created service/hotrod created ingressroute.traefik.containo.us/hotrod created
hotrod route will match the hostname
hotrod.localhost, which allows you to open the application UI. (If you used K3d to create a demo cluster at the start of this tutorial, recall that it is exposed on port 8081.)
In the above UI you can see the four prebuilt customer personas. This UI is not required for this tracing demo, however, as you can use command-line tools.
To see Jaeger in action, send a few user requests to the application using a sample customer persona. For example, try the following
curl -I "http://localhost:8081/dispatch?customer=392" -H "host:hotrod.localhost"
curl -I "http://localhost:8081/dispatch?customer=123" -H "host:hotrod.localhost"
Each command triggered a sequence of requests to produce the expected result. You can see the generated traces in the Jaeger UI when you select
traefik as the Service and
hotrod.localhost as the Operation and click Find Traces:
You can select either of the traces to explore the detailed request flow.
The display above shows the top two spans expanded to show the information forwarded by Traefik. Each span shows the request duration, along with non-mandatory sections for Tags, Process, and Logs. The Tags section contains key-value pairs that can be associated with request handling.
The Tags field of the topmost
traefik span shows information related to HTTP handling, such as the status code, URL, host, and so on. The next span shows the routing information for the request, including the router name and service name.
Jaeger can also deduce an overall architecture by analyzing the request traces. This diagram is available under the
System Architecture > DAG tab:
The graph shows that you made two requests, which were routed to the
frontend service. The
frontend service then fanned out requests to the
Returning to the
Search tab of the Jaeger UI, you can see that in the current cluster you have traces generated for the following three entrypoints :
traefik-dashboard, which you used for lookup
ping api, used by Kubernetes for health checks
hotrod.localhost, used by the Hot R.O.D. application
As you deploy more applications to your cluster, you will see more entries in the Operations drop-down, based on the
This post has presented a very simple demonstration of how to integrate Traefik with Jaeger. There is much more to explore with Jaeger, and similar integrations can be done with other tracing systems, such as Zipkin and Datadog. Whichever one you choose, Traefik makes it easy to follow the progress of each request and gain insights into application flow.
We hope you've enjoyed this series of articles on how Traefik's capabilities can enable app monitoring and health analysis for SRI. If you missed the earlier installments on log aggregation and metrics, respectively, be sure to take a look. All three articles demonstrate how readily available open-source software, including Traefik, can empower practices that both increase app uptime and contribute to improving the design of distributed systems.
If you'd like to explore Traefik's monitoring and visibility features even further, check out Traefik Pilot, the SaaS monitoring and management platform from Traefik Labs.