Blog

Case Study: Rocket.Chat Deploys Traefik to Manage Unified Communications at Scale

About Rocket.Chat

Rocket.Chat’s mission is to deliver the ultimate communication hub. Its collaboration platform enables enterprises to consolidate text, video, and voice communications, replace email with secure channels and private groups, and support the transition to remote work. With more than 12 million users, its customers range from open-source communities to education, government, financial institutions, and enterprises. The Rocket.Chat open-source project, upon which the company’s offerings are based, is one of the world’s largest open-source communities, with more than 1,500 contributors.

Overview

Rocket.Chat’s core offering is an on-premises centralized communications server that integrates with a variety of popular messaging platforms, such as WhatsApp, Facebook Messenger, Twilio, Live Chat, and SMS, among others. It also integrates with leading chatbot and machine learning providers.

In addition to its on-premises offering, however, Rocket.Chat has broadened its portfolio with the addition of Rocket.Chat Cloud, a version of the software that it offers as a managed service. As a SaaS offering, Rocket.Chat Cloud requires no technical knowledge to use, and customers can get up and running with Rocket.Chat clients in minutes with only a credit card.

Challenge

The decision to roll out a managed version of its software was a significant one for Rocket.Chat. Delivering a SaaS application is a different business than licensing software for use on premises. Customers of a managed service expect it to be available whenever they need it, and given a global customer base, that means 24/7.

As a service that aims to be the communications hub for enterprises large and small, it’s critical that Rocket.Chat Cloud is able to scale to meet high traffic volumes. It must not only accommodate new customers, but it must also be able to handle traffic spikes, such as might arise during important marketing campaigns, company events, or peak shopping periods.

"The goal, of course, is 100% uptime. One of our main challenges is making sure that users can actually get to our services and get fast response times." — Aaron Ogle, Lead Cloud Architect, Rocket.Chat

Faced with an ambitious project, Lead Cloud Architect Aaron Ogle and his team started by building a proof-of-concept infrastructure. While it successfully demonstrated the value of the plan, this early iteration of the project could only operate at limited scale. Critically, it was too difficult to add or remove services. Even basic scaling operations required administrators to reset the routing system before it would recognize each new service.

Solution

Before it could move into production, Rocket.Chat needed a more flexible and scalable solution. Its first answer was to move to a Kubernetes-centric infrastructure for increased agility and greater automation. The crucial step was to go all-in on Traefik Proxy for its application networking.

By deploying Traefik as a Kubernetes Ingress controller, Rocket.Chat relies on Traefik to manage the flow of requests from the external network to and from microservices running on Rocket.Chat’s Kubernetes clusters. Once Ogle and his development team realized how effortlessly Traefik integrated with the new architecture, they quickly made it a key component of their designs.

"We were able to very easily fit Traefik directly into the Kubernetes-based architecture, while knowing almost nothing about Traefik." — Aaron Ogle, Lead Cloud Architect, Rocket.Chat

With Traefik as its gatekeeper to the network, Rocket.Chat is now operating four Kubernetes clusters, with between 10-15 nodes on each cluster. Each cluster has from 2-4 load balancer nodes running Traefik, depending on the current load, and together they act as the edge perimeter for the clusters.

"Having Traefik right there at the edge, as our entry point into our infrastructure, really simplified our operations a lot. Otherwise, it would have been a complete mess." Aaron Ogle, Lead Cloud Architect, Rocket.Chat

What made Traefik an enabling factor for Rocket.Chat is that Traefik can automatically detect the presence of services as they are brought up and down and dynamically adjust networking routes accordingly. Load balancing between service instances is automatic, and inbound and outbound traffic routing is essentially a hands-free affair.

Bottom Line

Based upon Aaron’s team’s experience with Traefik in the early days of the Rocket.Chat Cloud project, Traefik is now squarely in the company’s plans for all its routing needs. Rocket.Chat even recommends Traefik as part of its reference guidance for its users’ own projects, ranging from community-level efforts, all the way up to large enterprises.

"As soon as we were fully committed to our cloud project, Traefik was there. We've been fans ever since." — Aaron Ogle, Lead Cloud Architect, Rocket.Chat

For Rocket.Chat Cloud, Traefik has already more than met expectations. Its servers handle a constant influx of chat messages, meaning that in a typical month, Traefik handles requests numbering in the billions.

“We bet big on Traefik, and the bet has paid off over and over. Traefik has been a workhorse. It’s out performed our expectations. — Aaron Ogle, Lead Cloud Architect, Rocket.Chat

That number is only likely to grow, but Aaron’s team estimates that Traefik will remain Rocket.Chat’s choice for a load balancing and edge solution for a long time to come.