Recent DNS Outage: Updates, Root Cause Analysis and Plan of Action

Earlier last week, we faced a Denial of Service attack on our DNS Servers which resulted in disruption of services for several resellers. We sincerely apologize for the inconvenience and disruption caused by the issue and as promised, here is a debrief. Following this unfortunate incident, we conducted a complete root cause analysis. Here’s what went on and what we’re doing to deal with potential future attacks.

Updates, Root cause analysis and Plan of action:

Over the last few days we have spent time analyzing our DNS servers architecture and distributed denial-of-service(DDOS) mitigation process & capacity at our Data center (DC). The post below covers aspects on what is wrong and what we are doing to fix it.

Managed DNS Architecture

Our DNS servers are spread out across 4 Data centers in the US. They are isolated both physically and at the network level with their own bandwidth capacity, network gear etc. They are all hosted with Softlayer, who have always provided us with the best service in all circumstances.

For each domain registered with us, your domain gets Managed DNS service for free with 4 Name servers configured. Below is an example that illustrates this –

domain.com registered with us gets 4 Name servers(NS) dns1.orderbox-dns.com , dns2.orderbox-dns.com, dns3.orderbox-dns.com and dns4.orderbox-dns.com. dns1 has 4 IP addresses and is hosted at DC1 using 2 physical servers, dns2 has 4 IP addresses and is hosted at DC2 using 2 physical servers and so on. So, in total, we serve our DNS traffic with 4 DC’s, each with 2 physical servers which gives us a capacity of 16 GBps network throughput. On each of these DNS servers we run a optimized version of PowerDNS with a capacity of 50000 qps. The total theoretical capacity of our DNS cluster is around 400,000 qps.

DDOS Mitigation Capacity

As mentioned before, our DNS servers are hosted at Softlayer and Softlayer’s network has been battle tested many times before during similar DDOS attacks. Each of the Softlayer’s DC is equipped with multiple 10Gbps or 40 gbps transit links to the internet and uses high-end networking gear. Softlayer also use Arbor Peakflow for DDOS detection and Arbor TMS for DDOS mitigation. Each of the Arbor TMS systems are capable of mitigating 10+ gbps of attack traffic.

What went wrong?

Typically we see one or few DNS Server IP addresses getting attacked and they get either null routed or mitigated on the TMS system. This activity is pretty common and we see two or three such incidents every week. We have always maintained our service levels during all such incidents.

During the recent attack, we received 40+ gbps traffic spread out across all our DNS server IP Addresses. The attack traffic was moving from one IP Address to the other at rapid succession. Softlayer, to prevent instability on their network null routed our IP Addresses. The null route is a rule to drop all traffic destined to our IP address at the Softlayer’s upstream ISP’s network. What this means is that after the null route is in place even Softlayer will not have the visibility onto what the attack traffic is.

Post this as explained in the previous post , we started removing each null route, finding and mitigating the attack on every IP.

What’s wrong with our setup ?

Problem 1 : Relying solely on one Datacenter provider and DDOS mitigation capabilities.
Problem 2 : We are bound to /32 static IP addresses provided by the DC’s. We are not utilizing our own /24 subnets to host the DNS servers. By using our own /24 subnets, we could have swung the traffic to our third party DDOS mitigation partner.
Problem 3 : All customers NS’s pointing to the same IP addresses. So when attack happens and causes disruption all customers are affected.

To solve these problems, we have planned a new DNS architecture in the last quarter and have made some progress in deploying the same. We will communicate the changes required by you and your customers as and when needed to ensure you all utilize the new setup. We sincerely regret and apologize for all the inconvenience. We understand that you count on us and to that effect, we’ll continue to render our services to the best of our ability to helping you build and grow your businesses to their full potential.

Leave a Reply

Your email address will not be published. Required fields are marked *

PS PHPCaptcha WP

You may also like:

Linux Software Startup Technology Telecom

Simplicity, Speed, Security, Reliability – The Prime Digital

There are domain extensions and combinations for all kinds of businesses, communities, styles, tastes and trends, providing, with all of these, a remarkable experience in the life of customers and consumers. Just use the creativity, and it will be possible to find several combinations..

Read More
Lifestyle Linux Software Technology Telecom

Freedom Gift for friends and family inside Shrinking Internet of Walled Gardens

Rather than purchasing such new gadgets, we encourage you to take the time to explore installing free software on one your friend or family member already owns, for example drive them away from Windows 11 to Linux or from Android to Ubuntu Mobile.

Read More
Manufacturing Software Startup Technology Telecom

Microsensors & Motherboards Industry

Micro sensors are devices that detect events, or changes in quantities, and deliver a corresponding output, typically in the form of an optical or electrical signal. These sensors are generally used to improve the functionality of the devices in which they are implemented.

Read More