Fixing reachability to 1.1.1.1, GLOBALLY!

Recently we announced our fast, privacy-centric DNS resolver 1.1.1.1, supported by our global network. As you can see 1.1.1.1 is very easy to remember, which is both a blessing and a curse. In the time leading up to the announcement of the resolver service we began testing reachability to 1.1.1.1, primarily using the RIPE Atlas probes. The RIPE Atlas project is an extensive collection of small monitoring devices hosted by the public around the world. Currently there are over 10,000 active probes hosted in over 3,000 networks, giving great vantage points for testing. We found large numbers of probes unable to query 1.1.1.1, but successfully able to query 1.0.0.1 in almost all cases. 1.0.0.1 is the secondary address we have assigned for the resolver, to allow clients who are unable to reach 1.1.1.1 to be able to make DNS queries.

This blog focuses on IPv4. We provide four IPs (two for each IP address family) in order to provide a path toward the DNS resolver independent of IPv4 or IPv6 reachability.

1.0.0.0/8 was assigned in 2010 to APNIC, before this time it was unassigned space.

% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object

inetnum:      1.0.0.0 - 1.255.255.255
organisation: APNIC
status:       ALLOCATED

whois:        whois.apnic.net

changed:      2010-01
source:       IANA

https://www.iana.org/whois?q=1.0.0.0%2F8

Unassigned, however is not the same as reserved for private use!

What we found

To put it simply, 1.1.1.1 was BROKEN! The good news is, for most users 1.1.1.1 is now reachable. We’ve worked hard to ensure that issues get resolved and continue to contact operators to resolve issues quickly. We’re confident we can get everything cleaned up, but this is a stark reminder that you shouldn’t hijack IP addresses not assigned to you. We found over 1,000 probes out of just over 10,000 were unable to make DNS queries to 1.1.1.1 successfully. Some of this was due to single large networks having reachability issues, for example a large operator in Germany has nearly 350 probes connected, all of them failing. The methodology for testing was very simple:

Run a DNS lookup measurement towards 1.1.1.1
Find probes where the lookup fails
Run a traceroute with affected probes
Analyse the result

The results were quite mixed, but fell into three main causes:

Built-in 1.1.1.1 ISP routers using 1.1.1.1 as an internal IP address, preventing queries from reaching the real 1.1.1.1
Blackholing 1.1.1.1 ISPs statically configuring a route for 1.1.1.1 inside their networks, preventing traffic leaving their network, either through routing internally, or by sending the packets to null0
Filtering 1.1.1.1 ISPs dropping packets on ingress or egress to/from their network when sourced/destined from/to 1.1.1.1

Of these three main causes, the majority of cases were either 1, 2 or both! Several ISPs even had route loops internally, where they were advertising 1.1.1.1 inside their network, but had no actually path to it, so packets loop around and around.

Time to get fixing

Once we had narrowed it down for each group of probes we began contacting ISPs for clarification on what was happening, several networks responded very quickly reporting they had removed an internal route to 1.1.1.1, which in most cases was the beginning and end of the matter. There were plenty of networks which took their time to respond, but in the end did the same, removing the internal routes left there for legacy testing reasons.

All of those fixes were great to make, but most were quite uninteresting, what was more interesting was finding cases from issue number 1, CPEs (customer premises equipment) aka home routers, gateways and wireless access points. With the help from the folks at Sonic we were quickly able to identify that the Pace 5268 a common xDSL modem deployed primarily in the United States (including wide usage on AT&T) uses 1.1.1.0/29 for internal communication. We requested comment from AT&T’s noc, but have not heard anything from them. We did however receive a response from them via social media:

DaMmfBTUQAEmc3k

Independent investigation confirms the findings:

The same finding was made with the D-Link DMG-6661, which was reported to us by a user from Brazil connected to Vivo FTTH.

Screen-Shot-2018-04-05-at-10.57.21

Another user in Argentina connected to Telefónica found the issue on the Mitrastar GPT-2541GNAC.

DaKwk7eW0AEUKL-

It appears this CPE has been deployed in many of Telefónica's networks internationally.

We noticed similar behaviour on a large portion of probes connected to Orange France, we contacted them and received a swift response that the CPE team was investigating the issue. After providing more details they came back to us with a statement.

We have escalated your alert inside ORANGE France.
Our CPE team has analyzed the issue which is now well understood. This problem only impacts a subset of our CPEs. The commitment of the fix is currently on going and a deployment on our CPEs will follow.
In case of complaints notified by our customers in the meantime we have prepared a communication to inform them that we are currently fixing the issue.

The CPE in question is the Livebox, although it's not clear which versions are affected, it should be resolved by Orange across all affected devices. Users in Poland reported the same issues as users in France, likely due to Orange deploying the same models across multiple networks.

By far my favourite response was from the friendly folks at Telenor:

I have corrected all routers in our network now that had an awful old solution that is now obsolete. Thank you Cloudflare for the help to get it done!

Obsolescence is inevitable, but the desire to speedily fix such occurrences is great to see.

These are by no means all devices that have issues, but some of the wider deployed ones. The current list we have of affected devices is:

Pace (Arris) 5268
D-Link DMG-6661
Technicolor C2100 Series
Mitrastar GPT-2541GNAC
Askey RTF3507VW-N1
Calix GigaCenter
Nomadix (model(s) unknown)
Xerox Phaser multi-function printer
See below :)

If you have a device that is affected, please let us know in the comments. A good example of this is a super-low latency with only 1 hop to 1.1.1.1:

Traceroute to 1.1.1.1 (1.1.1.1), 48 byte packets

1 1.1.1.1 1dot1dot1dot1.cloudflare-dns.com AS13335 8.301ms 1.879ms 1.836ms

Who else has been mis-using 1.1.1.1 more than others?

Using the RIPE Atlas probes gives excellent visibility into residential and business internet connections, however they’re connected via a cable, so this rules out another use-case, WiFi access points. After very little research we quickly came across Cisco mis-using 1.1.1.1, a quick search for “cisco 1.1.1.1” brought up numerous articles where Cisco are squatting on 1.1.1.1 for their Wireless LAN Controllers (WLC). It’s unclear if Cisco officially regards 1.0.0.0/8 as bogon space, but there are lots of examples that can be found on their community websites giving example bogon lists that include the /8. It mostly seems to be used for captive portal when authenticating to the wireless access point, often found in hotels, cafés and other public WiFi hotspot locations.

cisco_captive_portal_issue

Some statistics

Here are some interesting statistics from before we started contacting operators and after we have fixed many issues. It’s staggering to see how fixing some key networks increased availability by almost 20% in Europe & North America!

We began testing the availability of 1.1.1.1 on the 23rd of March, in Europe and North America it was only around 91%.

catchpoint_eu_us_march
1.1.1.1 availability from Europe and North America, 23rd of March

By the 3rd of April, our work cleaning up the space had pushed the availability up to 97%.

EU-US-3rd-of-April-2018
1.1.1.1 availability from Europe and North America, 3rd of April

For the rest of the world, excluding Europe and North America, availability to 1.1.1.1 was only 73% on the 23d of March.

world_without_eu_us_march
1.1.1.1 availability for the World (Europe and North America excluded), 23rd of March

By the 3rd of April we've made a tonne of progress and managed to clean up enough of the bad routing that availability was up to 85%.

World-without-EU-US-3rd-of-April-2018
1.1.1.1 availability for the World (Europe and North America excluded), 3rd of April

We are continuing to work with ISPs and CPE manufacturers to clean up bad routing globally. Our goal is for 1.1.1.1 to be properly routed and available for 100% of Internet users.

Above images from Catchpoint analytics

Bad traffic

The last public analysis was done in 2010 by RIPE and APNIC. At the time, 1.1.1.0/24 was 100 to 200Mb/s of traffic, most of it being audio traffic. In March, when Cloudflare announced 1.0.0.0/24 and 1.1.1.0/24, ~10Gbps of unsolicited background traffic appeared on our interfaces.

The most targeted IPs were 1.1.1.1, 1.1.1.8, 1.1.1.10, 1.0.0.19. When searching these IPs, we notice it is usually misconfiguration or hardcoded IPs.

The destination port is usually 80/443, but also other variants of HTTP ports (8000, 8080, et al.) using UDP and TCP, seemingly trying to setup a proxy connection. Some of the traffic was also DHCP/BOOTP, iperf and syslog.

Some IPs are also the target of DDoS attacks (even before we announced the new service publicly). Analysing by source port we saw NTP and memcached, usually reaching 5Gbps for a few minutes. The short duration of the attack shows that could be a hardcoded IP in a botnet before it starts sending traffic to a specific target.

We also noticed daily patterns where 4 IPs receive the same amount of traffic (±50Mbps).

All of this bad traffic is unrelated to DNS, simple, unsolicited background traffic.

Screen-Shot-2018-04-05-at-7.28.41-AM

Conclusion

It was clear from the start that we’d have our work cut out, especially with CPE vendors, where a firmware update would be required. What was impressive was the willingness of operators to collaborate with us to clean up the legacy misconfiguration. It was clear that 1.1.1.1 needed a lot of cleaning in order to be globally accessible. We decided six months before the 1st of April release date we'd commit the network and support resources to that task. Now that 1.1.1.1 is live, we’re thankful to all the networks and hardware companies who have assisted us in this effort. We’re not done, nor are others.

The RIPE Atlas project was immensely useful in testing reachability from as many networks around the world as possible. If you’d like to help the project, please consider hosting a probe. Some networks are not covered with at least one probe, you can see if your ISP has a probe here, sorted by country.

Particular thanks to the following operators who were responsive and helped clean up issues quickly.

Airtel
Beirut-IX
BHTelecom
Comcast
ITC
Fastweb
Kazakhtelecom
Level(3)
LG Telecom
Liquid Telecom
MTN
Omantel
Rostelecom
SKBB
SFR
STC
Tata
Telecom Italia
Telenor
Telus
Turk Telekom
Turkcell
Voo
XS4ALL
Ziggo

We still have work to do, contacting operators that we see issues with, in the meantime you should be able to use our second IP address of 1.0.0.1, which has far fewer issues. Don't forget, both of our IPv6 addresses too: 2606:4700:4700::1001 and 2606:4700:4700::1111.

Do you still have reachability issues to 1.1.1.1? You can find more information at our community forum. We’d also recommend reporting such issues to your ISP, they may already be aware of issues, or they may need you to report it to them to start investigating. Whichever is true, making them aware is especially helpful, operators are not always receptive to reports from external parties.

The Cloudflare Blog

Fixing reachability to 1.1.1.1, GLOBALLY!

What we found

Time to get fixing

Who else has been mis-using 1.1.1.1 more than others?

Some statistics

Bad traffic

Conclusion

Improving authoritative DNS with the official release of Foundation DNS

Advanced DNS Protection: mitigating sophisticated DNS DDoS attacks

Remediating new DNSSEC resource exhaustion vulnerabilities

DDoS threat report for 2023 Q4