Today we made a mistake. The mistake caused a number of LGBTQIA+ sites to inadvertently be blocked by the new 18.104.22.168 for Families service. I wanted to walk through what happened, why, and what we've done to fix it.
As is our tradition for the last three years, we roll out new products for the general public that uses the Internet on April 1. This year, one of those products was a filtered DNS service, 22.214.171.124 for Families. The service allows anyone who chooses to use it to restrict certain categories of sites.
Filtered vs Unfiltered DNS
Nothing about our new filtered DNS service changes the unfiltered nature of our original 126.96.36.199 service. However, we recognized that some people want a way to control what content is in their home. For instance, I block social media sites from resolving while I am trying to get work done because it makes me more productive. The number one request from users of 188.8.131.52 was that we create a version of the service for home use to block certain categories of sites. And so, earlier today, we launched 184.108.40.206 for Families.
Over time, we'll provide the ability for users of 220.127.116.11 for Families to customize exactly what categories they block (e.g., do what I do with social media sites to stay productive). But, initially, we created two default settings that were the most requested types of content people wanted to block: Malware (which you can block by setting 18.104.22.168 and 22.214.171.124 as your DNS resolvers) and Malware + Adult Content (which you can block by setting 126.96.36.199 and 188.8.131.52 as your DNS resolvers).
Licensed Categorization Data
To get data for 184.108.40.206 for Families we licensed feeds from multiple different providers who specialize in site categorization. We spent the last several months reviewing classification providers to choose the ones that had the highest accuracy and lowest false positives.
Malware, encompassing a range of widely agreed upon cyber security threats, was the easier of the two categories to define. For Adult Content, we aimed to mirror the Google SafeSearch criteria. Google has been thoughtful in this area and their SafeSearch tool is designed to limit search results for "sexually explicit content." The definition is focused on pornography and largely follows the requirements of the US Children's Internet Protection Act (CIPA), which schools and libraries in the United States are required to follow.
Because it was the default for the 220.127.116.11 service, and because we planned in the future to allow individuals to set their own specifications beyond the default, we intended the Adult Content category to be narrow. What we did not intend to include in the Adult Content category was LGBTQIA+ content. And yet, when it launched, we were horrified to receive reports that those sites were being filtered.
Choosing the Wrong Feed
So what went wrong? The data providers that we license content from have different categorizations; those categorizations do not line up perfectly between different providers. One of the providers has multiple "Adult Content" categories. One “Adult Content” category includes content that mirrors the Google SafeSearch/CIPA definition. Another “Adult Content” content category includes a broader set of topics, including LGBTQIA+ sites.
While we had specifically reviewed the Adult Content category to ensure that it was narrowly tailored to mirror the Google SafeSearch/CIPA definition, when we released the production version this morning we included the wrong “Adult Content” category from the provider in the build. As a result, the first users who tried 18.104.22.168 saw a broader set of sites being filtered than was intended, including LGBTQIA+ content. We immediately worked to fix the issue.
Slow to Update Data Structures
In order to distribute the list of sites quickly to all our data centers we use a compact data structure. The upside is that we can replicate the data structure worldwide very efficiently. The downside is that generating a new version of the data structure takes several hours. The minute we saw that we'd made a mistake we pulled the incorrect data provider and began recreating the new data structure.
While the new data structure replicated across our network we pushed individual sites to an allow list immediately. We began compiling lists both from user reports as well as from other LGBTQIA+ resources. These updates went out instantly. We continuously added sites to the allow list as they were reported or we discovered them.
By 16:51 UTC, approximately two hours after we’d received the first report of the mistaken blocking, the data structure with the intended definition of Adult Content had been generated and we pushed it out live. The only users that would have seen over-broad blocking are those that had already switched to the 22.214.171.124 service. Users of 126.96.36.199 — which will remain unfiltered — and 188.8.131.52 would not have experienced this inadvertent blocking.
As of now, the filtering provided by the default setting of 184.108.40.206 is what we intended it to be, and should roughly match what you find if you use Google SafeSearch and LGBTQIA+ sites are not being blocked. If you see site being blocked that should not be, please report them to us here.
Protections for the Future
Going forward, we've set up a number of checks of known sites that should fall outside the intended categories, including many that we mistakenly listed today. Before defaults are updated in the future, our build system will confirm that none of these sites are listed. We hope this will help catch mistakes like this in the future.
I'm sorry for the error. While I understand how it happened, it should never have happened. I appreciate our team responding quickly to fix the mistake we made.