Today we made a mistake. The mistake caused a number of LGBTQIA+ sites to inadvertently be blocked by the new 22.214.171.124 for Families service. I wanted to walk through what happened, why, and what we've done to fix it.
As is our tradition for the last three years, we roll out new products for the general public that uses the Internet on April 1. This year, one of those products was a filtered DNS service, 126.96.36.199 for Families. The service allows anyone who chooses to use it to restrict certain categories of sites.
Filtered vs Unfiltered DNS
Nothing about our new filtered DNS service changes the unfiltered nature of our original 188.8.131.52 service. However, we recognized that some people want a way to control what content is in their home. For instance, I block social media sites from resolving while I am trying to get work done because it makes me more productive. The number one request from users of 184.108.40.206 was that we create a version of the service for home use to block certain categories of sites. And so, earlier today, we launched 220.127.116.11 for Families.
Over time, we'll provide the ability for users of 18.104.22.168 for Families to customize exactly what categories they block (e.g., do what I do with social media sites to stay productive). But, initially, we created two default settings that were the most requested types of content people wanted to block: Malware (which you can block by setting 22.214.171.124 and 126.96.36.199 as your DNS resolvers) and Malware + Adult Content (which you can block by setting 188.8.131.52 and 184.108.40.206 as your DNS resolvers).
Licensed Categorization Data
To get data for 220.127.116.11 for Families we licensed feeds from multiple different providers who specialize in site categorization. We spent the last several months reviewing classification providers to choose the ones that had the highest accuracy and lowest false positives.
Malware, encompassing a range of widely agreed upon cyber security threats, was the easier of the two categories to define. For Adult Content, we aimed to mirror the Google SafeSearch criteria. Google has been thoughtful in this area and their SafeSearch tool is designed to limit search results for "sexually explicit content." The definition is focused on pornography and largely follows the requirements of the US Children's Internet Protection Act (CIPA), which schools and libraries in the United States are required to follow.
Because it was the default for the 18.104.22.168 service, and because we planned in the future to allow individuals to set their own specifications beyond the default, we intended the Adult Content category to be narrow. What we did not intend to include in the Adult Content category was LGBTQIA+ content. And yet, when it launched, we were horrified to receive reports that those sites were being filtered.
Choosing the Wrong Feed
So what went wrong? The data providers that we license content from have different categorizations; those categorizations do not line up perfectly between different providers. One of the providers has multiple "Adult Content" categories. One “Adult Content” category includes content that mirrors the Google SafeSearch/CIPA definition. Another “Adult Content” content category includes a broader set of topics, including LGBTQIA+ sites.
While we had specifically reviewed the Adult Content category to ensure that it was narrowly tailored to mirror the Google SafeSearch/CIPA definition, when we released the production version this morning we included the wrong “Adult Content” category from the provider in the build. As a result, the first users who tried 22.214.171.124 saw a broader set of sites being filtered than was intended, including LGBTQIA+ content. We immediately worked to fix the issue.
Slow to Update Data Structures
In order to distribute the list of sites quickly to all our data centers we use a compact data structure. The upside is that we can replicate the data structure worldwide very efficiently. The downside is that generating a new version of the data structure takes several hours. The minute we saw that we'd made a mistake we pulled the incorrect data provider and began recreating the new data structure.
While the new data structure replicated across our network we pushed individual sites to an allow list immediately. We began compiling lists both from user reports as well as from other LGBTQIA+ resources. These updates went out instantly. We continuously added sites to the allow list as they were reported or we discovered them.
By 16:51 UTC, approximately two hours after we’d received the first report of the mistaken blocking, the data structure with the intended definition of Adult Content had been generated and we pushed it out live. The only users that would have seen over-broad blocking are those that had already switched to the 126.96.36.199 service. Users of 188.8.131.52 — which will remain unfiltered — and 184.108.40.206 would not have experienced this inadvertent blocking.
As of now, the filtering provided by the default setting of 220.127.116.11 is what we intended it to be, and should roughly match what you find if you use Google SafeSearch and LGBTQIA+ sites are not being blocked. If you see site being blocked that should not be, please report them to us here.
Protections for the Future
Going forward, we've set up a number of checks of known sites that should fall outside the intended categories, including many that we mistakenly listed today. Before defaults are updated in the future, our build system will confirm that none of these sites are listed. We hope this will help catch mistakes like this in the future.
I'm sorry for the error. While I understand how it happened, it should never have happened. I appreciate our team responding quickly to fix the mistake we made.