<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Cloudflare Blog]]></title><description><![CDATA[Cloudflare is on a mission to help build a better Internet.]]></description><link>https://blog.cloudflare.com/</link><image><url>https://blog.cloudflare.com/favicon.png</url><title>Cloudflare Blog</title><link>https://blog.cloudflare.com/</link></image><generator>Ghost 1.24</generator><lastBuildDate>Fri, 22 Jun 2018 13:00:00 GMT</lastBuildDate><atom:link href="https://blog.cloudflare.com/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Building a serverless Slack bot using Cloudflare Workers]]></title><description><![CDATA[Our Workers platform can be used for a ton of useful purposes: for A/B testing, storage bucket authentication, coalescing responses from multiple APIs, and more. But Workers can also be put to use beyond "HTTP middleware": a Worker can effectively be a web application in its own right.]]></description><link>https://blog.cloudflare.com/building-a-serverless-slack-bot-using-cloudflare-workers/</link><guid isPermaLink="false">5b2ae1257cbc6900bf7f4335</guid><category><![CDATA[Tech Talks]]></category><category><![CDATA[Serverless]]></category><category><![CDATA[JavaScript]]></category><category><![CDATA[Workers]]></category><dc:creator><![CDATA[Rita Kozlov]]></dc:creator><pubDate>Fri, 22 Jun 2018 13:00:00 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/workers_slack_bot2.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/workers_slack_bot2.jpg" alt="Building a serverless Slack bot using Cloudflare Workers"><p>Our <a href="https://www.cloudflare.com/products/cloudflare-workers/">Workers platform</a> can be used for a ton of useful purposes: for A/B (multivariate) testing, storage bucket authentication, coalescing responses from multiple APIs, and more. But Workers can also be put to use beyond &quot;HTTP middleware&quot;: a Worker can effectively be a web application in its own right. Given the rise of 'chatbots', we can also build a <a href="https://api.slack.com/slack-apps">Slack app</a> using Cloudflare Workers, with no servers required (well, at least not yours!).</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/workers_slack_bot2-1.jpg" alt="Building a serverless Slack bot using Cloudflare Workers"></p>
<h3 id="whatarewebuilding">What are we Building?</h3>
<p>We're going to build a <a href="https://api.slack.com/slash-commands">Slack bot</a> (as an external webhook) for fetching the latest stock prices.</p>
<p>This Worker could also be adapted to fetch <a href="https://developer.github.com/v3/issues/">open issues</a> from GitHub's API; to <a href="https://www.themoviedb.org/documentation/api/discover">discover what movie to watch</a> after work; anything with a REST API you can make query against.</p>
<p>Nevertheless, our &quot;stock prices bot&quot;:</p>
<ul>
<li>Uses the <a href="https://www.alphavantage.co/documentation/">Alpha Vantage API</a> to fetch stock prices</li>
<li>Caches a map of the top equities to their public identifiers, so you can request <code>/stocks MSFT</code> as a shorthand.</li>
<li>Leverages Cloudflare's cache to minimize the need to hit the API on every invocation, whilst still serving recent price data.</li>
</ul>
<p>Using the cache allows you to improve your bot's response times across all invocations of your Worker. It's also polite to reduce redundant calls to an API (less you get rate limited!) where possible, so it's win-win.</p>
<h3 id="prerequisites">Prerequisites</h3>
<p>In order to get started, you'll need:</p>
<ul>
<li>A Cloudflare account, with Workers enabled (see note)</li>
<li>Some basic programming experience.</li>
<li>An existing Slack workspace. If you don't have one set up, follow Slack's <a href="https://get.slack.help/hc/en-us/articles/206845317-Create-a-Slack-workspace">helpful guide</a> to get one started.</li>
</ul>
<blockquote>
<p><em>Note: You can enable Workers via the &quot;Workers&quot; app in the Cloudflare dashboard.</em></p>
</blockquote>
<h3 id="creatingourworker">Creating our Worker</h3>
<p>We'll get our Worker up and running first, and test it outside of Slack before wiring it up. Our Worker needs to:</p>
<ol>
<li>Handle the incoming webhook (a HTTP POST request) from Slack, including authenticating it is actually from Slack.</li>
<li>Parsing the requested symbol from the user's message (the webhook body).</li>
<li>Making a request to the Alpha vantage API, as well as handling any errors that arise (invalid symbol, API unreachable, etc).</li>
<li>Building our response, and sending that back to Slack within 3s (the timeout).</li>
</ol>
<p>We'll step through each requirement and its associated code, deploy the Worker to a route, and then connect it to Slack.</p>
<h3 id="handlingthewebhook">Handling the Webhook</h3>
<p>Like all Cloudflare Workers, we need to add a hook for the <code>fetch</code> event and attach the entry point to our Worker. Our <code>slackWebhookHandler</code> function will then be responsible for triggering the rest of our logic and returning a <a href="https://developer.mozilla.org/en-US/docs/Web/API/Response"><code>Response</code></a> to Slack's request.</p>
<pre><code class="language-js">// SLACK_TOKEN is used to authenticate requests are from Slack.
// Keep this value secret.
const SLACK_TOKEN = &quot;SLACKTOKENGOESHERE&quot;
const BOT_NAME = &quot;Stock-bot 🤖&quot;
const ALPHA_VANTAGE_KEY = &quot;&quot;


let jsonHeaders = new Headers([[&quot;Content-Type&quot;, &quot;application/json&quot;]])

addEventListener(&quot;fetch&quot;, event =&gt; {
  event.respondWith(slackWebhookHandler(event.request))
})

/**
 * simpleResponse generates a simple JSON response
 * with the given status code and message.
 *
 * @param {Number} statusCode
 * @param {String} message
 */
function simpleResponse(statusCode, message) {
  let resp = {
    message: message,
    status: statusCode
  }

  return new Response(JSON.stringify(resp), {
    headers: jsonHeaders,
    status: statusCode
  })
}

/**
 * slackWebhookHandler handles an incoming Slack
 * webhook and generates a response.
 * @param {Request} request
 */
async function slackWebhookHandler(request) {
  // As per: https://api.slack.com/slash-commands
  // - Slash commands are outgoing webhooks (POST requests)
  // - Slack authenticates via a verification token.
  // - The webhook payload is provided as POST form data
  
  if (request.method !== &quot;POST&quot;) {
    return simpleResponse(
      200,
      `Hi, I'm ${BOT_NAME}, a Slack bot for fetching the latest stock prices`
    )
  }

  try {
    let formData = await request.formData()
    if (formData.get(&quot;token&quot;) !== SLACK_TOKEN) {
      return simpleResponse(403, &quot;invalid Slack verification token&quot;)
    }
    
    let parsed = parseMessage(formData)

  
    let reply = await stockRequest(parsed.stock)
    let line = `Current price (*${parsed.stock}*): 💵 USD $${reply.USD} (Last updated on ${reply.updated}).`

    return slackResponse(line)
  } catch (e) {
    return simpleResponse(
      200,
      `Sorry, I had an issue retrieving anything for that stock: ${e}`
    )
  }
}


</code></pre>
<p>Our handler is fairly straightforward:</p>
<ol>
<li>If the incoming request was not a POST request (i.e. what the Slack webhook is), we return some useful information.</li>
<li>For POST requests, we check that the token provided in the POST form data matches ours: this is how we validate the webhook is coming from Slack itself.</li>
<li>We then parse the user message, make a request to fetch the latest price, and construct our response.</li>
<li>If anything fails along the way, we return an error back to the user.</li>
</ol>
<p>While we were at it, we also built a couple of useful helper functions: <code>simpleResponse</code>, which is used for generating errors back to the client, and <code>slackResponse</code> (which we'll look at later) for generating responses in Slack's expected format.</p>
<p>The constants <code>SLACK_TOKEN</code>, <code>BOT_NAME</code>, and <code>ALPHA_VANTAGE_KEY</code>don't need to be calculated on every request, and so we've made them global, outside our request handling logic.</p>
<blockquote>
<p>Note: Caching (often called &quot;memoizing&quot;) static data outside of our request handler in a Worker allows it to be re-used across requests, should the Worker instance itself be re-used. Although the performance gain in this case is negligible, it's good practice, and doesn't hinder our Workers' readability.</p>
</blockquote>
<h3 id="parsingtheusermessage">Parsing the User Message</h3>
<p>Our next step is to parse the message sent in the POST request from Slack. This is where we capture the requested equity, ready to be passed to the Alpha Vantage API.</p>
<p>To receive the correct stock information from the API, we will parse the input we received from Slack, so that we can pass it to the API to collect information about the stock we are looking for.</p>
<pre><code class="language-js">/**
 * parseMessage parses the selected stock from the Slack message.
 *
 * @param {FormData} message - the message text
 * @return {Object} - an object containing the stock name.
 */
function parseMessage(message) {
  // 1. Parse the message (trim whitespace, uppercase)
  // 2. Return stock that we are looking for
  return {
    stock: message.get(&quot;text&quot;).trim().toUpperCase()
  }
}
</code></pre>
<p>Let's step through what we're doing:</p>
<ol>
<li>We pass in our <a href="https://developer.mozilla.org/en-US/docs/Web/API/FormData"><code>FormData</code></a> containing the user's message</li>
<li>We clean it up (i.e. trim surrounding whitespace, convert to uppercase)</li>
</ol>
<p>In the future, if we want to parse more values from the bot (maybe get the currency the user is interested in, or the date), we easily can add additional values to the object.</p>
<p>Now that we have the stock we are looking for, we can move on to making the API request!</p>
<h3 id="makingtheapirequest">Making the API Request</h3>
<p>We want to make sure our bot doesn't have to make a request to the Alpha Vantage API unnecessarily: if we had thousands of users every minute, there's no need to fetch the (same) price every time. We can fetch it once (per Cloudflare PoP), store in the Cloudflare cache for a short period of time (say 1 minute), and serve users that cached copy. This is win-win: our bot responds more quickly and we're kinder to the API we're consuming).</p>
<p>For customers on the Enterprise plan, you may also use the <code>cacheTtlByStatus</code> functionality, which allows you to set different TTLs based on the response status. This way, if you get an error code, you may only cache it for 1 second, or not cache at all, so that subsequent requests (once the API has been updated) will not fail as well.</p>
<p>Given the requested stock, we'll make a HTTP request to the API, confirm we got an acceptable response (HTTP 200), and then return an object with the fields we need:</p>
<pre><code class="language-js">
    let resp = await fetch(
      endpoint,
      { cf: { cacheTtl: 60} } // Cache our responses for 60s.
    )

</code></pre>
<p>The API output provides us with two things: the metadata and a line per time interval. For the purposes of our bot, we are going to keep the latest interval provided by the API and discard the rest. On future iterations, we may also call the monthly endpoint, and provide information on monthly highs and lows for comparison, but we will keep it simple for now.</p>
<p>We are going to be using the intradaily intervals API endpoint provided by Alpha Vantage. This will allow us to cache the response for each individual stock lookup, such that the next person to make a call to our bot may receive a cached version faster (and help us avoid getting rate limited by the API). Here, we will be choosing to optimize for having the latest data, rather than caching for longer periods of time.</p>
<p>You can see that we will be asking the API for a 1min interval.</p>
<pre><code>curl -s &quot;https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&amp;symbol=MSFT&amp;interval=1min&amp;apikey=KEY&quot; | jq
</code></pre>
<p>A single output block looks like this:</p>
<pre><code>{
      &quot;1. open&quot;: &quot;99.8950&quot;,
      &quot;2. high&quot;: &quot;99.8950&quot;,
      &quot;3. low&quot;: &quot;99.8300&quot;,
      &quot;4. close&quot;: &quot;99.8750&quot;,
      &quot;5. volume&quot;: &quot;34542&quot;
    },
</code></pre>
<p>To only get the last 1 minute interval, we will grab the last value that is provided to us by the API, and get the currently open price for it.</p>
<pre><code>/**
 * stockRequest makes a request to the Alpha Vantage API for the
 * given stock request.
 * Endpoint:  https://www.alphavantage.co/documentation/*
 * @param {string} stock - the stock to fetch the price for
 * @returns {Object} - an Object containing the stock, price in USD.
 */
async function stockRequest(stock) {
  let endpoint = new URL(&quot;https://www.alphavantage.co/query&quot;)

  endpoint.search = new URLSearchParams({&quot;function&quot; : &quot;TIME_SERIES_INTRADAY&quot; ,
    &quot;interval&quot; : &quot;1min&quot;,
    &quot;apikey&quot;: ALPHA_VANTAGE_KEY,
    &quot;symbol&quot;: stock
  })


  try {
    let resp = await fetch(
      endpoint,
      { cf: { cacheTtl: 60} } // Cache our responses for 60s.
    )

    if (resp.status !== 200) {
      throw new Error(`bad status code from Alpha Vantage: HTTP ${resp.status}`)
    }
    
    let data = await resp.json()
    let timeSeries = data[&quot;Time Series (1min)&quot;]

    // We want to use the last value (1 minute interval) that is provided by the API
    let timestamp = Object.keys(timeSeries)[1]
    let usd = timeSeries[timestamp][&quot;1. open&quot;]
    
    let reply = {
      stock: stock,
      USD: usd,
      updated: timestamp
    }
    
    return reply
  } catch (e) {
    throw new Error(`could not fetch the selected symbol: ${e}`)
  }
}

</code></pre>
<p>We build an object representing our reply. We're also careful to handle any errors should we get a bad response back from the API: be it a non-HTTP 200 response, or a non-JSON response body. When relying on a third-party service/API, any assumptions you make about the format or correctness of a response that could cause an exception to be thrown if broken—such as calling <code>resp.json()</code> on HTML body—must be accounted for.</p>
<p>Additionally, note that subrequests will respect the SSL Mode you have for your entire zone. Thus, if the SSL mode is set to flexible, Cloudflare will try to connect to the API over port 80 and over HTTP, and the request will fail (you will see a 525 error).</p>
<h3 id="respondingtoslack">Responding to Slack</h3>
<p>Slack expects responses in <a href="https://api.slack.com/slash-commands#responding_to_a_command">two possible formats</a>: a plain text string, or a simple JSON structure. Thus, we need to take our reply and build a response for Slack.</p>
<pre><code class="language-js">/**
 * slackResponse builds a message for Slack with the given text
 * and optional attachment text
 *
 * @param {string} text - the message text to return
 */
function slackResponse(text) {
  let content = {
    response_type: &quot;in_channel&quot;,
    text: text,
    attachments: []
  }

  return new Response(JSON.stringify(content), {
    headers: jsonHeaders,
    status: 200
  })
}

</code></pre>
<p>The corresponding part of our <code>slackWebhookHandler</code> deals with taking our reply object and passing it to <code>slackResponse</code> -</p>
<pre><code class="language-js">    let reply = await stockRequest(parsed.stock)
    let line = `Current price (*${parsed.stock}*): 💵 USD $${reply.USD} (Last updated on ${reply.updated}).`

    return slackResponse(line)
</code></pre>
<p>This returns a response to Slack that looks like this:</p>
<pre><code class="language-json">{
  &quot;response_type&quot;: &quot;in_channel&quot;,
  &quot;text&quot;: &quot;Current price (*MSFT*): 💵 USD $101.8300 (Last updated on 2018-06-20 11:52:00).&quot;,
  &quot;attachments&quot;: []
}
</code></pre>
<h3 id="configuringslacktestingourbot">Configuring Slack &amp; Testing Our Bot</h3>
<p>With our bot ready, let's configure Slack to talk to it for our chosen slash-command. First, log into Slack and head to the <a href="https://api.slack.com/apps">app management</a> dashboard.</p>
<p>You'll then want to click &quot;Create an App&quot; and fill in the fields, including nominating which  workspace to attach it to:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/slack-create-app.png" alt="Building a serverless Slack bot using Cloudflare Workers"></p>
<p>We'll then want to set it up as a Slash Command:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/slack-select-slash-command.png" alt="Building a serverless Slack bot using Cloudflare Workers"></p>
<p>Fill in the details: the request URL is the most important, and will reflect the route you've attached your Worker to. In our case, that's <code>https://bots.example.com/stockbot/stocks</code></p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/create-new-command.png" alt="Building a serverless Slack bot using Cloudflare Workers"></p>
<p>Fetch your App Credentials from the <em>Basic Information</em> tab: specifically, the <em>Verification Token</em>.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/app_credentials.png" alt="Building a serverless Slack bot using Cloudflare Workers"></p>
<p>Paste that value into our Worker bot as the value of our <code>SLACK_TOKEN</code> variable:</p>
<pre><code class="language-js">// SLACK_TOKEN is used to authenticate requests are from Slack.
// Keep this value secret.
let SLACK_TOKEN = &quot;PUTYOURTOKENHERE&quot;
</code></pre>
<p>Before hooking our bot up to Slack, we can test to make sure it responds correctly. We can emulate a request from Slack by making a POST request with token and message text via <code>curl</code> -</p>
<pre><code class="language-sh"># Replace with the hostname/route your Worker is running on
➜  ~  curl -X POST -F &quot;token=SLACKTOKENGOESHERE&quot; -F &quot;text=MSFT&quot; &quot;https://bots.example.com/stockbot/stock&quot;
{&quot;response_type&quot;:&quot;in_channel&quot;,&quot;text&quot;:&quot;Current price (MSFT): 💵 USD $101.7300&quot;,&quot;attachments&quot;:[]}
</code></pre>
<p>A correct response should net us the expected reply. If we intentionally send an invalid token instead, our bot should reply accordingly:</p>
<pre><code class="language-sh">➜  ~  curl -X POST https://bots.example.com/stockbot/stock
 -F &quot;token=OBVIOUSLYINCORRECTTOKEN&quot; -F &quot;text=MSFT&quot;
{&quot;message&quot;:&quot;invalid Slack verification token&quot;,&quot;status&quot;:403}%
</code></pre>
<p>... or an invalid symbol:</p>
<pre><code class="language-sh">➜  ~  curl -X POST https://bots.example.com/stockbot/stock -F &quot;token=SLACKTOKENGOESHERE&quot; -F &quot;text=BADSYMBOL&quot;
{&quot;message&quot;:&quot;Sorry, I had an issue retrieving anything for that symbol: Error: could not fetch the selected symbol: Error: bad status code from Alpha Vantage: HTTP 404&quot;,&quot;status&quot;:200}%
</code></pre>
<p>If you're running into issues, make sure your token is correct (case-sensitive), and that the stock you're after exists on Alpha Vantage. Beyond that however, we can now install the app to our Workspace (Slack will ask you to authorize the bot):</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/workspace.png" alt="Building a serverless Slack bot using Cloudflare Workers"></p>
<p>We can now call our bot via the slash command we assigned to it!</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/stockbot.gif" alt="Building a serverless Slack bot using Cloudflare Workers"></p>
<h4 id="wrap">Wrap</h4>
<p>With our Cloudflare Worker, we were able to put together a useful chat bot that responds quickly (within the 3 seconds Slack allows) thanks to Cloudflare's cache. We're also kind on the Alpha Vantage API, since we don't have to reach back out to it for a given symbol if we just fetched it recently.</p>
<p>We look forward to hearing what others have built using Workers!</p>
<h4 id="thankyou">Thank you</h4>
<p>Thank you to <a href="https://github.com/elithrar">Matt Silverlock</a> for his contribution to this post, and Cloudflare.</p>
</div>]]></content:encoded></item><item><title><![CDATA[DroneDeploy and Cloudflare Workers]]></title><description><![CDATA[When we launched Workers much of the focus was on use cases surrounding websites running on origins that needed extra oomph. With Workers you can easily take a site and introduce a raft of personalization capabilities around a range of services.]]></description><link>https://blog.cloudflare.com/dronedeploy-and-cloudflare-workers/</link><guid isPermaLink="false">5b2bce067cbc6900bf7f4356</guid><category><![CDATA[JavaScript]]></category><category><![CDATA[Tech Talks]]></category><category><![CDATA[Drones]]></category><category><![CDATA[API]]></category><category><![CDATA[Serverless]]></category><category><![CDATA[Workers]]></category><dc:creator><![CDATA[Jonathan Bruce]]></dc:creator><pubDate>Thu, 21 Jun 2018 16:36:59 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/image4-1.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/image4-1.jpg" alt="DroneDeploy and Cloudflare Workers"><p><img src="https://blog.cloudflare.com/content/images/2018/06/image4.jpg" alt="DroneDeploy and Cloudflare Workers"><small>Images courtesty of <a href="https://www.dronedeploy.com/">DroneDeploy</a></small></p>
<p>When we launched <a href="https://blog.cloudflare.com/cloudflare-workers-is-now-on-open-beta/">Workers</a> a few months ago, much of the focus was on use cases surrounding websites running on origins that needed extra oomph.  With Workers you can easily take a site, introduce a raft of personalization capabilities, A/B test changes or even aggregate a set of API responses around a range of services.  In short by layering in Cloudflare Workers we can take origin websites and do transformational things.</p>
<p>One of the joys of a platform, is that you never know where you are going to see the next use case.  Enter <a href="https://www.dronedeploy.com/">DroneDeploy</a></p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/image3.png" alt="DroneDeploy and Cloudflare Workers"><br>
DroneDeploy is a cloud platform that makes it easy to collect and analyze drone imagery and data. Simply install DroneDeploy on your mobile device and connect to a DJI drone. DroneDeploy flies the drone, collects the imagery, then stitches the photos into maps.</p>
<p>The maps can show things like crop conditions &amp; stress, construction project progress, or even thermal temperature ranges across vast solar farms or for search and rescue situations.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/image6.png" alt="DroneDeploy and Cloudflare Workers"><br>
<small>Using plant health algorithms applied to drone-generated maps, growers can pinpoint crop stress in their fields and stomp out pests, disease, or irrigation issues.</small></p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/image1.png" alt="DroneDeploy and Cloudflare Workers"><small>With Thermal Live Map, it’s possible to inspect solar farms from the sky in minutes to detect broken photocells in solar panels that are in need of repair.</small></p>
<p>You can then upload the images to the cloud and make high res maps and 3D models. With these you can perform deeper analysis (such as volumes, distances, plant health, etc), share and collaborate with coworkers, or move the maps and models into applications like CAD or Agriculture Management Platforms.</p>
<p>Check out how we were able to draw a flight path over Cloudflare’s HQ. The drone flew around the building and captured imagery that we turned into a map and 3D model.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/Screen-Shot-2018-06-21-at-10.08.50-AM.png" alt="DroneDeploy and Cloudflare Workers"></p>
<h4 id="sohowisdronedeployusingworkersandwhyisitimportanttodronedeploy">So how is DroneDeploy using Workers?  And why is it important to DroneDeploy?</h4>
<p>It’s important to understand that they want to maintain architectural freedom around the many services they use to make their service.  As with many software stacks today, they use GCP, AWS, and others, but they want to maintain flexibility in their network routing and authentication layer.</p>
<p>By offering a dramatically better experience to drone users in the field, they can both push the authentication out in front of a CDN and also serve collected images directly from our CDN (typically hundreds or thousands of tiles used to render maps or 3D models).  Many of DroneDeploy’s users operate in highly variable network conditions on job sites or in the field. Workers allows them to push their authentication to the edge; and use Workers to build a custom signed URL to ensure the correct images are surfaced to the correct consumer - in effect ensuring their multi-tenancy image storage model is safe guarded at the edge. To do this, DroneDeploy employs a URL authentication method commonly known as request signing which uses the Web Crypto API.</p>
<p>Commenting on this Eric Hauser, VP Engineering of DroneDeploy detailed the upside of Cloudflare Workers for his team:</p>
<blockquote>
<p><em>Cloudflare Workers provided us with flexibility when we ran into limitations with the shared capabilities of our primary infrastructure providers CDNs. Unique enterprise requirements around authentication, data security, and locality require us to have flexibility at our routing layer. From just the work we’ve done around authentication to date, we see an exciting and productive relationship with Cloudflare.</em></p>
</blockquote>
<p>Let's peel back the layers and understand how they use Workers.</p>
<p>DroneDeploy uses standard JWT authentication - if you are not sure what a JWT Token is, read more <a href="https://jwt.io/">here</a>.  So the general flow requires the Worker to:</p>
<ol>
<li>Intercept requests for images from the DroneDeploy mobile app or website.  These requests can number in the hundreds or thousands of image tiles all of which are needed to render a typical map or 3D model and are stored on either S3 or Google Cloud Storage.</li>
<li>Ensure the correct JSON Web Token (JWT) is present.</li>
<li>Assuming the token is valid, HMAC sign the URL, set cache headers, and return the appropriate file.</li>
</ol>
<p>Let's look at each step - note we filtered out some components of the code for security reasons.</p>
<pre><code class="language-javascript">addEventListener('fetch', event =&gt; {
  event.respondWith(handleFetch(event.request))
});

/**
 * Intercept a request
 * Validate the provided JWT credentials
 * If valid: 
 *   rewrite the request to the storage backend 
 *   sign the request with our backend credentials
 *   return the response from the storage backend
 * If not valid:
 *   return a 403 Forbidden response
 */
function handleFetch(request) {
  if (!(await isValidJwt(request))) {
    return new Response('Invalid JWT', { status: 403 })
  }
  const gsBaseUrl = createGoogleStorageUrl(request);
  const gsHeaders = new Headers();
  gsHeaders.set('Date', new Date().toUTCString());  // Required by Google for HMAC signed URLs
  const signature =  await hmacSignature(gsBaseUrl, gsHeaders);
  gsHeaders.set('Authorization', 'AWS ' + HMAC_KEY + ':' + signature);
  return fetch(new Request(gsBaseUrl, {headers: gsHeaders}))
}
</code></pre>
<br>
Now check for the JWT Token
<br>
<pre><code class="language-javascript">/**
 * Parse the JWT and validate it.
 *
 * We are just checking that the signature is valid, but you can do more that. 
 * For example, check that the payload has the expected entries or if the signature is expired..
 */ 
async function isValidJwt(request) {
  const encodedToken = getJwt(request);
  if (encodedToken === null) {
    return false
  }
  const token = decodeJwt(encodedToken);
  return isValidJwtSignature(token)
}

/**
 * For this example, the JWT is passed in as part of the Authorization header,
 * after the Bearer scheme.
 * Parse the JWT out of the header and return it.
 */
function getJwt(request) {
  const authHeader = request.headers.get('Authorization');
  if (!authHeader || authHeader.substring(0, 6) !== 'Bearer') {
    return null
  }
  return authHeader.substring(6).trim()
}
</code></pre>
<br>
Now decode the JWT Token
<br>
<pre><code class="language-javascript">/**
 * Parse and decode a JWT.
 * A JWT is three, base64 encoded, strings concatenated with ‘.’:
 *   a header, a payload, and the signature.
 * The signature is “URL safe”, in that ‘/+’ characters have been replaced by ‘_-’
 * 
 * Steps:
 * 1. Split the token at the ‘.’ character
 * 2. Base64 decode the individual parts
 * 3. Retain the raw Bas64 encoded strings to verify the signature
 */
function decodeJwt(token) {
  const parts = token.split('.');
  const header = JSON.parse(atob(parts[0]));
  const payload = JSON.parse(atob(parts[1]));
  const signature = atob(parts[2].replace(/_/g, '/').replace(/-/g, '+'));
  return {
    header: header,
    payload: payload,
    signature: signature,
    raw: { header: parts[0], payload: parts[1], signature: parts[2] }
  }
}

/**
 * Validate the JWT.
 *
 * Steps:
 * Reconstruct the signed message from the Base64 encoded strings.
 * Load the RSA public key into the crypto library.
 * Verify the signature with the message and the key.
 */
async function isValidJwtSignature(token) {
  const encoder = new TextEncoder();
  const data = encoder.encode([token.raw.header, token.raw.payload].join('.'));
  const signature = new Uint8Array(Array.from(token.signature).map(c =&gt; c.charCodeAt(0)));
  const jwk = {
    alg: 'RS256',
    e: 'AQAB',
    ext: true,
    key_ops: ['verify'],
    kty: 'RSA',
    n: RSA_PUBLIC_KEY
  };
  const key = await crypto.subtle.importKey('jwk', jwk, { name: 'RSASSA-PKCS1-v1_5', hash: 'SHA-256' }, false, ['verify']);
  return crypto.subtle.verify('RSASSA-PKCS1-v1_5', key, signature, data)
}
</code></pre>
<br>
Now HMAC sign the URL, and return the file.
<br>
<pre><code class="language-javascript">/**
 * Rewrite the URL from the original request to Google Storage API and bucket.
 */
function createGoogleStorageUrl(request) {
  const googlePrefix = 'https://storage.googleapis.com/BUCKET_NAME';
  const path = new URL(request.url).pathname;
  return new URL(googlePrefix + path)
}

/**
 * Create the HMAC signature for the Google Storage URL.
 */
async function hmacSignature(url, headers) {
  const encoder = new TextEncoder()
  const message = createMessage(url, headers)
  const key = await crypto.subtle.importKey('raw', encoder.encode(HMAC_SECRET), {name: 'HMAC', hash: 'SHA-1'}, false, ['sign'])
  const mac = await crypto.subtle.sign('HMAC', key, encoder.encode(message))
  return btoa(String.fromCharCode(...new Uint8Array(mac)))
}

/**
 * Google requires a specific format for the message that is signed.
 * More documentation can be found here:
 * https://cloud.google.com/storage/docs/migrating
 */
function createMessage(url, headers) {
  const verb = 'GET'
  return [
    verb,
    ‘’,  // GET requests don’t have Content-MD5 or Content-Type headers, so use empty strings
    ‘’,
    headers.get(‘Date’),
    url.pathname
  ].join('\n')
}
</code></pre>
<p>So the upside is clear - Authentication at the Edge provides flexibility, and scale but also means that DroneDeploy is not locked into an architecture that would prevent their ability to choose the best-in-class capabilities they need from GCP, AWS and more.</p>
<h4 id="sowheretofromhere">So where to from here?</h4>
<p>This Worker is the first of a few DroneDeploy are exploring.  In next generation Workers, DroneDeploy is looking to deliver a range of improvements all with a view of optimizing their customers experience by using Cloudflare’s cache in addition to other features Cloudflare has to offer. We’ll update the blog at that time.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Argo Tunnels: Spread the Load]]></title><description><![CDATA[We recently announced Argo Tunnel which allows you to deploy your applications anywhere, even if your webserver is sitting behind a NAT or firewall. Now, with support for load balancing, you can spread the traffic across your tunnels.]]></description><link>https://blog.cloudflare.com/argo-tunnels-spread-the-load/</link><guid isPermaLink="false">5b1e4ecd4ade3a00bf0a529e</guid><category><![CDATA[Argo]]></category><category><![CDATA[Argo Tunnel]]></category><category><![CDATA[Product News]]></category><category><![CDATA[Performance]]></category><category><![CDATA[Reliability]]></category><category><![CDATA[Load Balancing]]></category><dc:creator><![CDATA[Joaquin Madruga]]></dc:creator><pubDate>Wed, 20 Jun 2018 23:39:27 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/Salt_Cars-1.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/Salt_Cars-1.jpg" alt="Argo Tunnels: Spread the Load"><p>We recently announced <a href="https://www.cloudflare.com/products/argo-tunnel/">Argo Tunnel</a> which allows you to deploy your applications anywhere, even if your webserver is sitting behind a NAT or firewall.  Now, with support for load balancing, you can spread the traffic across your tunnels.</p>
<h3 id="aquickargotunnelrecap">A Quick Argo Tunnel Recap</h3>
<p>Argo Tunnel allows you to expose your web server to the internet without having to open routes in your firewall or setup dedicated routes.  Your servers stay safe inside your infrastructure.  All you need to do is install <em>cloudflared</em> (our open source agent) and point it to your server.  <em>cloudflared</em> will establish secure connections to our global network and securely forward requests to your service.  Since <em>cloudflared</em> initializes the connection, you don't need to open a hole in your firewall or create a complex routing policy.  Think of it as a lightweight GRE tunnel from Cloudflare to your server.</p>
<h3 id="tunnelsandloadbalancers">Tunnels and Load Balancers</h3>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/Salt_Cars.jpg" alt="Argo Tunnels: Spread the Load"><small><a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">CC BY-NC-ND 2.0</a> <a href="https://commons.wikimedia.org/wiki/File:Salt_Cars.jpg">image</a> by Carey Lyons</small></p>
<p>If you are running a simple service as a proof of concept or for local development, a single Argo Tunnel can be enough. For real-world deployments though, you almost always want multiple instances of your service running on seperate machines, availability zones, or even countries. Cloudflare’s distributed Load Balancing can now transparently balance traffic between how ever many Argo Tunnel instances you choose to create. Together this provides you with failure tolerance and, when combined with our geo-routing capabilities, improved performance around the world.</p>
<p>Want more performance in Australia? Just spin up more instances. Want to save money on the weekends? Just turn them off. Leave your firewalls closed and let Argo Tunnel handle the service discovery and routing for you.</p>
<p>On accounts with Load Balancing enabled, when you launch <em>cloudflared</em> to expose your web service, you can specify a load balancer you want to attach to, and we take care of the rest:</p>
<pre><code>cloudflared --lb-pool my_lb_pool --hostname myshinyservice.example.com --url http://localhost:8080
</code></pre>
<p>In the example above we'll take care of:</p>
<ul>
<li>Creating the DNS entry for your new service (myshinyservice.example.com).</li>
<li>Creating the Load Balancer (myshinyservice), if it doesn't exist.</li>
<li>Creating the Load Balancer Pool (my_lb_pool), if it doesn't exist.</li>
<li>Opening a tunnel and adding it to the pool.</li>
<li>Proxying all traffic from myshinyservice.example.com all the way to your server running on your localhost on port 8080.</li>
<li>Removing the tunnels from the pool when you shutdown <em>cloudflared</em>.</li>
</ul>
<p>If you run the same command from another machine with another server it will automatically join the pool and start sharing the load across both.  You're able to run a load balanced web service across multiple servers with a simple command.  You don't even need to login to the Cloudflare UI.</p>
<h3 id="loadbalancerfeatures">Load Balancer Features</h3>
<p>Now that you're running a resilient scalable web service, you'll probably want to delve into the other features the Cloudflare Load Balancing has to offer.  Go to the traffic page and take a look at your newly minted Load Balancer.  From there you can specify health checks, health check policy, routing policy and a fall-back pool in case your service is down.</p>
<h3 id="tryitout">Try it Out</h3>
<p>Head over to your dashboard and make sure you have Argo (Traffic-&gt;Argo-&gt;Tiered Caching + Smart Routing) and Load Balancer (Traffic-&gt;Load Balancing) enabled.  Start with the <a href="https://developers.cloudflare.com/argo-tunnel/quickstart/">Argo Tunnel Quickstart Guide</a> and run <em>cloudflared</em> with the --lb-pool option,  just like we did in the example above. At the moment we limit our non-Enterprise customers to just a handful of origins, but expect that limitation to be removed in the near future. For now, play away!</p>
</div>]]></content:encoded></item><item><title><![CDATA[Test New Features and Iterate Quickly with Cloudflare Workers]]></title><description><![CDATA[At Cloudflare, we believe that getting new products and features into the hands of customers as soon as possible is the best way to get great feedback. The thing about releasing products early and often is that sometimes they might not be initially ready for your entire user base.]]></description><link>https://blog.cloudflare.com/iterate-quickly-with-cloudflare-workers/</link><guid isPermaLink="false">5b293c7c7cbc6900bf7f42f2</guid><category><![CDATA[JavaScript]]></category><category><![CDATA[Tech Talks]]></category><category><![CDATA[Serverless]]></category><category><![CDATA[Workers]]></category><dc:creator><![CDATA[Remy Guercio]]></dc:creator><pubDate>Tue, 19 Jun 2018 19:11:14 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/photo-1526253038957-bce54e05968e.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/photo-1526253038957-bce54e05968e.jpg" alt="Test New Features and Iterate Quickly with Cloudflare Workers"><p><img src="https://images.unsplash.com/photo-1526253038957-bce54e05968e?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=9d7b88a36bbd544c26a37f356595fc67" alt="Test New Features and Iterate Quickly with Cloudflare Workers"><br>
<small>Photo by <a href="https://unsplash.com/@nesabymakers?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">NESA by Makers</a> / <a href="https://unsplash.com/?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Unsplash</a></small></p>
<p>At Cloudflare, we believe that getting new products and features into the hands of customers as soon as possible is the best way to get great feedback. The thing about releasing products early and often is that sometimes they might not be initially ready for your entire user base. You might want to provide access to only particular sets of customers that may be: power users, those who have expressed interest participating in a beta, or customers in need of a new feature the most.</p>
<p>As I have been meeting with many of the users who were in our own Workers beta program, I’ve seen (somewhat unsurprisingly) that many of our users share the same belief that they should be getting feedback from their own users early and often.</p>
<p>However, I was surprised to learn about the difficulty that many beta program members had in creating the necessary controls to quickly and securely gate new or deprecated features when testing and releasing updates.</p>
<p>Below are some ideas and recipes I’ve seen implemented inside of <a href="https://www.cloudflare.com/products/cloudflare-workers/">Cloudflare Workers</a> to ensure the appropriate customers have access to the correct features.</p>
<h3 id="howworkerswork">How Workers Work</h3>
<p>First, a brief primer on how Workers work.</p>
<p>As soon as a Worker is deployed, it is available and ready to run at every one of Cloudflare’s 155+ data centers in response to a request made to your website, application or API. Workers are able to modify anything about both the request to and response from your origin server. They also have the ability to make subrequests to other endpoints in response to the initial request.</p>
<p>Workers are able to make their own subrequests using the available fetch method. We’ll be relying on this as well as the fact that requests made via fetch are also cacheable by Cloudflare to make sure that gating of features is not just secure but also quick.</p>
<h3 id="howtosecurelycacheuserpermissions">How to Securely Cache User Permissions</h3>
<p>Let’s say you have an endpoint on your origin that allows us to securely pull the permissions for a particular user.</p>
<p><code>https://api.yoursite.com/user/{uid}</code></p>
<p>From a Cloudflare Worker we can securely fetch this permission information using a token and have it returned either as JSON or as part of the headers.<br></p>
<pre><code class="language-javascript">// Create Request
 var permissionRequest = new Request(permissionsURL, {
      method: 'GET', 
      headers: new Headers({
        'X-Auth-Token': 'super-secret-token'
      })
    });
// Make the request and wait for the response
var permissionResponse = await fetch(permissionRequest, { cf: { cacheTtl: 14400 } });

// Getting Permissions returned in the Headers
var newFeatureAvailable = permissionResponse.headers.get('X-YourSite-NewFeature');

// Getting Permissions returned as JSON
var jsonPermissions = await permissionResponse.json();
</code></pre>
<br>
<p>As I wrote earlier, the fetch request actually caches the responses generated when using it. So, subsequent Workers calls can grab user permissions without having to go back to the origin’s endpoint.</p>
<p>While the default cache TTL of 4 hours might work for many applications, fetch will also allow you to set an arbitrary TTL to ensure that your users are not granted permissions any longer than necessary. To set a TTL of 300 seconds (note: the free plan has a lower TTL limit of 2 hours or 7200 seconds) you would change the fetch above to be:<br>
<br></p>
<pre><code class="language-javascript">var permissionResponse = await fetch(permissionRequest, { cf: { cacheTtl: 300 } });
</code></pre>
<br>
<h4 id="anoteaboutcachingsensitiveobjects">A Note about Caching Sensitive Objects</h4>
<p>If you are storing sensitive information (like user permissions) in Cloudflare’s cache, it is always important to keep in mind that the url should never be publicly accessible, but rather only from within a Worker.</p>
<p>The Worker set to run in front of <code>api.yoursite.com/user/{uid}</code> should either block all requests to the path from outside of a Cloudflare Worker or check to ensure the request has a valid secret key.</p>
<h4 id="anoteaboutusingsupersecrettokens">A Note about Using “Super-Secret-Tokens”</h4>
<p>Tokens should be provided in your Worker when uploaded to Cloudflare and verified by your origin on each request. Extremely security conscious readers might be nervous about storing credentials in code, but note that Cloudflare strongly encourages 2FA as well as restricts Worker access to specific accounts. We are also exploring better ways of passing secrets to Workers.</p>
<h3 id="commonwaysofgatingnewfeatures">Common Ways of Gating New Features</h3>
<p>Now that you have quickly fetched the user permissions from cache, it’s time to do something with them! There are endless things you could do, but for this post I will cover some of the more common ones including: restricting paths, A/B Testing, and custom routing between origins.</p>
<h4 id="restrictingpaths">Restricting Paths</h4>
<p>Let’s say you’re releasing v2 of your current API. You want all users to still be able to send GET and POST requests to v1, but since you’re still performance tuning some new v2 features, only authorized users should be able to POST while everyone can GET. Continuing from the example before, this can be done with Cloudflare using the following code:<br>
<br></p>
<pre><code class="language-javascript">const apiV2 = jsonPermissions['apiV2'];

// Check to see if user in allowed to test the v2 API
if (apiV2) {
    // They're allowed to test v2 so pass everything through. 
    return fetch(request);
} else {
    // If they aren't specifically allowed to test v2 then we
    // only allow GETs everything else returns a 403 from the edge.
    if (request.method !== 'GET') {
        return new Response('Sorry, this page is not available.',
            { status: 403, statusText: 'Forbidden' });
    }
    return fetch(request);
}
</code></pre>
<br>
<h3 id="abtesting">A/B Testing</h3>
<br>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/workers-ab-test.png" alt="Test New Features and Iterate Quickly with Cloudflare Workers"><br>
When releasing a new API version you might also want to update your documentation with a new design, but before rolling out anything it’s important to run a test to make sure it improves (or doesn’t harm) your relevant metrics. A/B testing different versions of the documentation amongst users who have access to V2 of the API can be easily done with Cloudflare Workers:<br>
<br></p>
<pre><code class="language-javascript">const apiV2 = jsonPermissions['apiV2'];
const group = jsonPermissions['testingGroup'];

// Here we'll use a variable set in the JSON returned from
// the  user API to determine the users test group, but you 
// could also do this randomly by assigning a cookie to send back.
// Example: https://developers.cloudflare.com/workers/recipes/a-b-testing/

// Make sure the user is allowed to see API V2
if (apiV2) {
    let url = new URL(request.url);
    
    // Append the user's test group to the forwared request
    // Hidden from user: /docs/v2/ -&gt; /group-1/docs/v2/
    url.pathname = `/${group}${url.pathname}`;
    
    const modifiedRequest = new Request(url, {
        method: request.method,
        headers: request.headers
    });
    const response = await fetch(modifiedRequest);

    return response;
} else {
    // User shouldn't be allowed to see V2 docs
    return new Response('Sorry, this page is not yet available.',
        { status: 403, statusText: 'Forbidden' });
}
</code></pre>
<br>
<h4 id="customroutingbetweenorigins">Custom Routing Between Origins</h4>
<p>Spinning up a new version of an API or Application sometimes requires spinning up an entirely new origin server. Cloudflare Workers can easily route API calls to separate origins based on paths, headers, or anything else in the request. Here we’ll make sure the user has permission to access v2 of the API and then route the request to the dedicated origin:<br>
<br></p>
<pre><code class="language-javascript">const apiV2Allowed = jsonPermissions['apiV2Allowed'];

const v1origin = 'https://prod-v1-api.yoursite.com';
const v2origin = 'https://beta-v2-api.yoursite.com';

// Original URL: https://api.yoursite.com/v2/endpoint
const originalURL = new URL(request.url);
const originalPath = originalURL.pathname;
const apiVersion = originalPath.split('/')[1];
const endpoint = originalPath.split('/').splice(2).join('/');


if (apiVersion === 'v2') {
    if (apiV2Allowed) {
        let newUrl = new URL(v2origin);
        newUrl.pathname = endpoint;
        const modifiedRequest = new Request(newUrl, {
            method: request.method,
            headers: request.headers
        });
        return fetch(modifiedRequest);
    } else {
        return new Response('Sorry, this API version is not available.',
            { status: 403, statusText: 'Forbidden' });
    }
} else {
    let newUrl = new URL(v1origin);
    newUrl.pathname = endpoint;
    const modifiedRequest = new Request(newUrl, {
        method: request.method,
        headers: request.headers
    });
    return fetch(modifiedRequest);
}
</code></pre>
<br>
<p>Think I should have included another way of gating features? Make sure to share it on our <a href="https://community.cloudflare.com/tags/recipe-exchange">Cloudflare Community recipe exchange</a>.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Cloudflare Internet Summit - TweetStream]]></title><description><![CDATA[A collection of tweets from speakers, attendees, and staff at our UK Internet Summit. ]]></description><link>https://blog.cloudflare.com/internet-summit-tweetstream/</link><guid isPermaLink="false">5b0862971c604c00bf12b224</guid><category><![CDATA[Events]]></category><category><![CDATA[Internet Summit]]></category><dc:creator><![CDATA[Ryan Knight]]></dc:creator><pubDate>Thu, 14 Jun 2018 09:01:01 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/SummitMain.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/SummitMain.jpg" alt="Cloudflare Internet Summit - TweetStream"><p><a class="twitter-timeline" data-partner="tweetdeck" href="https://twitter.com/Yank/timelines/1000098577471815680?ref_src=twsrc%5Etfw">#InternetSummit - Curated tweets by Yank</a> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
</div>]]></content:encoded></item><item><title><![CDATA[Boston, London, & NY developers: We can't wait to meet you]]></title><description><![CDATA[Are you based in Boston, London, or New York? There's a lot going on this month from the London Internet Summit to Developer Week New York and additional meetups in Boston and New York. Drop by our events and connect with the Cloudflare community.]]></description><link>https://blog.cloudflare.com/boston-ny-developers-were-hosting-events-in-your-cities/</link><guid isPermaLink="false">5b1adedd4ade3a00bf0a528a</guid><category><![CDATA[Events]]></category><category><![CDATA[Developers]]></category><category><![CDATA[Internet Summit]]></category><dc:creator><![CDATA[Andrew Fitch]]></dc:creator><pubDate>Sat, 09 Jun 2018 17:27:59 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/photo-1523128662036-c964212cc9f0.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/photo-1523128662036-c964212cc9f0.jpg" alt="Boston, London, & NY developers: We can't wait to meet you"><p><img src="https://images.unsplash.com/photo-1523128662036-c964212cc9f0?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=23563dfbd1572df1b82db36be6456417" alt="Boston, London, & NY developers: We can't wait to meet you"><br>
<small>Photo by <a href="https://unsplash.com/@impatrickt?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Patrick Tomasso</a> / <a href="https://unsplash.com/?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Unsplash</a></small></p>
<p>Are you based in Boston, London, or New York? There's a lot going on this month from the <a href="https://www.cloudflare.com/internet-summit/london/">London Internet Summit</a> to <a href="http://www.developerweek.com/NYC/">Developer Week New York</a> and additional meetups in Boston and New York. Drop by our events and connect with the Cloudflare community.</p>
<h3 id="event1bostonuxintegrationsdeveloperexperienceapanelfeatdriftcloudflare">Event #1 (Boston): UX, Integrations, &amp; Developer Experience: A Panel feat. Drift &amp; Cloudflare</h3>
<p><a href="https://driftandcloudflare.eventbrite.com"><img src="https://blog.cloudflare.com/content/images/2018/06/drift.jpg" alt="Boston, London, & NY developers: We can't wait to meet you"></a><br>
<small>Photo by <a href="https://commons.wikimedia.org/w/index.php?title=Barrett_Lyon&amp;action=edit&amp;redlink=1">The Opte Project</a> / <a href="https://en.wikipedia.org/wiki/Main_Page">Originally from the English Wikipedia</a>; description page is/was <a href="https://en.wikipedia.org/wiki/File:Internet_map_1024.jpg">here</a></small></p>
<p><strong>Tuesday, June 12</strong>: 6:00 pm - 8:00 pm</p>
<p><strong>Location</strong>: <a href="https://www.drift.com/">Drift</a> - <a href="https://www.google.com/maps/place/222+Berkeley+St,+Boston,+MA+02116/@42.350665,-71.075501,17z/data=!3m1!4b1!4m5!3m4!1s0x89e37a74ad98b309:0xef6bd60d212b2bd6!8m2!3d42.3506611!4d-71.0733123">222 Berkley St, 6th Floor Boston, MA 02116</a></p>
<p>Join us at <a href="https://www.google.com/maps/place/222+Berkeley+St,+Boston,+MA+02116/@42.350665,-71.075501,17z/data=!3m1!4b1!4m5!3m4!1s0x89e37a74ad98b309:0xef6bd60d212b2bd6!8m2!3d42.3506611!4d-71.0733123">Drift HQ</a> for a panel discussion on user experience, developer experience, and integration, featuring <a href="https://twitter.com/eliast">Elias Torres</a> from Drift and <a href="https://twitter.com/conzorkingkong">Connor Peshek</a> and <a href="https://twitter.com/_Renahlee">Ollie Hsieh</a> from Cloudflare.</p>
<p>The panelists will speak about their experiences developing user-facing applications, best practices they learned in the process, the integration of the <a href="https://www.cloudflare.com/apps/drift">Drift app</a> and the <a href="https://www.cloudflare.com/apps/">Cloudflare Apps platform</a>, and future platform features.</p>
<p style="text-align: center"><a class="btn btn-warning" href="https://driftandcloudflare.eventbrite.com">View Event Details & Register Here &raquo;</a></p>
<h3 id="event2londoncloudflareinternetsummit">Event #2 (London): Cloudflare Internet Summit</h3>
<p><a href="https://www.cloudflare.com/internet-summit/london/"><img src="https://images.unsplash.com/photo-1508711046474-2f4c2d3d30ca?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=0fa85f17d882fba203e0e09669226046" alt="Boston, London, & NY developers: We can't wait to meet you"></a><br>
<small>Photo by <a href="https://unsplash.com/@lucamicheli?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Luca Micheli</a> / <a href="https://unsplash.com/?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Unsplash</a></small></p>
<p><strong>Thursday, June 14</strong>: 9:00 am - 6:00 pm</p>
<p><strong>Location</strong>: <a href="http://tobaccodocklondon.com/">The Tobacco Dock</a> - <a href="https://www.google.com/maps/place/Tobacco+Dock/@51.5081761,-0.0595714,15z/data=!4m2!3m1!1s0x0:0x438a8f1c8d683e45?sa=X&amp;ved=0ahUKEwj-2K6tmcXbAhXLi1QKHUuMDG4Q_BIImQEwEQ">Wapping Ln, St Katharine's &amp; Wapping, London E1W 2SF</a></p>
<p>The Internet Summit is focused on how the Internet will evolve over the next five years. The day-long event will feature a series of fireside chats, intimate panel discussions, and lively conversations from some of the brightest thought leaders, executives, entrepreneurs, researchers, and operators.</p>
<p>We don’t spend much time talking about Cloudflare at the Internet Summit but instead facilitate discussions with the people who inspire or challenge us.</p>
<p style="text-align: center"><a class="btn btn-warning" href="https://www.cloudflare.com/internet-summit/london/">Register & See Videos from Last Year Here &raquo;</a></p>
<h3 id="event3brooklyndelightingusersanddevelopers">Event #3 (Brooklyn): Delighting Users and Developers</h3>
<h4 id="lessonslearnedimprovinguxanddx">Lessons Learned Improving UX and DX</h4>
<p><a href="https://uxanddx.eventbrite.com"><img src="https://images.unsplash.com/photo-1512758017271-d7b84c2113f1?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=a743be09d9edb946069d22e97323b3d7" alt="Boston, London, & NY developers: We can't wait to meet you"></a><br>
<small>Photo by <a href="https://unsplash.com/@epicantus?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Daria Nepriakhina</a> / <a href="https://unsplash.com/?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Unsplash</a></small></p>
<p><strong>Tuesday, June 19</strong>: 5:45 pm - 7:45 pm</p>
<p><strong>Location</strong>: <a href="https://ramonabarnyc.com/">Ramona</a> - <a href="https://www.google.com/maps/place/113+Franklin+St,+Brooklyn,+NY+11222/data=!4m2!3m1!1s0x89c259387cdf288d:0x26afaada5042f424?sa=X&amp;ved=0ahUKEwj-ucTShMXbAhVnrVQKHYPpBjUQ8gEIJjAA">113 Franklin Street, Brooklyn, NY 11222</a></p>
<p><strong>Developer Experience &amp; User Experience: Tried &amp; True Methods for Improving Both</strong></p>
<p>Join us for a panel discussion on user experience, developer experience, and methods of improving both, featuring <a href="https://twitter.com/qiqing">Jade Wang</a>, <a href="https://twitter.com/jessperate">Jess Rosenberg</a>, <a href="https://twitter.com/conzorkingkong">Connor Peshek</a> and <a href="https://twitter.com/_Renahlee">Ollie Hsieh</a> from Cloudflare, and moderated by <a href="https://twitter.com/fitchaj">Andrew Fitch</a> from Cloudflare.</p>
<p>Our panelists will speak about their experiences developing user-facing applications, developer-facing tools, best practices they learned in the process, and future platform features.</p>
<p style="text-align: center"><a class="btn btn-warning" href="https://uxanddx.eventbrite.com">View Event Details & Register Here &raquo;</a></p>
<h3 id="event4brooklyndeveloperweekconferencetalk">Event #4 (Brooklyn): Developer Week Conference Talk</h3>
<h4 id="betterfasterstrongerwebaccelerationmobilenetworkoptimizationandaddingfeaturesontheedge">Better, Faster, Stronger: Web Acceleration, Mobile Network Optimization, and Adding Features on the Edge</h4>
<p><a href="https://developerweekny2018.sched.com/event/9bda69acea10e612b8cab017aee8abe1?iframe=no"><img src="https://blog.cloudflare.com/content/images/2018/06/Screen-Shot-2018-06-08-at-3.18.46-PM.png" alt="Boston, London, & NY developers: We can't wait to meet you"></a></p>
<p><strong>Wednesday, June 20</strong>: 10:00 am - 10:50 am</p>
<p><strong>Location</strong>: <a href="https://brooklynexpocenter.com/">Brooklyn Expo Center</a> - <a href="https://www.google.com/maps/place/72+Noble+St,+Brooklyn,+NY+11222/data=!4m2!3m1!1s0x89c25941be51c1bb:0xd7eb8487aa07833c?sa=X&amp;ved=0ahUKEwiZi-DAisXbAhVoiFQKHWrVCIIQ8gEIJjAA">72 Noble St, Brooklyn, NY 11222</a></p>
<p>If you happen to be attending <a href="http://www.developerweek.com/NYC/">Developer Week New York</a>, check out <a href="https://twitter.com/qiqing">Jade Wang</a>’s conference talk as well.</p>
<p>About 10% of all Internet requests flow through Cloudflare’s network. In addition to providing performance and security for over 7 million websites, Cloudflare exposes our entire infrastructure via a standard programmatic interface.</p>
<p>In this talk, Jade will cover:</p>
<ul>
<li>Improving mobile app performance, especially over spotty network connections (mobile SDK)</li>
<li>Access control at the edge (Cloudflare Access)</li>
<li>How to write JavaScript that runs on Cloudflare’s edge (Cloudflare Workers)</li>
<li>Write plugins that other people can install onto their websites (Cloudflare Apps)</li>
<li>If you could leverage 151+ data centers worldwide, what would you build?</li>
</ul>
<p style="text-align: center"><a class="btn btn-warning" href="https://developerweekny2018.sched.com/event/9bda69acea10e612b8cab017aee8abe1?iframe=no">View Event Details & Register Here &raquo;</a></p>
<p>We'll hope to meet you soon.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Introducing DNS Resolver for Tor]]></title><description><![CDATA[As was mentioned in the original 1.1.1.1 blog post, our policy is to never write client IP addresses to disk and wipe all logs within 24 hours. Still some folks might not want to reveal their IP address to the resolver at all. This is why we are launching a Tor hidden service for our resolver.]]></description><link>https://blog.cloudflare.com/welcome-hidden-resolver/</link><guid isPermaLink="false">5afb79b75fd79500bfb83b6a</guid><category><![CDATA[Crypto]]></category><category><![CDATA[1.1.1.1]]></category><category><![CDATA[Tor]]></category><category><![CDATA[Privacy]]></category><category><![CDATA[Security]]></category><category><![CDATA[DNS]]></category><category><![CDATA[Resolver]]></category><dc:creator><![CDATA[Mahrud Sayrafi]]></dc:creator><pubDate>Tue, 05 Jun 2018 14:46:17 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/tor-address-1.gif" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/tor-address-1.gif" alt="Introducing DNS Resolver for Tor"><p><img src="https://blog.cloudflare.com/content/images/2018/05/image_0.png" alt="Introducing DNS Resolver for Tor"></p>
<p>In case you haven’t heard yet, Cloudflare <a href="https://blog.cloudflare.com/dns-resolver-1-1-1-1/">launched</a> a privacy-first <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">DNS</a> resolver service on April 1st. It was no joke! The service, which was our first consumer-focused service, supports emerging DNS standards such as DNS over HTTPS:443 and TLS:853 in addition to traditional protocols over UDP:53 and TCP:53, all in one easy to remember address: <a href="https://1.1.1.1/">1.1.1.1</a>.</p>
<p>As it was mentioned in the original blog post, our policy is to never, ever write client IP addresses to disk and wipe all logs within 24 hours. Still, the exceptionally privacy-conscious folks might not want to reveal their IP address to the resolver at all, and we respect that. This is why we are launching a Tor onion service for our resolver at <a href="https://dns4torpnlfs2ifuz2s2yf3fc7rdmsbhm6rw75euj35pac6ap25zgqad.onion/">dns4torpnlfs2ifuz2s2yf3fc7rdmsbhm6rw75euj35pac6ap25zgqad.onion</a> and accessible via <a href="https://tor.cloudflare-dns.com/">tor.cloudflare-dns.com</a>.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/tor.gif" alt="Introducing DNS Resolver for Tor"></p>
<p><strong>NOTE:</strong> the hidden resolver is still an experimental service and should not be used in production or for other critical uses until it is more tested.</p>
<h3 id="crashcourseontor">Crash Course on Tor</h3>
<br>
<h4 id="whatistor">What is <a href="https://www.torproject.org/">Tor</a>?</h4>
<p>Imagine an alternative Internet where, in order to connect to www.cloudflare.com, instead of delegating the task of finding a path to our servers to your internet provider, you had to go through the following steps to reach Cloudflare:</p>
<ol>
<li>
<p>You calculate a path to your destination, like this:</p>
<pre><code> You -&gt; Your ISP -&gt; X -&gt; Y -&gt; Z -&gt; www.cloudflare.com.
</code></pre>
</li>
<li>
<p>You encrypt your packet with Z’s public key, then with Y’s, and finally with X’s.</p>
</li>
<li>
<p>You submit the result to X, who decrypts with their private key;</p>
</li>
<li>
<p>X submits the result to Y, who decrypts with their private key;</p>
</li>
<li>
<p>Y submits the result to Z, who decrypts with their private key to get the original packet;</p>
</li>
<li>
<p>Z submits the packet to <a href="https://blog.cloudflare.com/welcome-hidden-resolver/www.cloudflare.com">www.cloudflare.com</a>.</p>
</li>
</ol>
<p>If everyone plays their roles correctly, it is possible to ensure only the entry relay X knows your IP address and only the exit relay Z knows the website you’re connecting you, thereby providing you with privacy and anonymity. This is a simplified version of Tor: a collection of volunteer-run computers and servers around the world acting as relays for a huge network built on top of the Internet where every hop from one relay to the next peels one layer of encryption, hence its name: the onion router.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/exit-node.png" alt="Introducing DNS Resolver for Tor"></p>
<h4 id="whataretoronionservices">What are Tor onion services?</h4>
<p>Keeping internet users anonymous is not the only function of the Tor network. In particular, one caveat of the procedure above is that the connection is still accessible by the exit relay and anyone sitting between there and the destination, including network providers. To solve this problem, and to also provide anonymity for content publishers, Tor allows for onion services. Onion services are Tor nodes that advertise their public key, encoded as an address with .onion TLD, and establish connections entirely within the Tor network:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/image_3.png" alt="Introducing DNS Resolver for Tor"></p>
<h4 id="howdoyouresolveadomainwhileusingtor">How do you resolve a domain while using Tor?</h4>
<p>The process of returning an IP address given a domain name is called <em>DNS resolution</em>. Since Tor still uses IP addresses, you still need to do DNS resolution to browse the web over Tor. There are two common methods to resolve a domain name when using Tor:</p>
<ol>
<li>
<p>Resolve the name directly, then talk to the IP address through Tor;</p>
</li>
<li>
<p>Ask a Tor exit relay to resolve the name publicly and connect to the IP.</p>
</li>
</ol>
<p>Clearly, the first option leaks your IP to your DNS resolver and, unless your client uses DNS-over-HTTPS or DNS-over-TLS, it leaks your destination name to your ISP. What is less obvious is that the second option can open you to manipulation <a href="https://arstechnica.com/information-technology/2014/01/scientists-detect-spoiled-onions-trying-to-sabotage-tor-privacy-network/">attacks</a> such as DNS poisoning or sslstrip by <a href="https://trac.torproject.org/projects/tor/wiki/doc/ReportingBadRelays">bad relays</a>. This is where our new service comes in:</p>
<ol start="3">
<li>Ask a .onion-based resolver service!</li>
</ol>
<h3 id="howdoesthecloudflarehiddenresolverwork">How does the Cloudflare hidden resolver work?</h3>
<p>In a few words, our .onion-based resolver service is a Tor onion service which forwards all communication on DNS ports to the corresponding ports on 1.1.1.1, hence the apparent client IP is an internal IP rather than yours. There is, however, more than meets the eye.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/image_4.png" alt="Introducing DNS Resolver for Tor"></p>
<h4 id="isthehiddenresolversecure">Is the hidden resolver secure?</h4>
<p>One glaring difference between using 1.1.1.1 and this service is that the .onion address is &quot;dns4tor&quot; plus 49 seemingly random alphanumeric characters. This 56 character long string, in fact, contains a full Ed25519 public key which is used to secure communication with the onion service. This poses a number of challenges towards usable security:</p>
<ol>
<li>How can the users make sure that that the address is correct?</li>
</ol>
<p>We simply bought a <a href="https://crt.sh/?id=439705277">certificate</a> with tor.cloudflare-dns.com as subject name and the .onion address as a subject alternative name. This way, if you’re in the right place, you should see this:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/image_5.png" alt="Introducing DNS Resolver for Tor"></p>
<ol start="2">
<li>How can the users remember this address?</li>
</ol>
<p>We don’t think you should need to remember this address. Ideally, all you would need to do is go to <a href="https://tor.cloudflare-dns.com">https://tor.cloudflare-dns.com</a> and have the browser route your request to the .onion address. This is possible using the &quot;<a href="https://tools.ietf.org/html/rfc7838">Alt-Svc</a>&quot; HTTP header which is an optional header notifying the browser that the resources can be accessed from an alternative network location, possibly using a different protocol. Thanks to <a href="https://hacks.mozilla.org/2018/05/a-cartoon-intro-to-dns-over-https/">Mozilla</a>, using .onion addresses as alternative services is now possible in <a href="https://nightly.mozilla.org/">Firefox Nightly</a>.</p>
<p>Think of this feature like <a href="https://blog.cloudflare.com/opportunistic-encryption-bringing-http-2-to-the-unencrypted-web/">opportunistic encryption</a>: once your browser receives an Alt-Svc header indicating that a .onion address is available for tor.cloudflare-dns.com, if it knows that .onion addresses can be accessed (for instance through a SOCKS proxy), it attempts to check that the alternative service has the same or a higher level of security. This includes making sure that it is possible to connect to the onion service using the same certificate and <a href="https://tools.ietf.org/html/rfc6066#section-3">Server Name</a>. If that is the case, the browser uses the alternative service instead, therefore ensuring that your future requests do not leave the Tor network.</p>
<h4 id="isthehiddenresolverfast">Is the hidden resolver fast?</h4>
<p>Here is a thought experiment: suppose between each two points on Earth there is a fiber-optic cable, capable of lossless transmission of packets at the speed of light.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/image_6.png" alt="Introducing DNS Resolver for Tor"></p>
<p>Using a back-of-the-envelope calculation it’s easy to see that, on average, each packet traverses a distance equivalent to a <strong>quarter</strong> of the circumference of the Earth in about <strong>33ms</strong>, while each Tor packet takes about <strong>200ms</strong> to go <strong>one and a half</strong> turns around the Earth before reaching an onion service; that’s three turns for a round trip that ensures anonymity of both parties.</p>
<p>Cloudflare, however, does not require anonymity for its servers, which is why we can reduce the number of relays to just three by enabling an <a href="https://trac.torproject.org/projects/tor/ticket/17178">optional</a> <a href="https://gitweb.torproject.org/torspec.git/tree/proposals/260-rend-single-onion.txt">setting</a> for onion services that prioritize lower latency over location anonymity of the service. To emphasize, this does not impact client privacy or anonymity whatsoever. Indeed, as you may have noticed, in the first onion service image the origin is three hops away from the rendezvous point whereas our onion service is only one hop away.</p>
<p>We are actively working on developing ways to make this service faster and ensure it has as little downtime as possible.</p>
<h4 id="whyshouldiusethecloudflarehiddenresolver">Why should I use the Cloudflare hidden resolver?</h4>
<p>First and foremost, resolving DNS queries through the Tor network, for instance by connecting to Google’s 8.8.8.8 resolver, guarantees a significantly higher level of anonymity than making the requests directly. Not only does doing so prevent the resolver from ever seeing your IP address, even your ISP won’t know that you’ve attempted to resolve a domain name.</p>
<p>Still, unless the destination is an onion service, passive attackers can capture packets exiting the Tor network and malicious Exit Nodes can poison DNS queries or downgrade encryption through <a href="https://moxie.org/software/sslstrip/">sslstripping</a>. Even if you limit your browsing to <a href="https://www.eff.org/pages/tor-and-https">only HTTPS</a> sites, passive attackers can find out which addresses you’ve connected to. Even worse, actors capable of comparing traffic both before it enters the Tor network and after it leaves the network can potentially use the metadata (size, time, etc.) to <a href="https://nymity.ch/tor-dns/">deanonymize</a> the client. The only solution, then, is to eliminate the need for Exit Nodes by using onion services instead. That is what our .onion-based resolver offers.</p>
<p>Moreover, if your client does not support encrypted DNS queries, using a .onion-based resolver can secure the connection from on-path attacks, including BGP hijacking attacks. This means having the same level of security for DNS-over-UDP and DNS-over-TCP as DNS-over-HTTPS and DNS-over-TLS provides.</p>
<p>Your personal anonymity, however, is not the only reason why you should use this service. The power of Tor in ensuring everyone’s anonymity rests on the number of people who use it. If only whistleblowers, for instance, were to use the Tor network, then anyone connecting to the Tor network would automatically be suspected of being a whistleblower. Therefore the more people use Tor to browse memes or to watch cat videos on the Internet, the easier it will be for those who truly need anonymity to blend in with the traffic.</p>
<p>One barrier to using Tor for many users is that it is simply slow, so I can try to sympathize with those who wouldn’t sacrifice quick website load times to help keep activists and dissidents anonymous. That said, DNS requests are small in size and since most browsers and operating systems cache DNS results the total traffic is not significant. As a result, using the .onion-based resolver will only slightly slow down your initial DNS request without slowing down anything else, while still contributing to the overall anonymity of the Tor network and its users.</p>
<h3 id="whyshoulditrustthecloudflarehiddenresolver">Why should I trust the Cloudflare hidden resolver?</h3>
<p>Using a .onion-based resolver ensures that your ISP never finds out that you’re resolving a domain, the Exit Nodes don’t get a chance to manipulate DNS replies, and the resolver never finds out your IP address. However, the unique benefit of using the Cloudflare .onion-based resolver is combining the power of Tor with all privacy-preserving features of the 1.1.1.1 resolver, such as query name minimization, as well as a team of engineers working on improving it at every level, including standards like DNS-over-HTTPS and DNS-over-TLS.</p>
<p>As CEO Matthew Prince said about <a href="https://blog.cloudflare.com/the-trouble-with-tor/">two years ago</a>, anonymity online is a cause we value at Cloudflare. In addition, when we announced the 1.1.1.1 resolver we <a href="https://developers.cloudflare.com/1.1.1.1/commitment-to-privacy/">committed</a> to taking every technical step to ensure we can’t know what you do on the internet. Providing a way to use the resolver through the Tor network and making it as fast as possible is a big step in that direction.</p>
<h3 id="howtosetitup">How to set it up?</h3>
<p>The .onion-based resolver supports every DNS protocol that 1.1.1.1 supports, only over the Tor network. However, since not every DNS client is capable of connecting to the Tor network, some hacking is required to get it to work. Here we will explain how to set up DNS-over-HTTPS provided from the .onion-based resolver, but for all other scenarios head to our <a href="http://developers.cloudflare.com/1.1.1.1/fun-stuff/dns-over-tor/">developers page</a> to get the details of how to use the .onion-based resolver.</p>
<h4 id="remembercloudflared">Remember cloudflared?</h4>
<p>Here is how you can set up <code>cloudflared</code> to start a DNS client that uses DNS over HTTPS, routed through the Tor network:</p>
<ol>
<li>
<p>First, start with downloading <code>cloudflared</code> by following the regular guide for <a href="https://developers.cloudflare.com/1.1.1.1/dns-over-https/cloudflared-proxy/">Running a DNS over HTTPS Client</a>.</p>
</li>
<li>
<p>Start a Tor SOCKS proxy and use <code>socat</code> to forward port TCP:443 to localhost:</p>
<pre><code> socat TCP4-LISTEN:443,reuseaddr,fork SOCKS4A:127.0.0.1:dns4torpnlfs2ifuz2s2yf3fc7rdmsbhm6rw75euj35pac6ap25zgqad.onion:443,socksport=9150
</code></pre>
</li>
<li>
<p>Instruct your machine to treat the .onion address as localhost:</p>
<pre><code> cat &lt;&lt; EOF &gt;&gt; /etc/hosts
 127.0.0.1 dns4torpnlfs2ifuz2s2yf3fc7rdmsbhm6rw75euj35pac6ap25zgqad.onion
 EOF
</code></pre>
</li>
<li>
<p>Finally, start a local DNS over UDP daemon:</p>
<pre><code> cloudflared proxy-dns --upstream &quot;https://dns4torpnlfs2ifuz2s2yf3fc7rdmsbhm6rw75euj35pac6ap25zgqad.onion/dns-query&quot;
 INFO[0000] Adding DNS upstream                           url=&quot;https://dns4torpnlfs2ifuz2s2yf3fc7rdmsbhm6rw75euj35pac6ap25zgqad.onion/dns-query&quot;
 INFO[0000] Starting DNS over HTTPS proxy server          addr=&quot;dns://localhost:53&quot;
 INFO[0000] Starting metrics server                       addr=&quot;127.0.0.1:35659&quot;
</code></pre>
</li>
<li>
<p>Profit!</p>
</li>
</ol>
</div>]]></content:encoded></item><item><title><![CDATA[Cloudflare Workers Recipe Exchange]]></title><description><![CDATA[Share your Cloudflare Workers recipes with the Cloudflare Community. We’ve created a new tag “Recipe Exchange” in the Cloudflare Community Forum. We invite you to share your work, borrow / get inspired by the work of others, and upvote useful recipes written by others in the community. ]]></description><link>https://blog.cloudflare.com/cloudflare-workers-recipe-exchange/</link><guid isPermaLink="false">5ad662fa943509002252264a</guid><category><![CDATA[Community]]></category><category><![CDATA[Add-ons]]></category><category><![CDATA[Developers]]></category><category><![CDATA[JavaScript]]></category><category><![CDATA[Serverless]]></category><category><![CDATA[Workers]]></category><dc:creator><![CDATA[Jade Q. Wang]]></dc:creator><pubDate>Mon, 04 Jun 2018 21:25:55 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/developerplatform-1.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/developerplatform-1.png" alt="Cloudflare Workers Recipe Exchange"><p><img src="https://blog.cloudflare.com/content/images/2018/04/Indian_Spices-1.jpg" alt="Cloudflare Workers Recipe Exchange"><br>
<small><a href="https://commons.wikimedia.org/wiki/File:Indian_Spices.jpg">Photo of Indian Spices</a>, by Joe mon bkk. <a href="https://commons.wikimedia.org/wiki/File:Indian_Spices.jpg">Wikimedia Commons</a>, <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.</small></p>
<p>Share your <a href="https://developers.cloudflare.com/workers/about/">Cloudflare Workers</a> recipes with the <a href="https://community.cloudflare.com/c/developers/workers">Cloudflare Community</a>. Developers in Cloudflare’s community each bring a unique perspective that would yield use cases our core team could never have imagined. That is why we invite you to share Workers recipes that are useful in your own work, life, or hobby.</p>
<p>We’ve created a new tag <a href="https://community.cloudflare.com/tags/recipe-exchange">“Recipe Exchange”</a> in the Workers section of the <a href="https://community.cloudflare.com/c/developers/workers">Cloudflare Community Forum</a>. We invite you to share your work, borrow / get inspired by the work of others, and upvote useful recipes written by others in the community.</p>
<p align="center">
<a class="btn btn-warning" href="https://community.cloudflare.com/tags/recipe-exchange" target="_blank">Recipe Exchange in Cloudflare Community</a>
</p>
<p>We will be highlighting interesting and/or popular recipes (with author permission) in the coming months right here in this blog.</p>
<h3 id="whatiscloudflareworkersanyway">What is Cloudflare Workers, anyway?</h3>
<p><a href="https://developers.cloudflare.com/workers/about/">Cloudflare Workers</a> let you run JavaScript in Cloudflare’s hundreds of data centers around the world. Using a Worker, you can modify your site’s HTTP requests and responses, make parallel requests, or generate responses from the edge. Cloudflare Workers has been in open beta phase since February 1st. Read more about the launch in <a href="https://blog.cloudflare.com/cloudflare-workers-is-now-on-open-beta/">this blog post</a>.</p>
<h4 id="whatcanyoudowithcloudflareworkers">What can you do with Cloudflare Workers?</h4>
<p>Cloudflare has an incredibly powerful global network of 151 data centers, where you can put compute anywhere and you can write with a language you’re familiar with (JavaScript) with a standard API you're familiar with (<a href="https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API">Service Workers</a>). You can move your compute to be super low latency, to be much nearer to your end user, your database, your embedded device, or anything else you want faster round trips for.</p>
<p>You may access and leverage the power of Cloudflare’s network for your next applications or as an add-on for optimizing the performance of your existing projects by running JavaScript on Cloudflare’s edge network.</p>
<p>The <a href="https://blog.cloudflare.com/cloudflare-acquires-eager/">next</a> <a href="https://blog.cloudflare.com/neumob-optimizing-mobile/">product</a> that leverages Cloudflare's infrastructure to create security, performance, usability, and other optimizations for the Internet should have the lowest possible barriers to entry. Millions of customers should have access to those products on day one. That's how the Internet should work.</p>
<p><a href="https://developers.cloudflare.com/workers/writing-workers/"><img src="https://blog.cloudflare.com/content/images/2018/06/developerplatform.png" alt="Cloudflare Workers Recipe Exchange"></a><small>Cloudflare's Developer Platform. <a href="https://developers.cloudflare.com">See More≫</a></small></p>
<p>For inspiration, the Workers core team baked up <a href="https://developers.cloudflare.com/workers/recipes/">some recipes</a> to highlight <strong>a few popular use cases</strong>:</p>
<ul>
<li><a href="https://developers.cloudflare.com/workers/recipes/a-b-testing/">A/B Testing</a> You can create a Cloudflare Worker to control A/B tests.</li>
<li><a href="https://developers.cloudflare.com/workers/recipes/aggregating-multiple-requests/">Aggregating Multiple Requests</a> Here, we make multiple requests to different API endpoints, aggregate the responses and send it back as a single response.</li>
<li><a href="https://developers.cloudflare.com/workers/recipes/conditional-routing/">Conditional Routing</a> The easiest way to deliver different content based on the device being used is to rewrite the URL of the request based on the condition you care about.</li>
<li><a href="https://developers.cloudflare.com/workers/recipes/return-403/">Custom responses that don't hit origin servers</a> You can return responses directly from the edge. No need to hit your origin.</li>
<li><a href="https://developers.cloudflare.com/workers/recipes/hotlink-protection/">Hot-link Protection</a> You can use Cloudflare Workers to protect your hot-links on your web properties.</li>
<li><a href="https://developers.cloudflare.com/workers/recipes/post-requests/">Post Requests</a> Reading content from an HTTP POST request</li>
<li><a href="https://developers.cloudflare.com/workers/recipes/random-content-cookies/">Random Content Cookies</a> You can create random content cookies using Cloudflare Workers.</li>
<li><a href="https://developers.cloudflare.com/workers/recipes/signed-requests/">Signed Requests</a> A common URL authentication method known as request signing1 can be implemented in a worker with the help of the Web Crypto API.</li>
<li><a href="https://developers.cloudflare.com/workers/recipes/streaming-responses/">Streaming Responses</a> Minimize the visitor’s time-to-first-byte and the amount of buffering done in the worker script.</li>
</ul>
<h4 id="whatifiwanttojustofferanideaorausecase">What if I want to just offer an idea or a use case?</h4>
<p>That’s great! I’ve created <a href="https://community.cloudflare.com/t/recipe-request-thread/16906">this thread</a> in the <a href="https://community.cloudflare.com/c/developers/workers">Cloudflare Community forum</a> so anyone can submit a recipe idea, comment on one, and anyone can volunteer to write a recipe that there is a ready audience for.</p>
<p align="center">
<a class="btn btn-warning" href="https://community.cloudflare.com/tags/recipe-exchange" target="_blank">Share your recipe in the Community Recipe Exchange</a>
</p>
</div>]]></content:encoded></item><item><title><![CDATA[We have lift off - Rocket Loader GA is mobile!]]></title><description><![CDATA[Today we’re excited to announce the official GA of Rocket Loader, our JavaScript optimisation feature that will prioritise getting your content in front of your visitors faster than ever before with improved Mobile device support. ]]></description><link>https://blog.cloudflare.com/we-have-lift-off-rocket-loader-ga-is-mobile/</link><guid isPermaLink="false">5b101ed6fbb01000bfe54a6b</guid><category><![CDATA[Rocket Loader]]></category><category><![CDATA[Product News]]></category><category><![CDATA[Performance]]></category><category><![CDATA[Optimization]]></category><dc:creator><![CDATA[Simon Moore]]></dc:creator><pubDate>Fri, 01 Jun 2018 16:31:00 GMT</pubDate><media:content url="https://images.unsplash.com/photo-1457364887197-9150188c107b?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=29fd85889d0aabe55f1c1d0714df4afb" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://images.unsplash.com/photo-1457364887197-9150188c107b?ixlib=rb-0.3.5&q=80&fm=jpg&crop=entropy&cs=tinysrgb&w=1080&fit=max&ixid=eyJhcHBfaWQiOjExNzczfQ&s=29fd85889d0aabe55f1c1d0714df4afb" alt="We have lift off - Rocket Loader GA is mobile!"><p>Today we’re excited to announce the official GA of Rocket Loader, our JavaScript optimisation feature that will prioritise getting your content in front of your visitors faster than ever before with improved Mobile device support. In tests on www.cloudflare.com we saw reduction of 45% (almost 1 second) in First Contentful Paint times on our pages for visitors.</p>
<p><img src="https://images.unsplash.com/photo-1457364887197-9150188c107b?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=29fd85889d0aabe55f1c1d0714df4afb" alt="We have lift off - Rocket Loader GA is mobile!"><br>
<small>Photo by <a href="https://unsplash.com/@spacex?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">SpaceX</a> / <a href="https://unsplash.com/?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Unsplash</a></small></p>
<p>We initially launched Rocket Loader as a beta in June 2011, to asynchronously load a website’s JavaScript to dramatically improve the page load time. Since then, hundreds of thousands of our customers have benefited from a one-click option to boost the speed of your content.</p>
<p>With this release, we’ve vastly improved and streamlined Rocket Loader so that it works in conjunction with mobile &amp; desktop browsers to prioritise what matters most when loading a webpage: your content.</p>
<h3 id="visitorsdontwaitforpageload">Visitors don’t wait for page “load”</h3>
<p>To put it very simplistically - load time is a measure of when the browser has finished loading the document (HTML) and all assets referenced by that document.</p>
<p>When you clicked to visit this blog post, did you wait for the spinning wheel on your browser tab to start reading this content? You probably didn’t, and neither do your visitors. We’re conditioned to start consuming content as soon as it appears. However the industry had been focused on the load event timing too much, ignoring user perception &amp; behaviour. Data from Google analytics has shown that 53% of visits are abandoned if a mobile site takes longer than 3 seconds to load. This makes sense if you think about the last time you browsed onto a website in a hurry - if nothing is rendered quickly on screen you’re much more likely to go elsewhere.</p>
<p>Paint timing metrics are a closer approximation of how your users perceive speed. Put simply, paint timing measures when something is displayed on screen, and there are various stages of paint that can be measured &amp; reported which we’ll explain as we go.</p>
<h3 id="analysingyourperformance">Analysing your performance</h3>
<p>One of the ways in which you can learn more about your website’s performance is to use one of the many great synthetic analysis tools out there. The <a href="https://developers.google.com/web/tools/lighthouse/">Lighthouse</a> tool in Chrome can run a performance audit on your page simulating a typical mobile device &amp; connection. I ran this on Cloudflare’s homepage to illustrate the way the page loads over time:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/Lighthouse-performance-filmstrip-without-Rocket-Loader.png" alt="We have lift off - Rocket Loader GA is mobile!"></p>
<p>The First Meaningful Paint (FMP) takes 4.8 seconds on this mobile device simulation. FMP doesn’t measure the time of the first paint, but it waits for any web fonts to render and for the biggest above-the-fold layout change to happen. The red line drawn at 4.8 seconds shows our FMP in this test. So what can we do to improve?</p>
<h3 id="renderblockingscriptsareaproblemforpainttimes">Render blocking scripts are a problem for paint times</h3>
<p>Most tools will give you suggestions, and Lighthouse calls these opportunities and orders these by their estimated time saving:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/Lighthouse-performance-opportunities-without-Rocket-Loader.png" alt="We have lift off - Rocket Loader GA is mobile!"></p>
<p>Try running a lighthouse audit on your website and see what opportunities you get.</p>
<h4 id="themarchofthescripts">The march of the scripts</h4>
<p>Now is a good time to quantify the spread of JavaScript on the web. Using the excellent <a href="https://httparchive.org">HTTP Archive</a> project we can review the make up of the Alexa top 500k websites over the years. Using <a href="https://httparchive.org/reports/state-of-javascript?start=2011_06_01&amp;end=2018_05_01">their data</a>, we can see that the median number of JavaScripts on mobile has increased from 5, at Rocket Loader’s launch in June 2011, increasing nearly four fold to a whopping 19 JavaScripts in May 2018. So most of us have plenty of JavaScript on our websites and there’s a good chance it will be very high on the list of performance opportunities you can seize to improve your visitors’ experience.</p>
<p>Implementing this recommendation would require you to make changes to your origin application’s code to asynchronously load, defer or inline your scripts. In some cases this might not be possible because you don’t control your application’s code or have the expertise to implement these strategies. Rocket Loader to the rescue!</p>
<h3 id="howrocketloaderworks">How Rocket Loader works</h3>
<table>
<thead>
<tr>
<th>Without Rocket Loader</th>
<th>With Rocket Loader</th>
</tr>
</thead>
<tbody>
<tr>
<td><img src="https://blog.cloudflare.com/content/images/2018/02/Rocket-before.svg" alt="We have lift off - Rocket Loader GA is mobile!"></td>
<td><img src="https://blog.cloudflare.com/content/images/2018/02/Rocket-after.svg" alt="We have lift off - Rocket Loader GA is mobile!"></td>
</tr>
</tbody>
</table>
<p>New Rocket Loader prioritises paint time by locating the JavaScripts inside your HTML page and hiding them from the browser temporarily during the page load. This allows the browser to continue parsing the rest of your HTML and begin discovering other assets such as CSS &amp; images that are required to render your page. Once that has completed, Rocket Loader dynamically inserts the scripts back into the page and so the browser can load these.</p>
<h4 id="enablingrocketloader">Enabling Rocket Loader</h4>
<p>This bit is quite labour-intensive, so watch closely:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/Enabling-Rocket-Loader-animation.gif" alt="We have lift off - Rocket Loader GA is mobile!"></p>
<h3 id="measuringtheimpact">Measuring the impact</h3>
<p>Let’s run lighthouse again on our homepage now we have Rocket Loader enabled:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/Lighthouse-performance-fimstrip-with-Rocket-Loader.png" alt="We have lift off - Rocket Loader GA is mobile!"></p>
<p>So Lighthouse has detected that First Meaningful Paint is just over 1.5 seconds faster in this test - a really impressive improvement delivered from a single click!</p>
<p>To drive this home, the opportunity lighthouse identified is now officially a “passed audit”:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/Lighthouse-performance-audit-with-Rocket-Loader.png" alt="We have lift off - Rocket Loader GA is mobile!"></p>
<h4 id="letsgetrealusermeasurement">Let’s get Real (User Measurement)</h4>
<p>To measure this with real users &amp; devices, last week we ran a simple A/B test on www.cloudflare.com where 50% of page views were optimised using Rocket Loader so we could compare performance with and without it enabled. As shown by the lighthouse audits above, our main website is a great use-case for Rocket Loader because while we do use a lot of JavaScript for some interactive aspects of our pages, most important to our visitors is reading information about Cloudflare’s network, products &amp; features. So in short, content should be prioritised over JavaScript.</p>
<p>To illustrate the changes we observed, below is a graph of the Time To First Contentful Paint (TTFCP) for www.cloudflare.com visits by real users during our test. TTFCP measures the first time something in the Document Object Model (DOM) is painted on the page. For websites that are primarily for consumption of content rather than heavy interactions, this is a closer representation of a user’s perception of your website speed than measuring load time.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/06/Distribution-of-Time-To-First-Contentful-Paint-in-Baseline-vs-Rocket-Loader.png" alt="We have lift off - Rocket Loader GA is mobile!"></p>
<p>With Rocket Loader you can see those orange bars are grouped more to the left (faster) and higher meaning many more of our visitors were getting content on screen in under a second. In fact, the median improvement delivered by Rocket Loader during our www.cloudflare.com test lands at a 0.93 second reduction in Time To First Contentful paint or around a 45% improvement over the baseline. Boom!</p>
<h3 id="whatsnewinrocketloader">What’s new in Rocket Loader</h3>
<p>So Rocket Loader continues to drive performance improvements on JavaScript heavy websites, but lots has changed under the hood. Here is a summary of the key changes:</p>
<ul>
<li>Improves time to first paint speed not just load time</li>
<li>Now compatible with over 93% of mobile devices<sup class="footnote-ref"><a href="#fn1" id="fnref1">[1]</a></sup></li>
<li>Tiny! Less than 10% of the size of prior version</li>
<li>Reduced complexity &amp; better compatibility with your on-site &amp; 3rd party JavaScripts</li>
<li>Compliant with stricter Content Security Policies (CSP)</li>
</ul>
<h4 id="moremobileusersnowgetoptimised">More mobile users now get optimised</h4>
<p>We’ve already predicted that by the end of 2018 mobile usage will reach 60%. Additionally as of July 2018 Google will begin using page speed in <a href="https://webmasters.googleblog.com/2018/01/using-page-speed-in-mobile-search.html">mobile search ranking</a>. With that in mind providing a fast experience for mobile devices is more important than ever.</p>
<p>Rocket Loader beta was first launched back in 2012 at a time when mobile device usage on the web was around 15%. That version of Rocket Loader intercepted JavaScript on the page, and executed it in a virtual sandbox: a world familiar, but with behavior changed behind the scenes. Unfortunately, the technique we chose to create this virtual sandbox didn't work well on all mobile browsers. That was considered an undesirable but acceptable trade off 6 years ago, but today it’s vital customers can leverage this technique on mobile. Thanks to the reduced complexity of our approach, new Rocket Loader works on over 93%<sup class="footnote-ref"><a href="#fn1" id="fnref1:1">[1:1]</a></sup> of mobile devices in use today. For those devices that are not compatible, we simply deliver the website normally without this optimisation.</p>
<h4 id="leanermeaner">Leaner &amp; Meaner</h4>
<p>New Rocket Loader’s JS weighs in at a lightweight 2.3KB. We did some <a href="https://blog.cloudflare.com/making-page-load-even-faster/">extensive refactoring</a> during 2017 that reduced the size of the JS required to run Rocket Loader from 47KB to 32KB and saved a staggering 213 terabytes of transfer across the globe. Due to the simplicity of the way New Rocket Loader works rocket-loader.min.js resulting in a JS file that is less than 10% of the size, saving approximately another 417 Terabytes of transfer each year when Rocket Loader does its thing.<sup class="footnote-ref"><a href="#fn2" id="fnref2">[2]</a></sup></p>
<h4 id="compatibilitywithcontentsecuritypolicy">Compatibility with Content Security Policy</h4>
<p>New Rocket Loader does not modify the content of your JavaScript, it only changes the time at which it is loaded which means it plays nicely with any Content Security Policy (CSP) you have defined. With Rocket Loader beta, if you wanted to set a CSP that only allowed execution scripts hosted on your domain you would need to disable Rocket Loader as that would also combine &amp; load external JavaScripts through your domain. New Rocket Loader does not use this approach and instead lets the browser load &amp; cache the files normally. As we also enable HTTP/2 for all of our customers, any first party scripts will load over a single TCP connection, and 3rd party scripts are still asynchronously loaded meaning we can optimise their loading without proxying this content. All of this means modifying your CSP to accommodate Rocket Loader is as simple as allowing <code>script-src</code> for <code>https://ajax.cloudflare.com</code> so that Rocket Loader itself can load.</p>
<h3 id="howcanienablenewrocketloader">How can I enable new Rocket Loader?</h3>
<p>If you already have Rocket Loader enabled as of today your site is using the new version. You can modify your settings at any time by visiting the Speed section of your Cloudflare settings.</p>
<p>If you had Rocket Loader disabled or in Manual mode just click the button in the Speed section to turn Rocket Loader on.</p>
<h3 id="whatelsecanidowithcloudflaretooptimisemywebsite">What else can I do with Cloudflare to optimise my website?</h3>
<p>As always achieving good performance typically takes a variety of approaches and Rocket Loader tackles JavaScript specifically. There are some other very simple optimisations you should also ensure are enabled:</p>
<ul>
<li><strong>Caching</strong> - cache everything you can so content is served directly from any one of our 150+ data centres without waiting for your origin.</li>
<li><strong>Minify &amp; Compress</strong> - Enable minification of your HTML, CSS &amp; JS in your Speed settings to losslessly reduce the total byte size of your web pages and enable Brotli compression so browsers that support this new compression method receive smaller responses.</li>
<li><strong>Optimise your images</strong> - Polish will automatically reduce the size of images on your website with support for highly efficient formats such as WebP. You can also turn on Mirage to optimise images for mobile devices with poor connectivity.</li>
<li><strong>Use HTTP/2</strong> - You get HTTP/2 support automatically as long as your site is served over HTTPS. Move as much as your content as you can onto your Cloudflare enabled URLs so all of that content can be multiplexed down a single TCP connection.</li>
<li><strong>Use Argo &amp; Railgun</strong> - For dynamic content Argo and Railgun can help optimise the connection &amp; transfer between Cloudflare and your origin server.</li>
</ul>
<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>Rocket Loader utilises a browser API called <code>document.currentScript</code> which is currently supported by 93.7% of mobile devices and growing: <a href="https://caniuse.com/#feat=document-currentscript">https://caniuse.com/#feat=document-currentscript</a> <a href="#fnref1" class="footnote-backref">↩︎</a> <a href="#fnref1:1" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn2" class="footnote-item"><p>Back of the envelope calculations based on 270 million rocket-loader.min.js responses served per week with a 29.7KB saving per serving. <a href="#fnref2" class="footnote-backref">↩︎</a></p>
</li>
</ol>
</section>
</div>]]></content:encoded></item><item><title><![CDATA[Today we mitigated 1.1.1.1]]></title><description><![CDATA[Cloudflare is protected from attacks by the Gatebot DDoS mitigation pipeline. Gatebot performs hundreds of mitigations a day, shielding our infrastructure and our customers from L3 and L7 attacks. ]]></description><link>https://blog.cloudflare.com/today-we-mitigated-1-1-1-1/</link><guid isPermaLink="false">5b106ca4fbb01000bfe54a78</guid><category><![CDATA[Reliability]]></category><category><![CDATA[Post Mortem]]></category><category><![CDATA[Mitigation]]></category><category><![CDATA[1.1.1.1]]></category><category><![CDATA[DNS]]></category><dc:creator><![CDATA[Marek Majkowski]]></dc:creator><pubDate>Fri, 01 Jun 2018 01:13:53 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/06/gatebot-stats.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/06/gatebot-stats.png" alt="Today we mitigated 1.1.1.1"><p>On May 31, 2018 we had a 17 minute outage on our 1.1.1.1 resolver service; this was our doing and not the result of an attack.</p>
<p>Cloudflare is protected from attacks by the Gatebot DDoS mitigation pipeline. Gatebot performs hundreds of mitigations a day, shielding our infrastructure and our customers from L3/L4 and L7 attacks. Here is a chart of a count of daily Gatebot actions this year:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/gatebot-stats.png" alt="Today we mitigated 1.1.1.1"></p>
<p>In the past, we have blogged about our systems:</p>
<ul>
<li><a href="https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-us-to-sleep/">Meet Gatebot, a bot that allows us to sleep</a></li>
</ul>
<p>Today, things didn't go as planned.</p>
<h3 id="gatebot">Gatebot</h3>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/gatebot-parts.png" alt="Today we mitigated 1.1.1.1"></p>
<p>Cloudflare’s network is large, handles many different types of traffic and mitigates different types of known and not-yet-seen attacks. The Gatebot pipeline manages this complexity in three separate stages:</p>
<ul>
<li><em>attack detection</em> - collects live traffic measurements across the globe and detects attacks</li>
<li><em>reactive automation</em> - chooses appropriate mitigations</li>
<li><em>mitigations</em> - executes mitigation logic on the edge</li>
</ul>
<p>The benign-sounding &quot;reactive automation&quot; part is actually the most complicated stage in the pipeline. We expected that from the start, which is why we implemented this stage using a custom <a href="https://en.wikipedia.org/wiki/Functional_reactive_programming">Functional Reactive Programming (FRP)</a> framework. If you want to know more about it, see <a href="https://idea.popcount.org/2016-02-01-enigma---building-a-dos-mitigation-pipeline/">the talk</a> and <a href="https://speakerdeck.com/majek04/gatelogic-somewhat-functional-reactive-framework-in-python">the presentation</a>.</p>
<p>Our mitigation logic often combines multiple inputs from different internal systems, to come up with the best, most appropriate mitigation. One of the most important inputs is the metadata about our IP address allocations: we mitigate attacks hitting HTTP and DNS IP ranges differently. Our FRP framework allows us to express this in clear and readable code. For example, this is part of the code responsible for performing DNS attack mitigation:</p>
<pre><code class="language-python">def action_gk_dns(...):

    [...]

    if port != 53:
        return None

    if whitelisted_ip.get(ip):
        return None

    if ip not in ANYCAST_IPS:
        return None
        [...] 
</code></pre>
<p>It's the last check in this code that we tried to improve today.</p>
<p>Clearly, the code above is a huge oversimplification of all that goes into attack mitigation, but making an early decision about whether the attacked IP serves DNS traffic or not is important. It's that check that went wrong today. If the IP does serve DNS traffic then attack mitigation is handled differently from IPs that never serve DNS.</p>
<h3 id="cloudflareisgrowingsomustgatebot">Cloudflare is growing, so must Gatebot</h3>
<p>Gatebot was created in early 2015. Three years may not sound like much time, but since then we've grown dramatically and added layers of services to our software stack. Many of the internal integration points that we rely on today didn't exist then.</p>
<p>One of them is what we call the <em>Provision API</em>. When Gatebot sees an IP address, it needs to be able to figure out whether or not it’s one of Cloudflare’s addresses. <em>Provision API</em> is a simple RESTful API used to provide this kind of information.</p>
<p>This is a relatively new API, and prior to its existence, Gatebot had to figure out which IP addresses were Cloudflare addresses by reading a list of networks from a hard-coded file. In the code snippet above, the <em>ANYCAST_IPS</em> variable is populated using this file.</p>
<h3 id="thingswentwrong">Things went wrong</h3>
<p>Today, in an effort to reclaim some technical debt, we deployed new code that introduced Gatebot to <em>Provision API</em>.</p>
<p>What we did not account for, and what <em>Provision API</em> didn’t know about, was that <a href="https://blog.cloudflare.com/dns-resolver-1-1-1-1/">1.1.1.0/24 and 1.0.0.0/24</a> are special IP ranges. Frankly speaking, almost every IP range is &quot;special&quot; for one reason or another, since our IP configuration is rather complex. But our recursive DNS resolver ranges are even more special: they are relatively new, and we're using them in a very unique way. Our hardcoded list of Cloudflare addresses contained a manual exception specifically for these ranges.</p>
<p>As you might be able to guess by now, we didn't implement this manual exception while we were doing the integration work. Remember, the whole idea of the fix was to remove the hardcoded gotchas!</p>
<h3 id="impact">Impact</h3>
<p>The effect was that, after pushing the new code release, our systems interpreted the resolver traffic as an attack. The automatic systems deployed DNS mitigations for our DNS resolver IP ranges for 17 minutes, between 17:58 and 18:13 May 31st UTC. This caused 1.1.1.1 DNS resolver to be globally inaccessible.</p>
<h3 id="lessonslearned">Lessons Learned</h3>
<p>While Gatebot, the DDoS mitigation system, has great power, we failed to test the changes thoroughly. We are using today’s incident to improve our internal systems.</p>
<p>Our team is incredibly proud of 1.1.1.1 and Gatebot, but today we fell short. We want to apologize to all of our customers. We will use today’s incident to improve. The next time we mitigate 1.1.1.1 traffic, we will make sure there is a legitimate attack hitting us.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Introducing: The Cloudflare All-Stars Fantasy League]]></title><description><![CDATA[Baseball season is well underway, and to celebrate, we're excited to introduce the Cloudflare All-Stars Fantasy League.]]></description><link>https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/</link><guid isPermaLink="false">5b0451655fd79500bfb83ba3</guid><category><![CDATA[Design]]></category><category><![CDATA[Cloudflare Team]]></category><category><![CDATA[Fun]]></category><dc:creator><![CDATA[Jessica Rosenberg]]></dc:creator><pubDate>Tue, 22 May 2018 19:09:54 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/05/All-Starts_Twitter--1--01-2.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/05/All-Starts_Twitter--1--01-2.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><p><img src="https://blog.cloudflare.com/content/images/2018/05/AllStarsAsset-12@2x.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=CloudflareAllStars&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/YHl6mfq48l" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #CloudflareAllStars</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>Baseball season is well underway, and to celebrate, we're excited to introduce the Cloudflare All-Stars Fantasy League: a group of fictitious sports teams that revolve around some of Cloudflare’s most championed products and services. Their mission? To help build a better Internet.</p>
<p>Cloudflare HQ is located just a block away from the San Francisco Giants Stadium. Each time there's a home game, crowds of people walk past Cloudflare's large 2nd street windows and peer in to the office space. The looks in their eyes scream: &quot;Cloudflare! Teach me about your products while giving me something visually stimulating to look at!&quot;</p>
<p>They asked. We listened.</p>
<p>The design team saw a creative opportunity, seized it, and hit it out of the park. Inspired by the highly stylized sports badges and emblems of some real-life sports teams, we applied this visual style to our own team badges. We had a lot of fun coming up with the team names, as well as figuring out which visuals to use for each.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-22-at-12.01.47-PM.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"></p>
<p>For the next few months, the Cloudflare All-Stars teams will be showcased within the large Cloudflare HQ windows facing 2nd street and en route to Giants Stadium. Feel free to swing by on your way to the next Giants game, snap a pic and share with your fans.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-22-at-11.57.50-AM.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"></p>
<p>You can also show the teams support by Tweeting out their hashtag, along with the images provided for each. Go Team Internet!</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/ddosAsset-4@2x.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=DDoSDefenders&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/0slgLm46fs" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #DDoSDefenders</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><strong>The Distributed Denial of Service (DDoS) Defenders</strong> are strong and undefeated. They have a flawless record of batting away malicious DDoS attacks that target  millions of websites and APIs around the globe. (see <a href="https://www.cloudflare.com/ddos/">DDoS Protection</a>) #DDoSDefenders</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/AthenasAsset-11@2x.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=AthenianProjectAthenas&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/JR7L5z4y9P" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #AthenianProjectAthenas</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><strong>Team Athenas</strong> is the team for the people: they ensure that U.S. State, County, and Municipal election websites stay online for free, no matter what kind of gnarly pitches get thrown their way. (see <a href="https://www.cloudflare.com/athenian-project/">Athenian Project</a>) #AthenianProjectAthenas</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/ArgoAsset-5@2x.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=ArgoArgonauts&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/InsNL2Dkgs" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #ArgoArgonauts</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><strong>The Argo Argonauts</strong> know how to throw the fastest pitches for routing your traffic across the Internet. (see <a href="https://www.cloudflare.com/products/argo-smart-routing/">Argo Smart Routing</a>). #ArgoArgonauts</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/WAFAsset-6@2x.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=WAFMasons&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/Nndy3dgXsz" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #WAFMasons</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><strong>The Web Application Firewall (WAF) Masons</strong> are the firefighters of the Internet — there is no fire too big for this team to put out. (See <a href="https://www.cloudflare.com/waf/">Web Application Firewall</a>) #WAFMasons</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/WorkersAsset-8@2x.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=WorkersBees&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/bdzEsIy7C8" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #WorkersBees</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><strong>The Workers Bees</strong> are the efficient go-getter team that can help you do the impossible on the Edge! These workers can help get anything done from detecting malicious bots to filtering logic at the Edge. (See <a href="https://www.cloudflare.com/products/cloudflare-workers/">Workers</a>) #WorkersBees</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/All-Starts_Twitter--1--07-1.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=StreamRapids&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/OGpStDmM90" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #StreamRapids</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><strong>The Stream Rapids</strong> are up to bat and ready to knock fast and speedy video hits out of the park, and across the Internet! (see <a href="https://www.cloudflare.com/products/cloudflare-stream/">Stream</a>) #StreamRapids</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/CDNAsset-9@2x.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=CDNPackets&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/JMvsHQYHvB" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #CDNPackets</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><strong>The CDN Packets</strong> are a team of fast &amp; strong International players — with over 150 teammates (data centers) around the world, they guarantee web content gets delivered safely and as fast as possible. (See <a href="https://www.cloudflare.com/cdn/">Cloudflare CDN</a>) #CDNPackets</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/ResolversAsset-10@2x.png" alt="Introducing: The Cloudflare All-Stars Fantasy League"><a href="https://twitter.com/intent/tweet?button_hashtag=DNSResolvers&ref_src=twsrc%5Etfw" class="twitter-hashtag-button" data-size="large" data-text="pic.twitter.com/KHQFzJZIAU" data-url="https://blog.cloudflare.com/introducing-the-cloudflare-all-star-fantasy-league/" data-show-count="false">Tweet #DNSResolvers</a><script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script><strong>The DNS Resolvers</strong> are here to lead the way to anywhere you want to go on the Internet! They're the fastest in the game and they're out to keep you (and your data) safe and private through the journey. (See <a href="https://1.1.1.1">1.1.1.1</a>) #DNSResolvers</p>
<p>Brought to you by the <a href="https://dribbble.com/Cloudflare">Cloudflare Brand Design team</a> 🎨</p>
<p><a href="http://twitter.com/jessperate">Jess</a>, <a href="http://twitter.com/kkblinder">Kari</a>, <a href="http://twitter.com/Londonzhang1">London</a>, <a href="http://twitter.com/drewmherron">Drew</a> &amp; <a href="http://twitter.com/tinycrowbar">Jenny</a></p>
</div>]]></content:encoded></item><item><title><![CDATA[Rate Limiting: Delivering more rules, and greater control]]></title><description><![CDATA[With more and more platforms taking the necessary precautions against DDoS attacks like integrating DDoS mitigation services and increasing bandwidth at weak points, Layer 3 and 4 attacks are just not as effective anymore. ]]></description><link>https://blog.cloudflare.com/rate-limiting-delivering-more-rules-and-greater-control/</link><guid isPermaLink="false">5b0300bd5fd79500bfb83b8f</guid><category><![CDATA[Rate Limiting]]></category><category><![CDATA[Product News]]></category><category><![CDATA[Reliability]]></category><category><![CDATA[Performance]]></category><category><![CDATA[Attacks]]></category><category><![CDATA[DDoS]]></category><category><![CDATA[Mitigation]]></category><dc:creator><![CDATA[Alex Cruz Farmer]]></dc:creator><pubDate>Mon, 21 May 2018 20:41:37 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-21-at-10.36.27-AM-1.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-21-at-10.36.27-AM-1.png" alt="Rate Limiting: Delivering more rules, and greater control"><p>With more and more platforms taking the necessary precautions against DDoS attacks like integrating DDoS mitigation services and increasing bandwidth at weak points, Layer 3 and 4 attacks are just not as effective anymore. For Cloudflare, we have fully automated Layer 3/4 based protections with our internal platform, <a href="https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-us-to-sleep/">Gatebot</a>.  In the last 6 months we have seen a large upward trend of Layer 7 based DDoS attacks. The key difference to these attacks is they are no longer focused on using huge payloads (volumetric attacks), but based on Requests per Second to exhaust server resources (CPU, Disk and Memory). On a regular basis we see attacks that are over 1 million requests per second. The graph below shows the number of Layer 7 attacks Cloudflare has monitored, which is trending up. On average seeing around 160 attacks a day, with some days spiking up to over 1000 attacks.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-21-at-10.36.27-AM.png" alt="Rate Limiting: Delivering more rules, and greater control"></p>
<p>A year ago, Cloudflare released <a href="https://blog.cloudflare.com/rate-limiting/">Rate Limiting</a> and it is proving to be a hugely effective tool for customers to protect their web applications and APIs from all sorts of attacks, from “low and slow” DDoS attacks, through to bot-based attacks, such as credential stuffing and content scraping. We’re pleased about the success our customers are seeing with Rate Limiting and are excited to announce additional capabilities to give our customers further control.</p>
<h3 id="sowhatschanging">So what’s changing?</h3>
<p>There are times when you clearly know that traffic is malicious.  In cases like this, our existing Block action is proving effective for our customers.  But there are times when it is not the best option, and causes a negative user experience.  Rather than risk a false negative, customers often want to challenge a client to ensure they are who they represent themselves to be, which is in most situations, human not a bot.</p>
<p><strong>Firstly</strong>, to help customers more accurately identify the traffic, we are adding Cloudflare JavaScript Challenge, and Google reCaptcha (Challenge) mitigation actions to the UI and API for Pro and Business plans. The existing Block and Simulate actions still exist. As a reminder, to test any rule, deploying in Simulate means that you will not be charged for any requests. This is a great way to test your new rules to make sure they have been configured correctly.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-21-at-10.36.39-AM.png" alt="Rate Limiting: Delivering more rules, and greater control"><br>
<strong>Secondly</strong>, we’re making Rate Limiting more dynamically scalable. A new feature has been added which allows Rate Limiting to count on Origin Response Headers for Business and Enterprise customers. The way this feature works is by matching attributes which are returned by the Origin to Cloudflare.</p>
<h3 id="thenewcapabilitiesinaction">The new capabilities - in action!</h3>
<p>One of the things that really drives our innovation is solving the real problems we hear from customers every day.  With that, we wanted to provide some real world examples of these new capabilities in action.</p>
<p>Each of the use cases have Basic and Advanced implementation options.  After some testing, we found that tiering rate limits is an extremely effective solution against repeat offenders.</p>
<p><strong>Credential Stuffing Protection</strong> for Login Pages and APIs. The best way to build applications is to utilise the standardized Status Codes. For example, if I fail to authenticate against an endpoint or a website, I should receive a “401” or “403”. Generally speaking a user to a website will often get their password wrong three times before selecting the “I forgot my password” option. Most Credential Stuff bots will try thousands of times cycling through many usernames and password combinations to see what works.</p>
<p>Here are some example rate limits which you can configure to protect your application from credential stuffing.</p>
<p><strong>Basic</strong>:<br>
Cloudflare offers a “Protect My Login” feature out the box.  Enter the URL for your login page and Cloudflare will create a rule such that clients that attempt to log in more than 5 times in 5 minutes will be blocked for 15 minutes.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-21-at-10.36.47-AM.png" alt="Rate Limiting: Delivering more rules, and greater control"></p>
<p>With the new Challenge capabilities of Rate Limiting, you can customize the response parameters for log in to more closely match the behavior pattern for bots you see on your site through a custom built rule.</p>
<p>Logging in four times in one minute is hard - I type fast, but couldn’t even do this.  If I’m seeing this pattern in my logs, it is likely a bot.  I can now create a Rate Limiting rule based on the following criteria:</p>
<style type="text/css">
    .table-with-last-column-right-aligned tr td:last-child {
      text-align: right;
    }
</style>
<table class="table-with-last-column-right-aligned">
    <tbody>
        <tr>
            <th>RuleID
            </th><th>URL
            </th><th>Count
            </th><th>Timeframe            
            </th><th>Matching Criteria
            </th><th>Action </th></tr>
        <tr>
            <td>1</td>
            <td>/login</td>
            <td>4</td>
            <td>1 minute</td>
            <td>Method: POST<br>Status Code: 401,403</td>
            <td>Challenge</td>
        </tr>
</tbody></table>
<p>With this new rule, if someone tries to log in four times within a minute, they will be thrown a challenge.  My regular human users will likely never hit it, but if they do - the challenge insures they can still access the site.</p>
<p><strong>Advanced</strong>:<br>
And sometimes bots are just super persistent in their attacks.  We can tier rules together to tackle repeat offenders. For example, instead of creating just a single rule, we can create a series of rules which can be tiered to protect against persistent threats:</p>
<style type="text/css">
    .table-with-last-column-right-aligned tr td:last-child {
      text-align: right;
    }
</style>
<table class="table-with-last-column-right-aligned">
    <tbody>
        <tr>
            <th>RuleID
            </th><th>URL
            </th><th>Count
            </th><th>Timeframe            
            </th><th>Matching Criteria
            </th><th>Action </th></tr>
        <tr>
            <td>1</td>
            <td>/login</td>
            <td>4</td>
            <td>1 minute</td>
            <td>Method: POST<br>Status Code: 401,403</td>
            <td>JavaScript Challenge</td>
        </tr>
        <tr>
            <td>2</td>
            <td>/login</td>
            <td>10</td>
            <td>5 minutes</td>
            <td>Method: POST<br>Status Code: 401,403</td>
            <td>Challenge</td>
        </tr>
        <tr>
            <td>3</td>
            <td>/login</td>
            <td>20</td>
            <td>1 hour</td>
            <td>Method: POST<br>Status Code: 401,403</td>
            <td>Block for 1 day</td>
        </tr>
</tbody></table>
<p>With this type of tiering, any genuine users that are just having a hard time remembering their login details whilst also being extremely fast typers will not be fully blocked. Instead, they will first be given out automated JavaScript challenge followed by a traditional CAPTCHA if they hit the next limit. This is a much more user-friendly approach while still securing your login endpoints.</p>
<h4 id="timebasedfirewall">Time-based Firewall</h4>
<p>Our IP Firewall is a powerful feature to block problematic IP addresses from accessing your app.  Particularly this is related to repeated abuse, or based on IP Reputation or Threat Intelligence feeds that are integrated at the origin level.</p>
<p>While the the IP firewall is powerful, maintaining and managing a list of IP addresses which are currently being blocked can be cumbersome.  It becomes more complicated if you want to allow blocked IP addresses to “age out” if bad behavior stops after a period of time.  This often requires authoring and managing a script and multiple API calls to Cloudflare.</p>
<p>The new Rate Limiting Origin Headers feature makes this all so much easier.  You can now configure your origin to respond with a Header to trigger a Rate-Limit. To make this happen, we need to generate a Header at the Origin, which is then added to the response to Cloudflare. As we are matching on a static header, we can set a severity level based on the content of the Header. For example, if it was a repeat offender, you could respond with High as the Header value, which could Block for a longer period.</p>
<p>Create a Rate Limiting rule based on the following criteria:</p>
<table class="table-with-last-column-right-aligned">
    <tbody>
        <tr>
            <th>RuleID
            </th><th>URL
            </th><th>Count
            </th><th>Timeframe            
            </th><th>Matching Criteria
            </th><th>Action </th></tr>
        <tr>
            <td>1</td>
            <td>*</td>
            <td>1</td>
            <td>1 second</td>
            <td>Method: _ALL_<br>Header: X-CF-Block = low</td>
            <td>Block for 5 minutes</td>
        </tr>
        <tr>
            <td>2</td>
            <td>*</td>
            <td>1</td>
            <td>1 second</td>
            <td>Method: _ALL_<br>Header: X-CF-Block = medium</td>
            <td>Block for 15 minutes</td>
        </tr>
         <tr>
            <td>3</td>
            <td>*</td>
            <td>1</td>
            <td>1 second</td>
            <td>Method: _ALL_<br>Header: X-CF-Block = high</td>
            <td>Block for 60 minutes</td>
        </tr>
</tbody></table>
<p>Once that Rate-Limit has been created, Cloudflare’s Rate-Limiting will then kick-in immediately when that Header is received.</p>
<h4 id="enumerationattacks">Enumeration Attacks</h4>
<p>Enumeration attacks are proving to be increasingly popular and pesky to mitigate.  With enumeration attacks, attackers identify an expensive operation in your app and hammer at it to tie up resources and slow or crash your app.  For example, an app that offers the ability to look up a user profile requires a database lookup to validate whether the user exists. In a enumeration attack, attackers will send a random set of characters to that endpoint in quick succession, causing the database to ground to a halt.</p>
<p>Rate Limiting to the rescue!</p>
<p>One of our customers was hit with a huge enumeration attack on their platform earlier this year, where the aggressors were trying to do exactly what we described above, in an attempt to overload their database platform. Their Rate Limiting configuration blocked over 100,000,000 bad requests during the 6 hour attack.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-21-at-10.36.57-AM.png" alt="Rate Limiting: Delivering more rules, and greater control"></p>
<p>When a query is sent to the app, and the user is not found, the app serves a 404 (page not found) .  A very basic approach is to set a rate limit for 404s.  If a user crosses a threshold of 404’s in a period of time, set the app to challenge the user to prove themselves to be a real person.</p>
<table class="table-with-last-column-right-aligned">
    <tbody>
        <tr>
            <th>RuleID
            </th><th>URL
            </th><th>Count
            </th><th>Timeframe            
            </th><th>Matching Criteria
            </th><th>Action </th></tr>
        <tr>
            <td>1</td>
            <td>*</td>
            <td>10</td>
            <td>1 minute</td>
            <td>Method: GET<br>Status Code: 404</td>
            <td>Challenge</td>
        </tr>
</tbody></table>
<p>To catch repeat offenders, you can tier the tier Rate Limits:</p>
<table class="table-with-last-column-right-aligned">
    <tbody>
        <tr>
            <th>RuleID
            </th><th>URL
            </th><th>Count
            </th><th>Timeframe            
            </th><th>Matching Criteria
            </th><th>Action </th></tr>
        <tr>
            <td>1</td>
            <td>/public/profile*</td>
            <td>10</td>
            <td>1 minute</td>
            <td>Method: GET<br>Status Code: 404</td>
            <td>JavaScript Challenge</td>
        </tr>
        <tr>
            <td>2</td>
            <td>/public/profile*</td>
            <td>25</td>
            <td>1 minute</td>
            <td>Method: GET<br>Status Code: 200</td>
            <td>Challenge</td>
        </tr>
         <tr>
            <td>3</td>
            <td>/public/profile*</td>
            <td>50</td>
            <td>10 minutes</td>
            <td>Method: GET<br>Status Code: 200, 404</td>
            <td>Block for 4 hours</td>
        </tr>
</tbody></table>
<p>With this type of tiered defense in place, it means that you can “caution” an offender with a JavaScript challenge or Challenge (Google Captcha), and then “block” them if they continue.</p>
<h4 id="contentscraping">Content Scraping</h4>
<p>Increasingly, content owners are wrestling with content scraping - malicious bots copying copyrighted images or assets and redistributing or reusing them.  For example, we work with an eCommerce store that uses copyrighted images and their images are appearing elsewhere on the web without their consent. Rate Limiting can help!</p>
<p>In their app, each page displays 4 copyrighted images, 1 which is actual size, and 3 which are thumbnails. By looking at logs and user patterns, they determined that most users, at a stretch, would never view more than 10-15 products in a minute, which would equate to 40-60 loads from the images store.</p>
<p>They chose to tier their Rate Limiting rules to prevent end users from getting unnecessarily blocked when they were browsing heavily. To block malicious attempts at content scraping can be quite simple, however it does require some forward planning. Placing the rate limit on the right URL is key to insure you are placing the rule on exactly what you are trying to protect and not the broader content.  Here’s an example set of rate limits this customer set to protect their images:</p>
<table class="table-with-last-column-right-aligned">
    <tbody>
        <tr>
            <th>RuleID
            </th><th>URL
            </th><th>Count
            </th><th>Timeframe            
            </th><th>Matching Criteria
            </th><th>Action </th></tr>
        <tr>
            <td>1</td>
            <td>/img/thumbs/*</td>
            <td>10</td>
            <td>1 minute</td>
            <td>Method: GET<br>Status Code: 404</td>
            <td>Challenge</td>
       </tr><tr>
            <td>2</td>
            <td>/img/thumbs/*</td>
            <td>25</td>
            <td>1 minute</td>
            <td>Method: GET<br>Status Code: 200</td>
            <td>Challenge</td>
        </tr>
        <tr>
            <td>3</td>
            <td>/img/*</td>
            <td>75</td>
            <td>1 minute</td>
            <td>Method: GET<br>Status Code: 200</td>
            <td>Block for 4 hours</td>
        </tr>
        <tr>
            <td>4</td>
            <td>/img/*</td>
            <td>5</td>
            <td>1 minute</td>
            <td>Method: GET<br>Status Code: 403, 404</td>
            <td>Challenge</td>
        </tr>
</tbody></table>
<p>As we can see here, rules 1 and 2 are counting based on the number of requests to each endpoint. Rule 3 is counting based on all hits to the image store, and if it gets above 75 requests, the user will be blocked for 4 hours. Finally, to avoid any enumeration or bots guessing image names and numbers, we are counting on 404 and 403s and challenging if we see an unusual spikes.</p>
<h3 id="onemorethingmorerulestotallyrules">One more thing ... more rules, <em>totally rules!</em></h3>
<p>We want to ensure you have the rules you need to secure your app.   To do that, we are increasing the number of available rules for Pro and Business, for no additional charge.</p>
<ul>
<li>Pro plans increase from 3 to 10 rules</li>
<li>Business plans increase from 3 to 15 rules</li>
</ul>
<p>As always, Cloudflare only charges for good traffic - requests that are allowed through Rate Limiting, not blocked. For more information click <a href="https://support.cloudflare.com/hc/en-us/articles/115000272247-Billing-for-Cloudflare-Rate-Limiting">here</a>.</p>
<p>The Rate-Limiting feature can be enabled within the Firewall tab on the Dashboard, or by visiting: <a href="https://www.cloudflare.com/a/firewall/">cloudflare.com/a/firewall</a></p>
</div>]]></content:encoded></item><item><title><![CDATA[Why I'm Joining Cloudflare]]></title><description><![CDATA[Back in 2002 a mentor told me, “You have two rewarding but very different paths: you can prosecute one bad actor at a time, or you can try to build solutions that take away many bad actors' ability to do harm at all.” ]]></description><link>https://blog.cloudflare.com/why-im-joining-cloudflare/</link><guid isPermaLink="false">5afc79ec5fd79500bfb83b7b</guid><category><![CDATA[People]]></category><category><![CDATA[Security]]></category><dc:creator><![CDATA[Joe Sullivan]]></dc:creator><pubDate>Wed, 16 May 2018 18:43:42 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-16-at-11.47.12-AM-1.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-16-at-11.47.12-AM-1.png" alt="Why I'm Joining Cloudflare"><p>I love working as a Chief Security Officer because every day centers around building something that makes people safer. Back in 2002, as I considered leaving my role as a cybercrime federal prosecutor to work in tech on e-commerce trust and safety, a mentor told me, “You have two rewarding but very different paths: you can prosecute one bad actor at a time, or you can try to build solutions that take away many bad actors' ability to do harm at all.” And while each is rewarding in its own way, my best days are those where I get to see harm prevented—at Internet scale.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/IMG_8125-1.JPG" alt="Why I'm Joining Cloudflare"></p>
<p>In 2016, while traveling the United States to conduct hearings on the condition of Internet security as a member of President Obama's <a href="https://www.whitehouse.gov/the-press-office/2016/04/13/president-obama-announces-more-key-administration-posts">cyber commission</a>, my co-commissioners noticed I had fallen into a pattern of asking the same question of every panelist: “Who is responsible for building a safer online environment where small businesses can set up shop without fear?” We heard many answers that all led to the same “not a through street” conclusion: Most law enforcement agencies extend their jurisdiction online, but there are no digital equivalents to the Department of Transportation or National Highway Traffic Safety Administration and limited government technical contribution to building a safer environment.</p>
<p>I grew frustrated because I believe we need to invest as much in the upkeep of the sidewalks of the Internet as we do on making every street corner safer. The Internet may be the only context where governments spend less on preventing harm than they do on punishing misbehavior. It is certainly the only context where developing businesses are left to their own devices to fend off nation states—and then potentially chastised by regulators if they fail to do it well.</p>
<p>I've had the good fortune to serve on some of the best Internet security teams in the world at eBay, Facebook, and Uber—and have still fallen short of reaching an ideal state of security. Governments and larger companies have the resources and talent to face the daunting challenges of operating online, but good security is hard. If it is a challenge for them, small businesses and most individuals simply don't stand a chance.</p>
<p>With these conclusions weighing on me, my next step professionally had to be towards a team that pushes security out, proactively, to as much of the Internet as possible. It has to be a place thats mission is to help build a better Internet—so that people can step online confidently and launch their own businesses without fear.</p>
<p>I did not think I would find a company that matches my passion for securing the whole Internet—security is often an ancillary feature worthy of investment because strong security will drive brand loyalty and customer trust or differentiate a company from competitors. I've been lucky in the past to join companies with Internet-breadth challenges willing to work proactively and collaboratively on security, yet know those opportunities are few and far between. But when I met the leadership team at Cloudflare, I was amazed to learn how what had started with a focus on mitigating denial of service attacks had grown quickly into so much more—because their strong technology not only makes internet properties safer, it makes them faster. Their product innovation mirrors my physical-world streets analogy—helping to build a better online infrastructure creates better and safer opportunities for everyone in the community.</p>
<p>The team at Cloudflare seem to really embrace their mission of helping build a better Internet. They have certainly approached things differently—launching free versions of security products even in their earliest days to anyone operating a website, mobilizing <a href="https://www.cloudflare.com/galileo/">Project Galileo</a> to help those at risk of losing their voice online, and recently <a href="https://blog.cloudflare.com/announcing-1111/">launching 1.1.1.1</a> to help everyone with better privacy and connectivity. For such a young company, they have done a lot of good already. I am so thrilled to join and learn from them, and hopefully help them continue to expand their efforts to prevent harm—at Internet scale.</p>
<p>Joe Sullivan</p>
<p>P.S. I would be remiss if I did not mention the <a href="https://www.cloudflare.com/careers/">security team at Cloudflare is hiring</a>! If you too want to work on some of the most technically challenging and rewarding security issues the world can offer, let me know!</p>
</div>]]></content:encoded></item><item><title><![CDATA[You get TLS 1.3! You get TLS 1.3! Everyone gets TLS 1.3!]]></title><description><![CDATA[It's no secret that Cloudflare has been a big proponent of TLS 1.3, the newest edition of the TLS protocol that improves both speed and security, since we have made it available to our customers starting in 2016. ]]></description><link>https://blog.cloudflare.com/you-get-tls-1-3-you-get-tls-1-3-everyone-gets-tls-1-3/</link><guid isPermaLink="false">5afc28f05fd79500bfb83b72</guid><category><![CDATA[SSL/TLS]]></category><category><![CDATA[TLS]]></category><category><![CDATA[TLS 1.3]]></category><category><![CDATA[Security]]></category><category><![CDATA[Product News]]></category><dc:creator><![CDATA[Alessandro Ghedini]]></dc:creator><pubDate>Wed, 16 May 2018 17:28:07 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/05/you-get-tls-3-1.jpg" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/05/you-get-tls-3-1.jpg" alt="You get TLS 1.3! You get TLS 1.3! Everyone gets TLS 1.3!"><p>It's no secret that Cloudflare has been a big proponent of <a href="https://blog.cloudflare.com/introducing-tls-1-3/">TLS 1.3</a>, the newest edition of the TLS protocol that improves both speed and security, since we have made it available to our customers starting in 2016. However, for the longest time TLS 1.3 has been a work-in-progress which meant that the feature was disabled by default in our customers’ dashboards, at least until <a href="https://blog.cloudflare.com/why-tls-1-3-isnt-in-browsers-yet/">all the kinks</a> in the protocol could be resolved.</p>
<p>With the specification <a href="https://www.ietf.org/mail-archive/web/tls/current/msg25837.html">finally nearing its official publication</a>, and after several years of work (as well as 28 draft versions), we are happy to announce that the TLS 1.3 feature on Cloudflare is out of beta and will be enabled by default for all new zones.</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/Screen-Shot-2018-05-23-at-8.49.33-AM.png" alt="You get TLS 1.3! You get TLS 1.3! Everyone gets TLS 1.3!"><br>
<small>Custom image derived from <a href="https://youtu.be/8CAscBCdaQg?t=1m48s">YouTube video</a> courtesy of <a href="https://www.youtube.com/user/OWN">OWN</a></small></p>
<p>For our Free and Pro customers not much changes, they already had TLS 1.3 enabled by default from the start. We have also decided to disable the <a href="https://blog.cloudflare.com/introducing-0-rtt/">0-RTT feature</a> by default for these plans (it was previously enabled by default as well), due to <a href="https://twitter.com/grittygrease/status/991750903295164416">its inherent security properties</a>. It will still be possible to explicitly enable it from the dashboard or the API (more on 0-RTT soon-ish in another blog post).</p>
<p>Our Business and Enterprise customers will now also get TLS 1.3 enabled by default for new zones (but will continue to have 0-RTT disabled). For existing Business customers that haven't made an explicit choice (that is, they haven't turned the feature on or off manually), we are also retroactively turning TLS 1.3 on.</p>
<h3 id="whathappenedtothemiddleboxes">What happened to the middleboxes?</h3>
<p>Back in December <a href="https://blog.cloudflare.com/why-tls-1-3-isnt-in-browsers-yet/">we blogged about why TLS 1.3 still wasn't being widely adopted</a>, the main reason being non-compliant middleboxes, network appliances designed to monitor and sometimes intercept HTTPS traffic.</p>
<p>Due to the fact that the TLS protocol hasn’t been updated for a long time (TLS 1.2 came out back in 2008, with fairly minimal changes compared to TLS 1.1), wrong assumptions about the protocol made by these appliances meant that some of the more invasive changes in TLS 1.3, which broke those assumptions, caused the middleboxes to misbehave, in the worst cases causing TLS connections passing through them to break.</p>
<p>Since then, new draft versions of the protocol have been discussed and published, providing additional measures (on top of the ones already adopted, like the “supported_versions” extension) to mitigate the impact caused by middleboxes. How?, you ask. The trick was to modify the TLS 1.3 protocol to look more like previous TLS versions, but without impacting the improved performance and security benefits the new version provides.</p>
<p>For example, the ChangeCipherSpec handshake message, which in previous versions of the protocol was used to notify the receiving party that subsequent records would be encrypted, was originally removed from TLS 1.3 since it had no purpose in the protocol anymore after the handshake algorithm was streamlined, but in order to avoid confusing middleboxes that expected to see the message on the wire, it was reintroduced even though the receiving endpoint will just ignore it.</p>
<p>Another point of contention was the fact that some middleboxes expect to see the Certificate messages sent by servers (usually to identify the end server, sometimes with nefarious purposes), but since TLS 1.3 moved that message to the encrypted portion of the handshake, it became invisible to the snooping boxes. The trick there was to make the TLS 1.3 handshake look like it was <a href="https://blog.cloudflare.com/tls-session-resumption-full-speed-and-secure/">resuming a previous connection</a> which means that, even in previous TLS versions, the Certificate message is omitted from plain text communication. This was achieved by populating the previously deprecated &quot;session_id&quot; field in the ClientHello message with a bogus value.</p>
<p>Adopting these changes meant that, while the protocol itself lost a bit of its original elegance (but without losing any of the security and performance), major browsers could finally enable TLS 1.3 by default for all of their users: <a href="https://www.chromestatus.com/features/5712755738804224">Chrome enabled TLS 1.3 by default in version 65</a> while <a href="https://www.mozilla.org/en-US/firefox/60.0/releasenotes/">Firefox did so in version 60</a>.</p>
<h3 id="adoption">Adoption</h3>
<p>We can now go back to our metrics and see what all of this means for general TLS 1.3 adoption.</p>
<p>Back in December, <a href="https://blog.cloudflare.com/why-tls-1-3-isnt-in-browsers-yet/">only 0.06% of TLS connections to Cloudflare websites used TLS 1.3</a>. Now, 5-6% do so, with this number steadily rising:</p>
<p><img src="https://blog.cloudflare.com/content/images/2018/05/tls13_metric.png" alt="You get TLS 1.3! You get TLS 1.3! Everyone gets TLS 1.3!"></p>
<p>It’s worth noting that the current Firefox beta (v61) switched to using draft 28, from draft 23 (which Chrome also uses). The two draft versions are incompatible due to some minor wire changes that were adopted some time after draft 23 was published, but Cloudflare can speak both versions so there won’t be a dip in adoption once Firefox 61 becomes stable. Once the final TLS 1.3 version (that is draft 28) becomes an official RFC we will also support that alongside the previous draft versions, to avoid leaving behind slow to update clients.</p>
<h3 id="conclusion">Conclusion</h3>
<p>The tremendous work required to specify, implement and deploy TLS 1.3 is finally starting to bear fruit, and adoption will without a doubt keep steadily increasing for some time: at the end of 2017 <a href="https://blog.cloudflare.com/our-predictions-for-2018/">our CTO predicted</a> that by the end of 2018 more than 50% of HTTPS connections will happen over TLS 1.3, and given the recent developments we are still confident that it is a reachable target.</p>
</div>]]></content:encoded></item><item><title><![CDATA[Tracing System CPU on Debian Stretch]]></title><description><![CDATA[How an innocent OS upgrade triggered a cascade of issues and forced us into tracing Linux networking internals.]]></description><link>https://blog.cloudflare.com/tracing-system-cpu-on-debian-stretch/</link><guid isPermaLink="false">5ae28464180249002268504c</guid><category><![CDATA[Performance]]></category><category><![CDATA[Kafka]]></category><category><![CDATA[eBPF]]></category><category><![CDATA[Linux]]></category><category><![CDATA[Networking]]></category><dc:creator><![CDATA[Ivan Babrou]]></dc:creator><pubDate>Sun, 13 May 2018 16:00:00 GMT</pubDate><media:content url="https://blog.cloudflare.com/content/images/2018/04/image2017-8-17-16_43_16.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card-markdown"><img src="https://blog.cloudflare.com/content/images/2018/04/image2017-8-17-16_43_16.png" alt="Tracing System CPU on Debian Stretch"><p><em>This is a heavily truncated version of an internal blog post from August 2017. For more recent updates on Kafka, check out <a href="https://blog.cloudflare.com/squeezing-the-firehose/">another blog post on compression</a>, where we optimized throughput 4.5x for both disks and network.</em></p>
<p><img src="https://images.unsplash.com/photo-1511971523672-53e6411f62b9?ixlib=rb-0.3.5&amp;q=80&amp;fm=jpg&amp;crop=entropy&amp;cs=tinysrgb&amp;w=1080&amp;fit=max&amp;ixid=eyJhcHBfaWQiOjExNzczfQ&amp;s=79649c78f5bbe2b0beb5504eb08471b8" alt="Tracing System CPU on Debian Stretch"><br>
<small>Photo by <a href="https://unsplash.com/@alex_povolyashko?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Alex Povolyashko</a> / <a href="https://unsplash.com/?utm_source=ghost&amp;utm_medium=referral&amp;utm_campaign=api-credit">Unsplash</a></small></p>
<h3 id="upgradingoursystemstodebianstretch">Upgrading our systems to Debian Stretch</h3>
<p>For quite some time we've been rolling out Debian Stretch, to the point where we have reached ~10% adoption in our core datacenters. As part of upgarding the underlying OS, we also evaluate the higher level software stack, e.g. taking a look at our ClickHouse and Kafka clusters.</p>
<p>During our upgrade of Kafka, we sucessfully migrated two smaller clusters, <code>logs</code> and <code>dns</code>, but ran into issues when attempting to upgrade one of our larger clusters, <code>http</code>.</p>
<p>Thankfully, we were able to roll back the <code>http</code> cluster upgrade relatively easily, due to heavy versioning of both the OS and the higher level software stack. If there's one takeaway from this blog post, it's to take advantage of consistent versioning.</p>
<h3 id="highleveldifferences">High level differences</h3>
<p>We upgraded one Kafka <code>http</code> node, and it did not go as planned:</p>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/1.png"><img src="https://blog.cloudflare.com/content/images/2018/04/1.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<p>Having 5x CPU usage was definitely an unexpected outcome. For control datapoints, we compared to a node where no upgrade happened, and an intermediary node that received a software stack upgrade, but not an OS upgrade. Neither of these two nodes experienced the same CPU saturation issues, even though their setups were practically identical.</p>
<p>For debugging CPU saturation issues, we call on <code>perf</code> to fish out details:</p>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/2-3.png"><img src="https://blog.cloudflare.com/content/images/2018/04/2-3.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<p><em>The command used was: <code>perf top -F 99</code>.</em></p>
<h3 id="rcustalls">RCU stalls</h3>
<p>In addition to higher system CPU usage, we found secondary slowdowns, including <a href="http://www.rdrop.com/~paulmck/RCU/whatisRCU.html">read-copy update (RCU)</a> stalls:</p>
<pre><code>[ 4909.110009] logfwdr (26887) used greatest stack depth: 11544 bytes left
[ 4909.392659] oom_reaper: reaped process 26861 (logfwdr), now anon-rss:8kB, file-rss:0kB, shmem-rss:0kB
[ 4923.462841] INFO: rcu_sched self-detected stall on CPU
[ 4923.462843]  13-...: (2 GPs behind) idle=ea7/140000000000001/0 softirq=1/2 fqs=4198
[ 4923.462845]   (t=8403 jiffies g=110722 c=110721 q=6440)
</code></pre>
<p>We've seen RCU stalls before, and our (suboptimal) solution was to reboot the machine.</p>
<p>However, one can only handle so many reboots before the problem becomes severe enough to warrant a deep dive. During our deep dive, we noticed in <code>dmesg</code> that we had issues allocating memory, while trying to write errors:</p>
<pre><code>Aug 15 21:51:35 myhost kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
Aug 15 21:51:35 myhost kernel:         26-...: (1881 ticks this GP) idle=76f/140000000000000/0 softirq=8/8 fqs=365
Aug 15 21:51:35 myhost kernel:         (detected by 0, t=2102 jiffies, g=1837293, c=1837292, q=262)
Aug 15 21:51:35 myhost kernel: Task dump for CPU 26:
Aug 15 21:51:35 myhost kernel: java            R  running task    13488  1714   1513 0x00080188
Aug 15 21:51:35 myhost kernel:  ffffc9000d1f7898 ffffffff814ee977 ffff88103f410400 000000000000000a
Aug 15 21:51:35 myhost kernel:  0000000000000041 ffffffff82203142 ffffc9000d1f78c0 ffffffff814eea10
Aug 15 21:51:35 myhost kernel:  0000000000000041 ffffffff82203142 ffff88103f410400 ffffc9000d1f7920
Aug 15 21:51:35 myhost kernel: Call Trace:
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff814ee977&gt;] ? scrup+0x147/0x160
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff814eea10&gt;] ? lf+0x80/0x90
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff814eecb5&gt;] ? vt_console_print+0x295/0x3c0
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff810b1193&gt;] ? call_console_drivers.isra.22.constprop.30+0xf3/0x100
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff810b1f51&gt;] ? console_unlock+0x281/0x550
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff810b2498&gt;] ? vprintk_emit+0x278/0x430
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff810b27ef&gt;] ? vprintk_default+0x1f/0x30
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff811588df&gt;] ? printk+0x48/0x50
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff810b30ee&gt;] ? dump_stack_print_info+0x7e/0xc0
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff8142d41f&gt;] ? dump_stack+0x44/0x65
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff81162e64&gt;] ? warn_alloc+0x124/0x150
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff81163842&gt;] ? __alloc_pages_slowpath+0x932/0xb80
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff81163c92&gt;] ? __alloc_pages_nodemask+0x202/0x250
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff811ae9c2&gt;] ? alloc_pages_current+0x92/0x120
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff81159d2f&gt;] ? __page_cache_alloc+0xbf/0xd0
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff8115cdfa&gt;] ? filemap_fault+0x2ea/0x4d0
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff8136dc95&gt;] ? xfs_filemap_fault+0x45/0xa0
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff8118b3eb&gt;] ? __do_fault+0x6b/0xd0
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff81190028&gt;] ? handle_mm_fault+0xe98/0x12b0
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff8110756b&gt;] ? __seccomp_filter+0x1db/0x290
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff8104fa5c&gt;] ? __do_page_fault+0x22c/0x4c0
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff8104fd10&gt;] ? do_page_fault+0x20/0x70
Aug 15 21:51:35 myhost kernel:  [&lt;ffffffff819bea02&gt;] ? page_fault+0x22/0x30
</code></pre>
<p>This suggested that we were logging too many errors, and the actual failure may be earlier in the process. Armed with this hypothesis, we looked at the very beginning of the error chain:</p>
<pre><code>Aug 16 01:14:51 myhost systemd-journald[13812]: Missed 17171 kernel messages
Aug 16 01:14:51 myhost kernel:  [&lt;ffffffff81171754&gt;] shrink_inactive_list+0x1f4/0x4f0
Aug 16 01:14:51 myhost kernel:  [&lt;ffffffff8117234b&gt;] shrink_node_memcg+0x5bb/0x780
Aug 16 01:14:51 myhost kernel:  [&lt;ffffffff811725e2&gt;] shrink_node+0xd2/0x2f0
Aug 16 01:14:51 myhost kernel:  [&lt;ffffffff811728ef&gt;] do_try_to_free_pages+0xef/0x310
Aug 16 01:14:51 myhost kernel:  [&lt;ffffffff81172be5&gt;] try_to_free_pages+0xd5/0x180
Aug 16 01:14:51 myhost kernel:  [&lt;ffffffff811632db&gt;] __alloc_pages_slowpath+0x31b/0xb80
</code></pre>
<p>As much as <code>shrink_node</code> may scream &quot;NUMA issues&quot;, you're looking primarily at:</p>
<pre><code>Aug 16 01:14:51 myhost systemd-journald[13812]: Missed 17171 kernel messages
</code></pre>
<p>In addition, we also found memory allocation issues:</p>
<pre><code>[78972.506644] Mem-Info:
[78972.506653] active_anon:3936889 inactive_anon:371971 isolated_anon:0
[78972.506653]  active_file:25778474 inactive_file:1214478 isolated_file:2208
[78972.506653]  unevictable:0 dirty:1760643 writeback:0 unstable:0
[78972.506653]  slab_reclaimable:1059804 slab_unreclaimable:141694
[78972.506653]  mapped:47285 shmem:535917 pagetables:10298 bounce:0
[78972.506653]  free:202928 free_pcp:3085 free_cma:0
[78972.506660] Node 0 active_anon:8333016kB inactive_anon:989808kB active_file:50622384kB inactive_file:2401416kB unevictable:0kB isolated(anon):0kB isolated(file):3072kB mapped:96624kB dirty:3422168kB writeback:0kB shmem:1261156kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:15744 all_unreclaimable? no
[78972.506666] Node 1 active_anon:7414540kB inactive_anon:498076kB active_file:52491512kB inactive_file:2456496kB unevictable:0kB isolated(anon):0kB isolated(file):5760kB mapped:92516kB dirty:3620404kB writeback:0kB shmem:882512kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:9080974 all_unreclaimable? no
[78972.506671] Node 0 DMA free:15900kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
** 9 printk messages dropped ** [78972.506716] Node 0 Normal: 15336*4kB (UMEH) 4584*8kB (MEH) 2119*16kB (UME) 775*32kB (MEH) 106*64kB (UM) 81*128kB (MH) 29*256kB (UM) 25*512kB (M) 19*1024kB (M) 7*2048kB (M) 2*4096kB (M) = 236080kB
[78972.506725] Node 1 Normal: 31740*4kB (UMEH) 3879*8kB (UMEH) 873*16kB (UME) 353*32kB (UM) 286*64kB (UMH) 62*128kB (UMH) 28*256kB (MH) 20*512kB (UMH) 15*1024kB (UM) 7*2048kB (UM) 12*4096kB (M) = 305752kB
[78972.506726] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[78972.506727] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[78972.506728] 27531091 total pagecache pages
[78972.506729] 0 pages in swap cache
[78972.506730] Swap cache stats: add 0, delete 0, find 0/0
[78972.506730] Free swap  = 0kB
[78972.506731] Total swap = 0kB
[78972.506731] 33524975 pages RAM
[78972.506732] 0 pages HighMem/MovableOnly
[78972.506732] 546255 pages reserved
[78972.620129] ntpd: page allocation stalls for 272380ms, order:0, mode:0x24000c0(GFP_KERNEL)
[78972.620132] CPU: 16 PID: 13099 Comm: ntpd Tainted: G           O    4.9.43-cloudflare-2017.8.4 #1
[78972.620133] Hardware name: Quanta Computer Inc D51B-2U (dual 1G LoM)/S2B-MB (dual 1G LoM), BIOS S2B_3A21 10/01/2015
[78972.620136]  ffffc90022f9b6f8 ffffffff8142d668 ffffffff81ca31b8 0000000000000001
[78972.620138]  ffffc90022f9b778 ffffffff81162f14 024000c022f9b740 ffffffff81ca31b8
[78972.620140]  ffffc90022f9b720 0000000000000010 ffffc90022f9b788 ffffc90022f9b738
[78972.620140] Call Trace:
[78972.620148]  [&lt;ffffffff8142d668&gt;] dump_stack+0x4d/0x65
[78972.620152]  [&lt;ffffffff81162f14&gt;] warn_alloc+0x124/0x150
[78972.620154]  [&lt;ffffffff811638f2&gt;] __alloc_pages_slowpath+0x932/0xb80
[78972.620157]  [&lt;ffffffff81163d42&gt;] __alloc_pages_nodemask+0x202/0x250
[78972.620160]  [&lt;ffffffff811aeae2&gt;] alloc_pages_current+0x92/0x120
[78972.620162]  [&lt;ffffffff8115f6ee&gt;] __get_free_pages+0xe/0x40
[78972.620165]  [&lt;ffffffff811e747a&gt;] __pollwait+0x9a/0xe0
[78972.620168]  [&lt;ffffffff817c9ec9&gt;] datagram_poll+0x29/0x100
[78972.620170]  [&lt;ffffffff817b9d48&gt;] sock_poll+0x48/0xa0
[78972.620172]  [&lt;ffffffff811e7c35&gt;] do_select+0x335/0x7b0
</code></pre>
<p>This specific error message did seem fun:</p>
<pre><code>[78991.546088] systemd-network: page allocation stalls for 287000ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
</code></pre>
<p>You don't want your page allocations to stall for 5 minutes, especially when it's order zero allocation (smallest allocation of one 4 KiB page).</p>
<p>Comparing to our control nodes, the only two possible explanations were: a kernel upgrade, and the switch from Debian Jessie to Debian Stretch. We suspected the former, since CPU usage implies a kernel issue. However, just to be safe, we rolled both the kernel back to 4.4.55, and downgraded the affected nodes back to Debian Jessie. This was a reasonable compromise, since we needed to minimize downtime on production nodes.</p>
<h3 id="diggingabitdeeper">Digging a bit deeper</h3>
<p>Keeping servers running on older kernel and distribution is not a viable long term solution. Through bisection, we found the issue lay in the Jessie to Stretch upgrade, contrary to our initial hypothesis.</p>
<p>Now that we knew what the problem was, we proceeded to investigate why. With the help from existing automation around <code>perf</code> and Java, we generated the following flamegraphs:</p>
<ul>
<li>Jessie</li>
</ul>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/9.png"><img src="https://blog.cloudflare.com/content/images/2018/04/9.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<ul>
<li>Stretch</li>
</ul>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/10.png"><img src="https://blog.cloudflare.com/content/images/2018/04/10.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<p>At first it looked like Jessie was doing <code>writev</code> instead of <code>sendfile</code>, but the full flamegraphs revealed that Strech was executing <code>sendfile</code> a lot slower.</p>
<p>If you highlight <code>sendfile</code>:</p>
<ul>
<li>Jessie</li>
</ul>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/11.png"><img src="https://blog.cloudflare.com/content/images/2018/04/11.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<ul>
<li>Stretch</li>
</ul>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/12.png"><img src="https://blog.cloudflare.com/content/images/2018/04/12.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<p>And zoomed in:</p>
<ul>
<li>Jessie</li>
</ul>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/13.png"><img src="https://blog.cloudflare.com/content/images/2018/04/13.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<ul>
<li>Stretch</li>
</ul>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/14.png"><img src="https://blog.cloudflare.com/content/images/2018/04/14.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<p>These two look very different.</p>
<p>Some colleagues suggested that the differences in the graphs may be due to TCP offload being disabled, but upon checking our NIC settings, we found that the feature flags were identical.</p>
<p>We'll dive into the differences in the next section.</p>
<h3 id="anddeeper">And deeper</h3>
<p>To trace latency distributions of <code>sendfile</code> syscalls between Jessie and Stretch, we used <a href="https://github.com/iovisor/bcc/blob/master/tools/funclatency_example.txt"><code>funclatency</code></a> from <a href="https://iovisor.github.io/bcc/">bcc-tools</a>:</p>
<ul>
<li>Jessie</li>
</ul>
<pre><code>$ sudo /usr/share/bcc/tools/funclatency -uTi 1 do_sendfile
Tracing 1 functions for &quot;do_sendfile&quot;... Hit Ctrl-C to end.
23:27:25
     usecs               : count     distribution
         0 -&gt; 1          : 9        |                                        |
         2 -&gt; 3          : 47       |****                                    |
         4 -&gt; 7          : 53       |*****                                   |
         8 -&gt; 15         : 379      |****************************************|
        16 -&gt; 31         : 329      |**********************************      |
        32 -&gt; 63         : 101      |**********                              |
        64 -&gt; 127        : 23       |**                                      |
       128 -&gt; 255        : 50       |*****                                   |
       256 -&gt; 511        : 7        |                                        |
</code></pre>
<ul>
<li>Stretch</li>
</ul>
<pre><code>$ sudo /usr/share/bcc/tools/funclatency -uTi 1 do_sendfile
Tracing 1 functions for &quot;do_sendfile&quot;... Hit Ctrl-C to end.
23:27:28
     usecs               : count     distribution
         0 -&gt; 1          : 1        |                                        |
         2 -&gt; 3          : 20       |***                                     |
         4 -&gt; 7          : 46       |*******                                 |
         8 -&gt; 15         : 56       |********                                |
        16 -&gt; 31         : 65       |**********                              |
        32 -&gt; 63         : 75       |***********                             |
        64 -&gt; 127        : 75       |***********                             |
       128 -&gt; 255        : 258      |****************************************|
       256 -&gt; 511        : 144      |**********************                  |
       512 -&gt; 1023       : 24       |***                                     |
      1024 -&gt; 2047       : 27       |****                                    |
      2048 -&gt; 4095       : 28       |****                                    |
      4096 -&gt; 8191       : 35       |*****                                   |
      8192 -&gt; 16383      : 1        |                                        |
</code></pre>
<p>In the flamegraphs, you can see timers being set at the tip (<code>mod_timer</code> function), with these timers taking locks. On Stretch we installed 3x more timers, resulting in 10x the amount of contention:</p>
<ul>
<li>Jessie</li>
</ul>
<pre><code>$ sudo /usr/share/bcc/tools/funccount -T -i 1 mod_timer
Tracing 1 functions for &quot;mod_timer&quot;... Hit Ctrl-C to end.
00:33:36
FUNC                                    COUNT
mod_timer                               60482
00:33:37
FUNC                                    COUNT
mod_timer                               58263
00:33:38
FUNC                                    COUNT
mod_timer                               54626
</code></pre>
<pre><code>$ sudo /usr/share/bcc/tools/funccount -T -i 1 lock_timer_base
Tracing 1 functions for &quot;lock_timer_base&quot;... Hit Ctrl-C to end.
00:32:36
FUNC                                    COUNT
lock_timer_base                         15962
00:32:37
FUNC                                    COUNT
lock_timer_base                         16261
00:32:38
FUNC                                    COUNT
lock_timer_base                         15806
</code></pre>
<ul>
<li>Stretch</li>
</ul>
<pre><code>$ sudo /usr/share/bcc/tools/funccount -T -i 1 mod_timer
Tracing 1 functions for &quot;mod_timer&quot;... Hit Ctrl-C to end.
00:33:28
FUNC                                    COUNT
mod_timer                              149068
00:33:29
FUNC                                    COUNT
mod_timer                              155994
00:33:30
FUNC                                    COUNT
mod_timer                              160688
</code></pre>
<pre><code>$ sudo /usr/share/bcc/tools/funccount -T -i 1 lock_timer_base
Tracing 1 functions for &quot;lock_timer_base&quot;... Hit Ctrl-C to end.
00:32:32
FUNC                                    COUNT
lock_timer_base                        119189
00:32:33
FUNC                                    COUNT
lock_timer_base                        196895
00:32:34
FUNC                                    COUNT
lock_timer_base                        140085
</code></pre>
<p>The Linux kernel includes debugging facilities for timers, which <a href="https://elixir.bootlin.com/linux/v4.9.43/source/kernel/time/timer.c#L1010">call</a> the <code>timer:timer_start</code> <a href="https://elixir.bootlin.com/linux/v4.9.43/source/include/trace/events/timer.h#L44">tracepoint</a> on every timer start. This allowed us to pull up timer names:</p>
<ul>
<li>Jessie</li>
</ul>
<pre><code>$ sudo perf record -e timer:timer_start -p 23485 -- sleep 10 &amp;&amp; sudo perf script | sed 's/.* function=//g' | awk '{ print $1 }' | sort | uniq -c
[ perf record: Woken up 54 times to write data ]
[ perf record: Captured and wrote 17.778 MB perf.data (173520 samples) ]
      6 blk_rq_timed_out_timer
      2 clocksource_watchdog
      5 commit_timeout
      5 cursor_timer_handler
      2 dev_watchdog
     10 garp_join_timer
      2 ixgbe_service_timer
     36 reqsk_timer_handler
   4769 tcp_delack_timer
    171 tcp_keepalive_timer
 168512 tcp_write_timer
</code></pre>
<ul>
<li>Stretch</li>
</ul>
<pre><code>$ sudo perf record -e timer:timer_start -p 3416 -- sleep 10 &amp;&amp; sudo perf script | sed 's/.* function=//g' | awk '{ print $1 }' | sort | uniq -c
[ perf record: Woken up 671 times to write data ]
[ perf record: Captured and wrote 198.273 MB perf.data (1988650 samples) ]
      6 clocksource_watchdog
      4 commit_timeout
     12 cursor_timer_handler
      2 dev_watchdog
     18 garp_join_timer
      4 ixgbe_service_timer
      1 neigh_timer_handler
      1 reqsk_timer_handler
   4622 tcp_delack_timer
      1 tcp_keepalive_timer
1983978 tcp_write_timer
      1 writeout_period
</code></pre>
<p>So basically we install 12x more <code>tcp_write_timer</code> timers, resulting in higher kernel CPU usage.</p>
<p>Taking specific flamegraphs of the timers revealed the differences in their operation:</p>
<ul>
<li>Jessie</li>
</ul>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/15.png"><img src="https://blog.cloudflare.com/content/images/2018/04/15.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<ul>
<li>Stretch</li>
</ul>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/16.png"><img src="https://blog.cloudflare.com/content/images/2018/04/16.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<p>We then traced the functions that were different:</p>
<ul>
<li>Jessie</li>
</ul>
<pre><code>$ sudo /usr/share/bcc/tools/funccount -T -i 1 tcp_sendmsg
Tracing 1 functions for &quot;tcp_sendmsg&quot;... Hit Ctrl-C to end.
03:33:33
FUNC                                    COUNT
tcp_sendmsg                             21166
03:33:34
FUNC                                    COUNT
tcp_sendmsg                             21768
03:33:35
FUNC                                    COUNT
tcp_sendmsg                             21712
</code></pre>
<pre><code>$ sudo /usr/share/bcc/tools/funccount -T -i 1 tcp_push_one
Tracing 1 functions for &quot;tcp_push_one&quot;... Hit Ctrl-C to end.
03:37:14
FUNC                                    COUNT
tcp_push_one                              496
03:37:15
FUNC                                    COUNT
tcp_push_one                              432
03:37:16
FUNC                                    COUNT
tcp_push_one                              495
</code></pre>
<pre><code>$ sudo /usr/share/bcc/tools/trace -p 23485 'tcp_sendmsg &quot;%d&quot;, arg3' -T -M 100000 | awk '{ print $NF }' | sort | uniq -c | sort -n | tail
   1583 4
   2043 54
   3546 18
   4016 59
   4423 50
   5349 8
   6154 40
   6620 38
  17121 51
  39528 44
</code></pre>
<ul>
<li>Stretch</li>
</ul>
<pre><code>$ sudo /usr/share/bcc/tools/funccount -T -i 1 tcp_sendmsg
Tracing 1 functions for &quot;tcp_sendmsg&quot;... Hit Ctrl-C to end.
03:33:30
FUNC                                    COUNT
tcp_sendmsg                             53834
03:33:31
FUNC                                    COUNT
tcp_sendmsg                             49472
03:33:32
FUNC                                    COUNT
tcp_sendmsg                             51221
</code></pre>
<pre><code>$ sudo /usr/share/bcc/tools/funccount -T -i 1 tcp_push_one
Tracing 1 functions for &quot;tcp_push_one&quot;... Hit Ctrl-C to end.
03:37:10
FUNC                                    COUNT
tcp_push_one                            64483
03:37:11
FUNC                                    COUNT
tcp_push_one                            65058
03:37:12
FUNC                                    COUNT
tcp_push_one                            72394
</code></pre>
<pre><code>$ sudo /usr/share/bcc/tools/trace -p 3416 'tcp_sendmsg &quot;%d&quot;, arg3' -T -M 100000 | awk '{ print $NF }' | sort | uniq -c | sort -n | tail
    396 46
    409 4
   1124 50
   1305 18
   1547 40
   1672 59
   1729 8
   2181 38
  19052 44
  64504 4096
</code></pre>
<p>The traces showed huge variations of <code>tcp_sendmsg</code> and <code>tcp_push_one</code> within <code>sendfile</code>.</p>
<p>To further introspect, we leveraged a kernel feature available since 4.9: the ability to count stacks. This led us to measuring what hits <code>tcp_push_one</code>:</p>
<ul>
<li>Jessie</li>
</ul>
<pre><code>$ sudo /usr/share/bcc/tools/stackcount -i 10 tcp_push_one
Tracing 1 functions for &quot;tcp_push_one&quot;... Hit Ctrl-C to end.
  tcp_push_one
  inet_sendmsg
  sock_sendmsg
  sock_write_iter
  do_iter_readv_writev
  do_readv_writev
  vfs_writev
  do_writev
  SyS_writev
  do_syscall_64
  return_from_SYSCALL_64
    1
  tcp_push_one
  inet_sendpage
  kernel_sendpage
  sock_sendpage
  pipe_to_sendpage
  __splice_from_pipe
  splice_from_pipe
  generic_splice_sendpage
  direct_splice_actor
  splice_direct_to_actor
  do_splice_direct
  do_sendfile
  sys_sendfile64
  do_syscall_64
  return_from_SYSCALL_64
    4950
</code></pre>
<ul>
<li>Stretch</li>
</ul>
<pre><code>$ sudo /usr/share/bcc/tools/stackcount -i 10 tcp_push_one
Tracing 1 functions for &quot;tcp_push_one&quot;... Hit Ctrl-C to end.
  tcp_push_one
  inet_sendmsg
  sock_sendmsg
  sock_write_iter
  do_iter_readv_writev
  do_readv_writev
  vfs_writev
  do_writev
  SyS_writev
  do_syscall_64
  return_from_SYSCALL_64
    123
  tcp_push_one
  inet_sendmsg
  sock_sendmsg
  sock_write_iter
  __vfs_write
  vfs_write
  SyS_write
  do_syscall_64
  return_from_SYSCALL_64
    172
  tcp_push_one
  inet_sendmsg
  sock_sendmsg
  kernel_sendmsg
  sock_no_sendpage
  tcp_sendpage
  inet_sendpage
  kernel_sendpage
  sock_sendpage
  pipe_to_sendpage
  __splice_from_pipe
  splice_from_pipe
  generic_splice_sendpage
  direct_splice_actor
  splice_direct_to_actor
  do_splice_direct
  do_sendfile
  sys_sendfile64
  do_syscall_64
  return_from_SYSCALL_64
    735110
</code></pre>
<p>If you diff the most popular stacks, you'll get:</p>
<pre><code>--- jessie.txt  2017-08-16 21:14:13.000000000 -0700
+++ stretch.txt 2017-08-16 21:14:20.000000000 -0700
@@ -1,4 +1,9 @@
 tcp_push_one
+inet_sendmsg
+sock_sendmsg
+kernel_sendmsg
+sock_no_sendpage
+tcp_sendpage
 inet_sendpage
 kernel_sendpage
 sock_sendpage
</code></pre>
<p>Let's look closer at <a href="https://elixir.bootlin.com/linux/v4.9.43/source/net/ipv4/tcp.c#L1012"><code>tcp_sendpage</code></a>:</p>
<pre><code>int tcp_sendpage(struct sock *sk, struct page *page, int offset,
         size_t size, int flags)
{
    ssize_t res;

    if (!(sk-&gt;sk_route_caps &amp; NETIF_F_SG) ||
        !sk_check_csum_caps(sk))
        return sock_no_sendpage(sk-&gt;sk_socket, page, offset, size,
                    flags);

    lock_sock(sk);

    tcp_rate_check_app_limited(sk);  /* is sending application-limited? */

    res = do_tcp_sendpages(sk, page, offset, size, flags);
    release_sock(sk);
    return res;
}
</code></pre>
<p>It looks like we don't enter the <code>if</code> body. We looked up what <a href="https://elixir.bootlin.com/linux/v4.9.43/source/include/linux/netdev_features.h#L115">NET_F_SG</a> does: <a href="https://en.wikipedia.org/wiki/Large_send_offload">segmentation offload</a>. This difference is peculiar, since both OS'es should have this enabled.</p>
<h3 id="evendeepertothecrux">Even deeper, to the crux</h3>
<p>It turned out that we had segmentation offload enabled for only a few of our NICs: <code>eth2</code>, <code>eth3</code>, and <code>bond0</code>. Our network setup can be described as follows:</p>
<pre><code>eth2 --&gt;|              |--&gt; vlan10
        |---&gt; bond0 --&gt;|
eth3 --&gt;|              |--&gt; vlan100
</code></pre>
<p><strong>The missing piece was that we were missing segmentation offload on VLAN interfaces, where the actual IPs live.</strong></p>
<p>Here's the diff from <code>ethtook -k vlan10</code>:</p>
<pre><code>$ diff -rup &lt;(ssh jessie sudo ethtool -k vlan10) &lt;(ssh stretch sudo ethtool -k vlan10)
--- /dev/fd/63  2017-08-16 21:21:12.000000000 -0700
+++ /dev/fd/62  2017-08-16 21:21:12.000000000 -0700
@@ -1,21 +1,21 @@
 Features for vlan10:
 rx-checksumming: off [fixed]
-tx-checksumming: off
+tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
-       tx-checksum-ip-generic: off
+       tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off
        tx-checksum-sctp: off
-scatter-gather: off
-       tx-scatter-gather: off
+scatter-gather: on
+       tx-scatter-gather: on
        tx-scatter-gather-fraglist: off
-tcp-segmentation-offload: off
-       tx-tcp-segmentation: off [requested on]
-       tx-tcp-ecn-segmentation: off [requested on]
-       tx-tcp-mangleid-segmentation: off [requested on]
-       tx-tcp6-segmentation: off [requested on]
-udp-fragmentation-offload: off [requested on]
-generic-segmentation-offload: off [requested on]
+tcp-segmentation-offload: on
+       tx-tcp-segmentation: on
+       tx-tcp-ecn-segmentation: on
+       tx-tcp-mangleid-segmentation: on
+       tx-tcp6-segmentation: on
+udp-fragmentation-offload: on
+generic-segmentation-offload: on
 generic-receive-offload: on
 large-receive-offload: off [fixed]
 rx-vlan-offload: off [fixed]
</code></pre>
<p>So we entusiastically enabled segmentation offload:</p>
<pre><code>$ sudo ethtool -K vlan10 sg on
</code></pre>
<p>And it didn't help! Will the suffering ever end? Let's also enable TCP transmission checksumming offload:</p>
<pre><code>$ sudo ethtool -K vlan10 tx on
Actual changes:
tx-checksumming: on
        tx-checksum-ip-generic: on
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: on
        tx-tcp6-segmentation: on
udp-fragmentation-offload: on
</code></pre>
<p>Nothing. The diff is essentially empty now:</p>
<pre><code>$ diff -rup &lt;(ssh jessie sudo ethtool -k vlan10) &lt;(ssh stretch sudo ethtool -k vlan10)
--- /dev/fd/63  2017-08-16 21:31:27.000000000 -0700
+++ /dev/fd/62  2017-08-16 21:31:27.000000000 -0700
@@ -4,11 +4,11 @@ tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
-       tx-checksum-fcoe-crc: off [requested on]
-       tx-checksum-sctp: off [requested on]
+       tx-checksum-fcoe-crc: off
+       tx-checksum-sctp: off
 scatter-gather: on
        tx-scatter-gather: on
-       tx-scatter-gather-fraglist: off [requested on]
+       tx-scatter-gather-fraglist: off
 tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
</code></pre>
<p>The last missing piece we found was that offload changes are applied only during connection initiation, so we restarted Kafka, and we immediately saw a performance improvement (green line):</p>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/17.png"><img src="https://blog.cloudflare.com/content/images/2018/04/17.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<p>Not enabling offload features when possible seems like a pretty bad regression, so we filed a ticket for <code>systemd</code>:</p>
<ul>
<li><a href="https://github.com/systemd/systemd/issues/6629">https://github.com/systemd/systemd/issues/6629</a></li>
</ul>
<p>In the meantime, we work around our upstream issue by enabling offload features automatically on boot if they are disabled on VLAN interfaces.</p>
<p>Having a fix enabled, we rebooted our <code>logs</code> Kafka cluster to upgrade to the latest kernel, and our 5 day CPU usage history yielded positive results:</p>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/18.png"><img src="https://blog.cloudflare.com/content/images/2018/04/18.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<p>The DNS cluster also yielded positive results, with just 2 nodes rebooted (purple line going down):</p>
<p><a href="https://blog.cloudflare.com/content/images/2018/04/19.png"><img src="https://blog.cloudflare.com/content/images/2018/04/19.png" alt="Tracing System CPU on Debian Stretch"></a></p>
<h3 id="conclusion">Conclusion</h3>
<p>It was an error on our part to hit performance degradation without a good regression framework in place to catch the issue. Luckily, due to our heavy use of version control, we managed to bisect the issue rather quickly, and had a temp rollback in place while root causing the problem.</p>
<p>In the end, enabling offload also removed RCU stalls. It's not really clear whether it was the cause or just a catalyst, but the end result speaks for itself.</p>
<p>On the bright side, we dug pretty deep into Linux kernel internals, and although there were fleeting moments of giving up, moving to the woods to become a park ranger, we persevered and came out of the forest successful.</p>
<hr>
<p><em>If deep diving from high level symptoms to kernel/OS issues makes you excited, <a href="https://www.cloudflare.com/careers/">drop us a line</a>.</em></p>
<hr>
</div>]]></content:encoded></item></channel></rss>