
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Wed, 08 Apr 2026 00:50:05 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Introducing HAR Sanitizer: secure HAR sharing]]></title>
            <link>https://blog.cloudflare.com/introducing-har-sanitizer-secure-har-sharing/</link>
            <pubDate>Thu, 26 Oct 2023 13:20:05 GMT</pubDate>
            <description><![CDATA[ As a follow-up to the most recent Okta breach, we are making a HAR file sanitizer available to everyone, not just Cloudflare customers, at no cost. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6SLumYs48sPRowoZrb0BCp/84d8ff0d78c5d4a8498edfc136541bbe/image2-8.png" />
            
            </figure><p>On Wednesday, October 18th, 2023, Cloudflare’s Security Incident Response Team (SIRT) discovered an attack on our systems that originated from an <a href="/how-cloudflare-mitigated-yet-another-okta-compromise/">authentication token stolen from one of Okta’s support systems</a>. No Cloudflare customer information or systems were impacted by the incident, thanks to the real-time detection and rapid action of our Security Incident Response Team (SIRT) in tandem with our <a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/">Zero Trust security posture</a> and use of hardware keys. With that said, we’d rather not repeat the experience — and so we have built a new security tool that can help organizations render this type of attack obsolete for good.</p><p>The bad actor in the Okta breach compromised user sessions by capturing session tokens from administrators at Cloudflare and other impacted organizations. They did this by infiltrating Okta’s customer support system and stealing one of the most common mechanisms for troubleshooting — an HTTP Response Archive (HAR) file.</p><p>HAR files contain a record of a user’s browser session, a kind of step-by-step audit, that a user can share with someone like a help desk agent to diagnose an issue. However, the file can also contain sensitive information that can be used to launch an attack.</p><p>As a follow-up to the Okta breach, we are making a <a href="http://har-sanitizer.pages.dev/">HAR file sanitizer</a> available to everyone, not just Cloudflare customers, at no cost. We are publishing this tool under an <a href="https://github.com/cloudflare/har-sanitizer">open source license</a> and are making it available to any support, engineering or security team. At Cloudflare, we are committed to making the Internet a better place and using HAR files without the threat of stolen sessions should be part of the future of the Internet.</p>
    <div>
      <h2>HAR Files - a look back in time</h2>
      <a href="#har-files-a-look-back-in-time">
        
      </a>
    </div>
    <p>Imagine being able to rewind time and revisit every single step a user took during a web session, scrutinizing each request and the responses the browser received.</p><p><a href="https://en.wikipedia.org/wiki/HAR_%28file_format%29">HAR (HTTP Archive)</a> files are a JSON formatted archive file of a web browser’s interaction with a web application. HAR files provide a detailed snapshot of every request, including headers, cookies, and other types of data sent to a web server by the browser. This makes them an invaluable resource to troubleshoot web application issues especially for complex, layered web applications.</p><p>The snapshot that a HAR file captures can contain the following information:</p><p><b>Complete Request and Response Headers:</b> Every piece of data sent and received, including method types (GET, POST, etc.), status codes, URLs, cookies, and more.</p><p><b>Payload Content:</b> Details of what was actually exchanged between the client and server, which can be essential for diagnosing issues related to data submission or retrieval.</p><p><b>Timing Information:</b> Precise timing breakdowns of each phase – from DNS lookup, connection time, SSL handshake, to content download – giving insight into performance bottlenecks.</p><p>This information can be difficult to gather from an application’s logs due to the diverse nature of devices, browsers and networks used to access an application. A user would need to take dozens of manual steps. A HAR file gives them a one-click option to share diagnostic information with another party. The file is also standard, providing the developers, support teams, and administrators on the other side of the exchange with a consistent input to their own tooling. This minimizes the frustrating back-and-forth where teams try to recreate a user-reported problem, ensuring that everyone is, quite literally, on the same page.</p>
    <div>
      <h2>HAR files as an attack vector</h2>
      <a href="#har-files-as-an-attack-vector">
        
      </a>
    </div>
    <p>HAR files, while powerful, come with a cautionary note. Within the set of information they contain, session cookies make them a target for malicious actors.</p>
    <div>
      <h3>The Role of Session Cookies</h3>
      <a href="#the-role-of-session-cookies">
        
      </a>
    </div>
    <p>Before diving into the risks, it's crucial to understand the role of session cookies. A session cookie is sent from a server and stored on a user's browser to maintain stateful information across web sessions for that user. In simpler terms, it’s how the browser keeps you logged into an application for a period of time even if you close the page. Generally, these cookies live in local memory on a user’s browser and are not often shared. However, a HAR file is one of the most common ways that a session cookie could be inadvertently shared.</p>
    <div>
      <h3>Dangers of a stolen session cookie</h3>
      <a href="#dangers-of-a-stolen-session-cookie">
        
      </a>
    </div>
    <p>If a HAR file with a valid session cookie is shared, then there are a number of potential security threats that user, and company, may be exposed to:</p><p><b>Unauthorized Access:</b> The biggest risk is unauthorized access. If a HAR file with a session cookie lands in the wrong hands, it grants entry to the user’s account for that application. For platforms that store personal data or financial details, the consequences of such a breach can be catastrophic. Especially if the session cookie of a user with administrative or elevated permissions is stolen.</p><p><b>Session Hijacking:</b> Armed with a session cookie, attackers can impersonate legitimate users, a tactic known as session hijacking. This can lead to a range of malicious activities, from spreading misinformation to siphoning off funds.</p><p><b>Persistent Exposure:</b> Unlike other forms of data, a session cookie's exposure risk doesn't necessarily end when a user session does. Depending on the cookie's lifespan, malicious actors could gain prolonged access, repeatedly compromising a user's digital interactions.</p><p><b>Gateway to Further Attacks:</b> With access to a user's session, especially an administrator’s, attackers can probe for other vulnerabilities, exploit platform weaknesses, or jump to other applications.</p>
    <div>
      <h2>Mitigating the impact of a stolen HAR file</h2>
      <a href="#mitigating-the-impact-of-a-stolen-har-file">
        
      </a>
    </div>
    <p>Thankfully, there are ways to render a HAR file inert even if stolen by an attacker. One of the most effective methods is to “sanitize” a HAR file of any session related information before sharing it for debugging purposes.</p><p>The <a href="http://har-sanitizer.pages.dev/">HAR sanitizer</a> we are introducing today allows a user to upload any HAR file, and the tool will strip out any session related cookies or JSON Web Tokens (JWT). The tool is built entirely on Cloudflare Workers, and all sanitization is done client-side which means Cloudflare never sees the full contents of the session token.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/50hGJi9BlyGNoT428LPEJL/1119b4ec03eebf7de4eefa7a5561638c/image1-8.png" />
            
            </figure>
    <div>
      <h3>Just enough sanitization</h3>
      <a href="#just-enough-sanitization">
        
      </a>
    </div>
    <p>By default, the sanitizer will remove all session-related cookies and tokens — but there are some cases where these are essential for troubleshooting. For these scenarios, we are implementing a way to conditionally strip “just enough” data from the HAR file to render them safe, while still giving support teams the information they need.</p><p>The first product we’ve optimized the HAR sanitizer for is <a href="https://developers.cloudflare.com/cloudflare-one/policies/access/">Cloudflare Access</a>. Access relies on a user’s <a href="https://developers.cloudflare.com/cloudflare-one/identity/authorization-cookie/application-token/">JWT</a> — a compact token often used for secure authentication — to verify that a user should have access to the requested resource. This means a JWT plays a crucial role in troubleshooting issues with Cloudflare Access. We have tuned the HAR sanitizer to strip the cryptographic signature out of the Access JWT, rendering it inert, while still providing useful information for internal admins and Cloudflare support to debug issues.</p><p>Because HAR files can include a diverse array of data types, selectively sanitizing them is not a case of ‘one size fits all’. We will continue to expand support for other popular authentication tools to ensure we strip out “just enough” information.</p>
    <div>
      <h2>What’s next</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Over the coming months, we will launch additional security controls in Cloudflare Zero Trust to further mitigate attacks stemming from session tokens stolen from HAR files. This will include:</p><ul><li><p>Enhanced Data Loss Prevention (DLP) file type scanning to include HAR file and session token detections, to ensure users in your organization can not share unsanitized files.</p></li><li><p>Expanded API CASB scanning to detect HAR files with session tokens in collaboration tools like Zendesk, Jira, Drive and O365.</p></li><li><p>Automated HAR sanitization of data in popular collaboration tools.</p></li></ul><p>As always, we continue to expand our Cloudflare One Zero Trust suite to protect organizations of all sizes against an ever-evolving array of threats. Ready to get started? <a href="https://www.cloudflare.com/products/zero-trust/">Sign up here</a> to begin using Cloudflare One at no cost for teams of up to 50 users.</p> ]]></content:encoded>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[Open Source]]></category>
            <guid isPermaLink="false">5Le8RmeoVTzjhB1qvPodhM</guid>
            <dc:creator>Kenny Johnson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Project Crossbow: Lessons from Refactoring a Large-Scale Internal Tool]]></title>
            <link>https://blog.cloudflare.com/project-crossbow-lessons-from-refactoring-a-large-scale-internal-tool/</link>
            <pubDate>Tue, 07 Apr 2020 07:00:00 GMT</pubDate>
            <description><![CDATA[ Crossbow is a tool that is now allowing Cloudflare’s Technical Support Engineers to perform diagnostic activities from running commands (like traceroutes, cURL requests and DNS queries) to debugging product features and performance features using bespoke tools. ]]></description>
            <content:encoded><![CDATA[ 
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4DD8CfPAPxEZXvXNS1u6cW/46988ed6b45099cd2827d63a190ee721/Crossbow-tool_2x-1.png" />
          </figure><p>Cloudflare’s <a href="https://www.cloudflare.com/network/">global network</a> currently spans 200 cities in more than 90 countries. Engineers working in product, technical support and operations often need to be able to debug network issues from particular locations or individual servers.</p><p>Crossbow is the internal tool for doing just this; allowing Cloudflare’s Technical Support Engineers to perform diagnostic activities from running commands (like traceroutes, cURL requests and DNS queries) to debugging product features and performance using bespoke tools.</p><p>In September last year, an Engineering Manager at Cloudflare asked to transition Crossbow from a Product Engineering team to the Support Operations team. The tool had been a secondary focus and had been transitioned through multiple engineering teams without developing subject matter knowledge.</p><p>The Support Operations team at Cloudflare is closely aligned with Cloudflare’s Technical Support Engineers; developing diagnostic tooling and Natural Language Processing technology to drive efficiency. Based on this alignment, it was decided that Support Operations was the best team to own this tool.</p>
    <div>
      <h3>Learning from Sisyphus</h3>
      <a href="#learning-from-sisyphus">
        
      </a>
    </div>
    <p>Whilst seeking advice on the transition process, an SRE Engineering Manager in Cloudflare suggested reading: “<a href="https://landing.google.com/sre/resources/practicesandprocesses/case-study-community-driven-software-adoption/">A Case Study in Community-Driven Software Adoption</a>”. This book proved a truly invaluable read for anyone thinking of doing internal tool development or contributing to such tooling. The book describes why multiple tools are often created for the same purpose by different autonomous teams and how this issue can be overcome. The book also describes challenges and approaches to gaining adoption of tooling, especially where this requires some behaviour change for engineers who use such tools.</p><p>That said, there are some things we learnt along the way of taking over Crossbow and performing a refactor and revamp of a large-scale internal tool. This blog post seeks to be an addendum to such guidance and provide some further practical advice.</p><p>In this blog post we won’t dwell too much on the work of the Cloudflare Support Operations team, but this can be found in the SRECon talk: “<a href="https://www.usenix.org/conference/srecon19emea/presentation/ali">Support Operations Engineering: Scaling Developer Products to the Millions</a>”. The software development methodology used in Cloudflare’s Support Operations Group closely resembles <a href="http://www.extremeprogramming.org/">Extreme Programming</a>.</p>
    <div>
      <h3>Cutting The Fat</h3>
      <a href="#cutting-the-fat">
        
      </a>
    </div>
    <p>There were two ways of using Crossbow, a CLI (command line interface) and UI in Cloudflare’s internal tool for Cloudflare’s Technical Support Engineers. Maintaining both interfaces clearly had significant overhead for improvement efforts, and we took the decision to deprecate one of the interfaces. This allowed us to focus our efforts on one platform to achieve large-scale improvements across technology, usability and functionality.</p><p>We set-up a poll to allow engineering, operations, solutions engineering and technical support teams to provide their feedback on how they used the tooling. Polling was not only critical for gaining vital information to how different teams used the tool, but also ensured that prior to deprecation that people knew their views were taken onboard. We polled not only on the option people preferred, but which options they felt were necessary to them and the reasons as to why.</p><p>We found that the reasons for favouring the web UI primarily revolved around the absence of documentation and training. Instead, we discovered those who used the CLI found it far more critical for their workflow. Product Engineering teams do not routinely have access to the support UI but some found it necessary to use Crossbow for their jobs and users wanted to be able to automate commands with shell scripts.</p><p>Technically, the UI was in JavaScript with an <a href="https://www.cloudflare.com/learning/security/api/what-is-an-api-gateway/">API Gateway</a> service that converted HTTP requests to gRPC alongside some configuration to allow it to work in the support UI. The CLI directly interfaced with the gRPC API so it was a simpler system. Given the Cloudflare Support Operations team primarily works on Systems Engineering projects and had limited UI resources, the decision to deprecate the UI was also in our own interest.</p><p>We rolled out a new internal Crossbow user group, trained up teams and created new documentation, provided advance notification of deprecation and abrogated the source code of these services. We also dramatically improved the user experience when using the CLI for users through simple improvements to the help information and easier CLI usage.</p>
    <div>
      <h3>Rearchitecting Pub/Sub with Cloudflare Access</h3>
      <a href="#rearchitecting-pub-sub-with-cloudflare-access">
        
      </a>
    </div>
    <p>One of the primary challenges we encountered was how the system architecture for Crossbow was designed many years ago. A gRPC API ran commands at Cloudflare’s edge network using a configuration management tool which the SRE team expressed a desire to deprecate (with Crossbow being the last user of it).</p><p>During a visit to the Singapore Office, the Edge SRE Engineering Manager locally wanted his team to understand Crossbow and how to contribute. During this meeting, we provided an overview of the current architecture and the team there were forthcoming in providing potential refactoring ideas to handle global network stability and move away from the old pipeline. This provided invaluable insight into the common issues experienced between technical approaches and instances of where the tool would fail requiring Technical Support Engineers to consult the SRE team.</p><p>We decided to adopt a more simple pub/sub pipeline, instead the edge network would expose a gRPC daemon that would listen for new jobs and execute them and then make a callback to the API service with the results (which would be relayed onto the client).</p><p>For authentication between the API service and the client or the API service and the network edge, we implemented a <a href="https://developers.cloudflare.com/access/setting-up-access/json-web-token/">JWT authentication</a> scheme. For a CLI user, the authentication was done by querying an HTTP endpoint behind Cloudflare Access <a href="https://developers.cloudflare.com/access/cli/connecting-from-cli/">using cloudflared</a>, which provided a JWT the client could use for <a href="https://grpc.io/docs/guides/auth/">authentication with gRPC</a>. In practice, this looks something like this:</p><ol><li><p>CLI makes request to authentication server using cloudflared</p></li><li><p>Authentication server responds with signed JWT token</p></li><li><p>CLI makes gRPC request with JWT authentication token to API service</p></li><li><p>API service validates token using a public key</p></li></ol><p>The gRPC API endpoint was placed on <a href="https://www.cloudflare.com/products/cloudflare-spectrum/">Cloudflare Spectrum</a>; as users were authenticated using Cloudflare Access, we could remove the requirement for users to be on the company VPN to use the tool. The new authentication pipeline, combined with a single user interface, also allowed us to improve the collection of metrics and usage logs of the tool.</p>
    <div>
      <h3>Risk Management</h3>
      <a href="#risk-management">
        
      </a>
    </div>
    <blockquote><p>Risk is inherent in the activities undertaken by engineering professionals, meaning that members of the profession have a significant role to play in managing and limiting it.
- <a href="https://www.engc.org.uk/standards-guidance/guidance/guidance-on-risk/">Guidance on Risk</a>, Engineering Council</p></blockquote><p>As with all engineering projects, it was critical to manage risk. However, the risk to manage is different for different engineering projects. Availability wasn’t the largest factor, given that Technical Support Engineers could escalate issues to the SRE team if the tool wasn’t available. The main risk was security of the Cloudflare network and ensuring Crossbow did not affect the availability of any other services. To this end we took methodical steps to improve isolation and engaged the InfoSec team early to assist with specification and code reviews of the new pipeline. Where a risk to availability existed, we ensured this was properly communicated to the support team and the internal Crossbow user group to communicate the risk/reward that existed.</p>
    <div>
      <h3>Feedback, Build, Refactor, Measure</h3>
      <a href="#feedback-build-refactor-measure">
        
      </a>
    </div>
    <p>The Support Operations team at Cloudflare works using a methodology based on Extreme Programming. A key tenant of Extreme Programming is that of Test Driven Development, this is often described as a “red-green-green” pattern or “<a href="https://www.codecademy.com/articles/tdd-red-green-refactor">red-green-refactor</a>”. First the engineer enshrines the requirements in tests, then they make those tests pass and then refactor to improve code quality before pushing the software.</p><p>As we took on this project, the Cloudflare Support and SRE teams were working on Project Baton - an effort to allow Technical Support Engineers to handle more customer escalations without handover to the SRE teams.</p><p>As part of this effort, they had already created an invaluable resource in the form of a feature wish list for Crossbow. We associated JIRAs with all these items and prioritised this work to deliver such feature requests using a Test Driven Development workflow and the introduction of Continuous Integration. Critically we measured such improvements once deployed. Adding simple functionality like support for MTR (a Linux network diagnostic tool) and exposing support for different cURL flags provided improvements in usage.</p><p>We were also able to embed Crossbow support for other tools available at the network edge created by other teams, allowing them to maintain such tools and expose features to Crossbow users. Through the creation of an improved development environment and documentation, we were able to drive Product Engineering teams to contribute functionality that was in the mutual interest of them and the customer support team.</p><p>Finally, we owned a number of tools which were used by Technical Support Engineers to discover what Cloudflare configuration was applied to a given URL and performing distributed performance testing, we deprecated these tools and rolled them into Crossbow. Another tool owned by the <a href="https://workers.cloudflare.com/">Cloudflare Workers</a> team, called Edge Worker Debug was rolled into Crossbow and the team deprecated their tool.</p>
    <div>
      <h3>Results</h3>
      <a href="#results">
        
      </a>
    </div>
    <p>From implementing user analytics on the tool on the 16 December 2019 to the week ending the 22 January 2020, we found a found usage increase of 4.5x. This growth primarily happened within a 4 week period; by adding the most wanted functionality, we were able to achieve a critical saturation of usage amongst Technical Support Engineers.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/48JaaCMLrwL9EWw9WtDNP5/4958cf9e7a94f0759983f475ca168d4b/image1.png" />
          </figure><p>Beyond this point, it became critical to use the number of checks being run as a metric to evaluate how useful the tool was. For example, only the week starting January 27 saw no meaningful increase in unique users (a 14% usage increase over the previous week - within the normal fluctuation of stable usage). However, over the same timeframe, we saw a 2.6x increase in the number of tests being run - coinciding with introduction of a number of new high-usage functionalities.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1ndjgHwGG3rVpuwOczuc6L/8158b2a71dbf87aa7dfeaf4650118ed8/pasted-image-0--6-.png" />
          </figure>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Through removing low-value/high-maintenance functionality and merciless refactoring, we were dramatically able to improve the quality of Crossbow and therefore improve the velocity of delivery. We were able to dramatically improve usage through enabling functionality to measure usage, receive feature requests in feedback loops with users and test-driven development. Consolidation of tooling reduced overhead of developing support tooling across the business, providing a common framework for developing and exposing functionality for Technical Support Engineers.</p><p>There are two key counterintuitive learnings from this project. The first is that cutting functionality can drive usage, providing this is done intelligently. In our case, the web UI contained no additional functionality that wasn’t in the CLI, yet caused substantial engineering overhead for maintenance. By deprecating this functionality, we were able to reduce technical debt and thereby improve the velocity of delivering more important functionality. This effort requires effective communication of the decision making process and involvement from those who are impacted by such a decision.</p><p>Secondly, tool development efforts are often focussed by user feedback but lack a means of objectively measuring such improvements. When logging is added, it is often done purely for security and audit logging purposes. Whilst feedback loops with users are invaluable, it is critical to have an objective measure of how successful such a feature is and how it is used. Effective measurement drives the decision making process of future tooling and therefore, in the long run, the usage data can be more important than the original feature itself.</p><p>If you're interested in debugging interesting technical problems on a network with these tools, we're hiring for <a href="https://www.cloudflare.com/careers/jobs/?department=Customer+Support">Support Engineers</a> (including Security Operations, Technical Support and Support Operations Engineering) in San Francisco, Austin, Champaign, London, Lisbon, Munich and Singapore.</p> ]]></content:encoded>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[Cloudflare Access]]></category>
            <category><![CDATA[Support]]></category>
            <category><![CDATA[Spectrum]]></category>
            <guid isPermaLink="false">17EMKPLIbfOeVwXlT4cDK8</guid>
            <dc:creator>Junade Ali</dc:creator>
            <dc:creator>Peter Weaver</dc:creator>
        </item>
        <item>
            <title><![CDATA[When Bloom filters don't bloom]]></title>
            <link>https://blog.cloudflare.com/when-bloom-filters-dont-bloom/</link>
            <pubDate>Mon, 02 Mar 2020 13:00:00 GMT</pubDate>
            <description><![CDATA[ Last month finally I had an opportunity to use Bloom filters. I became fascinated with the promise of this data structure, but I quickly realized it had some drawbacks. This blog post is the tale of my brief love affair with Bloom filters. ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4bQ9cbvVLCJTUntwCHSGSp/570583980831b19e4da88411fdff5eda/bloom-filter_2x.png" />
            
            </figure><p>I've known about <a href="https://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</a> (named after Burton Bloom) since university, but I haven't had an opportunity to use them in anger. Last month this changed - I became fascinated with the promise of this data structure, but I quickly realized it had some drawbacks. This blog post is the tale of my brief love affair with Bloom filters.</p><p>While doing research about <a href="/the-root-cause-of-large-ddos-ip-spoofing/">IP spoofing</a>, I needed to examine whether the source IP addresses extracted from packets reaching our servers were legitimate, depending on the geographical location of our data centers. For example, source IPs belonging to a legitimate Italian ISP should not arrive in a Brazilian datacenter. This problem might sound simple, but in the ever-evolving landscape of the internet this is far from easy. Suffice it to say I ended up with many large text files with data like this:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4EyhhPLD0IymvVMr3R8pMz/2c30f18b77fdee9184c438f80e94781b/Screenshot-from-2020-03-01-23-57-10.png" />
            
            </figure><p>This reads as: the IP 192.0.2.1 was recorded reaching Cloudflare data center number 107 with a legitimate request. This data came from many sources, including our active and passive probes, logs of certain domains we own (like cloudflare.com), public sources (like BGP table), etc. The same line would usually be repeated across multiple files.</p><p>I ended up with a gigantic collection of data of this kind. At some point I counted 1 billion lines across all the harvested sources. I usually write bash scripts to pre-process the inputs, but at this scale this approach wasn't working. For example, removing duplicates from this tiny file of a meager 600MiB and 40M lines, took... about an eternity:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7lzqpR8gHJRr6XC4fbcepJ/91185ef0b6036ecd484bd791f5a72651/Screenshot-from-2020-03-01-23-25-19a.png" />
            
            </figure><p>Enough to say that deduplicating lines using the usual bash commands like 'sort' in various configurations (see '--parallel', '--buffer-size' and '--unique') was not optimal for such a large data set.</p>
    <div>
      <h2>Bloom filters to the rescue</h2>
      <a href="#bloom-filters-to-the-rescue">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/vR5EB9TdKmVJDjStuvpwL/fe51959f24694cc5c072f3263aa42ab0/Bloom_filter.png" />
            
            </figure><p><a href="https://en.wikipedia.org/wiki/Bloom_filter#/media/File:Bloom_filter.svg">Image</a> by <a href="https://commons.wikimedia.org/wiki/User:David_Eppstein">David Eppstein</a> Public Domain</p><p>Then I had a brainwave - it's not necessary to sort the lines! I just need to remove duplicated lines - using some kind of "set" data structure should be much faster. Furthermore, I roughly know the cardinality of the input file (number of unique lines), and I can live with some data points being lost - using a probabilistic data structure is fine!</p><p>Bloom-filters are a perfect fit!</p><p>While you should go and read <a href="https://en.wikipedia.org/wiki/Bloom_filter#Algorithm_description">Wikipedia on Bloom Filters</a>, here is how I look at this data structure.</p><p>How would you implement a "<a href="https://en.wikipedia.org/wiki/Set_(abstract_data_type)">set</a>"? Given a perfect hash function, and infinite memory, we could just create an infinite bit array and set a bit number 'hash(item)' for each item we encounter. This would give us a perfect "set" data structure. Right? Trivial. Sadly, hash functions have collisions and infinite memory doesn't exist, so we have to compromise in our reality. But we can calculate and manage the probability of collisions. For example, imagine we have a good hash function, and 128GiB of memory. We can calculate the probability of the second item added to the bit array colliding would be 1 in 1099511627776. The probability of collision when adding more items worsens as we fill up the bit array.</p><p>Furthermore, we could use more than one hash function, and end up with a denser bit array. This is exactly what Bloom filters optimize for. A Bloom filter is a bunch of math on top of the four variables:</p><ul><li><p>'n' - The number of input elements (cardinality)</p></li><li><p>'m' - Memory used by the bit-array</p></li><li><p>'k' - Number of hash functions counted for each input</p></li><li><p>'p' - Probability of a false positive match</p></li></ul><p>Given the 'n' input cardinality and the 'p' desired probability of false positive, the Bloom filter math returns the 'm' memory required and 'k' number of hash functions needed.</p><p>Check out this excellent visualization by Thomas Hurst showing how parameters influence each other:</p><ul><li><p><a href="https://hur.st/bloomfilter/">https://hur.st/bloomfilter/</a></p></li></ul>
    <div>
      <h2>mmuniq-bloom</h2>
      <a href="#mmuniq-bloom">
        
      </a>
    </div>
    <p>Guided by this intuition, I set out on a journey to add a new tool to my toolbox - 'mmuniq-bloom', a probabilistic tool that, given input on STDIN, returns only unique lines on STDOUT, hopefully much faster than 'sort' + 'uniq' combo!</p><p>Here it is:</p><ul><li><p><a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2020-02-mmuniq/mmuniq-bloom.c">'mmuniq-bloom.c'</a></p></li></ul><p>For simplicity and speed I designed 'mmuniq-bloom' with a couple of assumptions. First, unless otherwise instructed, it uses 8 hash functions k=8. This seems to be a close to optimal number for the data sizes I'm working with, and the hash function can quickly output 8 decent hashes. Then we align 'm', number of bits in the bit array, to be a power of two. This is to avoid the pricey % modulo operation, which compiles down to slow assembly 'div'. With power-of-two sizes we can just do bitwise AND. (For a fun read, see <a href="https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi">how compilers can optimize some divisions by using multiplication by a magic constant</a>.)</p><p>We can now run it against the same data file we used before:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7eWA2sCxdTwbHW0JVE4dLm/ffffd6f6823305f41a15d0091d1acaef/image11.png" />
            
            </figure><p>Oh, this is so much better! 12 seconds is much more manageable than 2 minutes before. But hold on... The program is using an optimized data structure, relatively limited memory footprint, optimized line-parsing and good output buffering... 12 seconds is still eternity compared to 'wc -l' tool:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/10yS38GPKuBTqMMd4ZJVSe/199b9ceea61eb91700012362ef92e0ab/image5.png" />
            
            </figure><p>What is going on? I understand that counting lines by 'wc' is <i>easier</i> than figuring out unique lines, but is it really worth the 26x difference? Where does all the CPU in 'mmuniq-bloom' go?</p><p>It must be my hash function. 'wc' doesn't need to spend all this CPU performing all this strange math for each of the 40M lines on input. I'm using a pretty non-trivial 'siphash24' hash function, so it surely burns the CPU, right? Let's check by running the code computing hash function but <i>not</i> doing any Bloom filter operations:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4cmgeZbT0nJNh9bwRC4gEH/b5a71c8c573d506f20f6b6bc01173028/image2.png" />
            
            </figure><p>This is strange. Counting the hash function indeed costs about 2s, but the program took 12s in the previous run. The Bloom filter alone takes 10 seconds? How is that possible? It's such a simple data structure...</p>
    <div>
      <h2>A secret weapon - a profiler</h2>
      <a href="#a-secret-weapon-a-profiler">
        
      </a>
    </div>
    <p>It was time to use a proper tool for the task - let's fire up a profiler and see where the CPU goes. First, let's fire an 'strace' to confirm we are not running any unexpected syscalls:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/29N25pGnOTso9WBxn4SckO/4a8bbb0dccf4dfb5fd2a248625d29bd9/image14.png" />
            
            </figure><p>Everything looks good. The 10 calls to 'mmap' each taking 4ms (3971 us) is intriguing, but it's fine. We pre-populate memory up front with 'MAP_POPULATE' to save on page faults later.</p><p>What is the next step? Of course Linux's 'perf'!</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/632ygHAtmgUaNH18WsUPAG/b9f3990786d96bf84b2bc28d5a55f0e3/image10.png" />
            
            </figure><p>Then we can see the results:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/61GXvk0ujwFsP0vEDyrD6c/ab4ec96b0e748f5dd8e6e496d1b499b4/image6.png" />
            
            </figure><p>Right, so we indeed burn 87.2% of cycles in our hot code. Let's see where exactly. Doing 'perf annotate process_line --source' quickly shows something I didn't expect.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3i10E5RjlO3vvJTuA751wv/da7db1a7614e3936218206689767e752/image3.png" />
            
            </figure><p>You can see 26.90% of CPU burned in the 'mov', but that's not all of it! The compiler correctly inlined the function, and unrolled the loop 8-fold. Summed up that 'mov' or 'uint64_t v = *p' line adds up to a great majority of cycles!</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1sTjVPwEz6BILccJz8pdDa/2e3435805a1e092460dcb916f4d6ecb2/image4.png" />
            
            </figure><p>Clearly 'perf' must be mistaken, how can such a simple line cost so much? We can repeat the benchmark with any other profiler and it will show us the same problem. For example, I like using 'google-perftools' with kcachegrind since they emit eye-candy charts:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7fWWh8EZelfOMfwA5XMT16/4ded280402c936f2a6f67e2930115f5e/Screenshot-from-2020-03-02-00-08-23.png" />
            
            </figure><p>The rendered result looks like this:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2p5hkIX3zVAO7IFFWEVZuF/14f1f9505b91e15c904dab7309a11907/image13.png" />
            
            </figure><p>Allow me to summarise what we found so far.</p><p>The generic 'wc' tool takes 0.45s CPU time to process 600MiB file. Our optimized 'mmuniq-bloom' tool takes 12 seconds. CPU is burned on one 'mov' instruction, dereferencing memory....</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1eRwMPBq6BEAyWB0g10EDo/3802664ab1b47d94b96845f487338a08/6784957048_4661ea7dfc_c.jpg" />
            
            </figure><p><a href="https://flickr.com/photos/jonicdao/6784957048">Image</a> by <a href="https://flickr.com/photos/jonicdao/">Jose Nicdao</a> CC BY/2.0</p><p>Oh! I how could I have forgotten. Random memory access <i>is</i> slow! It's very, very, very slow!</p><p>According to the general rule <a href="http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html">"latency numbers every programmer should know about"</a>, one RAM fetch is about 100ns. Let's do the math: 40 million lines, 8 hashes counted for each line. Since our Bloom filter is 128MiB, on <a href="/gen-x-performance-tuning/">our older hardware</a> it doesn't fit into L3 cache! The hashes are uniformly distributed across the large memory range - each hash generates a memory miss. Adding it together that's...</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/ORPcRAqG2H2xqeEEdGbmh/ef6968c82bfe0aa44706a4f36e59bb1c/Screenshot-from-2020-03-02-00-34-29.png" />
            
            </figure><p>That suggests 32 seconds burned just on memory fetches. The real program is faster, taking only 12s. This is because, although the Bloom filter data does not completely fit into L3 cache, it still gets some benefit from caching. It's easy to see with 'perf stat -d':</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7pgVS7zKpVhueO7rAhQDRI/5145db41e7b06f5875d1da4adfa92142/image9.png" />
            
            </figure><p>Right, so we should have had at least 320M LLC-load-misses, but we had only 280M. This still doesn't explain why the program was running only 12 seconds. But it doesn't really matter. What matters is that the number of cache misses is a real problem and we can only fix it by reducing the number of memory accesses. Let's try tuning Bloom filter to use only one hash function:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/30BFuINOqYvVgCCdYXCzUB/0893742e7bc41b923af6e67f53629049/image12.png" />
            
            </figure><p>Ouch! That really hurt! The Bloom filter required 64 GiB of memory to get our desired false positive probability ratio of 1-error-per-10k-lines. This is terrible!</p><p>Also, it doesn't seem like we improved much. It took the OS 22 seconds to prepare memory for us, but we still burned 11 seconds in userspace. I guess this time any benefits from hitting memory less often were offset by lower cache-hit probability due to drastically increased memory size. In previous runs we required only 128MiB for the Bloom filter!</p>
    <div>
      <h2>Dumping Bloom filters altogether</h2>
      <a href="#dumping-bloom-filters-altogether">
        
      </a>
    </div>
    <p>This is getting ridiculous. To get the same false positive guarantees we either must use many hashes in Bloom filter (like 8) and therefore many memory operations, or we can have 1 hash function, but enormous memory requirements.</p><p>We aren't really constrained by available memory, instead we want to optimize for reduced memory accesses. All we need is a data structure that requires at most 1 memory miss per item, and use less than 64 Gigs of RAM...</p><p>While we could think of more sophisticated data structures like <a href="https://en.wikipedia.org/wiki/Cuckoo_filter">Cuckoo filter</a>, maybe we can be simpler. How about a good old simple hash table with linear probing?</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6PVz5hd2DqyiraxgMJ7KIp/37706f6ffcc1cd52f544d626381deaeb/linear-probing.png" />
            
            </figure><p><a href="https://www.sysadmins.lv/blog-en/array-search-hash-tables-behind-the-scenes.aspx">Image</a> by <a href="https://www.sysadmins.lv/about.aspx">Vadims Podāns</a></p>
    <div>
      <h2>Welcome mmuniq-hash</h2>
      <a href="#welcome-mmuniq-hash">
        
      </a>
    </div>
    <p>Here you can find a tweaked version of mmuniq-bloom, but using hash table:</p><ul><li><p><a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2020-02-mmuniq/mmuniq-hash.c">'mmuniq-hash.c'</a></p></li></ul><p>Instead of storing bits as for the Bloom-filter, we are now storing 64-bit hashes from the <a href="https://idea.popcount.org/2013-01-24-siphash/">'siphash24' function</a>. This gives us much stronger probability guarantees, with probability of false positives much better than one error in 10k lines.</p><p>Let's do the math. Adding a new item to a hash table containing, say 40M, entries has '40M/2^64' chances of hitting a hash collision. This is about one in 461 billion - a reasonably low probability. But we are not adding one item to a pre-filled set! Instead we are adding 40M lines to the initially empty set. As per <a href="https://en.wikipedia.org/wiki/Birthday_problem">birthday paradox</a> this has much higher chances of hitting a collision at some point. A decent approximation is 'n^2/2m', which in our case is '(40M<sup>2)/(2*(2</sup>64))'. This is a chance of one in 23000. In other words, assuming we are using good hash function, every one in 23 thousand random sets of 40M items, will have a hash collision. This practical chance of hitting a collision is non-negligible, but it's still better than a Bloom filter and totally acceptable for my use case.</p><p>The hash table code runs faster, has better memory access patterns and better false positive probability than the Bloom filter approach.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7i3tnYhlVqJ7NEfutgywVD/dafe27be80727cd61533f548457bc4a3/image7.png" />
            
            </figure><p>Don't be scared about the "hash conflicts" line, it just indicates how full the hash table was. We are using linear probing, so when a bucket is already used, we just pick up the next empty bucket. In our case we had to skip over 0.7 buckets on average to find an empty slot in the table. This is fine and, since we iterate over the buckets in linear order, we can expect the memory to be nicely prefetched.</p><p>From the previous exercise we know our hash function takes about 2 seconds of this. Therefore, it's fair to say 40M memory hits take around 4 seconds.</p>
    <div>
      <h2>Lessons learned</h2>
      <a href="#lessons-learned">
        
      </a>
    </div>
    <p>Modern CPUs are really good at sequential memory access when it's possible to predict memory fetch patterns (see <a href="https://en.wikipedia.org/wiki/Cache_prefetching#Methods_of_hardware_prefetching">Cache prefetching</a>). Random memory access on the other hand is very costly.</p><p>Advanced data structures are very interesting, but beware. Modern computers require cache-optimized algorithms. When working with large datasets, not fitting L3, prefer optimizing for reduced number loads, over optimizing the amount of memory used.</p><p>I guess it's fair to say that Bloom filters are great, as long as they fit into the L3 cache. The moment this assumption is broken, they are terrible. This is not news, Bloom filters optimize for memory usage, not for memory access. For example, see <a href="https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf">the Cuckoo Filters paper</a>.</p><p>Another thing is the ever-lasting discussion about hash functions. Frankly - in most cases it doesn't matter. The cost of counting even complex hash functions like 'siphash24' is small compared to the cost of random memory access. In our case simplifying the hash function will bring only small benefits. The CPU time is simply spent somewhere else - waiting for memory!</p><p>One colleague often says: "You can assume modern CPUs are infinitely fast. They run at infinite speed until they <a href="http://www.di-srv.unisa.it/~vitsca/SC-2011/DesignPrinciplesMulticoreProcessors/Wulf1995.pdf">hit the memory wall</a>".</p><p>Finally, don't follow my mistakes - everyone should start profiling with 'perf stat -d' and look at the "Instructions per cycle" (IPC) counter. If it's below 1, it generally means the program is stuck on waiting for memory. Values above 2 would be great, it would mean the workload is mostly CPU-bound. Sadly, I'm yet to see high values in the workloads I'm dealing with...</p>
    <div>
      <h2>Improved mmuniq</h2>
      <a href="#improved-mmuniq">
        
      </a>
    </div>
    <p>With the help of my colleagues I've prepared a further improved version of the 'mmuniq' hash table based tool. See the code:</p><ul><li><p><a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2020-02-mmuniq/mmuniq.c">'mmuniq.c'</a></p></li></ul><p>It is able to dynamically resize the hash table, to support inputs of unknown cardinality. Then, by using batching, it can effectively use the 'prefetch' CPU hint, speeding up the program by 35-40%. Beware, sprinkling the code with 'prefetch' rarely works. Instead, I specifically changed the flow of algorithms to take advantage of this instruction. With all the improvements I got the run time down to 2.1 seconds:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6cnwoob2q1SVG27I7zqals/87621808c31bd4707e08e816a4a58480/Screenshot-from-2020-03-01-23-52-18.png" />
            
            </figure>
    <div>
      <h2>The end</h2>
      <a href="#the-end">
        
      </a>
    </div>
    <p>Writing this basic tool which tries to be faster than 'sort | uniq' combo revealed some hidden gems of modern computing. With a bit of work we were able to speed it up from more than two minutes to 2 seconds. During this journey we learned about random memory access latency, and the power of cache friendly data structures. Fancy data structures are exciting, but in practice reducing random memory loads often brings better results.</p> ]]></content:encoded>
            <category><![CDATA[Deep Dive]]></category>
            <category><![CDATA[Hardware]]></category>
            <category><![CDATA[Optimization]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Tools]]></category>
            <guid isPermaLink="false">3CPWTXjZJXbtWVNIawBWsd</guid>
            <dc:creator>Marek Majkowski</dc:creator>
        </item>
        <item>
            <title><![CDATA[Three little tools: mmsum, mmwatch, mmhistogram]]></title>
            <link>https://blog.cloudflare.com/three-little-tools-mmsum-mmwatch-mmhistogram/</link>
            <pubDate>Tue, 04 Jul 2017 10:32:20 GMT</pubDate>
            <description><![CDATA[ In a recent blog post, my colleague Marek talked about some SSDP-based DDoS activity we'd been seeing recently. In that blog post he used a tool called mmhistogram to output an ASCII histogram. ]]></description>
            <content:encoded><![CDATA[ <p>In a recent blog post, my colleague <a href="/author/marek-majkowski/">Marek</a> talked about some <a href="/ssdp-100gbps/">SSDP-based DDoS</a> activity we'd been seeing recently. In that blog post he used a tool called <code>mmhistogram</code> to output an ASCII histogram.</p><p>That tool is part of a small suite of command-line tools that can be handy when messing with data. Since a reader asked for them to be open sourced... here they are.</p>
    <div>
      <h3>mmhistogram</h3>
      <a href="#mmhistogram">
        
      </a>
    </div>
    <p>Suppose you have the following CSV of the ages of major Star Wars characters at the time of Episode IV:</p>
            <pre><code>Anakin Skywalker (Darth Vader),42
Boba Fett,32
C-3PO,32
Chewbacca,200
Count Dooku,102
Darth Maul,54
Han Solo,29
Jabba the Hutt,600
Jango Fett,66
Jar Jar Binks,52
Lando Calrissian,31
Leia Organa (Princess Leia),19
Luke Skywalker,19
Mace Windu,72
Obi-Wan Kenobi,57
Palpatine,82
Qui-Gon Jinn,92
R2-D2,32
Shmi Skywalker,72
Wedge Antilles,21
Yoda,896</code></pre>
            <p>You can get an ASCII histogram of the ages as follows using the <code>mmhistogram</code> tool.</p>
            <pre><code>$ cut -d, -f2 epiv | mmhistogram -t "Age"
Age min:19.00 avg:123.90 med=54.00 max:896.00 dev:211.28 count:21
Age:
 value |-------------------------------------------------- count
     0 |                                                   0
     1 |                                                   0
     2 |                                                   0
     4 |                                                   0
     8 |                                                   0
    16 |************************************************** 8
    32 |                         ************************* 4
    64 |             ************************************* 6
   128 |                                            ****** 1
   256 |                                                   0
   512 |                                      ************ 2</code></pre>
            <p>Handy for getting a quick sense of the data. (These charts are inspired by the <a href="/revenge-listening-sockets/">ASCII output from systemtap</a>).</p>
    <div>
      <h3>mmwatch</h3>
      <a href="#mmwatch">
        
      </a>
    </div>
    <p>The <code>mmwatch</code> tool is handy if you want to look at output from a command-line tool that provides some snapshot of values, but need to have a rate.</p><p>For example, here's <code>df -H</code> on my machine:</p>
            <pre><code>$ df -H
Filesystem             Size   Used  Avail Capacity  iused   ifree %iused  Mounted on
/dev/disk1             250G   222G    28G    89% 54231161 6750085   89%   /
devfs                  384k   384k     0B   100%     1298       0  100%   /dev
map -hosts             0B     0B     0B   100%        0       0  100%   /net
map auto_home          0B     0B     0B   100%        0       0  100%   /home
/dev/disk4             7.3G    50M   7.2G     1%    12105 1761461    1%   
/Volumes/LANGDON</code></pre>
            <p>Now imagine you were interested in understanding the rate of change in iused and ifree. You can with <code>mmwatch</code>. It's just like <code>watch</code> but looks for changing numbers and interprets them as rates:</p>
            <pre><code>$ mmwatch 'df -H'</code></pre>
            <p>Here's a short GIF showing it working:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7n0JNaLVIvtD7cqC9kfasI/c1626e6be3fb8bf8b5086ddf7ecde187/mmwatch.gif" />
            
            </figure>
    <div>
      <h3>mmsum</h3>
      <a href="#mmsum">
        
      </a>
    </div>
    <p>And the final tool is <code>mmsum</code> that simply sums a list of floating point numbers (one per line).</p><p>Suppose you are downloading real-time rainfall data from the UK's Environment Agency and would like to know the total current rainfall. <code>mmsum</code> can help:</p>
            <pre><code>$ curl -s 'https://environment.data.gov.uk/flood-monitoring/id/measures?parameter=rainfall' | jq -e '.items[].latestReading.value+0' | ./mmsum
40.2</code></pre>
            <p>All these tools can be found on the Cloudflare <a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2017-06-29-ssdp/">Github</a>.</p> ]]></content:encoded>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[ASCII]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Reliability]]></category>
            <guid isPermaLink="false">4y5kxTJk106JdQKEiQDj95</guid>
            <dc:creator>John Graham-Cumming</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building the simplest Go static analysis tool]]></title>
            <link>https://blog.cloudflare.com/building-the-simplest-go-static-analysis-tool/</link>
            <pubDate>Wed, 27 Apr 2016 15:01:15 GMT</pubDate>
            <description><![CDATA[ Go native vendoring (a.k.a. GO15VENDOREXPERIMENT) allows you to freeze dependencies by putting them in a vendor folder in your project. The compiler will then look there before searching the GOPATH. ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://docs.google.com/document/d/1Bz5-UB7g2uPBdOx-rw5t9MxJwkfpx90cqG9AFL0JAYo/edit">Go native vendoring</a> (a.k.a. GO15VENDOREXPERIMENT) allows you to freeze dependencies by putting them in a <code>vendor</code> folder in your project. The compiler will then look there before searching the GOPATH.</p><p>The only annoyance compared to using a per-project GOPATH, which is what we used to do, is that you might forget to vendor a package that you have in your GOPATH. The program will build for you, but it won't for anyone else. Back to the <a href="https://www.urbandictionary.com/define.php?term=wfm">WFM</a> times!</p><p>I decided I wanted something, a tool, to check that all my (non-stdlib) dependencies were vendored.</p><p>At first I thought of using <a href="https://golang.org/cmd/go/#hdr-List_packages"><code>go list</code></a>, which Dave Cheney appropriately called a <a href="http://dave.cheney.net/2014/09/14/go-list-your-swiss-army-knife">swiss army knife</a>, but while it can show the entire recursive dependency tree (format <code>.Deps</code>), there's no way to know from the templating engine if a dependency is in the standard library.</p><p>We could just pass each output back into <code>go list</code> to check for <code>.Standard</code>, but I thought this would be a good occasion to build a very simple static analysis tool. Go's simplicity and libraries make it a very easy task, as you will see.</p>
    <div>
      <h3>First, loading the program</h3>
      <a href="#first-loading-the-program">
        
      </a>
    </div>
    <p>We use <a href="https://godoc.org/golang.org/x/tools/go/loader"><code>golang.org/x/tools/go/loader</code></a> to load the packages passed as arguments on the command line, including the test files based on a flag.</p>
            <pre><code>var conf loader.Config
for _, p := range flag.Args() {
    if *tests {
        conf.ImportWithTests(p)
    } else {
        conf.Import(p)
    }
}
prog, err := conf.Load()
if err != nil {
    log.Fatal(err)
}
for p := range prog.AllPackages {
    fmt.Println(p.Path())
}</code></pre>
            <p>With these few lines we already replicated <code>go list -f {{ .Deps }}</code>!</p><p>The only missing loading feature here is wildcard (<code>./...</code>) support. That code <a href="https://github.com/golang/go/blob/87bca88c703c1f14fe8473dc2f07dc521cf2b989/src/cmd/go/main.go#L365">is in the go tool source</a> and it's unexported. There's an <a href="https://github.com/golang/go/issues/8768">issue</a> about exposing it, but for now packages <a href="https://github.com/golang/lint/blob/58f662d2fc0598c6c36a92ae29af1caa6ec89d7a/golint/import.go">are just copy-pasting it</a>. We'll use a packaged version of that code, <a href="https://github.com/kisielk/gotool"><code>github.com/kisielk/gotool</code></a>:</p>
            <pre><code>for _, p := range gotool.ImportPaths(flag.Args()) {</code></pre>
            <p>Finally, since we are only interested in the dependency tree today we instruct the parser to only go as far as the imports statements and we ignore the resulting "not used" errors:</p>
            <pre><code>conf.ParserMode = parser.ImportsOnly
conf.AllowErrors = true
conf.TypeChecker.Error = func(error) {}</code></pre>
            
    <div>
      <h3>Then, the actual logic</h3>
      <a href="#then-the-actual-logic">
        
      </a>
    </div>
    <p>We now have a <code>loader.Program</code> object, which holds references to various <code>loader.PackageInfo</code> objects, which in turn are a combination of package, AST and types information. All you need to perform any kind of complex analysis. Not that we are going to do that today :)</p><p>We'll just replicate <a href="https://github.com/golang/go/blob/87bca88c703c1f14fe8473dc2f07dc521cf2b989/src/cmd/go/pkg.go#L183-L194">the <code>go list</code> logic to recognize stdlib packages</a> and remove the packages passed on the command line from the list:</p>
            <pre><code>initial := make(map[*loader.PackageInfo]bool)
for _, pi := range prog.InitialPackages() {
    initial[pi] = true
}

var packages []*loader.PackageInfo
for _, pi := range prog.AllPackages {
    if initial[pi] {
        continue
    }
    if len(pi.Files) == 0 {
        continue // virtual stdlib package
    }
    filename := prog.Fset.File(pi.Files[0].Pos()).Name()
    if !strings.HasPrefix(filename, build.Default.GOROOT) ||
        !isStandardImportPath(pi.Pkg.Path()) {
        packages = append(packages, pi)
    }
}</code></pre>
            <p>Then we just have to print a warning if any remaining package is not in a <code>/vendor/</code> folder:</p>
            <pre><code>for _, pi := range packages {
    if strings.Index(pi.Pkg.Path(), "/vendor/") == -1 {
        fmt.Println("[!] dependency not vendored:", pi.Pkg.Path())
    }
}</code></pre>
            <p>Done! You can find the tool here: <a href="https://github.com/FiloSottile/vendorcheck">https://github.com/FiloSottile/vendorcheck</a></p>
    <div>
      <h3>Further reading</h3>
      <a href="#further-reading">
        
      </a>
    </div>
    <p><a href="https://github.com/golang/example/tree/master/gotypes#gotypes-the-go-type-checker">This document</a> maintained by Alan Donovan will tell you more than I'll ever know about the static analysis tooling.</p><p>Note that you might be tempted to use <code>go/importer</code> and <code>types.Importer[From]</code> instead of <code>x/go/loader</code>. Don't do that. That doesn't load the source but reads compiled <code>.a</code> files, which <b>can be stale or missing</b>. Static analysis tools that spit out "package not found" for existing packages or, worse, incorrect results because of this are a pet peeve of mine.</p><p><i>If you now feel the urge to write static analysis tools, know that the CloudFlare Go team </i><a href="https://www.cloudflare.com/join-our-team/"><i>is hiring in London, San Francisco and Singapore</i></a><i>!</i></p> ]]></content:encoded>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[Go]]></category>
            <category><![CDATA[Programming]]></category>
            <guid isPermaLink="false">7f5NBXh02bwJ9WyQmBdtZK</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[DNS parser, meet Go fuzzer]]></title>
            <link>https://blog.cloudflare.com/dns-parser-meet-go-fuzzer/</link>
            <pubDate>Thu, 06 Aug 2015 13:40:40 GMT</pubDate>
            <description><![CDATA[ Here at CloudFlare we are heavy users of the github.com/miekg/dns Go DNS library and we make sure to contribute to its development as much as possible. Therefore when Dmitry Vyukov published go-fuzz and started to uncover tens of bugs in the Go standard library, our task was clear. ]]></description>
            <content:encoded><![CDATA[ <p>Here at CloudFlare we are heavy users of the <a href="https://github.com/miekg/dns"><code>github.com/miekg/dns</code></a> Go DNS library and we make sure to contribute to its development as much as possible. Therefore when <a href="https://github.com/dvyukov">Dmitry Vyukov</a> published go-fuzz and started to uncover tens of bugs in the Go standard library, our task was clear.</p>
    <div>
      <h3>Hot Fuzz</h3>
      <a href="#hot-fuzz">
        
      </a>
    </div>
    <p>Fuzzing is the technique of <i>testing software by continuously feeding it inputs that are automatically mutated</i>. For C/C++, the wildly successful <a href="http://lcamtuf.coredump.cx/afl/">afl-fuzz</a> tool by Michał Zalewski uses instrumented source coverage to judge which mutations pushed the program into new paths, <i>eventually hitting many rarely-tested branches</i>.</p><p><a href="https://github.com/dvyukov/go-fuzz"><i>go-fuzz</i></a><i> applies the same technique to Go programs</i>, instrumenting the source by rewriting it (<a href="/go-has-a-debugger-and-its-awesome/">like godebug does</a>). An interesting difference between afl-fuzz and go-fuzz is that the former normally operates on file inputs to unmodified programs, while the latter asks you to <i>write a Go function and passes inputs to that</i>. The former usually forks a new process for each input, the latter keeps calling the function without restarting often.</p><p>There is no strong technical reason for this difference (and indeed afl recently gained the ability to behave like go-fuzz), but it's likely due to the <i>different ecosystems</i> in which they operate: Go programs often expose <i>well-documented, well-behaved APIs</i> which enable the tester to write a good wrapper that doesn't contaminate state across calls. Also, Go programs are often easier to dive into and <i>more predictable</i>, thanks obviously to GC and memory management, but also to the general community repulsion towards unexpected global states and side effects. On the other hand many legacy C code bases are so intractable that the easy and stable file input interface is worth the performance tradeoff.</p><p>Back to our DNS library. RRDNS, our in-house DNS server, uses <code>github.com/miekgs/dns</code> for all its parsing needs, and it has proved to be up to the task. However, it's a bit fragile on the edge cases and has a track record of panicking on malformed packets. Thankfully, this is Go, not <a href="/a-deep-look-at-cve-2015-5477-and-how-cloudflare-virtual-dns-customers-are-protected/">BIND</a> C, and we can afford to <code>recover()</code> panics without worrying about ending up with insane memory states. Here's what we are doing</p>
            <pre><code>func ParseDNSPacketSafely(buf []byte, msg *old.Msg) (err error) {
	defer func() {
		panicked := recover()

		if panicked != nil {
			err = errors.New("ParseError")
		}
	}()

	err = msg.Unpack(buf)

	return
}</code></pre>
            <p>We saw an opportunity to make the library more robust so we wrote this initial simple fuzzing function:</p>
            <pre><code>func Fuzz(rawMsg []byte) int {
    msg := &amp;dns.Msg{}

    if unpackErr := msg.Unpack(rawMsg); unpackErr != nil {
        return 0
    }

    if _, packErr = msg.Pack(); packErr != nil {
        println("failed to pack back a message")
        spew.Dump(msg)
        panic(packErr)
    }

    return 1
}</code></pre>
            <p>To create a corpus of initial inputs we took our stress and regression test suites and used <code>github.com/miekg/pcap</code> to write a file per packet.</p>
            <pre><code>package main

import (
	"crypto/rand"
	"encoding/hex"
	"log"
	"os"
	"strconv"

	"github.com/miekg/pcap"
)

func fatalIfErr(err error) {
	if err != nil {
		log.Fatal(err)
	}
}

func main() {
	handle, err := pcap.OpenOffline(os.Args[1])
	fatalIfErr(err)

	b := make([]byte, 4)
	_, err = rand.Read(b)
	fatalIfErr(err)
	prefix := hex.EncodeToString(b)

	i := 0
	for pkt := handle.Next(); pkt != nil; pkt = handle.Next() {
		pkt.Decode()

		f, err := os.Create("p_" + prefix + "_" + strconv.Itoa(i))
		fatalIfErr(err)
		_, err = f.Write(pkt.Payload)
		fatalIfErr(err)
		fatalIfErr(f.Close())

		i++
	}
}</code></pre>
            
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JJf54JXErYsYRprnQO8nk/aec6c0359fc16d482d96bea2d37e8c3c/11597106396_a1927f8c71_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/jdhancock/11597106396/in/photolist-iENcGh-5GuU8g-4G3SzJ-cybzvf-ej9ytf-5PT2gy-2wCHkp-oTNLKN-4T5TVk-pikg74-64fbtb-64fbny-6iPZrk-6WSbWA-gTwR9P-6JEbMJ-uS5Qoe-p3LoLt-8rTRPb-gzJBbc-6u4Ko7-4uXbz8-bX4rtL-6HoBT8-cybFb7-pDtnkY-doskG2-a9tSqx-3NX4E-978gS2-4iW5fs-4VhKK2-7EpKqc-7EtB6Y-7EtAiN-7EpH54-7EpGgX-poTg8-55WEef-qfzP-dt83Gq-naDJvs-aCKhDG-drR492-aTSFS6-aTSEER-aTSC2t-aTSAUR-qqFXz2-ftsXnd">image</a> by <a href="https://www.flickr.com/photos/jdhancock/">JD Hancock</a></p><p>We then compiled our <code>Fuzz</code> function with go-fuzz, and launched the fuzzer on a lab server. The first thing go-fuzz does is minimize the corpus by throwing away packets that trigger the same code paths, then it starts mutating the inputs and passing them to <code>Fuzz()</code> in a loop. The mutations that don't fail (<code>return 1</code>) and <i>expand code coverage</i> are kept and iterated over. When the program panics, a small report (input and output) is saved and the program restarted. If you want to learn more about go-fuzz watch <a href="https://www.youtube.com/watch?v=a9xrxRsIbSU">the author's GopherCon talk</a> or read <a href="https://github.com/dvyukov/go-fuzz">the README</a>.</p><p><i>Crashes, mostly "index out of bounds", started to surface.</i> go-fuzz becomes pretty slow and ineffective when the program crashes often, so while the CPUs burned I started fixing the bugs.</p><p>In some cases I just decided to change some parser patterns, for example <a href="https://github.com/miekg/dns/commit/b5133fead4c0571c20eea405a917778f011dde02">reslicing and using <code>len()</code> instead of keeping offsets</a>. However these can be potentially disrupting changes—I'm far from perfect—so I adapted the Fuzz function to keep an eye on the differences between the old and new, fixed parser, and crash if the new parser started refusing good packets or changed its behavior:</p>
            <pre><code>func Fuzz(rawMsg []byte) int {
    var (
        msg, msgOld = &amp;dns.Msg{}, &amp;old.Msg{}
        buf, bufOld = make([]byte, 100000), make([]byte, 100000)
        res, resOld []byte

        unpackErr, unpackErrOld error
        packErr, packErrOld     error
    )

    unpackErr = msg.Unpack(rawMsg)
    unpackErrOld = ParseDNSPacketSafely(rawMsg, msgOld)

    if unpackErr != nil &amp;&amp; unpackErrOld != nil {
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErr.Error() == "dns: out of order NSEC block" {
        // 97b0a31 - rewrite NSEC bitmap [un]packing to account for out-of-order
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErr.Error() == "dns: bad rdlength" {
        // 3157620 - unpackStructValue: drop rdlen, reslice msg instead
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErr.Error() == "dns: bad address family" {
        // f37c7ea - Reject a bad EDNS0_SUBNET family on unpack (not only on pack)
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErr.Error() == "dns: bad netmask" {
        // 6d5de0a - EDNS0_SUBNET: refactor netmask handling
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErrOld == nil {
        println("new code fails to unpack valid packets")
        panic(unpackErr)
    }

    res, packErr = msg.PackBuffer(buf)

    if packErr != nil {
        println("failed to pack back a message")
        spew.Dump(msg)
        panic(packErr)
    }

    if unpackErrOld == nil {

        resOld, packErrOld = msgOld.PackBuffer(bufOld)

        if packErrOld == nil &amp;&amp; !bytes.Equal(res, resOld) {
            println("new code changed behavior of valid packets:")
            println()
            println(hex.Dump(res))
            println(hex.Dump(resOld))
            os.Exit(1)
        }

    }

    return 1
}</code></pre>
            <p>I was pretty happy about the robustness gain, but since we used the <code>ParseDNSPacketSafely</code> wrapper in RRDNS I didn't expect to find security vulnerabilities. I was wrong!</p><p>DNS names are made of labels, usually shown separated by dots. In a space saving effort, labels can be replaced by pointers to other names, so that if we know we encoded <code>example.com</code> at offset 15, <code>www.example.com</code> can be packed as <code>www.</code> + <i>PTR(15)</i>. What we found is <a href="https://github.com/FiloSottile/dns/commit/b364f94">a bug in handling of pointers to empty names</a>: when encountering the end of a name (<code>0x00</code>), if no label were read, <code>"."</code> (the empty name) was returned as a special case. Problem is that this special case was unaware of pointers, and it would instruct the parser to resume reading from the end of the pointed-to empty name instead of the end of the original name.</p><p>For example if the parser encountered at offset 60 a pointer to offset 15, and <code>msg[15] == 0x00</code>, parsing would then resume from offset 16 instead of 61, causing a infinite loop. This is a potential Denial of Service vulnerability.</p>
            <pre><code>A) Parse up to position 60, where a DNS name is found

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | -&gt;15 |      |

-------------------------------------------------&gt;     

B) Follow the pointer to position 15

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | -&gt;15 |      |

         ^                                        |
         ------------------------------------------      

C) Return a empty name ".", special case triggers

D) Erroneously resume from position 16 instead of 61

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | -&gt;15 |      |

                 --------------------------------&gt;   

E) Rinse and repeat</code></pre>
            <p>We sent the fixes privately to the library maintainer while we patched our servers and we <a href="https://github.com/miekg/dns/pull/237">opened a PR</a> once done. (Two bugs were independently found and fixed by Miek while we released our RRDNS updates, as it happens.)</p>
    <div>
      <h3>Not just crashes and hangs</h3>
      <a href="#not-just-crashes-and-hangs">
        
      </a>
    </div>
    <p>Thanks to its flexible fuzzing API, go-fuzz lends itself nicely not only to the mere search of crashing inputs, but <i>can be used to explore all scenarios where edge cases are troublesome</i>.</p><p>Useful applications range from checking output validation by adding crashing assertions to your <code>Fuzz()</code> function, to comparing the two ends of a unpack-pack chain and even comparing the behavior of two different versions or implementations of the same functionality.</p><p>For example, while preparing our <a href="/tag/dnssec/">DNSSEC</a> engine for launch, I faced a weird bug that would happen only on production or under stress tests: <i>NSEC records that were supposed to only have a couple bits set in their types bitmap would sometimes look like this</i></p>
            <pre><code>deleg.filippo.io.  IN  NSEC    3600    \000.deleg.filippo.io. NS WKS HINFO TXT AAAA LOC SRV CERT SSHFP RRSIG NSEC TLSA HIP TYPE60 TYPE61 SPF</code></pre>
            <p>The catch was that our "pack and send" code <i>pools </i><code><i>[]byte</i></code><i> buffers to reduce GC and allocation churn</i>, so buffers passed to <code>dns.msg.PackBuffer(buf []byte)</code> can be "dirty" from previous uses.</p>
            <pre><code>var bufpool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 2048)
    },
}

[...]

    data := bufpool.Get().([]byte)
    defer bufpool.Put(data)

    if data, err = r.Response.PackBuffer(data); err != nil {</code></pre>
            <p>However, <code>buf</code> not being an array of zeroes was not handled by some <code>github.com/miekgs/dns</code> packers, including the NSEC rdata one, that would <i>just OR present bits, without clearing ones that are supposed to be absent</i>.</p>
            <pre><code>case `dns:"nsec"`:
    lastwindow := uint16(0)
    length := uint16(0)
    for j := 0; j &lt; val.Field(i).Len(); j++ {
        t := uint16((fv.Index(j).Uint()))
        window := uint16(t / 256)
        if lastwindow != window {
            off += int(length) + 3
        }
        length = (t - window*256) / 8
        bit := t - (window * 256) - (length * 8)

        msg[off] = byte(window) // window #
        msg[off+1] = byte(length + 1) // octets length

        // Setting the bit value for the type in the right octet
---&gt;    msg[off+2+int(length)] |= byte(1 &lt;&lt; (7 - bit)) 

        lastwindow = window
    }
    off += 2 + int(length)
    off++
}</code></pre>
            <p>The fix was clear and easy: we benchmarked a few different ways to zero a buffer and updated the code like this</p>
            <pre><code>// zeroBuf is a big buffer of zero bytes, used to zero out the buffers passed
// to PackBuffer.
var zeroBuf = make([]byte, 65535)

var bufpool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 2048)
    },
}

[...]

    data := bufpool.Get().([]byte)
    defer bufpool.Put(data)
    copy(data[0:cap(data)], zeroBuf)

    if data, err = r.Response.PackBuffer(data); err != nil {</code></pre>
            <p>Note: <a href="https://github.com/golang/go/commit/f03c9202c43e0abb130669852082117ca50aa9b1">a recent optimization</a> turns zeroing range loops into <code>memclr</code> calls, so once 1.5 lands that will be much faster than <code>copy()</code>.</p><p>But this was a boring fix! Wouldn't it be nicer if we could trust our library to work with any buffer we pass it? Luckily, this is exactly what coverage based fuzzing is good for: <i>making sure all code paths behave in a certain way</i>.</p><p>What I did then is write a <code>Fuzz()</code> function that would first parse a message, and then pack it to two different buffers: one filled with zeroes and one filled with <code>0xff</code>. <i>Any differences between the two results would signal cases where the underlying buffer is leaking into the output.</i></p>
            <pre><code>func Fuzz(rawMsg []byte) int {
    var (
        msg         = &amp;dns.Msg{}
        buf, bufOne = make([]byte, 100000), make([]byte, 100000)
        res, resOne []byte

        unpackErr, packErr error
    )

    if unpackErr = msg.Unpack(rawMsg); unpackErr != nil {
        return 0
    }

    if res, packErr = msg.PackBuffer(buf); packErr != nil {
        return 0
    }

    for i := range res {
        bufOne[i] = 1
    }

    resOne, packErr = msg.PackBuffer(bufOne)
    if packErr != nil {
        println("Pack failed only with a filled buffer")
        panic(packErr)
    }

    if !bytes.Equal(res, resOne) {
        println("buffer bits leaked into the packed message")
        println(hex.Dump(res))
        println(hex.Dump(resOne))
        os.Exit(1)
    }

    return 1
}</code></pre>
            <p>I wish here, too, I could show a PR fixing all the bugs, but go-fuzz did its job even too well and we are still triaging and fixing what it finds.</p><p>Anyway, once the fixes are done and go-fuzz falls silent, we will be free to drop the buffer zeroing step without worry, with no need to audit the whole codebase!</p><p><i>Do you fancy fuzzing the libraries that serve 43 billion queries per day? We are </i><a href="https://www.cloudflare.com/join-our-team"><i>hiring</i></a><i> in London, San Francisco and Singapore!</i></p> ]]></content:encoded>
            <category><![CDATA[RRDNS]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[Go]]></category>
            <guid isPermaLink="false">7zu5Cq14O6t3QJfjOHY6b7</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Go has a debugger—and it's awesome!]]></title>
            <link>https://blog.cloudflare.com/go-has-a-debugger-and-its-awesome/</link>
            <pubDate>Thu, 18 Jun 2015 11:14:00 GMT</pubDate>
            <description><![CDATA[ Something that often, uh... bugs Go developers is the lack of a proper debugger. Builds are ridiculously fast and easy, but sometimes it would be nice to just set a breakpoint and step through that endless if chain or print a bunch of values without recompiling ten times. ]]></description>
            <content:encoded><![CDATA[ <p>Something that often, uh... <i>bugs</i><a href="#fn1">[1]</a> Go developers is the <b>lack of a proper debugger</b>. Sure, builds are ridiculously fast and easy, and <code>println(hex.Dump(b))</code> is your friend, but sometimes it would be nice to just set a breakpoint and step through that endless <code>if</code> chain or print a bunch of values without recompiling ten times.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1PlA0CeyPc2zJ6Ai6u9f9u/77f31929ad59993a59bb24367a46d852/12294903084_3a3d128ae7_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/62766743@N07/12294903084/in/photolist-jJsAkE-hiHrhB-9TNjzG-9TMKnB-9TKuyt-9TKuQx-4rHRku-9TNj1L-dCD4Ay-bbk7in-ngEQwy-q577yv-qmsFPs-qXFbRy-dCMyqk-rmqu1H-tncWw9-fzkCLf-54MZxq-9ZCivM-fdC6b-5jvVQ7-q4YkxA-2vVkpu-aY6pnx-9TNiVC-j8TKCC-9TNji3-dKjVwD-eRrMtP-dVJA3D-bwjW2u-ohnZh9-iRdXBy-dWXXKe-fdT8VT-ePmAs-ecdQqy-ieu7sA-iFi5z-j6m1Qs-ncgQ2q-7W3hJi-r17FpD-ekipUs-jYbRdy-ckWNBh-gT4VL-9TNjvC-9TNjpL">image</a> by <a href="https://www.flickr.com/photos/62766743@N07/">Carl Milner</a></p><p>You <i>could</i> try to use some dirty gdb hacks that will work if you built your binary with a certain linker and ran it on some architectures when the moon was in a waxing crescent phase, but let's be honest, it isn't an enjoyable experience.</p><p>Well, worry no more! <a href="https://github.com/mailgun/godebug">godebug</a> is here!</p><p><b>godebug is an awesome cross-platform debugger</b> created by the Mailgun team. You can read <a href="http://blog.mailgun.com/introducing-a-new-cross-platform-debugger-for-go/">their introduction</a> for some under-the-hood details, but here's the cool bit: instead of wrestling with half a dozen different ptrace interfaces that would not be portable, <b>godebug rewrites your source code</b> and injects function calls like <code>godebug.Line</code> on every line, <code>godebug.Declare</code> at every variable declaration, and <code>godebug.SetTrace</code> for breakpoints (i.e. wherever you type <code>_ = "breakpoint"</code>).</p><p>I find this solution brilliant. What you get out of it is a (possibly cross-compiled) debug-enabled binary that you can drop on a staging server just like you would with a regular binary. When a breakpoint is reached, the program will stop inline and wait for you on stdin. <b>It's the single-binary, zero-dependencies philosophy of Go that we love applied to debugging.</b> Builds everywhere, runs everywhere, with no need for tools or permissions on the server. It even compiles to JavaScript with gopherjs (check out the Mailgun post above—show-offs ;) ).</p><p>You might ask, "But does it get a decent runtime speed or work with big applications?" Well, the other day I was seeing RRDNS—our in-house Go DNS server—hit a weird branch, so I placed a breakpoint a couple lines above the <i>if</i> in question, <b>recompiled the whole of RRDNS with godebug instrumentation</b>, dropped the binary on a staging server, and replayed some DNS traffic.</p>
            <pre><code>filippo@staging:~$ ./rrdns -config config.json
-&gt; _ = "breakpoint"
(godebug) l

    q := r.Query.Question[0]

--&gt; _ = "breakpoint"

    if !isQtypeSupported(q.Qtype) {
        return
(godebug) n
-&gt; if !isQtypeSupported(q.Qtype) {
(godebug) q
dns.Question{Name:"filippo.io.", Qtype:0x1, Qclass:0x1}
(godebug) c</code></pre>
            <p>Boom. The request and the debug log paused (make sure to terminate any timeout you have in your tools), waiting for me to step through the code.</p><p>Sold yet? Here's how you use it: simply run <code>godebug {build|run|test}</code> instead of <code>go {build|run|test}</code>. <a href="https://github.com/mailgun/godebug/pull/32/commits">We adapted godebug</a> to resemble the go tool as much as possible. Remember to use <code>-instrument</code> if you want to be able to step into packages that are not <i>main</i>.</p><p>For example, here is part of the RRDNS Makefile:</p>
            <pre><code>bin/rrdns:
ifdef GODEBUG
	GOPATH="${PWD}" go install github.com/mailgun/godebug
	GOPATH="${PWD}" ./bin/godebug build -instrument "${GODEBUG}" -o bin/rrdns rrdns
else
	GOPATH="${PWD}" go install rrdns
endif

test:
ifdef GODEBUG
	GOPATH="${PWD}" go install github.com/mailgun/godebug
	GOPATH="${PWD}" ./bin/godebug test -instrument "${GODEBUG}" rrdns/...
else
	GOPATH="${PWD}" go test rrdns/...
endif</code></pre>
            <p>Debugging is just a <code>make bin/rrdns GODEBUG=rrdns/...</code> away.</p><p>This tool is still young, but in my experience, perfectly functional. The UX could use some love if you can spare some time (as you can see above it's pretty spartan), but it should be easy to build on what's there already.</p>
    <div>
      <h2>About source rewriting</h2>
      <a href="#about-source-rewriting">
        
      </a>
    </div>
    <p>Before closing, I'd like to say a few words about the technique of source rewriting in general. It powers many different Go tools, like <a href="https://blog.golang.org/cover">test coverage</a>, <a href="https://github.com/dvyukov/go-fuzz">fuzzing</a> and, indeed, debugging. It's made possible primarily by Go’s blazing-fast compiles, and it enables amazing cross-platform tools to be built easily.</p><p>However, since it's such a handy and powerful pattern, I feel like <b>there should be a standard way to apply it in the context of the build process</b>. After all, all the source rewriting tools need to implement a subset of the following features:</p><ul><li><p>Wrap the main function</p></li><li><p>Conditionally rewrite source files</p></li><li><p>Keep global state</p></li></ul><p>Why should every tool have to reinvent all the boilerplate to copy the source files, rewrite the source, make sure stale objects are not used, build the right packages, run the right tests, and interpret the CLI..? Basically, all of <a href="https://github.com/mailgun/godebug/blob/f8742f647adb8ee17a1435de3b1929d36df590c8/cmd.go">godebug/cmd.go</a>. And what about <a href="http://getgb.io/">gb</a>, for example?</p><p>I think we need a framework for Go source code rewriting tools. (Spoiler, spoiler, ...)</p><p><i>If you’re interested in working on Go servers at scale and developing tools to do it better, remember </i><a href="https://www.cloudflare.com/join-our-team"><i>we’re hiring in London, San Francisco, and Singapore</i></a><i>!</i></p><hr /><hr /><ol><li><p>I'm sorry. <a href="#fnref1">↩︎</a></p></li></ol> ]]></content:encoded>
            <category><![CDATA[RRDNS]]></category>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Go]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">7rlszh5ZEwkE3JfCjkJkZv</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
    </channel>
</rss>