
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 11 Apr 2026 20:03:24 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Migrating billions of records: moving our active DNS database while it’s in use]]></title>
            <link>https://blog.cloudflare.com/migrating-billions-of-records-moving-our-active-dns-database-while-in-use/</link>
            <pubDate>Tue, 29 Oct 2024 14:00:00 GMT</pubDate>
            <description><![CDATA[ DNS records have moved to a new database, bringing improved performance and reliability to all customers. ]]></description>
            <content:encoded><![CDATA[ <p>According to a survey done by <a href="https://w3techs.com/technologies/overview/dns_server"><u>W3Techs</u></a>, as of October 2024, Cloudflare is used as an <a href="https://www.cloudflare.com/en-gb/learning/dns/dns-server-types/"><u>authoritative DNS</u></a> provider by 14.5% of all websites. As an authoritative DNS provider, we are responsible for managing and serving all the DNS records for our clients’ domains. This means we have an enormous responsibility to provide the best service possible, starting at the data plane. As such, we are constantly investing in our infrastructure to ensure the reliability and performance of our systems.</p><p><a href="https://www.cloudflare.com/learning/dns/what-is-dns/"><u>DNS</u></a> is often referred to as the phone book of the Internet, and is a key component of the Internet. If you have ever used a phone book, you know that they can become extremely large depending on the size of the physical area it covers. A <a href="https://www.cloudflare.com/en-gb/learning/dns/glossary/dns-zone/#:~:text=What%20is%20a%20DNS%20zone%20file%3F"><u>zone file</u></a> in DNS is no different from a phone book. It has a list of records that provide details about a domain, usually including critical information like what IP address(es) each hostname is associated with. For example:</p>
            <pre><code>example.com      59 IN A 198.51.100.0
blog.example.com 59 IN A 198.51.100.1
ask.example.com  59 IN A 198.51.100.2</code></pre>
            <p>It is not unusual for these zone files to reach millions of records in size, just for a single domain. The biggest single zone on Cloudflare holds roughly 4 million DNS records, but the vast majority of zones hold fewer than 100 DNS records. Given our scale according to W3Techs, you can imagine how much DNS data alone Cloudflare is responsible for. Given this volume of data, and all the complexities that come at that scale, there needs to be a very good reason to move it from one database cluster to another. </p>
    <div>
      <h2>Why migrate </h2>
      <a href="#why-migrate">
        
      </a>
    </div>
    <p>When initially measured in 2022, DNS data took up approximately 40% of the storage capacity in Cloudflare’s main database cluster (<b>cfdb</b>). This database cluster, consisting of a primary system and multiple replicas, is responsible for storing DNS zones, propagated to our <a href="https://www.cloudflare.com/network/"><u>data centers in over 330 cities</u></a> via our distributed KV store <a href="https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale/"><u>Quicksilver</u></a>. <b>cfdb</b> is accessed by most of Cloudflare's APIs, including the <a href="https://developers.cloudflare.com/dns/manage-dns-records/how-to/create-dns-records/"><u>DNS Records API</u></a>. Today, the DNS Records API is the API most used by our customers, with each request resulting in a query to the database. As such, it’s always been important to optimize the DNS Records API and its surrounding infrastructure to ensure we can successfully serve every request that comes in.</p><p>As Cloudflare scaled, <b>cfdb</b> was becoming increasingly strained under the pressures of several services, many unrelated to DNS. During spikes of requests to our DNS systems, other Cloudflare services experienced degradation in the database performance. It was understood that in order to properly scale, we needed to optimize our database access and improve the systems that interact with it. However, it was evident that system level improvements could only be just so useful, and the growing pains were becoming unbearable. In late 2022, the DNS team decided, along with the help of 25 other teams, to detach itself from <b>cfdb</b> and move our DNS records data to another database cluster.</p>
    <div>
      <h2>Pre-migration</h2>
      <a href="#pre-migration">
        
      </a>
    </div>
    <p>From a DNS perspective, this migration to an improved database cluster was in the works for several years. Cloudflare initially relied on a single <a href="https://www.postgresql.org/"><u>Postgres</u></a> database cluster, <b>cfdb</b>. At Cloudflare's inception, <b>cfdb</b> was responsible for storing information about zones and accounts and the majority of services on the Cloudflare control plane depended on it. Since around 2017, as Cloudflare grew, many services moved their data out of <b>cfdb</b> to be served by a <a href="https://en.wikipedia.org/wiki/Microservices"><u>microservice</u></a>. Unfortunately, the difficulty of these migrations are directly proportional to the amount of services that depend on the data being migrated, and in this case, most services require knowledge of both zones and DNS records.</p><p>Although the term “zone” was born from the DNS point of view, it has since evolved into something more. Today, zones on Cloudflare store many different types of non-DNS related settings and help link several non-DNS related products to customers' websites. Therefore, it didn’t make sense to move both zone data and DNS record data together. This separation of two historically tightly coupled DNS concepts proved to be an incredibly challenging problem, involving many engineers and systems. In addition, it was clear that if we were going to dedicate the resources to solving this problem, we should also remove some of the legacy issues that came along with the original solution. </p><p>One of the main issues with the legacy database was that the DNS team had little control over which systems accessed exactly what data and at what rate. Moving to a new database gave us the opportunity to create a more tightly controlled interface to the DNS data. This was manifested as an internal DNS Records <a href="https://blog.cloudflare.com/moving-k8s-communication-to-grpc/"><u>gRPC API</u></a> which allows us to make sweeping changes to our data while only requiring a single change to the API, rather than coordinating with other systems.  For example, the DNS team can alter access logic and auditing procedures under the hood. In addition, it allows us to appropriately rate-limit and cache data depending on our needs. The move to this new API itself was no small feat, and with the help of several teams, we managed to migrate over 20 services, using 5 different programming languages, from direct database access to using our managed gRPC API. Many of these services touch very important areas such as <a href="https://developers.cloudflare.com/dns/dnssec/"><u>DNSSEC</u></a>, <a href="https://developers.cloudflare.com/ssl/"><u>TLS</u></a>, <a href="https://developers.cloudflare.com/email-routing/"><u>Email</u></a>, <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/"><u>Tunnels</u></a>, <a href="https://developers.cloudflare.com/workers/"><u>Workers</u></a>, <a href="https://developers.cloudflare.com/spectrum/"><u>Spectrum</u></a>, and <a href="https://developers.cloudflare.com/r2/"><u>R2 storage</u></a>. Therefore, it was important to get it right. </p><p>One of the last issues to tackle was the logical decoupling of common DNS database functions from zone data. Many of these functions expect to be able to access both DNS record data and DNS zone data at the same time. For example, at record creation time, our API needs to check that the zone is not over its maximum record allowance. Originally this check occurred at the SQL level by verifying that the record count was lower than the record limit for the zone. However, once you remove access to the zone itself, you are no longer able to confirm this. Our DNS Records API also made use of SQL functions to audit record changes, which requires access to both DNS record and zone data. Luckily, over the past several years, we have migrated this functionality out of our monolithic API and into separate microservices. This allowed us to move the auditing and zone setting logic to the application level rather than the database level. Ultimately, we are still taking advantage of SQL functions in the new database cluster, but they are fully independent of any other legacy systems, and are able to take advantage of the latest Postgres version.</p><p>Now that Cloudflare DNS was mostly decoupled from the zones database, it was time to proceed with the data migration. For this, we built what would become our <b>Change Data Capture and Transfer Service (CDCTS).</b></p>
    <div>
      <h2>Requirements for the Change Data Capture and Transfer Service</h2>
      <a href="#requirements-for-the-change-data-capture-and-transfer-service">
        
      </a>
    </div>
    <p>The Database team is responsible for all Postgres clusters within Cloudflare, and were tasked with executing the data migration of two tables that store DNS data: <i>cf_rec</i> and <i>cf_archived_rec</i>, from the original <b>cfdb </b>cluster to a new cluster we called <b>dnsdb</b>.  We had several key requirements that drove our design:</p><ul><li><p><b>Don’t lose data. </b>This is the number one priority when handling any sort of data. Losing data means losing trust, and it is incredibly difficult to regain that trust once it’s lost.  Important in this is the ability to prove no data had been lost.  The migration process would, ideally, be easily auditable.</p></li><li><p><b>Minimize downtime</b>.  We wanted a solution with less than a minute of downtime during the migration, and ideally with just a few seconds of delay.</p></li></ul><p>These two requirements meant that we had to be able to migrate data changes in near real-time, meaning we either needed to implement logical replication, or some custom method to capture changes, migrate them, and apply them in a table in a separate Postgres cluster.</p><p>We first looked at using Postgres logical replication using <a href="https://github.com/2ndQuadrant/pglogical"><u>pgLogical</u></a>, but had concerns about its performance and our ability to audit its correctness.  Then some additional requirements emerged that made a pgLogical implementation of logical replication impossible:</p><ul><li><p><b>The ability to move data must be bidirectional.</b> We had to have the ability to switch back to <b>cfdb</b> without significant downtime in case of unforeseen problems with the new implementation. </p></li><li><p><b>Partition the </b><b><i>cf_rec</i></b><b> table in the new database.</b> This was a long-desired improvement and since most access to <i>cf_rec</i> is by zone_id, it was decided that <b>mod(zone_id, num_partitions)</b> would be the partition key.</p></li><li><p><b>Transferred data accessible from original database.  </b>In case we had functionality that still needed access to data, a foreign table pointing to <b>dnsdb</b> would be available in <b>cfdb</b>. This could be used as emergency access to avoid needing to roll back the entire migration for a single missed process.</p></li><li><p><b>Only allow writes in one database. </b> Applications should know where the primary database is, and should be blocked from writing to both databases at the same time.</p></li></ul>
    <div>
      <h2>Details about the tables being migrated</h2>
      <a href="#details-about-the-tables-being-migrated">
        
      </a>
    </div>
    <p>The primary table, <i>cf_rec</i>, stores DNS record information, and its rows are regularly inserted, updated, and deleted. At the time of the migration, this table had 1.7 billion records, and with several indexes took up 1.5 TB of disk. Typical daily usage would observe 3-5 million inserts, 1 million updates, and 3-5 million deletes.</p><p>The second table, <i>cf_archived_rec</i>, stores copies of <i>cf_rec</i> that are obsolete — this table generally only has records inserted and is never updated or deleted.  As such, it would see roughly 3-5 million inserts per day, corresponding to the records deleted from <i>cf_rec</i>. At the time of the migration, this table had roughly 4.3 billion records.</p><p>Fortunately, neither table made use of database triggers or foreign keys, which meant that we could insert/update/delete records in this table without triggering changes or worrying about dependencies on other tables.</p><p>Ultimately, both of these tables are highly active and are the source of truth for many highly critical systems at Cloudflare.</p>
    <div>
      <h2>Designing the Change Data Capture and Transfer Service</h2>
      <a href="#designing-the-change-data-capture-and-transfer-service">
        
      </a>
    </div>
    <p>There were two main parts to this database migration:</p><ol><li><p><b>Initial copy:</b> Take all the data from <b>cfdb </b>and put it in <b>dnsdb.</b></p></li><li><p><b>Change copy:</b> Take all the changes in <b>cfdb </b>since the initial copy and update <b>dnsdb</b> to reflect them. This is the more involved part of the process.</p></li></ol><p>Normally, logical replication replays every insert, update, and delete on a copy of the data in the same transaction order, making a single-threaded pipeline.  We considered using a queue-based system but again, speed and auditability were both concerns as any queue would typically replay one change at a time.  We wanted to be able to apply large sets of changes, so that after an initial dump and restore, we could quickly catch up with the changed data. For the rest of the blog, we will only speak about <i>cf_rec</i> for simplicity, but the process for <i>cf_archived_rec</i> is the same.</p><p>What we decided on was a simple change capture table. Rows from this capture table would be loaded in real-time by a database trigger, with a transfer service that could migrate and apply thousands of changed records to <b>dnsdb</b> in each batch. Lastly, we added some auditing logic on top to ensure that we could easily verify that all data was safely transferred without downtime.</p>
    <div>
      <h3>Basic model of change data capture </h3>
      <a href="#basic-model-of-change-data-capture">
        
      </a>
    </div>
    <p>For <i>cf_rec</i> to be migrated, we would create a change logging table, along with a trigger function and a  table trigger to capture the new state of the record after any insert/update/delete.  </p><p>The change logging table named <i>log_cf_rec</i> had the same columns as <i>cf_rec</i>, as well as four new columns:</p><ul><li><p><b>change_id</b>:  a sequence generated unique identifier of the record</p></li><li><p><b>action</b>: a single character indicating whether this record represents an [i]nsert, [u]pdate, or [d]elete</p></li><li><p><b>change_timestamp</b>: the date/time when the change record was created</p></li><li><p><b>change_user:</b> the database user that made the change.  </p></li></ul><p>A trigger was placed on the <i>cf_rec</i> table so that each insert/update would copy the new values of the record into the change table, and for deletes, create a 'D' record with the primary key value. </p><p>Here is an example of the change logging where we delete, re-insert, update, and finally select from the <i>log_cf_rec</i><b> </b>table. Note that the actual <i>cf_rec</i> and <i>log_cf_rec</i> tables have many more columns, but have been edited for simplicity.</p>
            <pre><code>dns_records=# DELETE FROM  cf_rec WHERE rec_id = 13;

dns_records=# SELECT * from log_cf_rec;
Change_id | action | rec_id | zone_id | name
----------------------------------------------
1         | D      | 13     |         |   

dns_records=# INSERT INTO cf_rec VALUES(13,299,'cloudflare.example.com');  

dns_records=# UPDATE cf_rec SET name = 'test.example.com' WHERE rec_id = 13;

dns_records=# SELECT * from log_cf_rec;
Change_id | action | rec_id | zone_id | name
----------------------------------------------
1         | D      | 13     |         |  
2         | I      | 13     | 299     | cloudflare.example.com
3         | U      | 13     | 299     | test.example.com </code></pre>
            <p>In addition to <i>log_cf_rec</i>, we also introduced 2 more tables in <b>cfdb </b>and 3 more tables in <b>dnsdb:</b></p><p><b>cfdb</b></p><ol><li><p><i>transferred_log_cf_rec</i>: Responsible for auditing the batches transferred to <b>dnsdb</b>.</p></li><li><p><i>log_change_action</i>:<i> </i>Responsible for summarizing the transfer size in order to compare with the <i>log_change_action </i>in <b>dnsdb.</b></p></li></ol><p><b>dnsdb</b></p><ol><li><p><i>migrate_log_cf_rec</i>:<i> </i>Responsible for collecting batch changes in <b>dnsdb</b>, which would later be applied to <i>cf_rec </i>in <b>dnsdb</b><i>.</i></p></li><li><p><i>applied_migrate_log_cf_rec</i>:<i> </i>Responsible for auditing the batches that had been successfully applied to cf_rec in <b>dnsdb.</b></p></li><li><p><i>log_change_action</i>:<i> </i>Responsible for summarizing the transfer size in order to compare with the <i>log_change_action </i>in <b>cfdb.</b></p></li></ol>
    <div>
      <h3>Initial copy</h3>
      <a href="#initial-copy">
        
      </a>
    </div>
    <p>With change logging in place, we were now ready to do the initial copy of the tables from <b>cfdb</b> to <b>dnsdb</b>. Because we were changing the structure of the tables in the destination database and because of network timeouts, we wanted to bring the data over in small pieces and validate that it was brought over accurately, rather than doing a single multi-hour copy or <a href="https://www.postgresql.org/docs/current/app-pgdump.html"><u>pg_dump</u></a>.  We also wanted to ensure a long-running read could not impact production and that the process could be paused and resumed at any time.  The basic model to transfer data was done with a simple psql copy statement piped into another psql copy statement.  No intermediate files were used.</p><p><code>psql_cfdb -c "COPY (SELECT * FROM cf_rec WHERE id BETWEEN n and n+1000000 TO STDOUT)" | </code></p><p><code>psql_dnsdb -c "COPY cf_rec FROM STDIN"</code></p><p>Prior to a batch being moved, the count of records to be moved was recorded in <b>cfdb</b>, and after each batch was moved, a count was recorded in <b>dnsdb</b> and compared to the count in <b>cfdb</b> to ensure that a network interruption or other unforeseen error did not cause data to be lost. The bash script to copy data looked like this, where we included files that could be touched to pause or end the copy (if they cause load on production or there was an incident).  Once again, this code below has been heavily simplified.</p>
            <pre><code>#!/bin/bash
for i in "$@"; do
   # Allow user to control whether this is paused or not via pause_copy file
   while [ -f pause_copy ]; do
      sleep 1
   done
   # Allow user to end migration by creating end_copy file
   if [ ! -f end_copy ]; then
      # Copy a batch of records from cfdb to dnsdb
      # Get count of records from cfdb 
	# Get count of records from dnsdb
 	# Compare cfdb count with dnsdb count and alert if different 
   fi
done
</code></pre>
            <p><sup><i>Bash copy script</i></sup></p>
    <div>
      <h3>Change copy</h3>
      <a href="#change-copy">
        
      </a>
    </div>
    <p>Once the initial copy was completed, we needed to update <b>dnsdb</b> with any changes that had occurred in <b>cfdb</b> since the start of the initial copy. To implement this change copy, we created a function <i>fn_log_change_transfer_log_cf_rec </i>that could be passed a <i>batch_id</i> and <i>batch_size</i>, and did 5 things, all of which were executed in a single database <a href="https://www.postgresql.org/docs/current/tutorial-transactions.html"><u>transaction</u></a>:</p><ol><li><p>Select a <i>batch_size</i> of records from <i>log_cf_rec</i> in <b>cfdb</b>.</p></li><li><p>Copy the batch to <i>transferred_log_cf_rec</i> in <b>cfdb </b>to mark it as transferred.</p></li><li><p>Delete the batch from <i>log_cf_rec</i>.</p></li><li><p>Write a summary of the action to <i>log_change_action</i> table. This will later be used to compare transferred records with <b>cfdb</b>.</p></li><li><p>Return the batch of records.</p></li></ol><p>We then took the returned batch of records and copied them to <i>migrate_log_cf_rec </i>in <b>dnsdb</b>. We used the same bash script as above, except this time, the copy command looked like this:</p><p><code>psql_cfdb -c "COPY (SELECT * FROM </code><code><i>fn_log_change_transfer_log_cf_rec(&lt;batch_id&gt;,&lt;batch_size&gt;</i></code><code>) TO STDOUT" | </code></p><p><code>psql_dnsdb -c "COPY migrate_log_cf_rec FROM STDIN"</code></p>
    <div>
      <h3>Applying changes in the destination database</h3>
      <a href="#applying-changes-in-the-destination-database">
        
      </a>
    </div>
    <p>Now, with a batch of data in the <i>migrate_log_cf_rec </i>table, we called a newly created function <i>log_change_apply</i> to apply and audit the changes. Once again, this was all executed within a single database transaction. The function did the following:</p><ol><li><p>Move a batch from the <i>migrate_log_cf_rec</i> table to a new temporary table.</p></li><li><p>Write the counts for the batch_id to the <i>log_change_action</i> table.</p></li><li><p>Delete from the temporary table all but the latest record for a unique id (last action). For example, an insert followed by 30 updates would have a single record left, the final update. There is no need to apply all the intermediate updates.</p></li><li><p>Delete any record from <i>cf_rec</i> that has any corresponding changes.</p></li><li><p>Insert any [i]nsert or [u]pdate records in <i>cf_rec</i>.</p></li><li><p>Copy the batch to <i>applied_migrate_log_cf_rec</i> for a full audit trail.</p></li></ol>
    <div>
      <h3>Putting it all together</h3>
      <a href="#putting-it-all-together">
        
      </a>
    </div>
    <p>There were 4 distinct phases, each of which was part of a different database transaction:</p><ol><li><p>Call <i>fn_log_change_transfer_log_cf_rec </i>in <b>cfdb </b>to get a batch of records.</p></li><li><p>Copy the batch of records to <b>dnsdb.</b></p></li><li><p>Call <i>log_change_apply </i>in <b>dnsdb </b>to apply the batch of records.</p></li><li><p>Compare the <i>log_change_action</i> table in each respective database to ensure counts match.</p></li></ol>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2REIq71tc7M4jKPLZSJzS9/11f22f700300f2ad3a5ee5ca85a75480/Applying_changes_in_the_destination_database.png" />
          </figure><p>This process was run every 3 seconds for several weeks before the migration to ensure that we could keep <b>dnsdb</b> in sync with <b>cfdb</b>.</p>
    <div>
      <h2>Managing which database is live</h2>
      <a href="#managing-which-database-is-live">
        
      </a>
    </div>
    <p>The last major pre-migration task was the construction of the request locking system that would be used throughout the actual migration. The aim was to create a system that would allow the database to communicate with the DNS Records API, to allow the DNS Records API to handle HTTP connections more gracefully. If done correctly, this could reduce downtime for DNS Record API users to nearly zero.</p><p>In order to facilitate this, a new table called <i>cf_migration_manager</i> was created. The table would be periodically polled by the DNS Records API, communicating two critical pieces of information:</p><ol><li><p><b>Which database was active.</b> Here we just used a simple A or B naming convention.</p></li><li><p><b>If the database was locked for writing</b>. In the event the database was locked for writing, the DNS Records API would hold HTTP requests until the lock was released by the database.</p></li></ol><p>Both pieces of information would be controlled within a migration manager script.</p><p>The benefit of migrating the 20+ internal services from direct database access to using our internal DNS Records gRPC API is that we were able to control access to the database to ensure that no one else would be writing without going through the <i>cf_migration_manager</i>.</p>
    <div>
      <h2>During the migration </h2>
      <a href="#during-the-migration">
        
      </a>
    </div>
    <p>Although we aimed to complete this migration in a matter of seconds, we announced a DNS maintenance window that could last a couple of hours just to be safe. Now that everything was set up, and both <b>cfdb</b> and <b>dnsdb</b> were roughly in sync, it was time to proceed with the migration. The steps were as follows:</p><ol><li><p>Lower the time between copies from 3s to 0.5s.</p></li><li><p>Lock <b>cfdb</b> for writes via <i>cf_migration_manager</i>. This would tell the DNS Records API to hold write connections.</p></li><li><p>Make <b>cfdb</b> read-only and migrate the last logged changes to <b>dnsdb</b>. </p></li><li><p>Enable writes to <b>dnsdb</b>. </p></li><li><p>Tell DNS Records API that <b>dnsdb</b> is the new primary database and that write connections can proceed via the <i>cf_migration_manager</i>.</p></li></ol><p>Since we needed to ensure that the last changes were copied to <b>dnsdb</b> before enabling writing, this entire process took no more than 2 seconds. During the migration we saw a spike of API latency as a result of the migration manager locking writes, and then dealing with a backlog of queries. However, we recovered back to normal latencies after several minutes. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6agUpD8BQVxgDupBrwtTw3/38c96f91879c6539011866821ad6f11a/image3.png" />
          </figure><p><sup><i>DNS Records API Latency and Requests during migration</i></sup></p><p>Unfortunately, due to the far-reaching impact that DNS has at Cloudflare, this was not the end of the migration. There were 3 lesser-used services that had slipped by in our scan of services accessing DNS records via <b>cfdb</b>. Fortunately, the setup of the foreign table meant that we could very quickly fix any residual issues by simply changing the table name. </p>
    <div>
      <h2>Post-migration</h2>
      <a href="#post-migration">
        
      </a>
    </div>
    <p>Almost immediately, as expected, we saw a steep drop in usage across <b>cfdb</b>. This freed up a lot of resources for other services to take advantage of.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/Xfnbc9MZLwJB91ypItWsi/1eb21362893b31a1e3c846d1076a9f5b/image6.jpg" />
          </figure><p><sup><i><b>cfdb</b></i></sup><sup><i> usage dropped significantly after the migration period.</i></sup></p><p>Since the migration, the average <b>requests</b> per second to the DNS Records API has more than <b>doubled</b>. At the same time, our CPU usage across both <b>cfdb</b> and <b>dnsdb</b> has settled at below 10% as seen below, giving us room for spikes and future growth. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/39su35dkb5Pl8uwYfYjHLg/0eb26ced30b44efb71abb73830e01f3a/image2.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5AdlLKXtD68QWCsMVLKnkt/9137beee9c941827eb57c53825ffe209/image4.png" />
          </figure><p><sup><i><b>cfdb</b></i></sup><sup><i> and </i></sup><sup><i><b>dnsdb</b></i></sup><sup><i> CPU usage now</i></sup></p><p>As a result of this improved capacity, our database-related incident rate dropped dramatically.</p><p>As for query latencies, our latency post-migration is slightly lower on average, with fewer sustained spikes above 500ms. However, the performance improvement is largely noticed during high load periods, when our database handles spikes without significant issues. Many of these spikes come as a result of clients making calls to collect a large amount of DNS records or making several changes to their zone in short bursts. Both of these actions are common use cases for large customers onboarding zones.</p><p>In addition to these improvements, the DNS team also has more granular control over <b>dnsdb</b> cluster-specific settings that can be tweaked for our needs rather than catering to all the other services. For example, we were able to make custom changes to replication lag limits to ensure that services using replicas were able to read with some amount of certainty that the data would exist in a consistent form. Measures like this reduce overall load on the primary because almost all read queries can now go to the replicas.</p><p>Although this migration was a resounding success, we are always working to improve our systems. As we grow, so do our customers, which means the need to scale never really ends. We have more exciting improvements on the roadmap, and we are looking forward to sharing more details in the future.</p><p>The DNS team at Cloudflare isn’t the only team solving challenging problems like the one above. If this sounds interesting to you, we have many more tech deep dives on our blog, and we are always looking for curious engineers to join our team — see open opportunities <a href="https://www.cloudflare.com/en-gb/careers/jobs/"><u>here</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Database]]></category>
            <category><![CDATA[Kafka]]></category>
            <category><![CDATA[Postgres]]></category>
            <category><![CDATA[Tracing]]></category>
            <category><![CDATA[Quicksilver]]></category>
            <guid isPermaLink="false">24rozMdbFQ7jmUgRNMF4RU</guid>
            <dc:creator>Alex Fattouche</dc:creator>
            <dc:creator>Corey Horton</dc:creator>
        </item>
        <item>
            <title><![CDATA[Elephants in tunnels: how Hyperdrive connects to databases inside your VPC networks]]></title>
            <link>https://blog.cloudflare.com/elephants-in-tunnels-how-hyperdrive-connects-to-databases-inside-your-vpc-networks/</link>
            <pubDate>Fri, 25 Oct 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ Hyperdrive (Cloudflare’s globally distributed SQL connection pooler and cache) recently added support for directing database traffic from Workers across Cloudflare Tunnels. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>With September’s <a href="https://blog.cloudflare.com/builder-day-2024-announcements/#connect-to-private-databases-from-workers"><u>announcement</u></a> of Hyperdrive’s ability to send database traffic from <a href="https://workers.cloudflare.com/"><u>Workers</u></a> over <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/"><u>Cloudflare Tunnels</u></a>, we wanted to dive into the details of what it took to make this happen.</p>
    <div>
      <h2>Hyper-who?</h2>
      <a href="#hyper-who">
        
      </a>
    </div>
    <p>Accessing your data from anywhere in Region Earth can be hard. Traditional databases are powerful, familiar, and feature-rich, but your users can be thousands of miles away from your database. This can cause slower connection startup times, slower queries, and connection exhaustion as everything takes longer to accomplish.</p><p><a href="https://developers.cloudflare.com/workers/"><u>Cloudflare Workers</u></a> is an incredibly lightweight runtime, which enables our customers to deploy their applications globally by default and renders the <a href="https://en.wikipedia.org/wiki/Cold_start_(computing)"><u>cold start</u></a> problem almost irrelevant. The trade-off for these light, ephemeral execution contexts is the lack of persistence for things like database connections. Database connections are also notoriously expensive to spin up, with many round trips required between client and server before any query or result bytes can be exchanged.</p><p><a href="https://blog.cloudflare.com/hyperdrive-making-regional-databases-feel-distributed"><u>Hyperdrive</u></a> is designed to make the centralized databases you already have feel like they’re global while keeping connections to those databases hot. We use our <a href="https://www.cloudflare.com/network/"><u>global network</u></a> to get faster routes to your database, keep connection pools primed, and cache your most frequently run queries as close to users as possible.</p>
    <div>
      <h2>Why a Tunnel?</h2>
      <a href="#why-a-tunnel">
        
      </a>
    </div>
    <p>For something as sensitive as your database, exposing access to the public Internet can be uncomfortable. It is common to instead host your database on a private network, and allowlist known-safe IP addresses or configure <a href="https://www.cloudflare.com/learning/network-layer/what-is-gre-tunneling/"><u>GRE tunnels</u></a> to permit traffic to it. This is complex, toilsome, and error-prone. </p><p>On Cloudflare’s <a href="https://www.cloudflare.com/en-gb/developer-platform/"><u>Developer Platform</u></a>, we strive for simplicity and ease-of-use. We cannot expect all of our customers to be experts in configuring networking solutions, and so we went in search of a simpler solution. <a href="https://www.cloudflare.com/the-net/top-of-mind-security/customer-zero/"><u>Being your own customer</u></a> is rarely a bad choice, and it so happens that Cloudflare offers an excellent option for this scenario: Tunnels.</p><p><a href="https://www.cloudflare.com/products/tunnel/"><u>Cloudflare Tunnel</u></a> is a Zero Trust product that creates a secure connection between your private network and Cloudflare. Exposing services within your private network can be as simple as <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/"><u>running a </u><code><u>cloudflared</u></code><u> binary</u></a>, or deploying a Docker container running the <a href="https://hub.docker.com/r/cloudflare/cloudflared"><code><u>cloudflared</u></code><u> image we distribute</u></a>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3182f43rbwdH9krF1xhdlC/d22430cdb1efa134031f94fea691c36e/image1.png" />
          </figure>
    <div>
      <h2>A custom handler and generic streams</h2>
      <a href="#a-custom-handler-and-generic-streams">
        
      </a>
    </div>
    <p>Integrating with Tunnels to support sending Postgres directly through them was a bit of a new challenge for us. Most of the time, when we use Tunnels internally (more on that later!), we rely on the excellent job <code>cloudflared</code> does of handling all of the mechanics, and we just treat them as pipes. That wouldn’t work for Hyperdrive, though, so we had to dig into how Tunnels actually ingress traffic to build a solution.</p><p>Hyperdrive handles Postgres traffic using an entirely custom implementation of the <a href="https://www.postgresql.org/docs/current/protocol.html"><u>Postgres message protocol</u></a>. This is necessary, because we sometimes have to <a href="https://blog.cloudflare.com/postgres-named-prepared-statements-supported-hyperdrive"><u>alter the specific type or content</u></a> of messages sent from client to server, or vice versa. Handling individual bytes gives us the flexibility to implement whatever logic any new feature might need.</p><p>An additional, perhaps less obvious, benefit of handling Postgres message traffic as just bytes is that we are not bound to the transport layer choices of some <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping"><u>ORM</u></a> or library. One of the nuances of running services in Cloudflare is that we may want to egress traffic over different services or protocols, for a variety of different reasons. In this case, being able to egress traffic via a Tunnel would be pretty challenging if we were stuck with whatever raw TCP socket a library had established for us.</p><p>The way we accomplish this relies on a mainstay of Rust: <a href="https://doc.rust-lang.org/book/ch10-02-traits.html"><u>traits</u></a> (which are how Rust lets developers apply logic across generic functions and types). In the Rust ecosystem, there are two traits that define the behavior Hyperdrive wants out of its transport layers: <a href="https://docs.rs/tokio/latest/tokio/io/trait.AsyncRead.html"><code><u>AsyncRead</u></code></a> and <a href="https://docs.rs/tokio/latest/tokio/io/trait.AsyncWrite.html"><code><u>AsyncWrite</u></code></a>. There are a couple of others we also need, but we’re going to focus on just these two. These traits enable us to code our entire custom handler against a generic stream of data, without the handler needing to know anything about the underlying protocol used to implement the stream. So, we can pass around a WebSocket connection as a generic I/O stream, wherever it might be needed.</p><p>As an example, the code to create a generic TCP stream and send a Postgres startup message across it might look like this:</p>
            <pre><code>/// Send a startup message to a Postgres server, in the role of a PG client.
/// https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-STARTUPMESSAGE
pub async fn send_startup&lt;S&gt;(stream: &amp;mut S, user_name: &amp;str, db_name: &amp;str, app_name: &amp;str) -&gt; Result&lt;(), ConnectionError&gt;
where
    S: AsyncWrite + Unpin,
{
    let protocol_number = 196608 as i32;
    let user_str = &amp;b"user\0"[..];
    let user_bytes = user_name.as_bytes();
    let db_str = &amp;b"database\0"[..];
    let db_bytes = db_name.as_bytes();
    let app_str = &amp;b"application_name\0"[..];
    let app_bytes = app_name.as_bytes();
    let len = 4 + 4
        + user_str.len() + user_bytes.len() + 1
        + db_str.len() + db_bytes.len() + 1
        + app_str.len() + app_bytes.len() + 1 + 1;

    // Construct a BytesMut of our startup message, then send it
    let mut startup_message = BytesMut::with_capacity(len as usize);
    startup_message.put_i32(len as i32);
    startup_message.put_i32(protocol_number);
    startup_message.put(user_str);
    startup_message.put_slice(user_bytes);
    startup_message.put_u8(0);
    startup_message.put(db_str);
    startup_message.put_slice(db_bytes);
    startup_message.put_u8(0);
    startup_message.put(app_str);
    startup_message.put_slice(app_bytes);
    startup_message.put_u8(0);
    startup_message.put_u8(0);

    match stream.write_all(&amp;startup_message).await {
        Ok(_) =&gt; Ok(()),
        Err(err) =&gt; {
            error!("Error writing startup to server: {}", err.to_string());
            ConnectionError::InternalError
        }
    }
}

/// Connect to a TCP socket
let stream = match TcpStream::connect(("localhost", 5432)).await {
    Ok(s) =&gt; s,
    Err(err) =&gt; {
        error!("Error connecting to address: {}", err.to_string());
        return ConnectionError::InternalError;
    }
};
let _ = send_startup(&amp;mut stream, "db_user", "my_db").await;</code></pre>
            <p>With this approach, if we wanted to encrypt the stream using <a href="https://www.cloudflare.com/learning/ssl/transport-layer-security-tls/#:~:text=Transport%20Layer%20Security%2C%20or%20TLS,web%20browsers%20loading%20a%20website."><u>TLS</u></a> before we write to it (upgrading our existing <code>TcpStream</code> connection in-place, to an <code>SslStream</code>), we would only have to change the code we use to create the stream, while generating and sending the traffic would remain unchanged. This is because <code>SslStream</code> also implements <code>AsyncWrite</code>!</p>
            <pre><code>/// We're handwaving the SSL setup here. You're welcome.
let conn_config = new_tls_client_config()?;

/// Encrypt the TcpStream, returning an SslStream
let ssl_stream = match tokio_boring::connect(conn_config, domain, stream).await {
    Ok(s) =&gt; s,
    Err(err) =&gt; {
        error!("Error during websocket TLS handshake: {}", err.to_string());
        return ConnectionError::InternalError;
    }
};
let _ = send_startup(&amp;mut ssl_stream, "db_user", "my_db").await;</code></pre>
            
    <div>
      <h2>Whence WebSocket</h2>
      <a href="#whence-websocket">
        
      </a>
    </div>
    <p><a href="https://datatracker.ietf.org/doc/html/rfc6455"><u>WebSocket</u></a> is an application layer protocol that enables bidirectional communication between a client and server. Typically, to establish a WebSocket connection, a client initiates an HTTP request and indicates they wish to upgrade the connection to WebSocket via the “Upgrade” header. Then, once the client and server complete the handshake, both parties can send messages over the connection until one of them terminates it.</p><p>Now, it turns out that the way Cloudflare Tunnels work under the hood is that both ends of the tunnel want to speak WebSocket, and rely on a translation layer to convert all traffic to or from WebSocket. The <code>cloudflared</code> daemon you spin up within your private network handles this for us! For Hyperdrive, however, we did not have a suitable translation layer to send Postgres messages across WebSocket, and had to write one.</p><p>One of the (many) fantastic things about Rust traits is that the contract they present is very clear. To be <code>AsyncRead</code>, you just need to implement <code>poll_read</code>. To be <code>AsyncWrite</code>, you need to implement only three functions (<code>poll_write</code>, <code>poll_flush</code>, and <code>poll_shutdown</code>). Further, there is excellent support for WebSocket in Rust built on top of the <a href="https://github.com/snapview/tungstenite-rs"><u>tungstenite-rs library</u></a>.</p><p>Thus, building our custom WebSocket stream such that it can share the same machinery as all our other generic streams just means translating the existing WebSocket support into these poll functions. There are some existing OSS projects that do this, but for multiple reasons we could not use the existing options. The primary reason is that Hyperdrive operates across multiple threads (thanks to the <a href="https://docs.rs/tokio/latest/tokio/runtime/index.html"><u>tokio runtime</u></a>), and so we rely on our connections to also handle <a href="https://doc.rust-lang.org/std/marker/trait.Send.html"><code><u>Send</u></code></a>, <a href="https://doc.rust-lang.org/std/marker/trait.Sync.html"><code><u>Sync</u></code></a>, and <a href="https://doc.rust-lang.org/std/marker/trait.Unpin.html"><code><u>Unpin</u></code></a>. None of the available solutions had all five traits handled. It turns out that most of them went with the paradigm of <a href="https://docs.rs/futures/latest/futures/sink/trait.Sink.html"><code><u>Sink</u></code></a> and <a href="https://docs.rs/futures/latest/futures/stream/trait.Stream.html"><code><u>Stream</u></code></a>, which provide a solid base from which to translate to <code>AsyncRead</code> and <code>AsyncWrite</code>. In fact some of the functions overlap, and can be passed through almost unchanged. For example, <code>poll_flush</code> and <code>poll_shutdown</code> have 1-to-1 analogs, and require almost no engineering effort to convert from <code>Sink</code> to <code>AsyncWrite</code>.</p>
            <pre><code>/// We use this struct to implement the traits we need on top of a WebSocketStream
pub struct HyperSocket&lt;S&gt;
where
    S: AsyncRead + AsyncWrite + Send + Sync + Unpin,
{
    inner: WebSocketStream&lt;S&gt;,
    read_state: Option&lt;ReadState&gt;,
    write_err: Option&lt;Error&gt;,
}

impl&lt;S&gt; AsyncWrite for HyperSocket&lt;S&gt;
where
    S: AsyncRead + AsyncWrite + Send + Sync + Unpin,
{
    fn poll_flush(mut self: Pin&lt;&amp;mut Self&gt;, cx: &amp;mut Context&lt;'_&gt;) -&gt; Poll&lt;io::Result&lt;()&gt;&gt; {
        match ready!(Pin::new(&amp;mut self.inner).poll_flush(cx)) {
            Ok(_) =&gt; Poll::Ready(Ok(())),
            Err(err) =&gt; Poll::Ready(Err(Error::new(ErrorKind::Other, err))),
        }
    }

    fn poll_shutdown(mut self: Pin&lt;&amp;mut Self&gt;, cx: &amp;mut Context&lt;'_&gt;) -&gt; Poll&lt;io::Result&lt;()&gt;&gt; {
        match ready!(Pin::new(&amp;mut self.inner).poll_close(cx)) {
            Ok(_) =&gt; Poll::Ready(Ok(())),
            Err(err) =&gt; Poll::Ready(Err(Error::new(ErrorKind::Other, err))),
        }
    }
}
</code></pre>
            <p>With that translation done, we can use an existing WebSocket library to upgrade our <code>SslStream</code> connection to a Cloudflare Tunnel, and wrap the result in our <code>AsyncRead/AsyncWrite</code> implementation. The result can then be used anywhere that our other transport streams would work, without any changes needed to the rest of our codebase! </p><p>That would look something like this:</p>
            <pre><code>let websocket = match tokio_tungstenite::client_async(request, ssl_stream).await {
    Ok(ws) =&gt; Ok(ws),
    Err(err) =&gt; {
        error!("Error during websocket conn setup: {}", err.to_string());
        return ConnectionError::InternalError;
    }
};
let websocket_stream = HyperSocket::new(websocket));
let _ = send_startup(&amp;mut websocket_stream, "db_user", "my_db").await;</code></pre>
            
    <div>
      <h2>Access granted</h2>
      <a href="#access-granted">
        
      </a>
    </div>
    <p>An observant reader might have noticed that in the code example above we snuck in a variable named request that we passed in when upgrading from an <code>SslStream to a WebSocketStream</code>. This is for multiple reasons. The first reason is that Tunnels are assigned a hostname and use this hostname for routing. The second and more interesting reason is that (as mentioned above) when negotiating an upgrade from HTTP to WebSocket, a request must be sent to the server hosting the ingress side of the Tunnel to <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Protocol_upgrade_mechanism"><u>perform the upgrade</u></a>. This is pretty universal, but we also add in an extra piece here.</p><p>At Cloudflare, we believe that <a href="https://blog.cloudflare.com/secure-by-default-understanding-new-cisa-guide/"><u>secure defaults</u></a> and <a href="https://www.cloudflare.com/learning/security/glossary/what-is-defense-in-depth/"><u>defense in depth</u></a> are the correct ways to build a better Internet. This is why traffic across Tunnels is encrypted, for example. However, that does not necessarily prevent unwanted traffic from being sent into your Tunnel, and therefore egressing out to your database. While Postgres offers a robust set of <a href="https://www.postgresql.org/docs/current/user-manag.html"><u>access control</u></a> options for protecting your database, wouldn’t it be best if unwanted traffic never got into your private network in the first place? </p><p>To that end, all <a href="https://developers.cloudflare.com/hyperdrive/configuration/connect-to-private-database/"><u>Tunnels set up for use with Hyperdrive</u></a> should have a <a href="https://developers.cloudflare.com/cloudflare-one/applications/"><u>Zero Trust Access Application</u></a> configured to protect them. These applications should use a <a href="https://developers.cloudflare.com/cloudflare-one/identity/service-tokens/"><u>Service Token</u></a> to authorize connections. When setting up a new Hyperdrive, you have the option to provide the token’s ID and Secret, which will be encrypted and stored alongside the rest of your configuration. These will be presented as part of the WebSocket upgrade request to authorize the connection, allowing your database traffic through while preventing unwanted access.</p><p>This can be done within the request’s headers, and might look something like this:</p>
            <pre><code>let ws_url = format!("wss://{}", host);
let mut request = match ws_url.into_client_request() {
    Ok(req) =&gt; req,
    Err(err) =&gt; {
        error!(
            "Hostname {} could not be parsed into a valid request URL: {}", 
            host,
            err.to_string()
        );
        return ConnectionError::InternalError;
    }
};
request.headers_mut().insert(
    "CF-Access-Client-Id",
    http::header::HeaderValue::from_str(&amp;client_id).unwrap(),
);
request.headers_mut().insert(
    "CF-Access-Client-Secret",
    http::header::HeaderValue::from_str(&amp;client_secret).unwrap(),
);
</code></pre>
            
    <div>
      <h2>Building for customer zero</h2>
      <a href="#building-for-customer-zero">
        
      </a>
    </div>
    <p>If you’ve been reading the blog for a long time, some of this might sound a bit familiar.  This isn’t the first time that we’ve <a href="https://blog.cloudflare.com/cloudflare-tunnel-for-postgres/"><u>sent Postgres traffic across a tunnel</u></a>, it’s something most of us do from our laptops regularly.  This works very well for interactive use cases with low traffic volume and a high tolerance for latency, but historically most of our products have not been able to employ the same approach.</p><p>Cloudflare operates <a href="https://www.cloudflare.com/network/"><u>many data centers</u></a> around the world, and most services run in every one of those data centers. There are some tasks, however, that make the most sense to run in a more centralized fashion. These include tasks such as managing control plane operations, or storing configuration state.  Nearly every Cloudflare product houses its control plane information in <a href="https://blog.cloudflare.com/performance-isolation-in-a-multi-tenant-database-environment/"><u>Postgres clusters</u></a> run centrally in a handful of our data centers, and we use a variety of approaches for accessing that centralized data from elsewhere in our network. For example, many services currently use a push-based model to publish updates to <a href="https://blog.cloudflare.com/moving-quicksilver-into-production/"><u>Quicksilver</u></a>, and work through the complexities implied by such a model. This has been a recurring challenge for any team looking to build a new product.</p><p>Hyperdrive’s entire reason for being is to make it easy to access such central databases from our global network. When we began exploring Tunnel integrations as a feature, many internal teams spoke up immediately and strongly suggested they’d be interested in using it themselves. This was an excellent opportunity for Cloudflare to scratch its own itch, while also getting a lot of traffic on a new feature before releasing it directly to the public. As always, being “customer zero” means that we get fast feedback, more reliability over time, stronger connections between teams, and an overall better suite of products. We jumped at the chance.</p><p>As we rolled out early versions of Tunnel integration, we worked closely with internal teams to get them access to it, and fixed any rough spots they encountered. We’re pleased to share that this first batch of teams have found great success building new or <a href="https://www.cloudflare.com/learning/cloud/how-to-refactor-applications/">refactored</a> products on Hyperdrive over Tunnels. For example: if you’ve already tried out <a href="https://blog.cloudflare.com/builder-day-2024-announcements/#continuous-integration-and-delivery"><u>Workers Builds</u></a>, or recently <a href="https://www.cloudflare.com/trust-hub/reporting-abuse/"><u>submitted an abuse report</u></a>, you’re among our first users!  At the time of this writing, we have several more internal teams working to onboard, and we on the Hyperdrive team are very excited to see all the different ways in which fast and simple connections from Workers to a centralized database can help Cloudflare just as much as they’ve been helping our external customers.</p>
    <div>
      <h2>Outro</h2>
      <a href="#outro">
        
      </a>
    </div>
    <p>Cloudflare is on a mission to make the Internet faster, safer, and more reliable. Hyperdrive was built to make connecting to centralized databases from the Workers runtime as quick and consistent as possible, and this latest development is designed to help all those who want to use Hyperdrive without directly exposing resources within their virtual private clouds (VPCs) on the public web.</p><p>To this end, we chose to build a solution around our suite of industry-leading <a href="https://developers.cloudflare.com/cloudflare-one/"><u>Zero Trust</u></a> tools, and were delighted to find how simple it was to implement in our runtime given the power and extensibility of the Rust <code>trait</code> system. </p><p>Without waiting for the ink to dry, multiple teams within Cloudflare have adopted this new feature to quickly and easily solve what have historically been complex challenges, and are happily operating it in production today.</p><p>And now, if you haven't already, try <a href="https://developers.cloudflare.com/hyperdrive/configuration/connect-to-private-database/"><u>setting up Hyperdrive across a Tunnel</u></a>, and let us know what you think in the <a href="https://discord.com/channels/595317990191398933/1150557986239021106"><u>Hyperdrive Discord channel</u></a>!</p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Deep Dive]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Hyperdrive]]></category>
            <category><![CDATA[Postgres]]></category>
            <category><![CDATA[SQL]]></category>
            <category><![CDATA[Rust]]></category>
            <category><![CDATA[WebSockets]]></category>
            <guid isPermaLink="false">5GK429XQHhFzVyKSXGZ2R6</guid>
            <dc:creator>Andrew Repp</dc:creator>
            <dc:creator>Emilio Assunção</dc:creator>
            <dc:creator>Abhishek Chanda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Supporting Postgres Named Prepared Statements in Hyperdrive]]></title>
            <link>https://blog.cloudflare.com/postgres-named-prepared-statements-supported-hyperdrive/</link>
            <pubDate>Fri, 28 Jun 2024 13:00:09 GMT</pubDate>
            <description><![CDATA[ Hyperdrive (Cloudflare’s globally distributed SQL connection pooler and cache) recently added support for Postgres protocol-level named prepared statements across pooled connections. We dive deep on what it took to add this feature ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Hyperdrive (Cloudflare’s globally distributed SQL connection pooler and cache) recently added support for Postgres protocol-level named prepared statements across pooled connections. Named prepared statements allow Postgres to cache query execution plans, providing potentially substantial performance improvements. Further, many popular drivers in the ecosystem use these by default, meaning that not having them is a bit of a footgun for developers. We are very excited that Hyperdrive’s users will now have access to better performance and a more seamless development experience, without needing to make any significant changes to their applications!</p><p>While we're not the first connection pooler to add this support (<a href="https://www.pgbouncer.org/">PgBouncer</a> got to it in October 2023 in <a href="https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_21_0">version 1.21</a>, for example), there were some unique challenges in how we implemented it. To that end, we wanted to do a deep dive on what it took for us to deliver this.</p>
    <div>
      <h3>Hyper-what?</h3>
      <a href="#hyper-what">
        
      </a>
    </div>
    <p>One of the classic problems of building on the web is that your users are everywhere, but your database tends to be in one spot.  Combine that with pesky limitations like network routing, or the speed of light, and you can often run into situations where your users feel the pain of having your database so far away. This can look like slower queries, slower startup times, and connection exhaustion as everything takes longer to accomplish.</p><p><a href="/hyperdrive-making-regional-databases-feel-distributed">Hyperdrive</a> is designed to make the centralized databases you already have feel like they’re global. We use our <a href="https://www.cloudflare.com/network/">global network</a> to get faster routes to your database, keep connection pools primed, and cache your most frequently run queries as close to users as possible.</p>
    <div>
      <h3>Postgres Message Protocol</h3>
      <a href="#postgres-message-protocol">
        
      </a>
    </div>
    <p>To understand exactly what the challenge with prepared statements is, it's first necessary to dig in a bit to the <a href="https://www.postgresql.org/docs/current/protocol-flow.html">Postgres Message Protocol</a>. Specifically, we are going to take a look at the protocol for an “extended” query, which uses different message types and is a bit more complex than a “simple” query, but which is more powerful and thus more widely used.</p><p>A query using Hyperdrive might be coded something like this, but a lot goes on under the hood in order for Postgres to reliably return your response.</p>
            <pre><code>import postgres from "postgres";

// with Hyperdrive, we don't have to disable prepared statements anymore!
// const sql = postgres(env.HYPERDRIVE.connectionString, {prepare: false});

// make a connection, with the default postgres.js settings (prepare is set to true)
const sql = postgres(env.HYPERDRIVE.connectionString);

// This sends the query, and while it looks like a single action it contains several 
// messages implied within it
let [{ a, b, c, id }] = await sql`SELECT a, b, c, id FROM hyper_test WHERE id = ${target_id}`;</code></pre>
            <p>To prepare a statement, a Postgres client begins by sending a <i>Parse</i> message. This includes the query string, the number of parameters to be interpolated, and the statement's name. The name is a key piece of this puzzle. If it is empty, then Postgres uses a special "unnamed" prepared statement slot that gets overwritten on each new <i>Parse</i>. These are relatively easy to support, as most drivers will keep the entirety of a message sequence for unnamed statements together, and will not try to get too aggressive about reusing the prepared statement because it is overwritten so often.</p><p>If the statement has a name, however, then it is kept prepared for the remainder of the Postgres session (unless it is explicitly removed with <i>DEALLOCATE</i>). This is convenient because parsing a query string and preparing the statement costs bytes sent on the wire and CPU cycles to process, so reusing a statement is quite a nice optimization.</p><p>Once done with <i>Parse</i>, there are a few remaining steps to (the simplest form of) an extended query:</p><ul><li><p>A <i>Bind</i> message, which provides the specific values to be passed for the parameters in the statement (if any).</p></li><li><p>An <i>Execute</i> message, which tells the Postgres server to actually perform the data retrieval and processing.</p></li><li><p>And finally a <i>Sync</i> message, which causes the server to close the implicit transaction, return results, and provides a synchronization point for error handling.</p></li></ul><p>While that is the core pattern for accomplishing an extended protocol query, there are many more complexities possible (named <i>Portal</i>, <i>ErrorResponse</i>, etc.).</p><p>We will briefly mention one other complexity we often encounter in this protocol, which is <i>Describe</i> messages. Many drivers leverage Postgres’ built-in types to help with deserialization of the results into structs or classes. This is accomplished by sending a <i>Parse-Describe-Flush/Sync</i> sequence, which will send a statement to be prepared, and will expect back information about the types and data the query will return. This complicates bookkeeping around named prepared statements, as now there are two separate queries, with two separate kinds of responses, that must be kept track of. We won’t go into much depth on the tradeoffs of an additional round-trip in exchange for advanced information about the results’ format, but suffice it to say that it must be handled explicitly in order for the overall system to gracefully support prepared statements.</p><p>So the basic query from our code above looks like this from a message perspective:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5ztflSTbecT9o4QU3YLDW3/4ee2003396e5b15a15dd2cb63cdd2711/unnamed-4.png" />
            
            </figure><p>A <a href="https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY">more complete description</a> and the <a href="https://www.postgresql.org/docs/current/protocol-message-formats.html">full structure of each message type</a> are well described in the Postgres documentation.</p><p>So, what's so hard about that?</p>
    <div>
      <h3>Buffering Messages</h3>
      <a href="#buffering-messages">
        
      </a>
    </div>
    <p>The first challenge that Hyperdrive must solve (that many other connection poolers don't have) is that it's also a cache.</p><p>The happiest path for a query on Hyperdrive never travels far, and we are quite proud of the low latency of our cache hits. However, this presents a particular challenge in the case of an extended protocol query. A <i>Parse</i> by itself is insufficient as a cache key, both because the parameter values in the <i>Bind</i> messages can alter the expected results, and because it might be followed up with either a <i>Describe</i> or an <i>Execute</i> message which will invoke drastically different responses.</p><p>So Hyperdrive cannot simply pass each message to the origin database, as we must buffer them in a message log until we have enough information to reliably distinguish between cache keys. It turns out that receiving a <i>Sync</i> is quite a natural point at which to check whether you have enough information to serve a response. For most scenarios, we buffer until we receive a <i>Sync</i>, and then (assuming the scenario is cacheable) we determine whether we can serve the response from cache or we need to take a connection to the origin database.</p>
    <div>
      <h3>Taking a Connection From the Pool</h3>
      <a href="#taking-a-connection-from-the-pool">
        
      </a>
    </div>
    <p>Assuming we aren't serving a response from cache, for whatever reason, we'll need to take an origin connection from our pool. One of the key advantages any connection pooler offers is in allowing many client connections to share few database connections, so minimizing how often and for how long these connections are held is crucial to making Hyperdrive performant.</p><p>To this end, <a href="https://developers.cloudflare.com/hyperdrive/configuration/how-hyperdrive-works/#connection-pooling">Hyperdrive operates</a> in what is traditionally called “transaction mode”. This means that a connection taken from the pool for any given transaction is returned once that transaction concludes. This is in contrast to what is often called “session mode”, where once a connection is taken from the pool it is held by the client until the client disconnects.</p><p>For Hyperdrive, allowing any client to take any database connection is vital. This is because if we "pin" a client to a given database connection then we have one fewer available for every other possible client. You can run yourself out of database connections very quickly once you start down that path, especially when your clients are many small Workers spread around the world.</p><p>The challenge prepared statements present to this scenario is that they exist at the "session" scope, which is to say, at the scope of one connection. If a client prepares a statement on connection A, but tries to reuse it and gets assigned connection B, Postgres will naturally throw an error claiming the statement doesn't exist in the given session. No results will be returned, the client is unhappy, and all that's left is to retry with a <i>Parse</i> message included. This causes extra round-trips between client and server, defeating the whole purpose of what is meant to be an optimization.</p><p>One of the goals of a connection pooler is to be as transparent to the client and server as possible. There are limitations, as Postgres will let you do some powerful things to session state that cannot be reasonably shared across arbitrary client connections, but to the extent possible the endpoints should not have to know or care about any multiplexing happening between them.</p><p>This means that when a client sends a <i>Parse</i> message on its connection, it should expect that the statement will be available for reuse when it wants to send a <i>Bind-Execute-Sync</i> sequence later on. It also means that the server should not get <i>Bind</i> messages for statements that only exist on some other session. Maintaining this illusion is the crux of providing support for this feature.</p>
    <div>
      <h3>Putting it all together</h3>
      <a href="#putting-it-all-together">
        
      </a>
    </div>
    <p>So, what does the solution look like? If a client sends <i>Parse-Bind-Execute-Sync</i> with a named prepared statement, then later sends <i>Bind-Execute-Sync</i> to reuse it, how can we make sure that everything happens as expected? The solution, it turns out, needs just a few built-in Rust data structures for efficiently capturing what we need (a <a href="https://doc.rust-lang.org/std/collections/struct.HashMap.html">HashMap</a>, some <a href="https://docs.rs/lru/latest/lru/struct.LruCache.html"><i>LruCaches</i></a> and a <a href="https://doc.rust-lang.org/std/collections/struct.VecDeque.html">VecDeque</a>), and some straightforward business logic to keep track of when to intervene in the messages being passed back and forth.</p><p>Whenever a named <i>Parse</i> comes in, we store it in an in-memory <i>HashMap</i> on the server that handles message processing for that client’s connection. This persists until the client is disconnected. This means that whenever we see anything referencing the statement, we can go retrieve the complete message defining it. We'll come back to this in a moment.</p><p>Once we've buffered all the messages we can and gotten to the point where it's time to return results (let's say because the client sent a <i>Sync</i>), we need to start applying some logic. For the sake of brevity we're going to omit talking through error handling here, as it does add some significant complexity but is somewhat out of scope for this discussion.</p><p>There are two main questions that determine how we should proceed:</p><ol><li><p>Does our message sequence include a <i>Parse</i>, or are we trying to reuse a pre-existing statement?</p></li><li><p>Do we have a cache hit or are we serving from the origin database?</p></li></ol><p>This gives us four scenarios to consider:</p><ol><li><p><i>Parse</i> with cache hit</p></li><li><p><i>Parse</i> with cache miss</p></li><li><p>Reuse with cache hit</p></li><li><p>Reuse with cache miss</p></li></ol><p>A <i>Parse</i> with a cache hit is the easiest path to address, as we don't need to do anything special. We use the messages sent as a cache key, and serve the results back to the client. We will still keep the <i>Parse</i> in our <i>HashMap</i> in case we want it later (#2 below), but otherwise we're good to go.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/33CSY4vk0u6lYLkKBRW7XH/291ea17f55196c35caee8d29d0f733a6/unnamed--1--4.png" />
            
            </figure><p>A <i>Parse</i> with a cache miss is a bit more complicated, as now we need to send these messages to the origin server. We take a connection at random from our pool and do so, passing the results back to the client. With that, we've begun to make changes to session state such that all our database connections are no longer identical to each other. To keep track of what we've done to muddy up our state, we keep a <i>LruCache</i> on each connection of which statements it already has prepared. In the case where we need to evict from such a cache, we will also <i>DEALLOCATE</i> the statement on the connection to keep things tracked correctly.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3NFnGuHT6uAmr2dMRdp41m/a95422a38aa44e7720cbdcca7bd73513/unnamed--2--2.png" />
            
            </figure><p>Reuse with a cache hit is yet more tricky, but still straightforward enough. In the example below, we are sent a <i>Bind</i> with the same parameters twice (#1 and #9). We must identify that we received a <i>Bind</i> without a preceding <i>Parse</i>, we must go retrieve that <i>Parse</i> (#10), and we must use the information from it to build our cache key. Once all that is accomplished, we can serve our results from cache, needing only to trim out the <i>ParseComplete</i> within the cached results before returning them to the client.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3ksUndQnXpbjzu6Ggm3veo/9fa185c8c8a26829a1c4894efd24ccaa/unnamed--3--2.png" />
            
            </figure><p>Reuse with a cache miss is the hardest scenario, as it may require us to lie in both directions. In the example below, we cache results for one set of parameters (#8), but are sent a <i>Bind</i> with different parameters (#9). As in the cache hit scenario, we must identify that we were not sent a <i>Parse</i> as part of the current message sequence, retrieve it from our <i>HashMap</i> (#10), and build our cache key to GET from cache and confirm the miss (#11). Once we take a connection from the pool, though, we then need to check if it already has the statement we want prepared. If not, we must take our saved <i>Parse</i> and prepend it to our message log to be sent along to the origin database (#13). Thus, what the server receives looks like a perfectly valid <i>Parse-Bind-Execute-Sync</i> sequence. This is where our <i>VecDeque</i> (mentioned above) comes in, as converting our message log to that structure allowed us to very ergonomically make such changes without needing to rebuild the whole byte sequence. Once we receive the response from the server, all that's needed is to trim out the initial <i>ParseComplete</i> response from the server, as a well-made client would likely be very confused receiving such a response to a <i>Parse</i> it didn't send. With that message trimmed out, however, the client is in the position of getting exactly what it asked for, and both sides of the conversation are happy.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/lWQHcpash88r4sjmT3Thq/993c69dfe03c4a8bb0c4e472df2b4a7a/unnamed--4--1.png" />
            
            </figure>
    <div>
      <h3>Dénouement</h3>
      <a href="#denouement">
        
      </a>
    </div>
    <p>Now that we've got a working solution, where all parties are functioning well, let's review! Our solution lets us share database connections across arbitrary clients with no "pinning", no custom handling on either client or server, and supports reuse of prepared statements to reduce CPU load on re-parsing queries and reduce network traffic on re-sending <i>Parse</i> messages. Engineering always involves tradeoffs, so the cost of this is that we will sometimes still need to sneak in a <i>Parse</i> because a client got assigned a different connection on reuse, and in those scenarios there is a small amount of additional memory overhead because the same statement is prepared on multiple connections.</p><p>And now, if you haven't already, go give <a href="https://developers.cloudflare.com/hyperdrive/">Hyperdrive</a> a spin, and let us know what you think in the <a href="https://discord.com/channels/595317990191398933/1150557986239021106">Hyperdrive Discord channel</a>!</p> ]]></content:encoded>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Hyperdrive]]></category>
            <category><![CDATA[Postgres]]></category>
            <category><![CDATA[SQL]]></category>
            <category><![CDATA[Message Protocol]]></category>
            <category><![CDATA[Prepared Statements]]></category>
            <guid isPermaLink="false">65jmeFCIBZN3YdPbpEwmY1</guid>
            <dc:creator>Andrew Repp</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing Relational Database Connectors]]></title>
            <link>https://blog.cloudflare.com/relational-database-connectors/</link>
            <pubDate>Mon, 15 Nov 2021 13:59:29 GMT</pubDate>
            <description><![CDATA[ Customers can connect to a Postgres or MySQL database directly from their Workers using a Cloudflare Tunnel today. In the future, you can use Database Connectors to achieve this natively using a standardized Socket API. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>At Cloudflare, we’re building the best compute platform in the world. We want to make it easy, seamless, and obvious to build your applications with us. But simply making the best compute platform is not enough — at the heart of your applications are the data they interact with.</p><p>Cloudflare has multiple data storage solutions available today: <a href="/introducing-workers-kv/">Workers KV</a>, <a href="/introducing-r2-object-storage/">R2</a>, and <a href="/introducing-workers-durable-objects/">Durable Objects</a>. All three follow Cloudflare’s design goals for Workers: global by default, infinitely scalable, and delightful for developers to use. We’ve partnered with third-party storage solutions like Fauna, MongoDB and Prisma, who have built data platforms that align beautifully with our design goals and written tutorials for databases that already support HTTP connections.</p><p>The one area that’s been sorely missed: relational databases. Cloudflare itself runs on relational databases, and we’re not alone. In April, we asked <a href="https://workers.cloudflare.com/node">which Node libraries</a> you wanted us to support, and <b>four of the top five requests</b> were related to databases. For this Full Stack Week, we asked ourselves: how could we support relational databases in a way that aligned with our design goals?</p><p>Today, we’re taking a first step towards that world by announcing support for relational databases, including Postgres and MySQL from Workers.</p><p>Connecting to a database is no simple task — if it were as easy as passing a connection string to a database driver, we would have already done it. We’ve had to overcome several hurdles to reach this point, and have several more still to conquer.  </p><p>Our goal with this announcement is to work with you, our developers, to solve the unique pain points that come from accessing databases inside Workers. If you’d like to work with us, fill out <a href="https://www.cloudflare.com/database-connectors-early-access">this form</a> or join us <a href="https://discord.gg/rH4SsffFcc">on Discord</a> — this is just the beginning. If you’d just like to grab the code and play around, use this <a href="https://developers.cloudflare.com/workers/tutorials/query-postgres-from-workers-using-database-connectors">example</a> to get started connecting to your own database, or check out our demo.</p>
    <div>
      <h3>Why are Database Connectors so hard to build?</h3>
      <a href="#why-are-database-connectors-so-hard-to-build">
        
      </a>
    </div>
    <p>Serverless database connections are challenging to support for several reasons.</p><p>Databases are needy — they often require TCP connections, since they assume long-lived connections between an application server and the database. The Workers runtime doesn’t currently support TCP connections, so we’ve only been able to support HTTP-based databases or proxies.</p><p>Like a relationship, establishing a connection isn’t quite enough. Developers use client libraries for databases to make submitting queries and managing the responses easy. Since the Workers runtime is not entirely Node.js compatible, we need to either roll our own database library or find one that does not use unsupported built-in libraries.</p><p>Finally, databases are sensitive. It often takes external libraries to manage shared connections between an application server and a database, since these connections tend to be expensive to establish.</p>
    <div>
      <h3>Moving past these challenges</h3>
      <a href="#moving-past-these-challenges">
        
      </a>
    </div>
    <p>Our approach today gives us the foundation to address each of these challenges in creative ways going forward.</p><p>First, we’re leveraging <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup">cloudflared</a> to create a secure tunnel between Cloudflare and a private network within your existing infrastructure. Cloudflared already supports proxying HTTP to TCP over WebSockets — Our challenge is providing interfaces that look like the socket interfaces existing libraries expect, while rewiring the implementations to redirect reads and writes to our websocket. This method is fast, safe, and secure; but limiting in that we lack control of where to direct the final connections. This is a problem we will solve soon, but until then our approach is essential to gathering latency and performance data to see where else we need to improve.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/vVs0KefQQxbNEt3VsNHHj/41225487a689b82c04c9ef7beb7d8ae2/unnamed-10.png" />
            
            </figure><p>Next, we’ve created a shim-layer that adapts the socket API from a popular runtime to connect directly to databases using a WebSocket. This allows us to bundle code as-is, without forking or otherwise making significant changes to the database library. As part of this announcement, we’ve published a <a href="https://developers.cloudflare.com/workers/tutorials/query-postgres-from-workers-using-database-connectors">tutorial</a> on how to connect to and query a Postgres database from your Workers, using existing Cloudflare technology and a driver from the growing community at Deno. We’re excited to work with the upstream maintainers, on expanding support.</p><p>Finally, we’re most excited for how this approach will let us begin to manage connection pooling and connection establishment overhead. While our current tech demo requires setting up the Cloudflare Tunnel on your own infrastructure, we’re looking for customers who’d like to pilot a model where Cloudflare hosts the tunnel for you.</p>
    <div>
      <h3>Where we’re going</h3>
      <a href="#where-were-going">
        
      </a>
    </div>
    <p>We’re just getting started. Our goal with today’s announcement is to find customers who are looking to build new applications or migrate existing applications to Workers while working with data that’s stored in a relational database.</p><p>Just as Cloudflare started by providing security, performance, and reliability for customer’s websites, we’re excited about a future where Cloudflare manages database connections, handles replication of data across cloud providers and provides low-latency access to data globally.</p><p>First, we’re looking to add <a href="/introducing-socket-workers/">support for TCP into the runtime natively</a>. With native support for TCP we’ll not only have better support for databases, but expand the Workers runtime to work with data infrastructure more broadly.</p><p>Our position in the network layer of the stack makes providing performance, security benefits and extremely reduced egress costs to global databases all possible realities. To do so, we’ll repurpose the HTTP to TCP proxy service that we’ve currently built and run it for developers as a connection pooling service, managing connections to their databases on their behalf.</p><p>Finally, our network makes caching data and making it accessible globally at low latency possible. Once we have connections back to your data, making it globally accessible in Cloudflare’s network will unlock fundamentally new architectures for distributed data.</p>
    <div>
      <h3>Take our connectors for a spin</h3>
      <a href="#take-our-connectors-for-a-spin">
        
      </a>
    </div>
    <p>Want to check things out? There are three main steps to getting up-and-running:</p><ol><li><p>Deploying cloudflared within your infrastructure.</p></li><li><p>Deploying a database that connects to cloudflared.</p></li><li><p>Deploying a Worker with the database driver that submits queries.</p></li></ol><p>The Postgres tutorial is available <a href="https://developers.cloudflare.com/workers/tutorials/query-postgres-from-workers-using-database-connectors">here</a>.</p><p>When you’re all done, it’ll look a little something like this:</p>
            <pre><code>import { Client } from './driver/postgres/postgres'

export default {
  async fetch(request: Request, env, ctx: ExecutionContext) {
    try {
      const client = new Client({
        user: 'postgres',
        database: 'postgres',
        hostname: 'https://db.example.com',
        password: '',
        port: 5432,
      })
      await client.connect()
      const result = await client.queryArray('SELECT * FROM users WHERE uuid=1;')
      ctx.waitUntil(client.end())
      return new Response(JSON.stringify(result.rows[0]))
    } catch (e) {
      return new Response((e as Error).message)
    }
  },
}</code></pre>
            <p>Hit any snags? Fill out <a href="https://www.cloudflare.com/database-connectors-early-access">this form</a>, join <a href="https://discord.gg/rH4SsffFcc">our Discord</a> or shoot us an <a>email</a> and let’s chat!</p> ]]></content:encoded>
            <category><![CDATA[Full Stack Week]]></category>
            <category><![CDATA[Postgres]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <guid isPermaLink="false">CinJ7mVjQHrKXIimcCbNR</guid>
            <dc:creator>Kabir Sikand</dc:creator>
            <dc:creator>Greg McKeon</dc:creator>
            <dc:creator>Ben Yule</dc:creator>
        </item>
        <item>
            <title><![CDATA[Modernizing a familiar approach to REST APIs, with PostgreSQL and Cloudflare Workers]]></title>
            <link>https://blog.cloudflare.com/modernizing-a-familiar-approach-to-rest-apis-with-postgresql-and-cloudflare-workers/</link>
            <pubDate>Wed, 04 Aug 2021 12:56:38 GMT</pubDate>
            <description><![CDATA[ By using PostgREST with Postgres, we can build REST API-based applications. In particular, it's an excellent fit for Cloudflare Workers, our serverless function platform. Workers is a great place to build REST APIs. ]]></description>
            <content:encoded><![CDATA[ <p><a href="http://postgresql.com/">Postgres</a> is a ubiquitous open-source database technology. It contains a vast number of features and offers rock-solid reliability. It's also one of the most popular <a href="https://www.cloudflare.com/developer-platform/products/d1/">SQL database tools</a> in the industry. As the industry builds “modern” developer experience tools—real-time and highly interactive—Postgres has also served as a great foundation. Projects like <a href="https://hasura.io/">Hasura</a>, which offers a real-time GraphQL engine, and <a href="https://supabase.io/">Supabase</a>, an open-source Firebase alternative, use Postgres under the hood. This makes Postgres a technology that every developer should know, and consider using in their applications.</p><p>For many developers, REST APIs serve as the primary way we interact with our data. Language-specific libraries like <a href="https://node-postgres.com"><code>pg</code></a> allow developers to connect with Postgres in their code, and directly interact with their databases. Yet in almost every case, developers reinvent the wheel, building the same connection logic on an app-by-app basis.</p><p>Many developers building applications with <a href="https://workers.cloudflare.com/">Cloudflare Workers</a>, our serverless functions platform, have asked how they can use Postgres in Workers functions. Today, we're releasing <a href="https://developers.cloudflare.com/workers/tutorials/postgres">a new tutorial for Workers</a> that shows how to connect to Postgres inside Workers functions. Built on <a href="http://postgrest.com/">PostgREST</a>, you'll write a REST API that communicates directly with your database, on the edge.</p><p>This means that you can entirely build applications on Cloudflare’s edge — using Workers as a performant and globally-distributed API, and <a href="https://pages.cloudflare.com/">Cloudflare Pages</a>, our Jamstack deployment platform, as the <a href="https://www.cloudflare.com/developer-platform/solutions/hosting/">host for your frontend user interface</a>. With Workers, you can add new API endpoints and handle authentication <i>in front</i> of your database without needing to alter your Postgres configuration. With features like Workers KV and Durable Objects, Workers can provide globally-distributed caching in front of your Postgres database. <a href="/introducing-websockets-in-workers/">Features like WebSockets</a> can be used to build real-time interactions for your applications, without having to migrate from Postgres to a new database-as-a-service platform.</p><p>PostgREST is an open-source tool that generates a standards-compliant REST API for your Postgres databases. Many growing database-as-a-service startups like <a href="https://retool.com/">Retool</a> and <a href="http://supabase.com/">Supabase</a> use PostgREST under the hood. PostgREST is fast and has great defaults, allowing you to access your Postgres data using standard REST conventions.</p><p>It’s great to be able to access your database directly from Workers, but do you really want to expose your database directly to the public Internet? Luckily, Cloudflare has a solution for this, and it works great with PostgREST: <a href="https://www.cloudflare.com/products/tunnel/">Cloudflare Tunnel</a>. Cloudflare Tunnel is one of my personal favorite products at Cloudflare. It creates a secure tunnel between your local server and the Cloudflare network. We want to expose our PostgREST endpoint, without making our entire database available on the public internet. Cloudflare Tunnel allows us to do that securely.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3yA7ys92hJceHCMsd04rkT/edb36c84a2ea43d56c4f3374c60a3a0b/image1-4.png" />
            
            </figure><p>By using PostgREST with Postgres, we can build REST API-based applications. In particular, it's an excellent fit for Cloudflare Workers, our serverless function platform. Workers is a great place to build REST APIs. With the open-source JavaScript library <a href="https://github.com/supabase/postgrest-js"><code>postgrest-js</code></a>, we can interact with a PostgREST endpoint from inside our Workers function, using simple JS-based primitives.</p><p><i>By the way — if you haven't built a REST API with Workers yet, </i><a href="https://egghead.io/courses/build-a-serverless-api-with-cloudflare-workers-d67ca551?af=a54gwi"><i>check out our free video course with Egghead: "Building a Serverless API with Cloudflare Workers"</i></a><i>.</i></p><p>Scaling applications built on Postgres is an incredibly common problem that developers face. Often, this means duplicating your Postgres database and distributing reads between your primary database, and a fleet of “read replicas”. With PostgREST and Workers, we can begin to explore a different approach to solving the scaling problem. <a href="https://developers.cloudflare.com/workers/learning/how-workers-works">Workers' unique architecture</a> allows us to deploy hyper-performant functions <i>in front</i> of Postgres databases. With tools like Workers KV and Durable Objects, exposed in Workers as basic JavaScript APIs, we can build intelligent caches for our databases, without sacrificing performance or developer experience.</p><p>If you'd like to learn more about building REST APIs in Cloudflare Workers using PostgREST, <a href="https://developers.cloudflare.com/workers/tutorials/postgres">check out our new tutorial</a>! We've also provided two open-source libraries to help you get started. <a href="https://github.com/cloudflare/postgres-postgrest-cloudflared-example"><code>cloudflare/postgres-postgrest-cloudflared-example</code></a> helps you set up a Cloudflare Tunnel-backed Postgres + PostgREST endpoint. <a href="https://github.com/cloudflare/postgrest-worker-example"><code>postgrest-worker-example</code></a> is an example of using postgrest-js inside of Cloudflare Workers, to build REST APIs with your Postgres databases.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3qDgnCYYBrkCji8WvUgM33/09114cef001625e6e56236d2a3575cc0/image2-4.png" />
            
            </figure><p>With <code>postgrest-js</code>, you can build dynamic queries and request data from your database using the JS primitives you know and love:</p>
            <pre><code>// Get all users with at least 100 followers
const { data: users, error } = await client
.from('users')
.select(‘*’)
.gte('followers', 100)</code></pre>
            <p>You can also join our Cloudflare Developers Discord community! Learn more about what you can build with Cloudflare Workers, and meet our wonderful community of developers from around the world. <a href="https://discord.gg/cloudflaredev">Get your invite link here.</a></p> ]]></content:encoded>
            <category><![CDATA[Cloudflare Tunnel]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Postgres]]></category>
            <guid isPermaLink="false">8tTIUN8pM2HWOKhmrZfSG</guid>
            <dc:creator>Kristian Freeman</dc:creator>
        </item>
        <item>
            <title><![CDATA[A Byzantine failure in the real world]]></title>
            <link>https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/</link>
            <pubDate>Fri, 27 Nov 2020 12:00:00 GMT</pubDate>
            <description><![CDATA[ At Cloudflare, we are always on the lookout for Single Points of Failure. In this post, we explore the role a failure mode known as a Byzantine fault played in a a real-world incident. ]]></description>
            <content:encoded><![CDATA[ <p><i>An analysis of the Cloudflare API availability incident on 2020-11-02</i></p><p>When we review design documents at Cloudflare, we are always on the lookout for Single Points of Failure (SPOFs). Eliminating these is a necessary step in architecting a system you can be confident in. Ironically, when you’re designing a system with built-in redundancy, you spend most of your time thinking about how well it functions when that redundancy is lost.</p><p>On November 2, 2020, Cloudflare had an <a href="https://www.cloudflarestatus.com/incidents/9ggr0k6dwzwg">incident</a> that impacted the availability of the API and dashboard for six hours and 33 minutes. During this incident, the success rate for queries to our API periodically dipped as low as 75%, and the dashboard experience was as much as 80 times slower than normal. While Cloudflare’s edge is massively distributed across the world (and kept working without a hitch), Cloudflare’s control plane (API &amp; dashboard) is made up of a large number of microservices that are redundant across two regions. For most services, the databases backing those microservices are only writable in one region at a time.</p><p>Each of Cloudflare’s control plane data centers has multiple racks of servers. Each of those racks has two switches that operate as a pair—both are normally active, but either can handle the load if the other fails. Cloudflare survives rack-level failures by spreading the most critical services across racks. Every piece of hardware has two or more power supplies with different power feeds. Every server that stores critical data uses RAID 10 redundant disks or storage systems that replicate data across at least three machines in different racks, or both. Redundancy at each layer is something we review and require. So—how could things go wrong?</p><p>In this post we present a timeline of what happened, and how a difficult failure mode known as a Byzantine fault played a role in a cascading series of events.</p>
    <div>
      <h3>2020-11-02 14:43 UTC: Partial Switch Failure</h3>
      <a href="#2020-11-02-14-43-utc-partial-switch-failure">
        
      </a>
    </div>
    <p>At 14:43, a network switch started misbehaving. Alerts began firing about the switch being unreachable to pings. The device was in a partially operating state: network control plane protocols such as <a href="https://en.wikipedia.org/wiki/Link_aggregation#Link_Aggregation_Control_Protocol">LACP</a> and <a href="https://www.cloudflare.com/learning/security/glossary/what-is-bgp/">BGP</a> remained operational, while others, such as vPC, were not. The vPC link is used to synchronize ports across multiple switches, so that they appear as one large, aggregated switch to servers connected to them. At the same time, the data plane (or forwarding plane) was not processing and forwarding all the packets received from connected devices.</p><p>This failure scenario is completely invisible to the connected nodes, as each server only sees an issue for some of its traffic due to the load-balancing nature of LACP. Had the switch failed fully, all traffic would have failed over to the peer switch, as the connected links would've simply gone down, and the ports would've dropped out of the forwarding LACP bundles.</p><p>Six minutes later, the switch recovered without human intervention. But this odd failure mode led to further problems that lasted long after the switch had returned to normal operation.</p>
    <div>
      <h3>2020-11-02 14:44 UTC: etcd Errors begin</h3>
      <a href="#2020-11-02-14-44-utc-etcd-errors-begin">
        
      </a>
    </div>
    <p>The rack with the misbehaving switch included one server in our etcd cluster. We use <a href="https://etcd.io/">etcd</a> heavily in our core data centers whenever we need strongly consistent data storage that’s reliable across multiple nodes.</p><p>In the event that the cluster leader fails, etcd uses the <a href="https://raft.github.io/">RAFT</a> protocol to maintain consistency and establish consensus to promote a new leader. In the RAFT protocol, cluster members are assumed to be either available or unavailable, and to provide accurate information or none at all. This works fine when a machine crashes, but is not always able to handle situations where different members of the cluster have conflicting information.</p><p>In this particular situation:</p><ul><li><p>Network traffic between node 1 (in the affected rack) and node 3 (the leader) was being sent through the switch in the degraded state,</p></li><li><p>Network traffic between node 1 and node 2 were going through its working peer, and</p></li><li><p>Network traffic between node 2 and node 3 was unaffected.</p></li></ul><p>This caused cluster members to have conflicting views of reality, known in distributed systems theory as a <a href="https://en.wikipedia.org/wiki/Byzantine_fault">Byzantine fault</a>. As a consequence of this conflicting information, node 1 repeatedly initiated leader elections, voting for itself, while node 2 repeatedly voted for node 3, which it could still connect to. This resulted in ties that did not promote a leader node 1 could reach. RAFT leader elections are disruptive, blocking all writes until they're resolved, so this made the cluster read-only until the faulty switch recovered and node 1 could once again reach node 3.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/37eeMzKeltqOt105GJ4ctK/0a0a320ace579b8f28cc2aded23abc9a/image1-20.png" />
            
            </figure>
    <div>
      <h3>2020-11-02 14:45 UTC: Database system promotes a new primary database</h3>
      <a href="#2020-11-02-14-45-utc-database-system-promotes-a-new-primary-database">
        
      </a>
    </div>
    <p>Cloudflare’s control plane services use relational databases hosted across multiple clusters within a data center. Each cluster is configured for <a href="https://www.cloudflare.com/learning/performance/glossary/application-availability/">high availability</a>. The cluster setup includes a primary database, a synchronous replica, and one or more asynchronous replicas. This setup allows redundancy within a data center. For cross-datacenter redundancy, a similar high availability secondary cluster is set up and replicated in a geographically dispersed data center for disaster recovery. The cluster management system leverages etcd for cluster member discovery and coordination.</p><p>When etcd became read-only, two clusters were unable to communicate that they had a healthy primary database. This triggered the automatic promotion of a synchronous database replica to become the new primary. This process happened automatically and without error or data loss.</p><p>There was a defect in our cluster management system that requires a rebuild of all database replicas when a new primary database is promoted. So, although the new primary database was available instantly, the replicas would take considerable time to become available, depending on the size of the database. For one of the clusters, service was restored quickly. Synchronous and asynchronous database replicas were rebuilt and started replicating successfully from primary, and the impact was minimal.</p><p>For the other cluster, however, performant operation of that database <i>required</i> a replica to be online. Because this database handles authentication for API calls and dashboard activities, it takes a lot of reads, and one replica was heavily utilized to spare the primary the load. When this failover happened and no replicas were available, the primary was overloaded, as it had to take all of the load. This is when the main impact started.</p>
    <div>
      <h3>Reduce Load, Leverage Redundancy</h3>
      <a href="#reduce-load-leverage-redundancy">
        
      </a>
    </div>
    <p>At this point we saw that our primary authentication database was overwhelmed and began shedding load from it. We dialed back the rate at which we push SSL certificates to the edge, send emails, and other features, to give it space to handle the additional load. Unfortunately, because of its size, we knew it would take several hours for a replica to be fully rebuilt.</p><p>A silver lining here is that every database cluster in our primary data center also has online replicas in our secondary data center. Those replicas are not part of the local failover process, and were online and available throughout the incident. The process of steering read-queries to those replicas was not yet automated, so we manually diverted API traffic that could leverage those read replicas to the secondary data center. This substantially improved our API availability.</p>
    <div>
      <h3>The Dashboard</h3>
      <a href="#the-dashboard">
        
      </a>
    </div>
    <p>The Cloudflare dashboard, like most web applications, has the notion of a user session. When user sessions are created (each time a user logs in) we perform some database operations and keep data in a Redis cluster for the duration of that user’s session. Unlike our API calls, our user sessions cannot currently be moved across the ocean without disruption. As we took actions to improve the availability of our API calls, we were unfortunately making the user experience on the dashboard worse.</p><p>This is an area of the system that is currently designed to be able to fail over across data centers in the event of a disaster, but has not yet been designed to work in both data centers at the same time. After a first period in which users on the dashboard became increasingly frustrated, we failed the authentication calls fully back to our primary data center, and kept working on our primary database to ensure we could provide the best service levels possible in that degraded state.</p>
    <div>
      <h3>2020-11-02 21:20 UTC Database Replica Rebuilt</h3>
      <a href="#2020-11-02-21-20-utc-database-replica-rebuilt">
        
      </a>
    </div>
    <p>The instant the first database replica rebuilt, it put itself back into service, and performance resumed to normal levels. We re-ramped all of the services that had been turned down, so all asynchronous processing could catch up, and after a period of monitoring marked the end of the incident.</p>
    <div>
      <h3>Redundant Points of Failure</h3>
      <a href="#redundant-points-of-failure">
        
      </a>
    </div>
    <p>The cascade of failures in this incident was interesting because each system, on its face, had redundancy. Moreover, no system fully failed—each entered a degraded state. That combination meant the chain of events that transpired was considerably harder to model and anticipate. It was frustrating yet reassuring that some of the possible failure modes were already being addressed.</p><p>A team was already working on fixing the limitation that requires a database replica rebuild upon promotion. Our user sessions system was inflexible in scenarios where we’d like to steer traffic around, and redesigning that was already in progress.</p><p>This incident also led us to revisit the configuration parameters we put in place for things that auto-remediate. In previous years, promoting a database replica to primary took far longer than we liked, so getting that process automated and able to trigger on a minute’s notice was a point of pride. At the same time, for at least one of our databases, the cure may be worse than the disease, and in fact we may not want to invoke the promotion process so quickly. Immediately after this incident we adjusted that configuration accordingly.</p><p>Byzantine Fault Tolerance (BFT) is a hot research topic. Solutions have been known since 1982, but have had to choose between a variety of engineering tradeoffs, including security, performance, and algorithmic simplicity. Most general-purpose cluster management systems choose to forgo BFT entirely and use protocols based on PAXOS, or simplifications of PAXOS such as RAFT, that perform better and are easier to understand than BFT consensus protocols. In many cases, a simple protocol that is known to be vulnerable to a rare failure mode is safer than a complex protocol that is difficult to implement correctly or debug.</p><p>The first uses of BFT consensus were in safety-critical systems such as aircraft and spacecraft controls. These systems typically have hard real time latency constraints that require tightly coupling consensus with application logic in ways that make these implementations unsuitable for general-purpose services like etcd. Contemporary research on BFT consensus is mostly focused on applications that cross trust boundaries, which need to protect against malicious cluster members as well as malfunctioning cluster members. These designs are more suitable for implementing general-purpose services such as etcd, and we look forward to collaborating with researchers and the open source community to make them suitable for production cluster management.</p><p>We are very sorry for the difficulty the outage caused, and are continuing to improve as our systems grow. We’ve since fixed the bug in our cluster management system, and are continuing to tune each of the systems involved in this incident to be more resilient to failures of their dependencies.  If you’re interested in helping solve these problems at scale, please visit <a href="https://www.cloudflare.com/careers/">cloudflare.com/careers</a>.</p>
    <div>
      <h3>Postscript</h3>
      <a href="#postscript">
        
      </a>
    </div>
    <p>The distributed systems community has pointed out that the failure we've encountered would be better characterized as an omission fault rather than a Byzantine fault. Omission faults are much more specific and can be tolerated without BFT protocols.</p><p>We’re grateful to all those who read and critiqued this post and will be following up with a detailed post about different fault types in distributed systems soon. Stay tuned.</p> ]]></content:encoded>
            <category><![CDATA[Post Mortem]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Postgres]]></category>
            <category><![CDATA[Outage]]></category>
            <guid isPermaLink="false">7cztH7o1B3T3GmwGGNBZ6v</guid>
            <dc:creator>Tom Lianza</dc:creator>
            <dc:creator>Chris Snook</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing CFSSL 1.2]]></title>
            <link>https://blog.cloudflare.com/introducing-cfssl-1-2/</link>
            <pubDate>Thu, 31 Mar 2016 12:00:00 GMT</pubDate>
            <description><![CDATA[ Continuing our commitment to high quality open-source software, we’re happy to announce release 1.2 of CFSSL, our TLS/PKI Swiss Army knife. We haven’t written much about CFSSL here since we originally open sourced the project in 2014, so we thought we’d provide an update. ]]></description>
            <content:encoded><![CDATA[ <p>Continuing our commitment to high quality open-source software, we’re happy to announce release 1.2 of CFSSL, our TLS/PKI Swiss Army knife. We haven’t written much about CFSSL here since we <a href="/introducing-cfssl/">originally open sourced the project</a> in 2014, so we thought we’d provide an update. In the last 20 months, we have added a ton of great features, and CFSSL has attracted an active community of users and <a href="https://github.com/cloudflare/cfssl/graphs/contributors">contributors</a>. Users range from large SaaS providers (Heroku) to game companies (Riot Games) and the newest Certificate Authority (Let’s Encrypt). For them and for CloudFlare, CFSSL has become a core tool for automating certificates and TLS configurations. With added support for configuration scanning, automated provisioning via the transport package, revocation, certificate transparency and PKCS#11, CFSSL is now even more powerful.</p><p>We’re also happy to announce CFSSL’s new home: <a href="http://cfssl.org">cfssl.org</a>. From there you can try out CFSSL’s user interface, download binaries, and test some of its features.</p>
    <div>
      <h3>Motivation</h3>
      <a href="#motivation">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4dDeCSuAPv9TzI9o5urty5/d62243397e9a4bfc071af63314ab9950/image_0.jpg" />
            
            </figure><p><a href="https://commons.wikimedia.org/wiki/File:NSA_Muscular_Google_Cloud.jpg">Licensing: Public Domain</a></p><p>This 2013 National Security Agency (NSA) slide describing how data from Google’s internal network was collected by intelligence agencies was eye-opening—and shocking—to many technology companies. The idea that an attacker could read messages passed between services wasn’t technically groundbreaking, but it did reveal a security flaw in the way many distributed systems were designed. Many companies only encrypted the data to the border of their datacenter, not inside. The slide showed that private physical networks are being subverted to extract data passing through them. And just because a network has a <a href="https://www.cloudflare.com/learning/access-management/what-is-the-network-perimeter/">security perimeter</a>, it doesn’t mean that data can be safely sent between applications unencrypted inside that perimeter. In short: treat your own network as hostile.</p><p>This mentality helped shape CloudFlare’s philosophy for securing internal services and resulted in a simple rule:</p><blockquote><p>Services should only communicate with each other using encrypted and mutually authenticated protocols.</p></blockquote><p>With this in mind we started tackling the harder problem of how to manage the encryption keys for these services. To tackle the issue of service-to-service encryption, <a href="/how-to-build-your-own-public-key-infrastructure/">we built our own public key infrastructure</a> using CFSSL. Much of the new features we’re introducing in this post came about from our effort to make this system robust.</p><p>We have also made an effort to use standards-compliant and interoperable technology. By incorporating support for <a href="http://www.certificate-transparency.org/">certificate transparency</a>, <a href="https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol">OSCP</a>, and <a href="https://en.wikipedia.org/wiki/Revocation_list">CRL</a>, the standards used by the public Internet can now be used in your private infrastructure. Now, on to the new features.</p>
    <div>
      <h3>Scan</h3>
      <a href="#scan">
        
      </a>
    </div>
    <p>CFSSL now has a full-featured TLS endpoint scanner.</p><p>Just because a server uses encryption, it doesn’t mean that it is secure. There have been a <a href="https://www.youtube.com/watch?v=oovK9YkJ8Co">series of vulnerabilities</a> in TLS that only affect some configurations. To keep your server and its visitors protected against the nearly monthly new attacks you need to pick the right configuration. Staying secure requires testing your configuration against the latest vulnerabilities, and keeping your configuration updated against new threats.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7o6R0dw4TmHnB7cDYIwsvY/7ac9e5e632520f3ca97851f1e7fed8d4/image_1.png" />
            
            </figure><p>History of vulnerabilities in SSL/TLS</p><p>The gold standard for testing a website’s TLS configuration is Ivan Ristić’s <a href="https://www.ssllabs.com">SSL Labs</a>. It provides a simple letter grade for your site’s configuration (sites using CloudFlare get an A, by the way, and A+ if you <a href="/enforce-web-policy-with-hypertext-strict-transport-security-hsts/">enable HSTS</a>). The drawback of SSL Labs is that it only works on public websites: you can’t use it for internal services. At CloudFlare, we needed an easy way to check the configuration of our services as well as our customers’ origins (which are typically not publicly accessible).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3RPq6XPIXqld31gzAhaqVl/6b925d22874b349e657aeddb99459f78/image_2.png" />
            
            </figure><p>To solve this, CloudFlare added functionality to CFSSL to scan a TLS endpoint to evaluate how securely it’s configured. With it, we are able to check the configuration of internal services and protected customer origins for the following configuration issues:</p><ul><li><p>IPv4/IPv6 connectivity</p></li><li><p>Certificate validity (expiration, trust chain, hostnames, etc.)</p></li><li><p>Supported cipher suites and algorithms</p></li><li><p>Session resumption</p></li><li><p>Revoked certificates</p></li></ul><p>Each scan provides a grade of "Good" or "Bad". CFSSL Scan can also be used to scan entire IP ranges or lists of hosts. It can be used either as a CLI or as API-driven server.</p><p>Using the CLI is a simple command:</p>
            <pre><code>$ cfssl scan cloudflare.com
{
  "Connectivity": {
	"DNSLookup": {
	  "grade": "Good",
	  "output": [
		"198.41.215.162",
		"198.41.214.162",
		"2400:cb00:2048:1::c629:d6a2",
		"2400:cb00:2048:1::c629:d7a2"
	  ]
	},
	"TCPDial": {
	  "grade": "Good"
	},
	"TLSDial": {
	  "grade": "Good"
	}
  },
  "PKI": {
	"ChainExpiration": {
	  "grade": "Good",
	  "output": "2016-11-30T23:59:59Z"
	},
	"ChainValidation": {
	  "grade": "Warning",
	  "output": [
		"Certificate for COMODO ECC Extended Validation Secure Server CA is valid for too long"
	  ]
	},
	"MultipleCerts": {
	  "grade": "Good"
	}
  },
  "TLSHandshake": {
	"CertsByCiphers": {
	  "grade": "Good",
	  "output": {
		"TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA": "SHA256WithRSA",
		"TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA": "SHA256WithRSA",
		"TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256": "SHA256WithRSA",
		"TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256": "SHA256WithRSA",
		"TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA": "SHA256WithRSA",
		"TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384": "SHA256WithRSA",
		"TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384": "SHA256WithRSA",
		"TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256": "SHA256WithRSA",
		"TLS_RSA_WITH_3DES_EDE_CBC_SHA": "SHA256WithRSA",
		"TLS_RSA_WITH_AES_128_CBC_SHA": "SHA256WithRSA",
		"TLS_RSA_WITH_AES_128_CBC_SHA256": "SHA256WithRSA",
		"TLS_RSA_WITH_AES_128_GCM_SHA256": "SHA256WithRSA",
		"TLS_RSA_WITH_AES_256_CBC_SHA": "SHA256WithRSA",
		"TLS_RSA_WITH_AES_256_CBC_SHA256": "SHA256WithRSA",
		"TLS_RSA_WITH_AES_256_GCM_SHA384": "SHA256WithRSA"
	  }
	}
  }
}</code></pre>
            <p>CFSSL Scan also accessible as part of the new <a href="https://cfssl.org/scan">CFSSL UI</a>.</p>
    <div>
      <h3>Transport Package</h3>
      <a href="#transport-package">
        
      </a>
    </div>
    <p>An important design pattern in security engineering is secure defaults. Developers want to write secure software and aren’t always security experts, let alone crypto gurus. The two trickiest parts of deploying an application that speaks TLS are:</p><ol><li><p>Configuration</p></li><li><p>Key management</p></li></ol><p>Simplicity is the key to empowering developers to use encryption in their services. We created the CFSSL Transport package to make these two tasks easy for our Go developers.</p><p>Transport is a Go library that takes regular HTTP or TCP connections, and transparently turns them into encrypted connections. Transport handles all the sticky points so that the developer doesn’t have to. This includes creating a private key, getting a certificate for it using a CFSSL CA, renewing certificates before they expire, and choosing the correct cryptographic parameters. If you’re writing a service in Go, you no longer need to know how PKI works.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/LB668vaQzSx7sH0ziXtVI/e520c040366a84eef9f5081ec96af82f/image_3.png" />
            
            </figure><p>Certificate Issuance with CFSSL CA</p><p>Not only does the Transport handle setting up and rotating certificates, it automatically checks to make sure the services your service are connecting to are using a valid certificate, including checking for revocation (more on that later).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/VvQlLi9gWnu1bQzcOZbPN/25fd2cc8aa99655aa67487c04f428861/image_4.png" />
            
            </figure><p>OCSP Check with CFSSL CA</p><p>Internal CAs can be used to set up coarse-grained authorization between services. For example, if you have both an API server and a database, you can set up a dedicated CA for each of them. In the example below, the API server CA is in orange and the DB CA is in red. You can then configure the DB to only trust connections from the API server and vice versa. This type of setup can provide a baseline level of authorization enforcement for your applications. The transport package lets you automate the setup of these mutually-authenticated connections. This type of setup is covered in a <a href="/how-to-build-your-own-public-key-infrastructure/">previous blog post</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2aisydnjxkRWE49YnloHJh/7e46ea56a393031cf1339f2d508f14f3/image_5-1.png" />
            
            </figure><p>Once you have a CFSSL CA (or a multi-root CA) up and running, it just takes a few lines of code to start using TLS in your Go application. Just swap your standard <code>net.Dial</code> or <code>net.Listen/Accept</code> with <code>transport.Dial</code> and <code>transport.Listen/Accept</code>.</p><p>Before:</p>
            <pre><code>conn, err := net.Dial("tcp", addr)
if err != nil {
	// handle error
}</code></pre>
            <p>After (configuration file location stored in the <code>conf</code> variable):</p>
            <pre><code>var id = new(core.Identity)
data, err := ioutil.ReadFile(conf)
if err != nil {
	// handle error
}
err = json.Unmarshal(data, id)
if err != nil {
	// handle error
}

    // Renew 5 minutes before expiry
tr, err := transport.New(5 * time.Minute, id)
if err != nil {
	// handle error
}
conn, err := transport.Dial(addr, tr)
if err != nil {
	// handle error
}</code></pre>
            <p>You can start playing around with the transport package with some examples from Github:</p><p><a href="https://github.com/cloudflare/cfssl/tree/master/transport/example">https://github.com/cloudflare/cfssl/tree/master/transport/example</a></p>
    <div>
      <h3>Revocation and PostgreSQL support</h3>
      <a href="#revocation-and-postgresql-support">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/SozAa3OGRFEYvvIIf2AYd/39af18862a24b84f6280dd7fb5030f00/image_7-1.png" />
            
            </figure><p><a href="https://commons.wikimedia.org/wiki/File:Database-postgres.svg">CC Creative Commons Attribution-Share Alike 3.0 Unported</a></p><p>One of the nice things about CFSSL is that you can easily spin it up inside your infrastructure and have a certificate authority. One of the risks of running a PKI is infrastructure compromise. If the private key material for a certificate falls into the wrong hands, there need to be mechanisms so that the rest of the system knows to no longer trust that certificate.</p><p>The first step in knowing which certificates are trusted is knowing which certificates have been issued. To solve this, we added the ability to keep track, in a persistent database, of which certificates have been issued and the subset of those that have been revoked. We are big fans of PostgreSQL, so we built a database backend for CFSSL in PostgreSQL, but other backends like MySQL are <a href="https://github.com/cloudflare/cfssl/pull/562">in development</a>. You can now set up CFSSL to use a certificate database with very little work, and we leveraged that integration to create an automated revocation system.</p><p>The two standard mechanisms for signaling that a certificate is no longer trusted are certificate revocation lists (CRLs) and the online certificate status protocol (OCSP). CFSSL now fully supports both of these mechanisms.</p><p>A CRL is simply a list of revoked certificate serial numbers. It covers all certificates issued by a CA that have not expired, and is digitally signed by the CA’s private key. When a client obtains a certificate, it can simply look at this list to check to see if the certificate has been revoked. CRL files can grow quite a bit if a lot of certificates are revoked, and can therefore cause some scalability issues. We saw this <a href="/the-heartbleed-aftermath-all-cloudflare-certificates-revoked-and-reissued/">after Heartbleed</a>, when we revoked a large number of customer certificates at once.</p><p>Partly to combat these scalability issues, OCSP was introduced. OCSP provides on-demand answers about the revocation status of a given certificate. An OCSP responder is a service that returns signed answers to the question "is this certificate revoked?". The response is either "Yes" or "No". Each response is signed by the CA and has a validity period so the client knows how long to cache the response.</p><p>CFSSL now has an OCSP responder service that can be configured to run in a distributed way, without access to the CA. There are also OCSP management tools in CFSSL to automatically populate the data for the OCSP responder and keep it fresh using the certificate database.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4v0OiQafB9lqAwRrAVssdq/383b720297e414fa6a54c20fedb6db20/image_8.png" />
            
            </figure><p>In CFSSL, you can now programmatically create CRLs and OCSP responses for certificates issued by your CA. Using standards-compatible revocation mechanisms allows these certificates to be shared outside of our infrastructure and to work with most software that implements TLS.</p>
    <div>
      <h3>Certificate Transparency</h3>
      <a href="#certificate-transparency">
        
      </a>
    </div>
    <p>Another exciting new PKI standard is Certificate Transparency (CT). It helps provide (as the name implies) transparency into the workings of a certificate authority by providing an append-only log of issued certificates.</p><p>You can think of CT as a public ledger of all certificates issued. Any certificate on the list (even if issued for use on a private network) is made public and can be checked to see if it was issued according to the rules of the CA/Browser forum. If you encounter a certificate that is not on the ledger, then it may have been created fraudulently. Google Chrome <a href="https://www.certificate-transparency.org/ev-ct-plan">currently requires</a> all Extended Validation certificates used by websites to be in the CT log.</p><p>CFSSL now allows you to submit certificates to a CT log at issuance time and automatically embed the proof that it has been logged into the certificate. Running a CT log inside your internal infrastructure is a nice way to audit your CA and catch mis-issuances.</p>
    <div>
      <h3>PKCS #11</h3>
      <a href="#pkcs-11">
        
      </a>
    </div>
    <p>CFSSL is great for software deployments, as you can spin it up anywhere and run it on any platform that Go supports. You can even use our convenient Dockerfiles to deploy it in a containerized environment. However, in some situations (like running a publicly-trusted CA), keeping a private key in software is not secure enough. For these situations, hardware-based protection is needed.</p><p>The industry standard protocol for working with cryptographic hardware is called <a href="https://en.wikipedia.org/wiki/PKCS_11">PKCS#11</a>. With help from Richard Barnes of Mozilla and others we were able to add support for PKCS#11 into CFSSL. This feature is can be enabled in programs that use the <a href="https://github.com/cloudflare/cfssl/tree/master/signer/local">signer/local package</a> and the <a href="https://github.com/letsencrypt/pkcs11key">pkcs11key package</a>. We also have plans to add command line support using the <a href="https://tools.ietf.org/html/rfc7512">PKCS#11 URI specification</a>. If you have a PKCS#11 interface to your HSM, certificate creation using that key is fully supported by the <code>cfssl/signer</code> package. Power users including Let’s Encrypt use CFSSL to run their publicly trusted CA while keeping the private key in a FIPS 140-2 certified HSM.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Open source is hard. Different people have different needs, and as a project maintainer you have to be respectful of these needs while honoring the spirit of the project. CloudFlare’s needs for CFSSL are not identical to the needs of its other users. We have attempted to strike a balance between building a tool for our own specific use cases and building a great general-purpose toolkit for PKI/TLS. We are grateful to the open source community for their valuable contributions to this project and are proud to be part of the tradition of free and open source software.</p><p>I’d like to thank one of the largest contributors to CFSSL over the last year: the Let’s Encrypt project. They have contributed code reviews and useful features while integrating CFSSL into Boulder, the software that manages their certificate authority. I’d also like to thank the Open Academy participants from Cornell and UCSD who worked on the project for a semester, and everyone else who helped contribute to this release. The core CFSSL team is Kyle Isom, Zi Lin, Jacob Haven, and Nick Sullivan.</p> ]]></content:encoded>
            <category><![CDATA[TLS]]></category>
            <category><![CDATA[CFSSL]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[SSL]]></category>
            <category><![CDATA[Postgres]]></category>
            <category><![CDATA[OCSP]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">68velcvOW35if8MiypaluK</guid>
            <dc:creator>Nick Sullivan</dc:creator>
        </item>
        <item>
            <title><![CDATA[Scaling out PostgreSQL for CloudFlare Analytics using CitusDB]]></title>
            <link>https://blog.cloudflare.com/scaling-out-postgresql-for-cloudflare-analytics-using-citusdb/</link>
            <pubDate>Thu, 09 Apr 2015 17:32:05 GMT</pubDate>
            <description><![CDATA[ When I joined CloudFlare about 18 months ago, we had just started to build out our new Data Platform. At that point, the log processing and analytics pipeline built in the early days of the company had reached its limits.  ]]></description>
            <content:encoded><![CDATA[ <p>When I joined CloudFlare about 18 months ago, we had just started to build out our new Data Platform. At that point, the log processing and analytics pipeline built in the early days of the company had reached its limits. This was due to the rapidly increasing log volume from our Edge Platform where we’ve had to deal with traffic growth in excess of 400% annually.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4AxkHPDBZrwj6QJQVWuQcX/fec02af530de1ab2f8a1f516ece59057/keepcalm_scaled.png" />
            
            </figure><p>Our log processing pipeline started out like most everybody else’s: compressed log files shipped to a central location for aggregation by a motley collection of Perl scripts and C++ programs with a single PostgreSQL instance to store the aggregated data. Since then, CloudFlare has grown to serve millions of requests per second for millions of sites. Apart from the hundreds of terabytes of log data that has to be aggregated every day, we also face some unique challenges in providing detailed analytics for each of the millions of sites on CloudFlare.</p><p>For the next iteration of our Customer Analytics application, we wanted to get something up and running quickly, try out Kafka, write the aggregation application in Go, and see what could be done to scale out our trusty go-to database, PostgreSQL, from a single machine to a cluster of servers without requiring us to deal with sharding in the application.</p><p>As we were analyzing our scaling requirements for PostgreSQL, we came across <a href="https://www.citusdata.com/">Citus Data</a>, one of the companies to launch out of <a href="https://www.ycombinator.com/">Y Combinator</a> in the summer of 2011. Citus Data builds a database called CitusDB that scales out PostgreSQL for real-time workloads. Because CitusDB enables both real-time data ingest and sub-second queries across billions of rows, it has become a crucial part of our analytics infrastructure.</p>
    <div>
      <h4>Log Processing Pipeline for Analytics</h4>
      <a href="#log-processing-pipeline-for-analytics">
        
      </a>
    </div>
    <p>Before jumping into the details of our database backend, let’s review the pipeline that takes a log event from CloudFlare’s Edge to our analytics database.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4I3yKJFMKyL4M3gtS2SlTy/b34a4a58d3da74e788950e6af1699582/image01.png" />
            
            </figure><p>An HTTP access log event proceeds through the CloudFlare data pipeline as follows:</p><ol><li><p>A web browser makes a request (e.g., an HTTP GET request).</p></li><li><p>An Nginx web server running <a href="/pushing-nginx-to-its-limit-with-lua/">Lua code</a> handles the request and generates a binary log event in <a href="https://capnproto.org">Cap’n Proto format</a>.</p></li><li><p>A Go program akin to <a href="https://github.com/mozilla-services/heka">Heka</a> receives the log event from Nginx over a UNIX socket, batches it with other events, compresses the batch using a fast algorithm like <a href="https://github.com/google/snappy">Snappy</a> or <a href="https://github.com/Cyan4973/lz4">LZ4</a>, and sends it to our data center over a TLS-encrypted TCP connection.</p></li><li><p>Another Go program (the Kafka shim) receives the log event stream, decrypts it, decompresses the batches, and produces the events into a Kafka topic with partitions replicated on many servers.</p></li><li><p>Go aggregators (one process per partition) consume the topic-partitions and insert aggregates (not individual events) with 1-minute granularity into the CitusDB database. Further rollups to 1-hour and 1-day granularity occur later to reduce the amount of data to be queried and to speed up queries over intervals spanning many hours or days.</p></li></ol>
    <div>
      <h4>Why Go?</h4>
      <a href="#why-go">
        
      </a>
    </div>
    <p>Previous blog <a href="/what-weve-been-doing-with-go/">posts</a> and <a href="https://www.youtube.com/watch?v=8igk2ylk_X4">talks</a> have covered <a href="/go-at-cloudflare/">various CloudFlare projects that have been built using Go</a>. We’ve found that Go is a great language for teams to use when building the kinds of distributed systems needed at CloudFlare, and this is true regardless of an engineer’s level of experience with Go. Our Customer Analytics team is made up of engineers that have been using Go since before its 1.0 release as well as complete Go newbies. Team members that were new to Go were able to spin up quickly, and the code base has remained maintainable even as we’ve continued to build many more data processing and aggregation applications such as a new version of <a href="https://www.hakkalabs.co/articles/optimizing-go-3k-requestssec-480k-requestssec">our Layer 7 DDoS attack mitigation system</a>.</p><p>Another factor that makes Go great is the ever-expanding ecosystem of third party libraries. We used <a href="https://github.com/glycerine/go-capnproto">go-capnproto</a> to generate Go code to handle binary log events in Cap’n Proto format from a common schema shared between Go, C++, and <a href="/introducing-lua-capnproto-better-serialization-in-lua/">Lua projects</a>. Go support for Kafka with <a href="https://godoc.org/github.com/Shopify/sarama">Shopify’s Sarama</a> library, support for ZooKeeper with <a href="https://github.com/samuel/go-zookeeper">go-zookeeper</a>, support for PostgreSQL/CitusDB through <a href="http://golang.org/pkg/database/sql/">database/sql</a> and the <a href="https://github.com/lib/pq">lib/pq driver</a> are all very good.</p>
    <div>
      <h4>Why Kafka?</h4>
      <a href="#why-kafka">
        
      </a>
    </div>
    <p>As we started building our new data processing applications in Go, we had some additional requirements for the pipeline:</p><ol><li><p>Use a queue with persistence to allow short periods of downtime for downstream servers and/or consumer services.</p></li><li><p>Make the data available for processing in real time by <a href="https://github.com/mumrah/kafka-python">scripts</a> written by members of our Site Reliability Engineering team.</p></li><li><p>Allow future aggregators to be built in other languages like Java, <a href="https://github.com/edenhill/librdkafka">C or C++</a>.</p></li></ol><p>After extensive testing, we selected <a href="https://kafka.apache.org/">Kafka</a> as the first stage of the log processing pipeline.</p>
    <div>
      <h4>Why Postgres?</h4>
      <a href="#why-postgres">
        
      </a>
    </div>
    <p>As we mentioned when <a href="http://www.postgresql.org/about/press/presskit93/">PostgreSQL 9.3 was released</a>, PostgreSQL has long been an important part of our stack, and for good reason.</p><p>Foreign data wrappers and other extension mechanisms make PostgreSQL an excellent platform for storing lots of data, or as a gateway to other NoSQL data stores, without having to give up the power of SQL. PostgreSQL also has great performance and documentation. Lastly, PostgreSQL has a large and active community, and we've had the privilege of meeting many of the PostgreSQL contributors at meetups held at the CloudFlare office and elsewhere, organized by the <a href="http://www.meetup.com/postgresql-1/">The San Francisco Bay Area PostgreSQL Meetup Group</a>.</p>
    <div>
      <h4>Why CitusDB?</h4>
      <a href="#why-citusdb">
        
      </a>
    </div>
    <p>CloudFlare has been using PostgreSQL since day one. We trust it, and we wanted to keep using it. However, CloudFlare's data has been growing rapidly, and we were running into the limitations of a single PostgreSQL instance. Our team was tasked with scaling out our analytics database in a short time so we started by defining the criteria that are important to us:</p><ol><li><p><b>Performance</b>: Our system powers the Customer Analytics dashboard, so typical queries need to return in less than a second even when dealing with data from many customer sites over long time periods.</p></li><li><p><b>PostgreSQL</b>: We have extensive experience running PostgreSQL in production. We also find several extensions useful, e.g., Hstore enables us to store semi-structured data and HyperLogLog (HLL) makes unique count approximation queries fast.</p></li><li><p><b>Scaling</b>: We need to dynamically scale out our cluster for performance and huge data storage. That is, if we realize that our cluster is becoming overutilized, we want to solve the problem by just adding new machines.</p></li><li><p><b>High availability</b>: This cluster needs to be highly available. As such, the cluster needs to automatically recover from failures like disks dying or servers going down.</p></li><li><p><b>Business intelligence queries</b>: in addition to sub-second responses for customer queries, we need to be able to perform business intelligence queries that may need to analyze billions of rows of analytics data.</p></li></ol><p>At first, we evaluated what it would take to build an application that deals with sharding on top of stock PostgreSQL. We investigated using the <a href="http://www.postgresql.org/docs/9.4/static/postgres-fdw.html">postgres_fdw</a> extension to provide a unified view on top of a number of independent PostgreSQL servers, but this solution did not deal well with servers going down.</p><p>Research into the major players in the PostgreSQL space indicated that CitusDB had the potential to be a great fit for us. On the performance point, they already had customers running real-time analytics with queries running in parallel across a large cluster in tens of milliseconds.</p><p>CitusDB has also maintained compatibility with PostgreSQL, not by forking the code base like other vendors, but by extending it to plan and execute distributed queries. Furthermore, CitusDB used the concept of many logical shards so that if we were to add new machines to our cluster, we could easily rebalance the shards in the cluster by calling a simple PostgreSQL user-defined function.</p><p>With CitusDB, we could replicate logical shards to independent machines in the cluster, and automatically fail over between replicas even during queries. In case of a hardware failure, we could also use the rebalance function to re-replicate shards in the cluster.</p>
    <div>
      <h4>CitusDB Architecture</h4>
      <a href="#citusdb-architecture">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/74Tl2hP3Tfk1HNzpF1MV4i/121b31b62700289180653b647a653edb/image00.png" />
            
            </figure><p>CitusDB follows an architecture similar to Hadoop to scale out Postgres: one primary node holds authoritative metadata about shards in the cluster and parallelizes incoming queries. The worker nodes then do all the actual work of running the queries.</p><p>In CloudFlare's case, the cluster holds about 1 million shards and each shard is replicated to multiple machines. When the application sends a query to the cluster, the primary node first prunes away unrelated shards and finds the specific shards relevant to the query. The primary node then transforms the query into many smaller queries for <a href="http://www.citusdata.com/blog/19-ozgun/114-how-to-build-your-distributed-database">parallel execution</a> and ships those smaller queries to the worker nodes.</p><p>Finally, the primary node receives intermediate results from the workers, merges them, and returns the final results to the application. This takes anywhere between 25 milliseconds to 2 seconds for queries in the CloudFlare analytics cluster, depending on whether some or all of the data is available in page cache.</p><p>From a high availability standpoint, when a worker node fails, the primary node automatically fails over to the replicas, even during a query. The primary node holds slowly changing metadata, making it a good fit for continuous backups or PostgreSQL's streaming replication feature. Citus Data is currently working on further improvements to make it easy to replicate the primary metadata to all the other nodes.</p><p>At CloudFlare, we love the CitusDB architecture because it enabled us to continue using PostgreSQL. Our analytics dashboard and BI tools connect to Citus using standard PostgreSQL connectors, and tools like <code>pg_dump</code> and <code>pg_upgrade</code> just work. Two features that stand out for us are CitusDB’s PostgreSQL extensions that power our analytics dashboards, and CitusDB’s ability to parallelize the logic in those extensions out of the box.</p>
    <div>
      <h4>Postgres Extensions on CitusDB</h4>
      <a href="#postgres-extensions-on-citusdb">
        
      </a>
    </div>
    <p>PostgreSQL extensions are pieces of software that add functionality to the core database itself. Some examples are data types, user-defined functions, operators, aggregates, and custom index types. PostgreSQL has more than 150 publicly available official extensions. We’d like to highlight two of these extensions that might be of general interest. It’s worth noting that with CitusDB all of these extensions automatically scale to many servers without any changes.</p>
    <div>
      <h4>HyperLogLog</h4>
      <a href="#hyperloglog">
        
      </a>
    </div>
    <p><a href="https://en.wikipedia.org/wiki/HyperLogLog">HyperLogLog</a> is a sophisticated algorithm developed for doing unique count approximations quickly. And since a <a href="https://github.com/aggregateknowledge/postgresql-hll">HLL implementation for PostgreSQL</a> was open sourced by the good folks at Aggregate Knowledge, we could use it with CitusDB unchanged because it’s compatible with most (if not all) Postgres extensions.</p><p>HLL was important for our application because we needed to compute unique IP counts across various time intervals in real time and we didn’t want to store the unique IPs themselves. With this extension, we could, for example, count the number of unique IP addresses accessing a customer site in a minute, but still have an accurate count when further rolling up the aggregated data into a 1-hour aggregate.</p>
    <div>
      <h4>Hstore</h4>
      <a href="#hstore">
        
      </a>
    </div>
    <p>The <a href="http://www.postgresql.org/docs/9.4/static/hstore.html">hstore data type</a> stores sets of key/value pairs within a single PostgreSQL value. This can be helpful in various scenarios such as with rows with many attributes that are rarely examined, or to represent semi-structured data. We use the hstore data type to hold counters for sparse categories (e.g. country, HTTP status, data center).</p><p>With the hstore data type, we save ourselves from the burden of denormalizing our table schema into hundreds or thousands of columns. For example, we have one hstore data type that holds the number of requests coming in from different data centers per minute per CloudFlare customer. With millions of customers and hundreds of data centers, this counter data ends up being very sparse. Thanks to hstore, we can efficiently store that data, and thanks to CitusDB, we can efficiently parallelize queries of that data.</p><p>For future applications, we are also investigating other extensions such as the Postgres columnar store extension <a href="https://github.com/citusdata/cstore_fdw">cstore_fdw</a> that Citus Data has open sourced. This will allow us to compress and store even more historical analytics data in a smaller footprint.</p>
    <div>
      <h4>Conclusion</h4>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>CitusDB has been working very well for us as the new backend for our Customer Analytics system. We have also found many uses for the analytics data in a business intelligence context. The ease with which we can run distributed queries on the data allows us to quickly answer new questions about the CloudFlare network that arise from anyone in the company, from the SRE team through to Sales.</p><p>We are looking forward to features available in the recently released <a href="https://www.citusdata.com/citus-products/citusdb-software">CitusDB 4.0</a>, especially the performance improvements and the new shard rebalancer. We’re also excited about using the JSONB data type with CitusDB 4.0, along with all the other improvements that come standard as part of <a href="http://www.postgresql.org/docs/9.4/static/release-9-4.html">PostgreSQL 9.4</a>.</p><p>Finally, if you’re interested in building and operating distributed services like Kafka or CitusDB and writing Go as part of a dynamic team dealing with big (nay, gargantuan) amounts of data, <a href="https://www.cloudflare.com/join-our-team">CloudFlare is hiring</a>.</p> ]]></content:encoded>
            <category><![CDATA[Analytics]]></category>
            <category><![CDATA[SQL]]></category>
            <category><![CDATA[Postgres]]></category>
            <category><![CDATA[Kafka]]></category>
            <category><![CDATA[LUA]]></category>
            <category><![CDATA[DDoS]]></category>
            <guid isPermaLink="false">4WkjJAXrP1iZH5uthDDnAh</guid>
            <dc:creator>Albert Strasheim</dc:creator>
        </item>
    </channel>
</rss>