
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 04 Apr 2026 11:28:17 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Cloudy Summarizations of Email Detections: Beta Announcement]]></title>
            <link>https://blog.cloudflare.com/cloudy-driven-email-security-summaries/</link>
            <pubDate>Fri, 29 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ We're now leveraging our internal LLM, Cloudy, to generate automated summaries within our Email Security product, helping SOC teams better understand what's happening within flagged messages. ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>Background</h2>
      <a href="#background">
        
      </a>
    </div>
    <p>Organizations face continuous threats from <a href="https://www.cloudflare.com/learning/access-management/phishing-attack/"><u>phishing</u></a>,<a href="https://www.cloudflare.com/learning/email-security/business-email-compromise-bec/"><u> business email compromise (BEC)</u></a>, and other advanced email attacks. Attackers <a href="https://www.cloudflare.com/the-net/multichannel-phishing/"><u>adapt their tactics</u></a> daily, forcing defenders to move just as quickly to keep inboxes safe.</p><p>Cloudflare’s visibility across a large portion of the Internet gives us an unparalleled view of malicious campaigns. We process billions of email threat signals every day, feeding them into multiple AI and machine learning models. This lets our detection team create and deploy new rules at high speed, blocking malicious and unwanted emails before they reach the inbox.</p><p>But rapid protection introduces a new challenge: making sure security teams understand exactly what we blocked — and why.</p>
    <div>
      <h2>The Challenge</h2>
      <a href="#the-challenge">
        
      </a>
    </div>
    <p>Cloudflare’s fast-moving detection pipeline is one of our greatest strengths — but it also creates a communication gap for customers. Every day, our detection analysts publish new rules to block phishing, BEC, and other unwanted messages. These rules often blend signals from multiple AI and machine learning models, each looking at different aspects of a message like its content, headers, links, attachments, and sender reputation.</p><p>While this layered approach catches threats early, SOC teams don’t always have insight into the specific combination of factors that triggered a detection. Instead, they see a rule name in the investigation tab with little explanation of what it means.</p><p>Take the rule <i>BEC.SentimentCM_BEC.SpoofedSender</i> as an example. Internally, we know this indicates:</p><ul><li><p>The email contained no unique links or attachments a common BEC pattern</p></li><li><p>It was flagged as highly likely to be BEC by our Churchmouse sentiment analysis models</p></li><li><p>Spoofing indicators were found, such as anomalies in the envelope_from header</p></li></ul><p>Those details are second nature to our detection team, but without that context, SOC analysts are left to reverse-engineer the logic from opaque labels. They don’t see the nuanced ML outputs (like Churchmouse’s sentiment scoring) or the subtle header anomalies, or the sender IP/domain reputation data that factored into the decision.</p><p>The result is time lost to unclear investigations or the risk of mistakenly releasing malicious emails. For teams operating under pressure, that’s more than just an inconvenience, it's a security liability.</p><p>That’s why we extended Cloudy (our AI-powered agent) to translate complex detection logic into clear explanations, giving SOC teams the context they need without slowing them down.</p>
    <div>
      <h2>Enter Cloudy Summaries</h2>
      <a href="#enter-cloudy-summaries">
        
      </a>
    </div>
    <p>Several weeks ago, we launched Cloudy within our Cloudflare One product suite to help customers understand gateway policies and their impacts (you can read more about the launch here: https://blog.cloudflare.com/introducing-ai-agent/).</p><p>We began testing Cloudy's ability to explain the detections and updates we continuously deploy. Our first attempt revealed significant challenges.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/63bsCRl2hKUyECh1vJND5k/a033fce3c95a635ede07e1fd03a9edf5/image3.png" />
          </figure>
    <div>
      <h3>The Hallucination Problem</h3>
      <a href="#the-hallucination-problem">
        
      </a>
    </div>
    <p>We observed frequent LLM <a href="https://www.cloudflare.com/learning/ai/what-are-ai-hallucinations/"><u>hallucinations</u></a>, the model generating inaccurate information about messages. While this might be acceptable when analyzing logs, it's dangerous for email security detections. A hallucination claiming a malicious message is clean could lead SOC analysts to release it from quarantine, potentially causing a security breach.</p><p>These hallucinations occurred because email detections involve numerous and complex inputs. Our scanning process runs messages through multiple ML algorithms examining different components: body content, attachments, links, IP reputation, and more. The same complexity that makes manual detection explanation difficult also caused our initial LLM implementation to produce inconsistent and sometimes inaccurate outputs.</p>
    <div>
      <h3>Building Guardrails</h3>
      <a href="#building-guardrails">
        
      </a>
    </div>
    <p>To minimize hallucination risk while maintaining inbox security, we implemented several manual safeguards:</p><p><b>Step 1: RAG Implementation</b></p><p>We ensured Cloudy only accessed information from our detection dataset corpus, creating a <a href="https://www.cloudflare.com/learning/ai/retrieval-augmented-generation-rag/"><u>Retrieval-Augmented Generation (RAG)</u></a> system. This significantly reduced hallucinations by grounding the LLM's assessments in actual detection data.</p><p><b>Step 2: Model Context Enhancement</b></p><p>We added crucial context about our internal models. For example, the "Churchmouse" designation refers to a group of sentiment detection models, not a single algorithm. Without this context, Cloudy attempted to define "churchmouse" using the common idiom "poor as a church mouse" referencing starving church mice because holy bread never falls to the floor. While historically interesting, this was completely irrelevant to our security context.</p>
    <div>
      <h3>Current Results</h3>
      <a href="#current-results">
        
      </a>
    </div>
    <p>Our testing shows Cloudy now produces more stable explanations with minimal hallucinations. For example, the detection <i>SPAM.ASNReputation.IPReputation_Scuttle.Anomalous_HC</i> now generates this summary:</p><p>"This rule flags email messages as spam if they come from a sender with poor Internet reputation, have been identified as suspicious by a blocklist, and have unusual email server setup, indicating potential malicious activity."</p><p>This strikes the right balance. Customers can quickly understand what the detection found and why we classified the message accordingly.</p>
    <div>
      <h2>Beta Program</h2>
      <a href="#beta-program">
        
      </a>
    </div>
    <p>We're opening Cloudy email detection summaries to a select group of beta users. Our primary goal is ensuring our guardrails prevent hallucinations that could lead to security compromises. During this beta phase, we'll rigorously test outputs and verify their quality before expanding access to all customers.</p>
    <div>
      <h2>Ready to enhance your email security?</h2>
      <a href="#ready-to-enhance-your-email-security">
        
      </a>
    </div>
    <p>We provide all organizations (whether a Cloudflare customer or not) with free access to our Retro Scan tool, allowing them to use our predictive AI models to scan existing inbox messages. Retro Scan will detect and highlight any threats found, enabling organizations to remediate them directly in their email accounts. With these insights, organizations can implement further controls, either using <a href="https://www.cloudflare.com/zero-trust/products/email-security/"><u>Cloudflare Email Security</u></a> or their preferred solution, to prevent similar threats from reaching their inboxes in the future.</p><p>If you are interested in how Cloudflare can help secure your inboxes, sign up for a phishing risk assessment <a href="https://www.cloudflare.com/lp/email-security-self-guided-demo-request/?utm_medium=referral&amp;utm_source=blog&amp;utm_campaign=2025-q3-acq-gbl-modernsec-es-ge-general-ai_week_blog"><u>here</u></a>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/lV6mxQTYwaS6j0n0e8arE/fd62cf8032b15780690f4ed48578d3fc/image2.png" />
          </figure><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Cloud Email Security]]></category>
            <category><![CDATA[LLM]]></category>
            <guid isPermaLink="false">hzXLKdI5wqNlvwd0JKzXS</guid>
            <dc:creator>Ayush Kumar</dc:creator>
            <dc:creator>Nick Blazier</dc:creator>
            <dc:creator>Phil Syme</dc:creator>
        </item>
        <item>
            <title><![CDATA[A quirk in the SUNBURST DGA algorithm]]></title>
            <link>https://blog.cloudflare.com/a-quirk-in-the-sunburst-dga-algorithm/</link>
            <pubDate>Fri, 18 Dec 2020 00:30:00 GMT</pubDate>
            <description><![CDATA[ On Wednesday, December 16, the RedDrip Team from QiAnXin Technology released their discoveries (tweet, github) regarding the random subdomains associated with the SUNBURST malware which was present in the SolarWinds Orion compromise. I ]]></description>
            <content:encoded><![CDATA[ <p></p><p>On Wednesday, December 16, the RedDrip Team from QiAnXin Technology <a href="https://mp.weixin.qq.com/s/v-ekPFtVNZG1W7vWjcuVug">released their discoveries</a> (<a href="https://twitter.com/RedDrip7/status/1339168187619790848?s=20">tweet</a>, <a href="https://github.com/RedDrip7/SunBurst_DGA_Decode">github</a>) regarding the random subdomains associated with the SUNBURST malware which was present in the SolarWinds Orion compromise. In studying queries performed by the malware, Cloudflare has uncovered additional details about how the Domain Generation Algorithm (DGA) encodes data and exfiltrates the compromised hostname to the command and control servers.</p>
    <div>
      <h3>Background</h3>
      <a href="#background">
        
      </a>
    </div>
    <p>The RedDrip team discovered that the DNS queries are created by combining the previously reverse-engineered unique guid (based on hashing of hostname and MAC address) with a payload that is a custom base 32 encoding of the hostname. The article they published includes screenshots of decompiled or reimplemented C# functions that are included in the compromised DLL. This background primer summarizes their work so far (which is published in Chinese).</p><p>RedDrip discovered that the DGA subdomain portion of the query is split into three parts:</p><p><code><b>&lt;encoded_guid&gt; + &lt;byte&gt; + &lt;encoded_hostname&gt;</b></code></p><p>An example malicious domain is:</p><p><code><b>7cbtailjomqle1pjvr2d32i2voe60ce2.appsync-api.us-east-1.avsvmcloud.com</b></code></p><p>Where the domain is split into the three parts as</p>
<table>
<colgroup>
<col></col>
<col></col>
<col></col>
</colgroup>
<thead>
  <tr>
    <th><span>Encoded guid (15 chars)</span></th>
    <th><span>byte</span></th>
    <th><span>Encoded hostname</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>7cbtailjomqle1p</span></td>
    <td><span>j</span></td>
    <td><span>vr2d32i2voe60ce2</span></td>
  </tr>
</tbody>
</table><p>The work from the RedDrip Team focused on the encoded hostname portion of the string, we have made additional insights related to the encoded hostname and encoded guid portions.</p><p>At a high level the encoded hostnames take one of two encoding schemes. If all of the characters in the hostname are contained in the set of domain name-safe characters <code>"0123456789abcdefghijklmnopqrstuvwxyz-_."</code> then the <code>OrionImprovementBusinessLayer.CryptoHelper.Base64Decode</code> algorithm, explained in the article, is used. If there are characters outside of that set in the hostname, then the <code>OrionImprovementBusinessLayer.CryptoHelper.Base64Encode</code> is used instead and ‘00’ is prepended to the encoding. This allows us to simply check if the first two characters of the encoded hostname are ‘00’ and know how the hostname is encoded.</p><p>These function names within the compromised DLL are meant to resemble the names of legitimate functions, but in fact perform the message encoding for the malware. The DLL function Base64Decode is meant to resemble the legitimate function name base64decode, but its purpose is actually to perform the encoding of the query (which is a variant of base32 encoding).</p><p>The RedDrip Team has posted Python code for encoding and decoding the queries, including identifying random characters inserted into the queries at regular character intervals.</p><p>One potential issue we encountered with their implementation is the inclusion of a check clause looking for a ‘0’ character in the encoded hostname (line 138 of the decoding script). This line causes the decoding algorithm to ignore any encoded hostnames that do not contain a ‘0’. We believe this was included because ‘0’ is the encoded value of a ‘0’, ‘.’, ‘-’ or ‘_’. Since fully qualified hostnames are comprised of multiple parts separated by ‘.’s, e.g. ‘example.com’, it makes sense to be expecting a ‘.’ in the unencoded hostname and therefore only consider encoded hostnames containing a ‘0’. However, this causes the decoder to ignore many of the recorded DGA domains.</p><p>As we explain below, we believe that long domains are split across multiple queries where the second half is much shorter and unlikely to include a ‘.’. For example ‘www2.example.c’ takes up one message, meaning that in order to transmit the entire domain ‘www2.example.c’ a second message containing just ‘om’ would also need to be sent. This second message does not contain a ‘.’ so its encoded form does not contain a ‘0’ and it is ignored in the RedDrip team’s implementation.</p>
    <div>
      <h3>The quirk: hostnames are split across multiple queries</h3>
      <a href="#the-quirk-hostnames-are-split-across-multiple-queries">
        
      </a>
    </div>
    <p>A list of observed queries performed by the malware was published publicly on <a href="https://github.com/bambenek/research/blob/main/sunburst/uniq-hostnames.txt">GitHub</a>. Applying the decoding script to this set of queries, we see some queries appear to be truncated, such as <code>grupobazar.loca</code>, but also some decoded hostnames are curiously short or incomplete, such as “com”, “.com”, or a single letter, such as “m”, or “l”.</p><p>When the hostname does not fit into the available payload section of the encoded query, it is split up across multiple queries. Queries are matched up by matching the GUID section after applying a byte-by-byte exclusive-or (xor).</p>
    <div>
      <h3>Analysis of first 15 characters</h3>
      <a href="#analysis-of-first-15-characters">
        
      </a>
    </div>
    <p>Noticing that long domains are split across multiple requests led us to believe that the first 16 characters encoded information to associate multipart messages. This would allow the receiver on the other end to correctly re-assemble the messages and get the entire domain. The RedDrip team identified the first 15 bytes as a GUID, we focused on those bytes and will refer to them subsequently as the header.</p><p>We found the following queries that we believed to be matches without knowing yet the correct pairings between message 1 and message 2 (payload has been altered):</p><p><b>Part 1 - Both decode to “www2.example.c”</b><code>r1q6arhpujcf6jb6qqqb0trmuhd1r0ee.appsync-api.us-west-2.avsvmcloud.com</code><code>r8stkst71ebqgj66qqqb0trmuhd1r0ee.appsync-api.us-west-2.avsvmcloud.com</code></p><p><b>Part 2 - Both decode to “om”</b><code>0oni12r13ficnkqb2h.appsync-api.us-west-2.avsvmcloud.com</code><code>ulfmcf44qd58t9e82h.appsync-api.us-west-2.avsvmcloud.com</code></p><p>This gives us a final combined payload of <b>www2.example.com</b></p><p>This example gave us two sets of messages where we were confident the second part was associated with the first part, and allowed us to find the following relationship where message1 is the header of the first message and message2 is the header of the second:</p><p><code>_Base32Decode(message1) XOR KEY = Base32Decode(message2)_</code></p><p>The KEY is a single character. That character is xor’d with each byte of the Base32Decoded first header to produce the Base32Decoded second header. We do not currently know how to infer what character is used as the key, but we can still match messages together without that information. Since A XOR B = C where we know A and C but not B, we can instead use A XOR C = B. This means that in order to pair messages together we simply need to look for messages where XOR’ing them together results in a repeating character (the key).</p><p><code><i>Base32Decode(message1) XOR Base32Decode(message2) = KEY</i></code></p><p>Looking at the examples above this becomes</p>
<table>
<thead>
  <tr>
    <th></th>
    <th>Message 1</th>
    <th>Message 2</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>Header</span></td>
    <td><span>r1q6arhpujcf6jb</span></td>
    <td><span>0oni12r13ficnkq</span></td>
  </tr>
  <tr>
    <td><span>Base32Decode (binary)</span></td>
    <td><span>101101000100110110111111011</span><br /><span>010010000000011001010111111</span><br /><span>01111000101001110100000101</span></td>
    <td><span>110110010010000011010010000</span><br /><span>001000110110110100111100100</span><br /><span>00100011111111000000000100</span></td>
  </tr>
</tbody>
</table><p>We’ve truncated the results slightly, but below shows the two binary representations and the third line shows the result of the XOR.</p><p>101101000100110110111111011010010000000011001010111111011110001010011101110110010010000011010010000001000110110110100111100100001000111111110000011011010110110101101101011011010110110101101101011011010110110101101101</p><p>We can see the XOR result is the repeating sequence ‘01101101’meaning the original key was 0x6D or ‘m’.</p><p>We provide the following python code as an implementation for matching paired messages (Note: the decoding functions are those provided by the RedDrip team):</p>
            <pre><code># string1 is the first 15 characters of the first message
# string2 is the first 15 characters of the second message
def is_match(string1, string2):
    encoded1 = Base32Decode(string1)
    encoded2 = Base32Decode(string2)
    xor_result = [chr(ord(a) ^ ord(b)) for a,b in zip(encoded1, encoded2)]
    match_char = xor_result[0]
    for character in xor_result[0:9]:
        if character != match_char:
            return False, None
    return True, "0x{:02X}".format(ord(match_char))</code></pre>
            <p>The following are additional headers which based on the payload content Cloudflare is confident are pairs (the payload has been redacted because it contains hostname information that is not yet publicly available):</p><p><b>Example 1:</b></p>
<table>
<thead>
  <tr>
    <th><span>vrffaikp47gnsd4a</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>aob0ceh5l8cr6mco</span></td>
  </tr>
</tbody>
</table><p>xorkey: 0x4E</p><p><b>Example 2:</b></p>
<table>
<thead>
  <tr>
    <th><span>vrffaikp47gnsd4a</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>aob0ceh5l8cr6mco</span></td>
  </tr>
</tbody>
</table><p>xorkey: 0x54</p><p><b>Example 3:</b></p>
<table>
<thead>
  <tr>
    <th><span>vvu7884g0o86pr4a</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>6gpt7s654cfn4h6h</span></td>
  </tr>
</tbody>
</table><p>xorkey: 0x2B</p><p>We hypothesize that the xorkey can be derived from the header bytes and/or padding byte of the two messages, though we have not yet determined the relationship.</p><hr />
    <div>
      <h2>Update (12/18/2020):</h2>
      <a href="#update-12-18-2020">
        
      </a>
    </div>
    <p>Erik Hjelmvik posted a blog <a href="https://www.netresec.com/?page=Blog&amp;month=2020-12&amp;post=Reassembling-Victim-Domain-Fragments-from-SUNBURST-DNS">explaining where the xor key is located</a>. Based on his code, we provide a python implementation for converting the header (first 16 bytes) into the decoded GUID as a string. Messages can then be paired by matching GUID’s to reconstruct the full hostname.</p>
            <pre><code>def decrypt_secure_string(header):
    decoded = Base32Decode(header[0:16])
    xor_key = ord(decoded[0])
    decrypted = ["{0:02x}".format(ord(b) ^ xor_key) for b in decoded]
    return ''.join(decrypted[1:9])</code></pre>
            <p>Updated example:</p>
<table>
<thead>
  <tr>
    <th></th>
    <th>Message 1</th>
    <th>Message 2</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>Header</span></td>
    <td><span>r1q6arhpujcf6jb</span></td>
    <td><span>0oni12r13ficnkq</span></td>
  </tr>
  <tr>
    <td><span>Base32Decode Header (hex)</span></td>
    <td><span>b44dbf6900cafde29d05</span></td>
    <td><span>d920d2046da7908ff004</span></td>
  </tr>
    <tr>
    <td><span>Base32Decode first byte (xor key)</span></td>
    <td><span>0xb4</span></td>
    <td><span>0xd9</span></td>
  </tr>
    <tr>
    <td><span>XOR result (hex)
</span></td>
    <td><span>00f90bddb47e495629</span></td>
    <td><span>00f90bddb47e495629</span></td>
  </tr>
</tbody>
</table> ]]></content:encoded>
            <category><![CDATA[Cloudflare Zero Trust]]></category>
            <category><![CDATA[Cloudflare Gateway]]></category>
            <category><![CDATA[Deep Dive]]></category>
            <category><![CDATA[Threat Intelligence]]></category>
            <guid isPermaLink="false">7vPEi0QIhuwlb050ssMzXB</guid>
            <dc:creator>Nick Blazier</dc:creator>
            <dc:creator>Jesse Kipp</dc:creator>
        </item>
    </channel>
</rss>