The oldest trick in the ASCII book

by John Graham-Cumming.

If you're old enough (or interested enough) to have spent a lot of time messing around with the ASCII table then you might have run into a strange fact: it's possible to uppercase ASCII text using just bitwise AND.

And it turns out that in some situations this isn't just a curiosity, but actually useful. Here are the ASCII characters 0x20 (space) to 0x7E (tilde).

0x20| !"#$%&'()*+,-./0123456789:;<=>?

It's immediately obvious that each lowercase letter has an ASCII code 0x20 more than the corresponding uppercase letter. For example, lowercase m is 0x6D and uppercase M is 0x4D. And since 0x20 is a single bit then it's possible to uppercase an ASCII letter by taking its code and applying AND 0xDF (masking out the 0x20 bit).

Performing AND 0xDF has no effect on the first two rows above: they, including the uppercase letters, are unchanged. Only the third row is affected. There the lowercase letters get uppercased but there's some collateral damage: ` { | } ~ change to @ [ \ ] ^.

But if you know that a string has a limited character set then this trick can come in handy. Lots of old protocols (SMTP, POP3, ...) use USASCII characters for their commands. DNS names are typically restricted to letters, numbers and -. Even HTTP recommends that use of USASCII for the HTTP header.

So, if you know that your string is AND 0xDF-safe then the following C code makes a very fast case-insensitive comparison:

// Returns 1 if x == y (case insensitive)
int cmp(const char *x, const char* y) {  
  while (*x && *y) {
    if ((*x++ & 0xdf) != (*y++ & 0xdf)) {
      return 0;
  return (*x == 0) && (*y == 0);

And that sort of low-level trickery turns out to matter when you are being hit by millions of DNS packets per second and need to make very, very fast filtering decisions.

comments powered by Disqus