Cloudflare's journey with IPFS started in 2018 when we announced a public gateway for the distributed web. Since then, the number of infrastructure providers for the InterPlanetary FileSystem (IPFS) has grown and matured substantially. This is a huge benefit for users and application developers as they have the ability to choose their infrastructure providers.
Today, we’re excited to announce new secure filtering capabilities in IPFS. The Cloudflare IPFS module is a tool to protect users from threats like phishing and ransomware. We believe that other participants in the network should have the same ability. We are releasing that software as open source, for the benefit of the entire community.
Its code is available on github.com/cloudflare/go-ipfs. To understand how we built it and how to use it, read on.
A brief introduction on IPFS content retrieval
Before we get to understand how IPFS filtering works, we need to dive a little deeper into the operation of an IPFS node.
The InterPlanetary FileSystem (IPFS) is a peer-to-peer network for storing content on a distributed file system. It is composed of a set of computers called nodes that store and relay content using a common addressing system.
Nodes communicate with each other over the Internet using a Peer-to-Peer (P2P) architecture, preventing one node from becoming a single point of failure. This is even more true given that anyone can operate a node with limited resources. This can be light hardware such as a Raspberry Pi, a server at a cloud provider, or even your web browser.
This creates a challenge since not all nodes may support the same protocols, and networks may block some types of connections. For instance, your web browser does not expose a TCP API and your home router likely doesn’t allow inbound connections. This is where libp2p comes to help.
libp2p is a modular system of protocols, specifications, and libraries that enable the development of peer-to-peer network applications - libp2p documentation
That’s exactly what four IPFS nodes need to connect to the IPFS network. From a node point of view, the architecture is the following:
Any node that we maintain a connection with is a peer. A peer that does not have ? content can ask their peers, including you, they WANT?. If you do have it, you will provide the ? to them. If you don’t have it, you can give them information about the network to help them find someone who might have it. As each node chooses the resources they store, it means some might be stored on a limited number of nodes.
For instance, everyone likes ?, so many nodes will dedicate resources to store it. However, ? is less popular. Therefore, only a few nodes will provide it.
This assumption does not hold for public gateways like Cloudflare. A gateway is an HTTP interface to an IPFS node. On our gateway, we allow a user of the Internet to retrieve arbitrary content from IPFS. If a user asks for ?, we provide ?. If they ask for ?, we’ll find ? for them.
Cloudflare’s IPFS gateway is simply a cache in front of IPFS. Cloudflare does not have the ability to modify or remove content from the IPFS network. However, IPFS is a decentralized and open network, so there is the possibility of users sharing threats like phishing or malware. This is content we do not want to provide to the P2P network or to our HTTP users.
In the next section, we describe how an IPFS node can protect its users from such threats.
If you would like to learn more about the inner workings of libp2p, you can go to ProtoSchool which has a great tutorial about it.
How IPFS filtering works
As we described earlier, an IPFS node provides content in two ways: to its peers through the IPFS P2P network and to its users via an HTTP gateway.
Filtering content of the HTTP interface is no different from the current protection Cloudflare already has in place. If ? is considered malicious and is available at cloudflare-ipfs.com/ipfs/?, we can filter these requests, so the end user is kept safe.
The P2P layer is different. We cannot filter URLs because that’s not how the content is requested. IPFS is content-addressed. This means that instead of asking for a specific location such as cloudflare-ipfs.com/ipfs/?, peers request the content directly using its Content IDentifiers (CID), ?.
More precisely, ? is an abstraction of the content address. A CID looks like QmXnnyufdzAWL5CqZ2RnSNgPbvCc1ALT73s6epPrRnZ1Xy (QmXnnyufdzAWL5CqZ2RnSNgPbvCc1ALT73s6epPrRnZ1Xy happens to be the hash of a .txt file containing the string "I’m trying out IPFS''). CID is a convenient way to refer to content in a cryptographically verifiable manner.
This is great, because it means that when peers ask for malicious ? content, we can prevent our node from serving it. This includes both the P2P layer and the HTTP gateway.
In addition, the working of IPFS makes it, so content can easily be reused. On directories for instance, the address is a CID based on the CID of its files. This way, a file can be shared across multiple directories, and still be referred to by the same CID. It allows IPFS nodes to efficiently store content without duplicating it. This can be used to share docker container layers for example.
In the filtering use case, it means that if ? content is included in other IPFS content, our node can also prevent content linking to malicious ? content from being served. This results in ?, a mix of valid and malicious content.
This cryptographic method of linking content together is known as MerkleDAG. You can learn more about it on ProtoSchool, and Consensys did an article explaining the basic cryptographic construction with bananas ?.
How to use IPFS secure filtering
By now, you should have an understanding of how an IPFS node retrieves and provides content, as well as how we can protect peers and users from shared nodes accessing threats. Using this knowledge, Cloudflare went on to implement IPFS Safemode, a node protection layer on top of go-ipfs. It is up to every node operator to build their own list of threats to be blocked based on their policy.
To use it, we are going to follow the instructions available on cloudflare/go-ipfs repository.
First, you need to clone the git repository
git clone https://github.com/cloudflare/go-ipfs.git
cd go-ipfs/
Then, you have to check out the commit where IPFS safemode is implemented. This version is based on v0.9.1 of go-ipfs.
git checkout v0.9.1-safemode
Now that you have the source code on your machine, we need to build the IPFS client from source.
make build
Et voilà. You are ready to use your IPFS node, with safemode capabilities.
# alias ipfs command to make it easier to use
alias ipfs=’./cmd/ipfs/ipfs’
# run an ipfs daemon
ipfs daemon &
# understand how to use IPFS safemode
ipfs safemode --help
USAGE
ipfs safemode - Interact with IPFS Safemode to prevent certain CIDs from being provided.
...
Going further
IPFS nodes are running in a diverse set of environments and operated by parties at various scales. The same software has to accommodate configuration in which it is accessed by a single-user, and others where it is shared by thousands of participants.
At Cloudflare, we believe that decentralization is going to be the next major step for content networks, but there is still work to be done to get these technologies in the hands of everyone. Content filtering is part of this story. If the community aims at embedding a P2P node in every computer, there needs to be ways to prevent nodes from serving harmful content. Users need to be able to give consent on the content they are willing to serve, and the one they aren’t.
By providing an IPFS safemode tool, we hope to make this protection more widely available.