The Internet is the most important tool in our everyday lives. It’s how we consume media, conversate with friends and family, interact with colleagues, learn new skills, and handle our finances. However, the internet that we know and love has flaws. The biggest of those flaws is that the information on it is mostly centralized. This means that the information we access every day is held on servers that are under the control of a central company.
The Interplanetary File System Mission
The InterPlanetary File System (IPFS) has a goal to create a distributed Web. A peer-to-peer hypermedia protocol to make the web faster, safer, and more open.
2 ‘Central’ Problems
Centralization poses a couple of main issues. The first is when you have a central company that controls a lot of data, what happens if access to those servers is not possible? A central company that controls servers that store a lot of valuable and useful data becomes a single point of failure. This failure could be due to an attack, or something as simple as a server that is offline.
The second issue with storing data in a central location is censorship. When large majorities of data are hosted on a few main servers it is easier for Governments to block access to them. In 2017, residents of Turkey were blocked from accessing Wikipedia.com. The Turkish government banned the website calling it, “A National Security Threat”. This also happened in Egypt in January 2011, when the Government cut off Internet and Cell Phone access for over 95% of its citizens.
Why Do We Use a Flawed System?
The real reason we continue to accept this model is that we have become spoiled with regard to our internet access. We want web pages to load in milliseconds, images and videos to appear without lag, and of course, this all has to be in the highest HD or 4K quality. Centralizing servers gives companies complete control over how fast they can deliver this content, and charge accordingly for it. Another reason that we continue with this method is that there really is not a good alternative.
IPFS → The Good Alternative
The InterPlanetary File System (also known as IPFS), is an idea to make the Internet completely distributed. The concept transforms the traditional ‘HTTP’ internet into a Peer to Peer network, similar to how BitTorrent works.
Juan Benet is the creator of IPFS and the Founder of Protocol Labs. Protocol Labs is a tech research and development lab that is responsible for IPFS, they have also developed Filecoin & IPLD (and others). Benet studied Computer Science at Stanford University and is pretty much obsessed with anything to do with Knowledge, Science, and Technology.
Juan Benets’ original goal was not to necessarily create IPFS when he crafted the concept. What he was looking to do was find an efficient way to move scientific data sets, meaning data that could be 10-100+ GB in size. IPFS was designed to look like what would happen if Git and BitTorrent had a baby together. BitTorrent gives you the ability to move large files across a network rather quickly, and Git gives you the built-in versioning for data.
After creating this protocol, Benet soon realized that the implications were much larger than just moving large data sets. He had actually created a protocol which could replace other popular protocols in place for how we access information via the web today.
Where did the name InterPlanetary File System Come From?
When they came up with the naming of the InterPlanetary File System (IPFS), the idea was to pay a bit of an homage to how the Internet got its name. JCR Licklider, who is the creator of the Arpanet (the predecessor to the internet), had a goal to create an Intergalactic Network. Thus the Internet is really short for Intergalactic Network. InterPlanetary takes a little bit of that same naming convention, additionally, IPFS aims to be the File System (FS) for Internet Protocol (IP). When you put them together you have IPFS, the internet file system.
I will dive into and explain how IPFS works as a filing system. But first, it is important to understand how we access files from the web today.
When you want to download an image from the internet you tell your computer exactly where to find the image that you are requesting. This location is normally in the form of a URL that contains the domain name of the company that is storing the photo, this is followed by an extension that specifies what the file is. An example request for this blog post would look like this: https://achainofblocks.com/ipfs-simple-guide.jpg. This method for accessing resources is called “Location Based Addressing”, you tell the computer the Location where they can Access the information and the computer retrieves the information. The one problem with this method is if the location is not accessible (maybe the server is offline) then the users’ computer cannot retrieve the information that they need.
Server Down, We’re All Down
With location-based addressing when a server goes down, everything contained within that server is not accessible over the internet. However, when a server goes down there is a high probability that another user has downloaded that image, and is storing it locally on their computer. But even if another computer does have this file, your computer is not able to connect with the other computer in possession to transport the file.
All About the Content
To help address this issue, IPFS introduces the concept of “Content-Based Addressing”. With content-based addressing when requesting a specific resource you do not need to specify the location, you only need to specify what you want.
Every file has a unique hash, which can be thought of as the fingerprint or identification of the file. When you want to access a specific file, you simply ask the network who has a copy of the file with the specified hash. Once the request is made, someone on the IPFS network will provide the resource that you have requested. You will download that resource, and a copy will be saved to you IPFS cache. Now when another person comes and requests the same file, you will be able to provide it to them. This creates a system that speeds as it is used more because the more files that are shared the more readily available they are amongst a large group of nodes.
Change is Good…
At this point, my guess is that you have the same question I did right about now. How do I know that the person or node that is providing me with the file hasn’t tampered with it in some way? Because you use a hash function to retrieve the file, you can verify what you have received. Changing the hash of a file would be equally as difficult as changing a transaction in the blockchain. A request is made for a file that has a specific hash, so when the file is received you make sure that the hash matches the request. This is the same method you would use for validating your Amazon purchase. If you ordered Green Socks, and Red Socks show up you would reject them and wait for your Green Socks to show up.
Another feature of IPFS is deduplication, this means when multiple users post the same file it is only created one time on the network. This is something that helps to make the network more efficient.
How IPFS Really Works
Now you have the basics of how IPFS compares to today’s traditional methods, let’s dive a little deeper into how IPFS actually stores data and makes it accessible to users.
In IPFS, files are stored in IPFS objects, and each Object can store 256 kb of data. An Object can also contain a Link to another IPFS object, linking is what makes it possible to store data that is larger than 256 kb. For example, if you upload just a small text file, then one 256 kb object should be adequate for your small amount of text.
However, if you are storing a picture, this would be broken up into multiple objects that are each a maximum of 256 kb. The IPFS system will then create an Empty Object that would link to all the objects that make up that picture.
This architecture is very simple, but it is also very powerful, the architecture is what truly enables IPFS to be used as a file system. If you take a look at the simple file directory structure below, I’ll explain how this could be translated to an IPFS structure:
This could be translated to IPFS by creating 1 Object for each File and each Folder/Directory, then link the files to the specified directories. However, it gets even better when you take into account the fact that IPFS uses Content-Based Addressing. This means that files that are added are immutable, they can never be changed, very much like a blockchain. This means you can be assured that the resource you are accessing is the correct data, and has never been altered.
How Can I Update My Data?
IPFS supports file Versioning, this works similarly to how Git works as an open source code repository. For example, you are working on a text file called, ‘Important Document – v1.doc’, and you want to share this document with people using IFPS. When you add this file to IPFS, what happens behind the scenes is, IPFS will create a new Commit Object. This Object is very basic, all it does is tell IPFS which Commit preceded this one and it links to the IPFS Object associated with the File, ‘Important Document – v1.doc’.
Now let’s imagine some time has gone by and your ‘Important Document.doc’ needs a revision. This is done by simply adding the new file to IPFS ‘Important Document – v2.doc’, the software will create a new Commit Object for the updated file (same as the original process). This Commit Object now links to the previous Committed Object, the first Commit serving as the Parent Object. This process can be repeated endlessly, creating a linked chain of versions of the same data all referencing the entire chain. IPFS makes sure that your file along with the entire file versioning history is accessible to all other nodes on the network.
No System is Perfect
So far we have discussed many of the useful features and key concepts of the InterPlanetary File System. However, all protocols have limitation and drawbacks. As you might imagine, the biggest problem that IPFS currently faces is keeping the files available. Every node on the network keeps a cache of the files that they have downloaded and helps to make them available as other users need them. However, in a simple situation, if a Document is hosted by 4 nodes and they all go offline that document is no longer accessible.
There are a couple of ways to tackle the problem above. One way is to incentivize nodes to stay online and keep the files available to the community. A reward for storage space that you can commit to the network, this would ensure that files have a high likelihood of being available when needed. The other way is to proactively distribute files throughout the network, making sure there are always enough copies online at any given time. You can think of this like redundancy on a massive scale.
This issue is the exact issue that Filecoin is aiming to solve. Filecoin was created by the same group that founded IPFS. Filecoin is a blockchain that is built on top of IPFS with the goal of creating a decentralized market for storage. What that means is that users that have extra storage available on their hard drive can rent it out for use as IPFS storage and make some money off it in the process. You can think of Filecoin as a similar service to Airbnb, instead of renting out the available space in your house, you rent out the space available on your computer for storage. Filecoin creates an incentive for nodes to keep the data online and retain for as long as possible. In addition to an incentive, which keeps the nodes online, it also replicates the data across many nodes making it highly available and easily accessible (even if a few nodes are offline). Filecoin and IPFS have the same goal and that is to be Offline First, meaning they are constantly striving to make a better experience without needing to make a call to a server to access resources.
This is a very high-level quick summary of Filecoin. I will go deeper into Filecoin, along with some of the other great projects from Protocol labs in future articles.
InterPlanetary Linked Data (IPLD)
According to https://ipld.io:
“IPLD is the data model of the content-addressable web. It allows us to treat all hash-linked data structures as subsets of a unified information space, unifying all data models that link data with hashes as instances of IPLD.”
What this means is that IPLD aims to be a data model for interoperable protocols. There are many use cases for this type of technology. This gives the ability to have Smart Contracts that run on IPFS. The point is that IPLD provides libraries that make the underlying data interoperable across tools and across protocols.
IPLD and Filecoin are both very involved projects that will require their own dedicated articles to fully understand.
I hope it is obvious from reading this article that the IPFS is a very ambitious project. The majority of early projects with a focus on decentralization have been largely about the currency and financial aspect. IPFS is really about building a better way to share data. There have been challenges to HTTP before. However, IPFS is clearly the most established and regarded as a system that could see mass adoption in the coming years. By no means am I insinuating that HTTP is going away, Juan Benet himself admits that HTTP is a great protocol that is still very useful. However, it is a protocol that is over 25 years old, and as IPFS continues to grow in adoption the use cases will expand along with the technology. It is very likely that we will have a situation where we are using both protocols together until IPFS eventually takes over. Very similar to how we still use FTP today in much more specific capacity.
At the time of writing this article:
Bitcoin = $6,500
Ethereum = $220
Filecoin = $5.15
Bitcoin Dominance = 52.1%