Note: This is for someone who is trying to wrap up their heads on the concept of microservices, distributed systems, and application scalability.TL;DR
Well, the cover image almost explains what a distributed system is,
Let’s get a formal definition
A distributed system is a network that consists of autonomous computers that are connected using a distribution middleware. They help in sharing different resources and capabilities to provide users with a single and integrated coherent network.
In short, a mesh of the computing nodes interconnected with each other where a single point of link or node failure would not affect the entire system.
it’s a graph data structure 😆
The concept of a distributed system drives the internet with protocols like BGP, DNS, OSPF, etc.. interesting things like spanning-tree what we thought was stubborn theory is pretty much used in the real world, Some network admins may know this in a bitter odd way😉.
What’s the use case of distributed systems for a web app?
Data explosion is the only reason which caused the applications to move from a centralized architecture to a distributed architecture.
Where there any other options for the applications ?. let the image explain the rest…
What is Decentralized?
Google about bitcoin and torrent or p2p those first page articles explain it way better than me and we are here talk about distributed systems.
So the rising data expansion has forced us to move from using a relational DB to a non-relational key-value pair DB since the sole purpose of that was to meet the demand of scaling up the infrastructure by distributing data across several nodes in the cluster. The obvious question is what happens when one of the nodes goes down or how to communicate with each other over the network and how to manage consistency and availability . Before going too deep let’s get some overview with a.k.a Julia’s drawing of what generally happens inside a system like that.
So designing a distributed system is pretty much easy?
Wrong!!!!!! these are the 8 Fallacies of Distributed Systems
Designing Distributed Systems Is Hard
These fallacies were published more than 20 years ago. But they still hold true today, some of them more than others. I think that many developers today know them, but the code that we write doesn’t show this.
We must accept these as facts: the network is unreliable, insecure, and costs money. Bandwidth is limited. The network’s topology will change. Its components are not configured the same way. Being aware of these limitations will help us design better-distributed systems. Do we have a solution for these problems? Yup, we do.
This article is pretty super easy to go through, we are just connecting the dots here.
Let’s get to the basic stuff.
Database transactions are tricky to implement in distributed systems as they require each node to agree on the right action to take (abort or commit). This is known as consensus and it is a fundamental problem in distributed systems.
Consider that you and your friends are going to get some cake, the menu includes a chocolate cake, strawberry cream cake, so one of the members of the group chooses the chocolate cake and all of the members just agree’s on that including you.
One of your friends will be the candidate who initiated the proposal of having chocolate cake (He may be an alpha in the Group). In reality, there will be two friends that initiate the proposal two of them may propose two different cakes, in such scenarios an election could take over. What happens if it's a tie?
It's getting a bit awkward right? , Well don't worry about that
The agreement over the value is called consensus, and this particular proposal agreement algorithm is called Paxos in simple terms.
We are only talking about storage, we need to process all this data to make something out of it, maybe like recommendation systems, etc…
Distributed computing is the key to the influx of Big Data processing we’ve seen in recent years. It is the technique of splitting an enormous task (e.g aggregate 100 billion records), of which no single computer is capable of practically executing on its own, into many smaller tasks, each of which can fit into a single commodity machine. You split your huge task into many smaller ones, have them execute on many machines in parallel, aggregate the data appropriately and you have solved your initial problem. This approach again enables you to scale horizontally — when you have a bigger task, simply include more nodes in the calculation.
Known Scale — Folding@Home had 160k active machines in 2012
An early innovator in this space was Google, which by the necessity of their large amounts of data had to invent a new paradigm for distributed computation — MapReduce. They published a paper on it in 2004 and the open-source community later created Apache Hadoop based on it.
So it’s easy to create a distributed system
if you roll up 5 Node.js servers behind a single load balancer all connected to one database, could you call that a distributed application?
Wrong again !!!
A distributed system is a group of computers working together to appear as a single computer to the end-user. These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.
If you count the database as a shared state, you could argue that this can be classified as a distributed system — but you’d be wrong, as you’ve missed the “working together” part of the definition.
A system is distributed only if the nodes communicate with each other to coordinate their actions.
Well, that’s all for PART I 😎