In 1999, Bill Gates predicted that there will be developed devices that connect with each other and synchronize data in a smart way.
16 years later such devices and tools are here. It includes both physical devices like mobile telephones and virtual synchronizing services like Dropbox or Octerpus. We can discuss whether this technology is really that smart yet, or security considerations regarding these services. But still, they are here.
So, why weren’t they developed in 1999? Is it really that hard? What lies behind this technology. And the most important question – what makes one synchronization system better than the other?
Data synchronization is a process of exchanging data with a purpose to make the data representation equal at all destinations. A source sends data to the destination and vice versa. But is that so easy?
Consider an Iphone, an Ipad and a PC. If you change something in all those three systems simultaneously, what happens? Which version will be saved? Ideally, all changes should be taken care of. But what if you delete one file on Iphone and at the same time make changes in the same file on PC? What should the synchronization system do?
Synchronization is actually a whole research topic in Computer Science. There are several theoretical problems that are not so easily solved and have different implementation algorithms:
As we see, synchronization is easy in simple systems like database updates. It was originally developed primarily for databases when a thread that is writing to the database obtains a lock on it and then does whatever it wants. When the thread finishes it releases the lock so that other threads can perform their tasks. This is also the way operating systems, video games and other systems are coded. But when it comes to synchronizing real time data from multiple sources the whole thing becomes really complex and difficult.
Have you ever tried to code on a big code base with a team of 200 people? When every programmer has its own task and often make changes in just the same file as you are working on? Well, I did that. And it’s often a nightmare to properly commit the changes to the version control system in such cases.
Now we are going to look into different algorithms that some popular technology firms use.
Dropbox is using rsync algorithm where only the small difference between the files is transmitted to the server since the original version of the file is already stored. There are several variations of the algorithm with several implementations of each of them.
In addition to the basic synchronizing algorithm, there is a special conflict resolution SDK that takes care of the conflicts like my deleted picture. Here, take a look at it.
And then there is a special algorithm to merge multiple conflicts together.
All algorithms together can give a horrible system performance. They all need to be optimized to work together – and that is a work for several more algorithms!
The most optimal and fast algorithm is the key to be the best in data synchronization. An example is box.com – they claim to have a new and especially effective synchronization algorithm allowing to do a lot of things. They often win over Dropbox because of that.
As we see, the development of new and more efficient synchronization logic – that is what holds the whole idea back. As long as nobody has the ideal solution – the Computer Science branch is going to grow.
Want to be a winner? Solve Byzantine Generals problem (which is considered to be unsolvable).