Synchronizing data – how easy is it to implement

By , last updated December 4, 2019

In 1999, Bill Gates predicted that there will be developed devices that connect with each other and synchronize data in a smart way.

16 years later such devices and tools are here. It includes both physical devices like mobile telephones and virtual synchronizing services like Dropbox or Octerpus. We can discuss whether this technology is really that smart yet, or security considerations regarding these services. But still, they are here.

So, why weren’t they developed in 1999? Is it really that hard? What lies behind this technology. And the most important question – what makes one synchronization system better than the other?

Synchronization process

Data synchronization is a process of exchanging data with a purpose to make the data representation equal at all destinations. A source sends data to the destination and vice versa. But is that so easy?

Consider an Iphone, an Ipad and a PC. If you change something in all those three systems simultaneously, what happens? Which version will be saved? Ideally, all changes should be taken care of. But what if you delete one file on Iphone and at the same time make changes in the same file on PC? What should the synchronization system do?

balls war

Synchronization is actually a whole research topic in Computer Science. There are several theoretical problems that are not so easily solved and have different implementation algorithms:

  • Source preserve This is a type of synchronization when a source is an answer to all conflicts. This can work for some systems, but in case I delete a photo on Iphone by mistake I absolutely don’t want to lose it forever.
  • Time based synchronization In this system everything that arrives later is the source of truth. All the other data are considered to be old. This is more applicable than the first one, but still, what if I delete my picture on Iphone while I don’t have an internet connection? Then I go to my PC and make changes and then connect my phone to the internet? In this case my picture will be gone forever.
  • Mathematical synchronization This is probably the most reliable method of data synchronization and, of course, the most difficult one. All data are treated as mathematical objects and the system needs to find out what object should be changed and how.

As we see, synchronization is easy in simple systems like database updates. It was originally developed primarily for databases when a thread that is writing to the database obtains a lock on it and then does whatever it wants. When the thread finishes it releases the lock so that other threads can perform their tasks. This is also the way operating systems, video games and other systems are coded. But when it comes to synchronizing real time data from multiple sources the whole thing becomes really complex and difficult.

Have you ever tried to code on a big code base with a team of 200 people? When every programmer has its own task and often make changes in just the same file as you are working on? Well, I did that. And it’s often a nightmare to properly commit the changes to the version control system in such cases.

Now we are going to look into different algorithms that some popular technology firms use.

Dropbox algorithm

Dropbox is using rsync algorithm where only the small difference between the files is transmitted to the server since the original version of the file is already stored. There are several variations of the algorithm with several implementations of each of them.

In addition to the basic synchronizing algorithm, there is a special conflict resolution SDK that takes care of the conflicts like my deleted picture. Here, take a look at it.

And then there is a special algorithm to merge multiple conflicts together.

All algorithms together can give a horrible system performance. They all need to be optimized to work together – and that is a work for several more algorithms!

Custom algorithms

The most optimal and fast algorithm is the key to be the best in data synchronization. An example is box.com – they claim to have a new and especially effective synchronization algorithm allowing to do a lot of things. They often win over Dropbox because of that.

As we see, the development of new and more efficient synchronization logic – that is what holds the whole idea back. As long as nobody has the ideal solution – the Computer Science branch is going to grow.

Want to be a winner? Solve Byzantine Generals problem (which is considered to be unsolvable).