Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
?
EE Times-Asia > Embedded
?
?
Embedded??

Simplify device data replication, synchronization

Posted: 14 May 2007 ?? ?Print Version ?Bookmark and Share

Keywords:data replication synchronization? embedded database db4o? db4o API database?

By Rick Grehan
Compuware

Anyone with a PDA is familiar with "synchronization", the act of transferring information between the handheld and the desktop to guarantee identical data on both devices.

The ability to take all or part of a database's content, move it into a separate database, modify either database, and later re-connect the two and reconcile their differences, opens all sorts of possibilities for truly intelligent mobile devices. One can imagine scenarios ranging from handheld inventory tracking appliances to scientific data collection instruments.

Synchronization is closely related to "replication"; in fact, the terms are often used synonymously. Typically, however, replication refers to creating a clone of all or part of an original database, while synchronization refers to resolving the differences between two databases, one of which carries data that was replicated from the former.

Regardless of which term one uses, however, the mechanisms for copying a subset of a database so that the copy can be manipulated remotely from the original, then re-connected and reconciled in a consistent fashion, is as tricky as it is powerful. There is a great deal of under-the-covers bookkeeping that must occur for a proper implementation.

In this article, we will look as some of that bookkeeping, and before we frighten you off with the complexity, we will present an open-source database that deals with the complexities for you, so you can employ replication (and synchronization) in your next mobile application with relatively little mental pain.

First, however, the travails that you would otherwise have to wrestle yourself.

Pitfalls of replication
To successfully implement replication, you must hop a number of hurdles that, at first, you might not be aware of. To illustrate, let's assume that we have a database on a desktop system. We want to copy a subset of that database's content to a mobile device, carry the device elsewhere and modify its database, then reconnect to the "parent" database and synchronize the two.

At this stage, we will not be concerned with the kind of databases we're usingwhether relational or object-orientednor the nature of the data stored in them. We will simply say that our databases contain "entities". In a relational database, an entity would probably be a row in a table; in an object database, an entity would likely be an object. Regardless, our focus at this point will be on the tricky issues surrounding replication.

Tricky issue number one is maintaining a logical connection between corresponding entities in both the original and the mobile databases. That is, when we replicate data from the desktop database to the mobile device, we need some way of determining that two entitieseach in a different databaseactually represent the same thing.

This logical tether between entities is obviously critical. When we re-connect the two databases, if an entity has been modified in the mobile database, we have to apply that modification to the corresponding entity in the original databasewhich means we have to know which entity in the original database corresponds to the modified entity in the mobile database.

Establishing and maintaining such a connection between two entities is not as simple as you might think. Obviously, we need a unique identifier that we can attach to objects; an identifier that is identical for the original object and its replicated 'twin'. And, when you ponder the matter further, you realize that it must be a universally unique identifier.

Suppose that, after we replicate the master database into the mobile device database and disconnect the mobile device, we create a new entity in the original database and a different, new entity in the mobile database. We must be guaranteed that the identifiers created for both new entities are indeed unique (there is no chance that they match).

If, by some chance, the identifiers did match, then our databases are likely to become corrupted when they are re-connected. The synchronization software will incorrectly deduce that the two different entities are the same. Who know what sorts of errors will result?

Dealing with multiple mobile databases
Furthermore, what happens if the original database is replicated into multiple mobile databases? The classic example of this arrangement is a master database of customer information that feeds into salespeople's mobile databases. All the salespeople of the organization replicate a subset of the master database into their mobile devices, then travel out to the field to meet with customers, enter new data, modify data, and so on.

Later, each salesperson re-synchronizes his or her mobile database with the home office's master database. No matter how many mobile databases might be created from the original, and no matter how many new entities are added to each mobile database, all distinct entities must have unique identifiers.

Tricky issue number two is determining which entities have been modified. Put another way, we need a mechanism that allows us to ascertain which entities in the two databases have been altered since the mobile database was created from the original.

We could design the synchronization process so that it looks at ALL the entities in each database, examining each corresponding pair for differences. This technique, however, becomes more time consuming as the number of entities rises. As the database size grew, the synchronization process would expend more and more unnecessary time examining unmodified entities simply to find that they were unchanged.

Dirty flags
One solution would be to associate a "dirty flag" with each entity. An entity's dirty flag is set whenever that entity is modified, and cleared by the synchronization process. This technique would certainly quicken synchronization. A dirty flag is, in fact, used by the Palm OS to manage modifications. Every record in a Palm database carries a dirty flag that is set whenever the record is modified, and cleared by a subsequent synchronization process.

But, the difficulty of identifying modified entities is complicated by the fact that updates can occur on both the mobile and the original database. In that case, a dirty flag may not be sufficient. We may need to identify the time of the modification, so we can deduce which entity is the "older" entity.

Yet, even thatby itselfmight not be good enough. The fact that one modification occurs later than another may not be sufficient to qualify the later modification as the one that should prevail in a synchronization.

Ideally, our synchronization code should allow us to specify 'conflict resolution' algorithms for determining which entity is the 'winner', and therefore overwrites the other. This conflict resolution process could be informed by each object's modification time, and might even require user intervention.

Working out a scheme for tracking entities
None of the intricacies of replication and synchronization so far described are beyond the reach of a good programmer and decent code. Concocting a technique for generating a global identifier, associating that GUID with entities in the database, and working out a scheme for tracking entities that have been modified since the last synchronizationall these are well within the capabilities of a moderately good programmer.

However, even a moderately good programmer would probably rather get on with the coding of the actual database application, rather than spending his or her time designing and coding a replication/synchronization system. Luckily, there is a small-footprint, open-source database that manages virtually all the details so far described.

The database is called db4o, an embeddable object database engine available from www.db4objects.com. It is embeddable in the sense that the db4o engine is delivered as a library that you link into your applicationrunning in the same process space as your application, rather than operating in client/server fashion. (However, there is a client/server variant of db4o, should an application require that architecture.) Versions of db4o exist for Java and .NET (also MONO). The examples in this article will be in Java, but everything done here could also be done in .NET.

While db4o is an object database, it places no practical restrictions on the sorts of objects that can be persisted. It will happily handle simple objects as easily as arrays, collections, and even complex trees or networks of objects. Nor must the classes of persistent objects be specially augmented.

Some object databases require persistent classes to be descended from a persistence-aware parent, or to implement special persistence-enabling interfaces. db4o has no such requirements. The simplicity of the db4o API is possibly its most powerful characteristic.

For the following discussion, we will create a pair of classes whose objects are to be made persistent and, subsequently, replicated. We will pretend that we have a customer database, and we wish to replicate that database into a portable device so that a company employee can take that device into the field and record payments made by customers. Later, the mobile database will be re-connected to the 'parent' database, and synchronized.

An amended version of the Customer class is shown below:

We've left out the details of the accessor methods to keep things simple. As you can see, each Customer object carries a reference to an ArrayList of Payment objects. The Payment class looks like this:

Again, we've left out the access methods for simplicity's sake.

The intent of our class structure should be clear. Once a Customer object is entered into the database, information is collected concerning that customer's payments, and stored as Payment objects in each Customer object's payments ArrayList.

So, to demonstrate replication, we'll create a 'master' database, populate it with Customer and invoice information, then replicate that database into 'mobile' database. We will modify one of the Customer objects in the mobile database by adding a new Payment object, and synchronizer the mobile database with the master database. If all goes well, we will end up with identical master and mobile databases.

The first step, then, is a program that creates and populates the master database. This code is shown below:

The first two calls in this application are to the generateUUIDs() and generateVersionNumbers() methods. Notice that these are methods in the configuration API of the Db4o object (which represents the db4o database engine). These calls are necessary because our database is going to support replication.

Recall that we said that, for replication to work correctly, we need to be able to uniquely identify each object, and keep track of each object's version number. The call to generateUUIDs() accomplishes the former, and the call to generate VersionNumbers() accomplishes the latter.

These two methods activate mechanisms internal to db4o so that unique identifiers and version numbers are automatically generated for us by the db4o database engine.

Once we have configured the db4o engine, we create the database by first ensuring that the database file doesn't exist, then by calling the openFile() method on the Db4o object. This call simultaneously creates the database file, and provides a reference to the database's associated ObjectContainer. ("ObjectContainer" is db4o parlance for the database itself.)

Next, we create two Customer objects: "Bob" and "Bill". In addition, we attach a pair of payments to Bob: one for $100, another for $200. We store those objects into the database with a call to db.set(). Notice that we didn't have to tell db4o what those objects looked like.

There are no schema files to tell db4o that a Customer object includes a reference to an ArrayList, and the ArrayList must be stored as well as the Customer object. db4o figures it all out by itself. db4o "spiders" through an object's tree andunless we tell it otherwiseautomatically stores all objects that the 'base' object references.

Finally, we call db.commit(). Anytime an operation is performed on a db4o ObjectContainer that modifies the database, db4o invisibly starts a transaction. All we have to do is commit that transaction (which guarantees that the changes to the database will be permanent, even if the system were to somehow crash). We should note that db4o also supports rolling back (aborting) a transaction, which we could invoke with a call to db.rollback().

If we need to verify that our objects have been correctly stored in the database, we can peek into the customer.YAP file with db4o's ObjectManager. The ObjectManager is a kind of database explorer with which we can examine a database's contents, and explore the relationships among objects stored within that database.

The screenshot in Figure 1 shows the ObjectManager opened on the Customer database. In the Stored Classes frame, we can see that the Customer database does include bot Customer objects and Payment objects. Furthermore, in the right-hand frame, a tree view of the "Bob" Customer object shows that Bob's payments ArrayList contains two Payment objects, as it should.

Figure 1

With our parent database created, we can now replicate the objects from it into the mobile database. The following code is all we need to accomplish this:

The code begins by creating the mobile database and opening the original, master' database. The mechanics of replication are encapsulated in the ReplicationSession interface.

We create a ReplicationSession object using the Replication.begin() factory method, specifying the master database first, and the mobile database second. This ordering sets the direction; we are telling the replication system that objects in the original database are to be replicated into the new, mobile database.

Replication itself is driven by a kind of query. The call replication.providerA().objectsChangedSinceLastReplication() retrieves from the first replication provider (provider), which maps to db, the master database) all those objects that have changed since the last replication. This, of course, will retrieve ALL the objects in the original database. The list of objects is made available in the ObjectSet (changed).

At this point, all we need to do is iterate through the ObjectSet, calling replicate() on each item returned. It's that simple.

With the mobile database populated, we can modify one of its objects. We do so with the following code.

New in the above code are calls to activationDepth() and updateDepth(), which are applied to the Db4o configuration API. As you've already seen, when we store an object into a db4o database, the database engine also stores any reachable objectsthat is, objects referenced by the base object being stored.

However, when we fetch an object from the database, db4o does not automatically fetch all reachable objects; it only fetches reachable objects up to a given depthcalled the 'activation' depth. Likewise, if we update an object in the database (put it back after fetching and modifying it), db4o only re-stores reachable objects up to a given 'update' depth.

Consequently, the calls to activationDepth() and updateDepth() tell db4o how far into the object tree to reach when fetching and updating objects in the database. Setting both to 4 ensures that when we fetch or update a Customer object, we will also fetch and update the associated Payment objects.

We actually fetch the object associated with customer "Bill" by executing what db4o refers to as a query by example (QBE). We do this by building a template object custTemplateand setting the name field to the name we want matched in the database. All other fields are left zero or empty.

We then pass that to db4o's get() method, and db4o will return all objects in the database whose fields match the non-empty/non-zero fields of the template. In our case, there is only one "Bill" object in the database, so we withdraw that object from the returned ObjectSet collection, add a new payment to Bill's payments ArrayList, and put Bill back in the database with a set() call. A commit() and a close(), and our mobile database has been modified.

Finally, we can synchronize the mobile database with the parent database, using the following code:

This code looks virtually identical to the code we used to replicate from parent to mobile database. This is because, as far as db4o is concerned, replication and synchronization are really the same thing. The only notable difference in this piece of code is the order of the arguments in the call to Replication.begin().

Recall that replication "flows" from the first argument to the second. So, in the code above, the replication source is now the mobile database (dbMobile), and the replication destination is the master database (db).

In sync
While synchronization/replication can be an involved process if managed at the application level, having the mechanism incorporated directly into the database engine simplifies matters significantly.

As we've shown, db4o's replication system does the behind-the-scenes heavy-lifting of creating and tracking global object identifiers and version numbers, as well as managing the actual transfer of replicated objects from database to database.

There is still more, however, that space has not allowed us to present. For example, you can create conflict-resolution handlers, and 'hook' them into the replication session. When db4o identifies an object that has been modified in both databases, the handler is called, and is passed references to each object in contention.

Your handler can examine object contents, or even request db4o to provide each object's version information (which includes a timestamp of when the object was modified), and relay back to the replication system which of the two objects should be deemed 'the winner'.

While db4o's streamlined API makes it quick and easy to incorporate into a database application, its best characteristic is the fact that it is a professional system available as an open-source offering. It might not be a "synchronization-in-a-box" solution, but it's awful close.

About the author
Rick Grehan
is a QA Engineer for Compuware's NuMega Labs. He has been programming for nearly 30 years, and has also co-authored three books: one on RPCs, another on embedded systems, and a third on object databases in Java.




Article Comments - Simplify device data replication, sy...
Comments:??
*? You can enter [0] more charecters.
*Verify code:
?
?
Webinars

Seminars

Visit Asia Webinars to learn about the latest in technology and get practical design tips.

?
?
Back to Top