Local v Distributed Databases

In a local database the data resides in a single location (e.g. a single machine) whereas many locations can be used for a distributed database yet these are presented as a single database to the user/application by the use of a database management system (which also handles any internal complexity). The key advantage of a distributed database is that data can be manipulated and moved without having to change applications, whereas a local database cannot. This translates into the key disadvantages of distributed databases, scale, complexity and data independence call for advanced skills, equipment, communications and therefore much higher costs.

For example, in my past career we ran a 150TB graphic file database that was accessed from our site but also by four other sites in the UK, one in the USA and one in India. You would think that it would be ideal to run this as a distributed database system however due to performance issues that could not be overcome (external communication line speeds and reliability at the time) it was deemed better to have good performance at the UK sites and for others to perform only those tasks that did not require a high bandwidth and we mirrored data to their local systems. The budget for a distributed system was not a consideration in this operation, performance was the main concern.

Of course the majority of large scale “data only” databases are suited to distributed systems (e.g. UK government databases; police, customs, etc.) where not all applications or users need access to all data, therefore data can be distributed and managed by a database management system.

As a result, if the computers were located in the same building I would suggest that a local database be used unless there were any special conditions for have a distributed system. For example where there may be a need for scalability or flexibility in the future or there is a special need to control certain information (e.g. security).

If the computers were spread out over the country or spread out internationally, I would suggest that distributed databases should be used where budgets and resources exist unless, again, there was a special case for using a local database, possibly with mirroring as in the example above as poor response times are a major floor.

References

Glenn, J. (2009) Computer Science: An Overview. 10th ed. Boston: Pearson Education.