Categories
Software Architect

Control the data, not just the code

Source code control and continuous integration are excellent tools for managing the application build and deployment process. Along with source code, schema and data changes are often a significant part of this process and thus warrant similar controls. If your build and deployment process includes a list of elaborate steps required for data updates, beware. These are the lists that always have you crossing your fingers. They look something like this:

  1. Create a list of scripts that need to be run, in order
  2. E-mail scripts to special database person
  3. Database person copies the scripts to a location where they‘re executed by a cron job
  4. Check script execution log and pray that all scripts ran successfully since you‘re not exactly sure what will happen if you re-run them
  5. Run validation scripts and spot-check the data
  6. Regression test the application and see what blows up
  7. Write scripts to insert missing data and fix blow-ups
  8. Repeat

Ok, so that might be a slight exaggeration but it‘s not that far off. Many a project requires this type of acrobatic workflow for successful database migration. For some reason the data portion of the migration plan seems to be easily overlooked during architecture planning. As a result it can become a brittle, manual process that gets bolted on as an afterthought.

This complex web-work creates many opportunities for process breakdown. To make matters worse, bugs caused by schema and data changes don‘t always get caught by unit tests as part of the nightly build report. They like to rear their ugly head in a loud, boisterous manner immediately after a build has been migrated. Database problems are usually tedious to reverse by hand and their solutions tend to be more difficult to validate. The value of a completely automated build process that is capable of restoring the database to a known state will never be more evident than when you‘re using it to fix an extremely visible issue. If you don‘t have the ability to drop the database and restore it to a state that is compatible with a specific build of the application you are susceptible to the same type of problems you‘d have if you couldn‘t back out a code change quickly.

Database changes shouldn‘t create a ripple in your build‘s time-space continuum. You need to be able to build the entire application, including the database, as one unit. Make data and schema management a seamless part of your automated build and testing process early on and include an undo button; it will pay large dividends. At best it will save hours of painful, high-stress problem solving after a late night blunder. At worst it will give your team the ability to confidently charge forward with refactoring of the data access layer.

'Coz sharing is caring
Categories
Software Architect

It is all about the data

As software developers we initially understand software as a system of commands, functions and algorithms. This instruction-oriented view of software aids us in learning how to build software, but it is this very same perspective that starts to hamper us when we try to build bigger systems.

If you stand back a little, a computer is nothing more than a fancy tool to help you access and manipulate piles of data. It is the structure of this data that lies at the heart of understanding how to manage complexity in a huge system. Millions of instructions are intrinsically complicated, but underneath we can easily get our brains around a smaller set of basic data structures.

For instance, if you want to understand the UNIX operating system, digging through the source code line-by-line is unlikely to help. If however you read a book outlining the primary internal data-structures for handling things like processes and the filesystem, you‘ll have a better chance of understanding how UNIX works underneath. The data is conceptually smaller than the code and considerably less complicated.

As code is running in a computer, the underlying state of the data is continually changing. In an abstract sense, we can see any algorithm as just being just a simple transformation from one version of the data to another. We can see all functionality as just a larger set of well-defined transformations pushing the data through different revisions.

This data-oriented perspective — seeing the system, entirely by the structure of its underlying information — can reduce even the most complicated system down to a tangible collection of details. A reduction in complexity that is necessary for understanding how to build and run complex systems.

Data sits at the core of most problems. Business domain problems creep into the code via the data. Most key algorithms, for example, are often well understood, it is the structure and relationships of the data that frequently change. Operational issues like upgrades are also considerably more difficult if they effect data. This happens because changing code or behavior is not a big issue, it just needs to be released, but revising data structures can involve a huge effort in transforming the old version into a newer one.

And of course, many of the base problems in software architecture are really about data. Is the system collecting the right data at the right time, and who should be able to see or modify it? If the data exists, what is its quality and how fast is it growing? If not, what is its structure, and where does it reliably come from? In this light, once the data is in the system the only other question is whether or not there is already a way to view and/or edit the specific data, or does that need to be added?

From a design perspective, the critical issue for most systems is to get the right data into the system at the right time. From there, applying different transformations to the data is a matter of making it available, executing the functionality and then saving the results. Most systems don’t have to be particularly complex underneath in order for them to work, they just need to build up
bigger and bigger piles of data. Functionality is what we see first, but it’s data that forms the core of every system.

'Coz sharing is caring