Thursday, March 16, 2017

An Artful Chunk


A fairly substantial design issue when planning the logical design of a database is how to "chunk" it. I don't mean vertical or horizontal partitions (for performance optimization) nor segmentation for security constraints. Rather, I use the term chunk here to signify how you associate large swaths of data with business processes such that you can restore a subset of the database.

Here's an operational example. You have several customers for whom you retrieve monthly or weekly datasets of new input, at random intervals for each customer. After you ingest the data you run three large processes to end up with a pick list of what to ship to each one. Say the whole process takes a couple weeks of elapsed time.

One day, after the second big processing step, you are informed that one of your customers sent you the data for the wrong store. What do you do now?

Planning for this type of occurrence takes careful attention to detail. Any manual data updates that occur outside of automated production need an audit trail so you may re-apply them after your data is restored. Large aggregation processes need checkpoints or staging tables so you may recalculate after a backout.

Making contingencies for large contiguous corrections, from the start of logical database design, is the artful way to safely chunk your database.