The Pantry Principle

My wife rules the pantry.  She is not a database developer by any means, but she enforces one core principle that is sorely lacking in many complex schemas: organize your stuff on the way in — not on the way out!  The pantry analogy explains this pretty well.  When groceries come home, the pantry gets restocked in a very methodical way.  Canned fruit on one shelf and vegetables on another, each lined up by variety.  Pasta, pickles and peanut butter all have their standard places, too.  Most importantly, everything is sorted by expiration date with the oldest items in the front (and left to right, top to bottom). 

Sure, some occasional administrative work is needed to optimize shelf space or to accommodate a killer sale, but being organized to begin with makes that pretty easy.  There are other benefits every day; it makes shopping more efficient and meal planning much easier.  The real bonus, in my opinion, comes when I'm asked to fetch a particular item for the chef.  I know right where it is and which one to grab!

This same principle applies to databases.  I recently worked on a project where a large amount data from several legacy systems had been pulled into a large geodatabase.  Active financial data was also being written into tables with a convoluted process that didn't make any effort to standardize datatypes or formats with other related tables.  Since there wasn't a foreign key in the entire schema, the database couldn't enforce these relationships.  Nothing was very well organized.  The scope of my project was GIS only and the goal was to filter feature records by various parameters defined by this legacy data.  In one table, IDs were integers but in another they were text with a character prefixes.  That prefix correlated to a value in another table.  Most dates were stored as text.  Floating point values needed to match up with integers.  "Oh well," the client said, "we just round that off..."

The data was all there somewhere and my client was proud of that accomplishment.  They had already invested time and effort on system integration and now they just wanted their GIS to work.  After some significant effort to understand their business rules, and to then implement them with a series of complex nested views, it did work.  It just didn't work very efficiently.  Rendering a map was bad enough, but they also need to synchronize data with multiple remote devices.  That was painfully slow. 

Inefficient data retrieval is not the only cost, either.  The additional time required to develop convoluted solutions and troubleshoot problems was expensive.  This particular project suffered through several aggravating episodes where a procedure that worked well in testing broke down in the field because it encountered a duplicate or a corrupt record. 

Recognizing and supporting important data relationships in the beginning, before everything gets jumbled together is another is another kind of efficiency -- you're literally fixing mistakes before they happen.  The whole point of relational databases (GIS or otherwise) is to enforce those relationships.  When properly constrained, a database will raise informative errors when incompatible inserts are attempted and prevent those inserts from corrupting your system.  Fixing problems then, and with that clarity, is always the most economical solution.

Had I been involved in this project during the original system integration work, I would have insisted on the Pantry Principle.  You absolutely have to do everything you can when data is written to make it easier and faster to both maintain and retrieve.  Your data should absolutely be kept in a secure database with frequent backups — but it has little value until you get it out and get it out correctly.  I don't want to be constantly searching for the right ingredients.  I want to have dinner!

Posted in Database on Nov 02, 2020.