Capturing History in your Association's Data Warehouse
Now that your association realizes the importance of using data to make decisions, and has a firm grasp on your initial set of business questions, it’s time to plan how your data mart will be set up to support those initiatives. The data mart will likely be designed by your IT team or an experienced vendor, but it is still helpful for business users to understand the underlying concepts. Your technical resource is essentially constructing a safe place for a large asset to reside, so you’ll want to be involved at a high level.
It’s likely that some of your business questions are something along the lines of “How do my membership numbers for this year compare to last year?” or “Are more people coming to events this year than 5 years ago?” These kinds of questions require us to “go back in time” to find the answer. When we are interested in historical analysis, it is crucial to develop a method of capturing changes within the data model. Many AMS systems do a good job of capturing historical information within logs or other tables, but there may be a gap in what is captured historically and what is important to your association.
There are essentially 3 types of changes that can happen in your system:
- Unchanging Dimension
- No changes are likely. For example, Gender.
- Slowly Changing Dimension
- This is the most common
- These are attributes that can change slowly with respect to time. An example of this is price – while the price of an item in your inventory might change periodically, it’s unlikely that you change it on a daily basis. Another example is member status which may only change 1 or 2 times per year.
- Rapidly Changing Dimensions
- These are attributes that change often with respect to time. An example may be the high bid amount on an auction item during your foundation’s annual fundraiser.
Dimensional Modeling Approach
Since slowly changing dimensions are the most common, let’s explore the different methods of storing these changes.
- Type 1 – We don’t wish to store any history. In this type, we know that the dimensions are slowly changing, however, we are not interested in storing those changes. We are only interested in storing the current or latest value. Each time it changes, we will update the old value with the new ones. An example of this may be email address which may not be needed for analysis, but rather for reference.
- Type 2 – In this case we want to store the history of changes for the purpose of analysis. We are interested in the full history and can also extract the history of changes when necessary. In the dimension table, we add 3 new metadata columns to capture the date range the value is good for and a column to indicate if this is the latest information available. These help us determine if a particular record in the table is the latest or not, and what time period during which the record was the latest. If there is a change to the dimension, a new row is added and the old row is updated. Using this design and looking at the example below, it’s possible to go back to any date in history and figure out what the member type of each individual was at that point in time.
- Type 3 – This is used to store partial history. Instead of inserting new rows into a table like we do for Type 2, in Type 3, we add a new column to the table to store the history. This can become quite cumbersome after many changes, because that will result in multiple columns being added, and the lack of date range makes it more difficult to pin point exactly when the change occurred. There is a modified version of this type where we just store the current value and the previous value. The advantage of this type is that multiple rows are not created which eases out performance concerns.
Developing a data mart for your requirements can be an investment of time and resources. However, the rewards of well managed data will vastly improve operations, decision making, and your ability to deliver services to your members. DSK believes that data is a critical asset and we’ve devoted multiple blog postings to Data Quality Management for associations.