InfoKarta Inc.
Aggregation Strategy
Home
About us
Content

An Effective Aggregation Methodology

by

Michael J. Kamfonas

© Infokarta Inc.

This is an overview of what choices and decisions one has to make in order to achieve a robust and effective aggregation strategy:

1.      The Aggregation Lattice: A framework for classifying OLAP query workload, levels and patterns – relational or not, with a large part devoted to understanding the concept of the aggregation lattice. How should we decide on the optimal levels for which to build MQTs?

2.      Balanced Configuration Unit (BCU) Selection: How is the sizing of BCU related to the aggregation strategy and response time expectations? Currently the default BCU selection encourages coarse-grained data organization and large block I/O to satisfy table scans. The use of Multidimensional Clustering (MDC) in conjunction with MQTs reduces both the grain and the magnitude of such scans and favors random extent-size I/O. Is there any merit to introducing medium or fine-grained BCUs, particularly for high concurrency and short response times?

3.      The double-dipping query pattern: How can we keep the MQTs to an efficient size by excluding descriptive columns and still be able to redirect queries to them? Should we explicitly code the queries that way, or make the optimizer derive it? What are the pros and cons of either approach?

4.      Choices in MQT design: When should we embed flattened dimensions in an MQT? Is partial inclusion of levels desirable? How should we decide what dimensions to include and what levels of each dimension?

5.      Union Partitioning and Aggregations: It often happens that facts are not a single table, but a union of tables. Why such practice is likely to remain even after roll-in/roll-out functionality becomes available? How should we approach the MQT definitions over union-partitioned fact tables? How can we ensure that the optimizer will rewrite the queries to use the MQTs? How should we deal with the opposite problem, where a yearlong base table is covered by 12 period MQTs?

6.      Multi-Fact Cubes through Metric Integration Views: Most BI reports involve more than one fact table whose metrics need to be pulled together. What techniques can be used to combine metrics from multiple facts of same (or almost same) dimensionality while exploiting their MQTs? How can we use integration views to hide the complexity? How do we deal with dimensions that are present in one contributing fact but not the other?

7.      DB2 Rewrite vs. PMO: PMO stands for Poor Man’s Optimization, a term jokingly used to describe a mechanism, implemented through union views, used to redirect queries to a desired MQT level. Some BI applications such as Business Objects and Microstrategy provide ways for achieving the same goal. Should we fully rely on optimizer-induced redirection to the MQTs or should we force the level in the application? How can one achieve application-driven redirection? Can we have the cake and eat it too?

8.      MQT Maintenance Hurtles: Should we use system or user-maintained MQTs? What does it take to implement a complete system of MQT maintenance, encompassing complete refresh, incremental and recasting processes and utility scripts that are robust, maintainable and restartable. Such processes must be designed with operability in mind, i.e. they must have predictable and reasonable run times, must allow monitoring, provide regular progress status, must be idempotent and restartable, and must be adaptable for special fixes and manual corrections.

9.      Script Generation and Automation: A simple scheme based on templates and Unix scripting to auto-generate all the code needed to support a robust sophisticated aggregation strategy.

We address these questions and others generated as we attempt to answer those. Although the motivation for the approaches discussed comes from specific experiences, we try to evaluate a variety of options available and propose pros and cons. Some of these may appear farfetched, but they are real and have come out of over fifteen years of experience with trying to apply the concepts, the technology and the design in real VLDB situations.

We welcome your comments at inquiries@kamfonas.com

This page has been hit times!