Summary management has been prime consideration for data warehouse implemented using relations database technology (ROLAP.) Typical set of steps one follows for managing summary in a data warehouse projects is:
- Identifying the summary required. This depends on the nature of analytics and report performance. If on-the-fly summarization is taking long, then the summary is candidate of pre-aggregation
- Refreshing the summaries. The job does not end just after creating the summary. There should be a mechanism in place to keep the summaries fresh.
- Querying the pre-aggregated summary for reporting purpose. Reporting/analysis tools should have capability to leverage on summary
Characteristics of good summary management mechanism should be:
- Transparency. This is applicable for both refresh of summary and query. The summary should automatically refresh with the change in the base fact table. Also a summary query posed to the base table should be automatically redirected to the summary table if there’s any available.
- Efficient. This is applicable to refreshing. Refreshing mechanism should be efficient to do incremental refresh of the summary and not rebuild summaries all again.
- Fast. Both querying and refreshing should be fast
- Capability to manage the percentage of aggregation. There could be a case that all possible set of summaries for a base fact is not required. Summary management tool should enable to you tune this parameter.
So what were the possible summary management artifacts available in Oracle before the introduction of Materialized views in 8i? Primarily the two ways to maintain summaries before MV’s were:
- Snapshot. Snapshot is the mechanism to capture state of data at a given point of time and store it under a separate database object. They are not transparent to querying and needs to be explicitly referred in the query. The refresh of this snapshot could be schedule periodically.
- Summary tables. A data warehouse can have set of tables, which stores the summary data. For example a separate table keeping the pre-aggregated sales data at the year, country level. These tables are again not transparent to the query tools and needs be explicitly referred by the query. Also refresh of this tables needs to be handled in the application (ETL logic).
So none of the above approaches were able to achieve all the characteristics of ideal summary management system.
Materialized views (MV) came along with the Oracle 8i. They replaced snapshot. The core functionality of snapshot became part of MV. The radical characteristic of MV was the query rewrite. Query rewrite enables optimizer to rewrite the query to access materialized view instead of base table if the query is seeking summary information. This feature stands out MV from all the previous approaches of summary management.
Apart from query rewrite the other capability of MV is the wide range of options available for refresh. Refresh options avialble with MV’s could be classfied under two categories:
- How to refresh:
- Fast Refresh (Apply only incremental changes happened in the base fact to the MV)
- Complete Refresh (Rebuild the MV completely)
- Force (Attempt for Fast refresh and if it’s not possible then do complete refresh)
- When to refresh
- On Commit Any changes happening on the base table is immediately propagated to MV. Refresh is driven by the changes in base fact table
- On Demand: Do the refresh when the query accesses the MV. If the MV is stale the refresh at that point of time. Drive by the querying of the MVs.
- Automatically: Scheduling the refresh on the periodical basis. This is independent of both changes in Base table and querying of MV.
Lot of enhancements have been done to the MV after it was first introduced with Oracle 8i. Now one can have indexes on MV, Summary advisor came along in Oracle 9i, MV refresh mechanism have been further enhanced etc.
http://download-west.oracle.com/docs/cd/B14117_01/server.101/b10759/statements_6002.htm#sthref4967 is the quick link to on MV in Oracle 10g.
No comments:
Post a Comment