Saturday, May 21, 2005

Hand-coded v/s Tools-based ETL:

There have been rounds of debates going around on which paradigm to opt, for developing ETL for typical data warehouse project.

I would share my experience working with Handcoding the ETL and using Oracle Warehouse Builder (Tool Based ETL.)

The ETL of old days has not remain same now. Challenges ETL has faced in the recent times are:

  1. Complex business systems and Analytics requirement
  2. Volume of data
  3. Small refresh window
  4. Orchestrating and automation of Loads
  5. Monitoring capabilities and exception handling
  6. Complex data quality and integrity requirements
  7. Complex transformation
  8. Match-merge and cleansing
  9. Flexibility to the change ETL logic on the fly as the business changes
  10. Varied nature of source systems

And list goes on. So our very old hand-coded ETL does not cope up with this all.

Let me delineate my understanding of ETL. ETL is not piece of code or application, which enables you to do Extraction, Transformation and Loading. E, T and L are three primary pillars and core of typical ETL. But there are few more aspects, which fall under the purview of ETL. Today, ETL stands for Extraction, transformation, Loading, infrastructure elements (for extraction, transformation, loading ).

insert into sales values(customerid, …,…,revenue)

( select orderheader.customerid,… orderline.qty * orderline.price from orderheader@src , orderline@src where orderheader.order_id = orderline.order_id)

Above could be simple ETL doing Extraction (select..) , transformation (qty*price) and loading(insert into..) but it still lacks something to suffice above 10 challenges.

  1. Infrastructure for scheduling and automating the loads
  2. Mechanism to handle errors and exceptions and to throw out necessary alerts.
  3. Resilience and error recovery. .
  4. Abilities to switch from Set-based (insert into …select *) to row based (picking up one row at a time and loading it) loading type
  5. Workflow to orchestrate the various sub-loads (individual units)
  6. Metadata management
  7. Versioning and management of design metadata.
  8. Capturing Runtime audit logs
  9. Capabilities to leverage the specific features of target data warehouse platform.
  10. Flexibility to change the configuration at any level ranging from whole ETL project to individual object.

It’s unrealistic for a hand-coded ETL project to cover all this.

ETL tools suites these days are typically bundled with all this functionalities. Oracle Warehouse Builder is one such tool. Apart from enabling one to model the ETL flows (and not code), it also gives you all the above functionalities. Tools like OWB gives above functionalities in one integrated suite. With the new release of OWB (Paris) things would be more streamlined and all the above functionalities will be accessible through one single interface. I would be sharing my experiences on OWB Paris in subsequent blogs.

As I keep hearing from the research analysts that applications would be modeled and not coded as time goes by. This is driven by the inherent complexities in business applications.

So coming back to our prime question of which approach is better between Hand-coding and tool-based. As everything depends on context, the answer to this also depends on the context. If the ETL requirement is simple and straight without any great need for infrastructure elements like monitoring, metadata management etc. hand coding the ETL would be the wise choice. But if the ETL is complex and there is a requirement of monitoring, metadata management etc. tool-based approach would better go ahead.

1 comment:

Adeptia Integration said...

Extract, transform, load and manage data from single environment through best etl tools / software which provides rapid improvement in data warehousing process.