Testing Data Warehouse

Data Warehouse a.k.a ETL Process a.k.a Enterprise Data Warehouse(EDW) a.k.a Big Data a.k.a Back bone of information to take necessary business decisions a.k.a boring subject during my college days 🙂

And here I am today…Testing Data Warehouse for a Banking application involving huge pile of data which is extremely crucial for making business decisions.

Data Warehouse tends to become complex mainly due to its humongous volume of data being involved. The process can be explained in terms of ETL i.e, Extract Transform Load.

Extract: data is being extracted from different source(i.e, various source DB’s & tables) which is then converted into simple consolidated Data Warehouse format

Transform: Applying different rules to the data:

  • Cleaning (e.g., mapping NULL to 0 or “Male” to “M” and “Female” to “F” etc.)
  • Filtering (e.g., selecting only certain columns to load)
  • Splitting a column into multiple columns and vice versa
  • Joining together data from multiple sources (e.g., lookup, merge)
  • Transposing rows and columns
  • Applying any kind of simple or complex data validation

Load: Finally the data is loaded into Data Warehouse DB which acts like a source for target reporting information

There are various DataWarehouse/ETL Tools available in market to ease up our work. Some of them are  Talend Studio,  Oracle Warehouse Builder, SAP data services, and the list can go on. Tools for DataWarehouse/ETL testing are also available in market like QuerySurge, Informatica, et cetra…

Now everyone may not have the pleasure of using Tools for testing Data Warehouse. Hence, this blog is targeted to explain Test Strategy for Testing DataWarehouse in an effective manner by using simple tool like Beyond Compare.

Testing Data Warehouse

Test Strategy for testing DataWarehouse is focused on 2 simple items:

  • Data accuracy between source & target
  • Config rules for data migration

Here is a detailed look on test strategy for testing DataWarehouse/ETL

14.2 Testing Data Warehouse

Beyond Compare tool makes it extremely easy to compare two excel/csv files(Query result for Source & target tables) along with various GUI features like: Rules(allows you to specify columns to be compared from two excel files), Diff, Same, All, Color Highlighting, and so on. As the end result checking for data correctness for testing Data Warehouse becomes a piece of cake:

  • Verify for Data Correctness
    • Total records count/data in source & target tables
    • Numeric, Char, date & time, etc
    • Data types & formats
  • Verify data for each cloumns, records of table
  • Verify records for missing, duplicate data in target tables
  • Verify that no data truncation occurred

Manual vs. Automation

The only manual job we need to perform is make necessary queries for DataWarehouse tables & respective source tables. Execute those queries & export query result in excel/csv format. Once done, complete job of comparing data can be easily performed by using Beyond Compare tool. Here is how we can maintain all queries in a managed format for future reference or regression purpose:

14.3 Testing DataWarehouse_Query management
Now, after completing all these testing for DataWarehouse, the next question we think of could be: Why not automate the remaining manual process as well?

Answer: Of course, why not!! There are two ways in front of us:

a. We can either subscribe for existing tools available in market for DataWarehouse/ETL testing

OR..

b. Make our own tool which:

executes DB query(for source & targets) >> exports query results into a xls/csv format file >> compare both file data >> reports respective data comparision results

This is it for now, hope the article is useful for fellow testers!!

C Ya…

One thought on “Testing Data Warehouse

Leave a comment