Data Warehouse a.k.a ETL Process a.k.a Enterprise Data Warehouse(EDW) a.k.a Big Data a.k.a Back bone of information to take necessary business decisions a.k.a boring subject during my college days 🙂
And here I am today…Testing Data Warehouse for a Banking application involving huge pile of data which is extremely crucial for making business decisions.
Data Warehouse tends to become complex mainly due to its humongous volume of data being involved. The process can be explained in terms of ETL i.e, Extract Transform Load.
Extract: data is being extracted from different source(i.e, various source DB’s & tables) which is then converted into simple consolidated Data Warehouse format
Transform: Applying different rules to the data:
- Cleaning (e.g., mapping NULL to 0 or “Male” to “M” and “Female” to “F” etc.)
- Filtering (e.g., selecting only certain columns to load)
- Splitting a column into multiple columns and vice versa
- Joining together data from multiple sources (e.g., lookup, merge)
- Transposing rows and columns
- Applying any kind of simple or complex data validation
Load: Finally the data is loaded into Data Warehouse DB which acts like a source for target reporting information
There are various DataWarehouse/ETL Tools available in market to ease up our work. Some of them are Talend Studio, Oracle Warehouse Builder, SAP data services, and the list can go on. Tools for DataWarehouse/ETL testing are also available in market like QuerySurge, Informatica, et cetra…
Now everyone may not have the pleasure of using Tools for testing Data Warehouse. Hence, this blog is targeted to explain Test Strategy for Testing DataWarehouse in an effective manner by using simple tool like Beyond Compare.
Testing Data Warehouse
Test Strategy for testing DataWarehouse is focused on 2 simple items:
- Data accuracy between source & target
- Config rules for data migration
Here is a detailed look on test strategy for testing DataWarehouse/ETL
Beyond Compare tool makes it extremely easy to compare two excel/csv files(Query result for Source & target tables) along with various GUI features like: Rules(allows you to specify columns to be compared from two excel files), Diff, Same, All, Color Highlighting, and so on. As the end result checking for data correctness for testing Data Warehouse becomes a piece of cake:
- Verify for Data Correctness
- Total records count/data in source & target tables
- Numeric, Char, date & time, etc
- Data types & formats
- Verify data for each cloumns, records of table
- Verify records for missing, duplicate data in target tables
- Verify that no data truncation occurred
Manual vs. Automation
The only manual job we need to perform is make necessary queries for DataWarehouse tables & respective source tables. Execute those queries & export query result in excel/csv format. Once done, complete job of comparing data can be easily performed by using Beyond Compare tool. Here is how we can maintain all queries in a managed format for future reference or regression purpose:
Now, after completing all these testing for DataWarehouse, the next question we think of could be: Why not automate the remaining manual process as well?
Answer: Of course, why not!! There are two ways in front of us:
a. We can either subscribe for existing tools available in market for DataWarehouse/ETL testing
OR..
b. Make our own tool which:
executes DB query(for source & targets) >> exports query results into a xls/csv format file >> compare both file data >> reports respective data comparision results
This is it for now, hope the article is useful for fellow testers!!
C Ya…
hi
iam planning to move bigdata how it is please refer me