A lot of people probably have heard the IT guy in their company mention a “data warehouse” before, but exactly what is a business data warehouse?
The easiest way to put it is there are essentially two main ways of storing data: either for operational purposes, or for reporting purposes. When data is stored for operations, you usually want to be able to quickly ADD records to the database – so the database is designed such that data can be inserted into the database quickly. When you want to report on data, then the database will be designed with READING records in mind.
These seems like simple differences, but they create a whole host of incredibly technically challenging issues, and people tend to specialize in either one or the other. When someone refers to a “data warehouse”, they are referring the reporting-oriented database (also known as an OLAP database, for Online Analytical Processing).
The real power of a data warehouse containing your business data is that it is usually structured by creating things called “cubes”. Basically, what this means is that instead of treating the records in the database each as an individual row, they are grouped by some kind of data value, such as “Country” or “Processed Date”.
While this is a somewhat confusing idea, a simple way to look at it is like pages in a book. Imagine you had a book called “World Steel Production”, which contained paragraphs listing all the tons of steel produced on various dates by various steel companies. Now imagine that the pages of that book were organized geographically, so all the steel produced in a given country was listed on the same page together, regardless of date or company.
If the book was structured this way, it would be very quick and easy for you to go to the index and open right to the page for “China Steel Production”, much faster than if the book was simply in chronological order, and you had to go through the whole book to find all the paragraphs related to China.
A data warehouse is basically the same thing. Instead of storing the records in a single chronological fashion, records are grouped by particular measures (sometimes called dimensions), so that they can be accessed quickly for reporting purposes. Each “slice” of the data warehouse “cube” represents a particular value for that dimension – for instance, if the dimension was “Production Year”, there would be a “1998” slice.
Hopefully this sheds some light on data warehouses, which are an incredibly powerful data tool for your business.