Data warehousing helps manage the data flows and processes inherent to typical business operations. Data warehouses serve as the repository of business data, collected from a variety of internal and external sources. Most are largely populated with historical data derived from transaction data.
The data warehouse collects, organizes and stores the “fuel” for the Business Intelligence (BI) program – data from multiple areas like inventory, customers and sales that is analyzed to identify actionable trends to improve business operations.
At their basic level, data warehouses are a collection of tools that facilitate “ETL” or the functions of extracting, transforming and loading data. At a higher level, they also comprise the infrastructure necessary to run business processes that involve a large amount of data and generate reports for business users.
One of the more interesting developments to complicate the data warehouse environment in recent years has been the explosion in Big Data. Big Data is the massive volume of data sets, structured and unstructured, whose size and complexity make it difficult to manage and process using traditional tools – like those represented by data warehouses. The concern had been the need to find ways to scale up IT systems so that Big Data could lead to business value as a function of BI.
Today, scaling up the data warehouse to manage Big Data remains a concern, but it’s manageable, and increasingly, businesses are finding ways to team them up to drive better BI insights.
Philip Russom, director of The Data Warehouse Institute Research for Data Management, notes that Big Data from new social and digital sources ranges from structured to semi-structured to unstructured; most data warehouses are not designed to store and manage the full range. And Big Data is often fed continuously and in real time – something most data warehouses are not equipped to accommodate.
“Most of the business value coming from Big Data is derived from advanced analytics based on the combination of both traditional enterprise data and new data sources,” according to Russom.
“Big Data and the data warehouse can be a powerful team, providing many new analytic applications that enterprises need to stay competitive. Achieving this, however, requires some modifications to existing infrastructure, tools and process to integrate Big Data into the existing data warehouse environment.”
In her blog, Tamara Dull notes that some companies are converting completely from traditional BI/data warehousing architectures to Big Data and Hadoop, the open source framework for processing large datasets. “…a bold and arduous undertaking,” states Dull, director of emerging technologies at SAS Institute.
She and Russom are in agreement that it’s necessary to find ways to make traditional and new work together effectively, in ways each system was intended, to drive better decisions.
Her example: A high-tech company wants to update the customers in its social network of friends, so it pulls data from its social networking site and combines it with data from the data warehouse. It also might use Hadoop to quickly “score” their social influence. The data is provisioned back to the data warehouse where a campaign manager, for example, can view their influence scores and re-segment them as needed.
Data warehouses have had an important role to play for years in powering Business Intelligence that leads to smarter business decisions and more effective business strategy. The advent of Big Data is a factor that makes the environment more complex, but at the same time promises to enrich the process and intended outcomes, as well.