In a dynamic, global economy, organizations have begun to more heavily rely on insights from their customers, internal processes and business operations in order to uncover new opportunities for growth. In the process of discovering and determining these insights, large complex sets of data are generated that then must be managed, analyzed and manipulated by skilled professionals. The compilation of this large collection of data is collectively known as “big data.”
Most professionals in the industry consider multiple terabytes or petabytes to be the current big data benchmark. Others, however, are hesitant to commit to a specific quantity, as the rapid pace of technological development may render today’s concept of “big” as tomorrow’s “normal.” Still others will define big data relative to its context. In other words, big data is a subjective label attached to situations in which human and technical infrastructures are unable to keep pace with a company’s data needs.
Though the word “big” implies such, big data isn’t simply defined by volume, it’s about complexity. Many small datasets that are considered big data do not consume much physical space but are particularly complex in nature. At the same time, large datasets that require significant physical space may not be complex enough to be considered big data.
In addition to volume, the big data label also includes data variety and velocity making up the three V’s of big data – volume, variety and velocity. Variety references the different types of structured and unstructured data that organizations can collect, such as transaction-level data, video and audio, or text and log files. Velocity is an indication of how quickly the data can be made available for analysis.
In addition to the three V’s, some add a fourth to the big data definition. Veracity is an indication of data integrity and the ability for an organization to trust the data and be able to confidently use it to make crucial decisions.
To gain a better perspective on how much data is being generated and managed by big data systems, consider the following noteworthy facts:
As stated earlier, organizations are increasingly turning to big data to discover new ways to improve decision-making, opportunities and overall performance. For example, big data can be harnessed to address the challenges that arise when information that is dispersed across several different systems that are not interconnected by a central system. By aggregating data across systems, big data can help improve decision-making capability. It also can augment data warehouse solutions by serving as a buffer to process new data for inclusion in the data warehouse or to remove infrequently accessed or aged data.
Big data can lead to improvements in overall operations by giving organizations greater visibility into operational issues. Operational insights might depend on machine data, which can include anything from computers to sensors or meters to GPS devices. Big data provides unprecedented insight on customers’ decision-making processes by allowing companies to track and analyze shopping patterns, recommendations, purchasing behavior and other drivers that are known to influence sales.
Cyber security and fraud detection is another use of big data. With access to real-time data, businesses can enhance security and intelligence analysis platforms. They can also process, store and analyze a wider variety of data types to improve intelligence, security and law enforcement insight.
The challenge of big data is to convert it into useable information by identifying patterns and deviations from those patterns. Many companies are looking for solutions in how to do so. Developers and software providers certified in business intelligence are rising to this challenge, turning big data management into a booming industry with major players in both private industry and open source communities.
Among the private industry providers are SAP Sybase Q, Oracle Big Data Appliance, HP Information Optimization Solutions and IBM Big Data Platform. Open source providers are driving much of the change and innovation in big data management and include Apache Hadoop, Cascading, Apache HBase and MongoDB.
Big data is typically a division of the information technology department and requires people highly skilled in programming and data analysis to extract meaningful information and insights. Companies are now turning to data visualization tools to harness the power of big data and get it into the hands of those who can use it. By granting greater access to big data, companies can turn more of their data into useful information that leads to improvements on a wider scale.
With the growth of devices and transactions that generate increasingly complex data streams, effectively using that data is rapidly becoming a significant competitive advantage for many companies. In fact, some companies consider data to be one of their most valuable assets. Therefore, big data should only get bigger as organizations look for more and better ways to tap into existing data and gather new and emerging types of data to make critical decisions, answering questions that were previously considered beyond reach.