Queplix Data Virtualization and Hadoop: Marriage Made in Heaven

Saturday, 19 February 2011 23:50

Queplix CTO Steve Yaskin comments on new technology developments impacting data virtualization business – http://bit.ly/hfGvsi

Author: Steve Yaskin, CTO Queplix

Recently, there has been a lot of news covering advances in parallel processing frameworks, such as Hadoop. Some innovative data warehouse software vendors are increasingly starting to research new development strategies that parallel processing offers. So far, the majority of thsee efforts were targeted at improving the performance and optimization maps of queries within the traditional physical data warehouse architectures. For example, traditional data warehouse vendors like Teradata joined the Hadoop movement and applied parallel processing to their physical DW infrastructures. Companies like Yahoo and Amazon are also spearheading map/reduce Hadoop adaption for large data scale analytics.

I have been monitoring advances in the Hadoop front in particular, as I believe it will provide grounds for convergence for our products and a new development direction for Queplix Data Virtualization. Data virtualization and Hadoop are born from the same premise – provide data storage scalability and ease of information access and sharing and I see how the two technologies complement each other perfectly.

Hadoop’s data warehouse infrastructure (Hive) is what we are researching now to integrate with Queplix Data Virtualization products. Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL. Queplix Data Virtualization will soon utilize the flexibility of its object-oriented data modeling combined with the massive power of Hadoop parallel processing to build virtual data warehouse solutions. Imagine the analytical performance of such a virtual data warehouse solution created by using the Virtual Metadata Catalog and Virtual Entities in its base as organizational and hierarchal units (instead of traditional tables and columns and SQL-driven access).  Such “virtual” data warehouse solutions would be a perfect fit for large scale operational and analytical processing, data quality and data governance projects with the full power of Queplix heuristic and semantic data analysis. Today, data virtualization solutions are deployed by many larger enterprises to gain visibility into disperse application data silos without disrupting the original sources and applications; in the near future Data Virtualization and Hadoop-based virtual data warehouse solutions will be deployed in tandem to implement the full spectrum of data management enterprise solutions, ranging from larger-scale data integration projects (i.e. massive application data store mergers as a result of M&As between large companies) all the way to Virtual Master Data Management pioneered by Queplix. Such solutions will not only provide a better abstraction and business continuity for enterprise applications but will also utilize the full power of parallel processing and provide immense scalability to Queplix semantic data analytics and data alignment products.

Here are some new and exciting ideas Queplix is working on now:

• utilizing Hadoop for Virtual CEP (Complex Event Processing) within Queplix Virtual Metadata Catalog

• generating “data steward” real-time alerts using predictive data lineage analysis before data quality problems start to affect your enterprise applications

• implementing Hadoop-based virtual data warehouse solutions to provide high availability for large application stores that require massive analytics and semantic data processing

• large-scale Virtual Master Data Management initiatives involving enterprise-wide customer or product catalog building

• large-scale business intelligence projects based on Queplix Virtual Metadata Catalog

Watch this blog for new developments and advances of Queplix technology integrating Hadoop and Data Virtualization as we make announcements throughout the year!

