Today in the world of cloud and grid computing integration of data from heterogeneous databases is inevitable. Virtual Database Technology (VDB) is one of the effective solutions for integration of data from heterogeneous sources. This will become complex when size of the database is very large. MapReduce is a new framework specifically designed for processing huge datasets on distributed sources. Apache’s Hadoop is an implementation of MapReduce. Currently Hadoop has been applied successfully for file based datasets. This paper proposes to utilize the parallel and distributed processing capability of Hadoop MapReduce for handling heterogeneous query execution on large datasets. So, Virtual Database Engine built on top of this will result in effective high performance distributed data integration.
Keywords: Database integration, Hadoop MapReduce, Virtual Database Technology, heterogeneous databases, query optimization