The rollout of the Hadoop Distributed File System (HDFS) was indeed a major technological innovation. Hadoop gave the ability to store massive amounts of data while MapReduce provided a new distributed (parallel) processing capability to crunch this data rapidly. With the first rollout of Hadoop, Relational Database Management Systems (RDBMS) and its companion Structured Query Language (SQL), a pair of major innovations from three decades prior where left behind.
Regional Database Management Systems & Structured Query Language
Within large enterprises, relational databases and SQL are key components of many mission critical applications. There are dozens of database types and products with the most popular being IBM DB2, Oracle, MySQL, Microsoft SQL Server and PostgreSQL. In order to make Hadoop appeal to a broader set of industries, adding support to process existing data in relational format was key to a wider adoption. Enter Hive.
Hive
Hive provides a similar look, feel and function of the relational databases previously found on non-Hadoop systems. It allows enterprises to load existing data from these systems, while keeping its relational format, using the full power of HDFS and MapReduce. Programmers familiar with SQL have an easy transition to Hive Query Language (HiveQL).
While most large enterprises have no current plans to replace existing mission-critical systems with new Hadoop/Hive applications, they are becoming a popular addition to support the rapidly increasing demand for more and better data analytics.
To learn more about Hadoop, Hive or how our solutions can make them even more efficient, contact us today.