e-Business Management

SOLUTION

	BMC Products

	Digital Business Automation

	CTM 8

	CTM Overview

	Steelfusion

	SteelAPP

	DataDirect hadoop

	Progress DataDirect

	foghorn

DataDirect Type 5 JDBC

MongoDB Connectivity

HADOOP APACHE HIVE ODBC DRIVER

https://www.progress.com/products/datadirect-connect/odbc-drivers/data-sources/hadoop-apache-hive

Connectivity for BIG DATA

Ensure Reliable, High Performance Big Data Operations

https://www.progress.com/products/datadirect-connect/solutions/big-data

많은 전문가들은 앞으로 수년간 데이터량이 폭증하여 terabyte시대는 가고 petabyte가 일반화 될 것으로 전망합니다.

지금 우리는 Big data시대에 와 있습니다.

거대하고 다양한 데이터을 신속하게 처리해야 하고, 거기서 기업에 가치있는 정보를 만들어 내야 하는 도전에 직면해 있습니다.

최근 Apache Hadoop이 부상하고 있는데 이 Apache Hadoop software library는 컴퓨터 클러스터들에 흩어져 있는 대량 데이터 세트의 분산 처리를 간단한 프로그래밍 모델을 사용하는 가능하게 하는 프레임워크입니다. Apache Hive는 Hadoop으로 구축 된 데이터 요약, 쿼리 및 분석을 제공하는 DW 인프라입니다.

DataDirect driver for Apache Hive는 ODBC규범을 완벽히 준수하면서 multiple Hadoop distributions 환경에 특화된 유일한 ODBC driver 입니다. Apache Hive ODBC driver 로 DB 연결과 증가하는 대량 데이터의 import/export에 최상의 성능을 제공합니다.

DataDirect는 Hadoop JDBC driver로 Hadoop DB에서 데이터를 추출하여, DataDirect JDBC Driver로 Oracle, DB2, SQL Server, Sybase, 그리고 타 relational DB로 bulk load 할 수 있습니다.

아래와 같이 대량의 데이터를 처리할 때 유용하게 사용할 수 있습니다.

Data Warehousing

Bulk Load delivers the best performance for loading bulk data into an Oracle, DB2, Sybase, or SQL Server-based data warehouse – while avoiding data latency issues

Data Migration

Bulk Load is ideal for extract and load data migration operations – whether simple or more complex.

Data Replication

Rather than use FTP or pushing files around a network, Bulk Load functionality can allow for quickly loading data into relational database tables. This approach is faster and provides the benefit of storing the data as a relational database table easily accessed by reporting or Business Intelligence applications.

Disaster Recovery

Bulk Load can ensure that the data is quickly and easily replicated into disaster recovery databases.

Cloud Data Publication

Bulk Load allows developers to quickly and easily build a simple program that publishes Big Data into the cloud in a way which uses resources efficiently.

v

Bulk Load – Moving Big Data

잘 튜닝된 프로그램의 경우 처리속도의 90%이상은 DB Driver가 data를 액세스하는데 소요됩니다. 그러므로 Big data를 처리해야 할 경우 DB Driver의 역할은 더욱 커지게 됩니다. 기업은 Big Data를 기존의 data인프라에 완벽하게 융합을 시켜야 합니다.

DataDirect ODBC, JDBC, ADO.NET driver가 이를 가능하게 해줍니다.

         Guarantee the availability of any size data from any source

         Easily move Apache Hive Big Data into any data source

         Manage 'single-driver' connectivity to a wide array of enterprise databases and platforms

         Deliver the best possible bulk load performance, scalability and reliability

         Deploy with no application code changes or database vendor tools

         Reduce the time, cost and risk of making new data sets available to enterprise users

v

Progress DataDirect Hadoop Apache Hive ODBC Driver

어떻게 Data Warehouse(DW)에 효과적이면서도 효율적으로 connect하여 방대한 데이터를 수집하려 하십니까?

오늘날 Apache Hadoop® file system은 방대한 데이터를 보관해야 하는 DW용으로 많이 사용하기 시작했습니다. 그러나 아직 기존의 SQL기반의 business intelligence(BI) 에 연결하여 사용하는 수준이어서 분석된 정보가 미흡합니다.

‘DataDirect driver for Apache Hive’는 최신버전인 Apache Hive(“Hive2”)와 배포판인 Cloudera 까지 완벽하게 지원하는 유일한 fully-compliant ODBC driver 입니다.

High-performance and throughput with support for Hive2 and concurrent connections

Improved authentication for increased data security

Cloudera CDH 4.5 certification plus Cloudera Hive2 support

Support for Hive Kerberos

In addition to Cloudera, support for Apache, MapR, and Amazon EMR Hadoop distributions

Windows, RedHat, Solaris, SUSE, AIX, and HP-UX platform support

SELECT, INSERT [OVERWRITE] SELECT, LOAD, and CREATE/DROP Hive grammar support

Full driver metadata

Support for parameter arrays, processing the arrays as a series of executions, one execution for each row in the array

Support for standard SQL functionality, including Create Index, Create Table, Create View, Drop Index, Drop Table, Drop View

Support for a wide range of data types: Int, TinyInt, SmallInt, BigInt, String, Double, Binary, Boolean, Float, and Timestamp

Supported Apache Hive Versions and Distribution Versions

Distribution Version

Apache Hive Version

Amazon Elastic MapReduce (Amazon EMR)

N/A

Hive 0.8.x
Hive 0.11x

Apache Hadoop Hive

N/A

Hive 0.8.x
Hive 0.9.x
Hive 0.10.x
Hive 0.11.x
Hive 0.12.x
Hive 0.13.x

Cloudera’s Distribution Including Apache Hadoop (CDH)

CDH 4.0.x, 4.1
CDH 4.2, CDH 4.5
CHD 5.0

Hive 0.7.1
Hive 0.8.x
Hive 0.9.x
Hive 0.11.x
Hive 0.12.x
Hive 0.13.x

Hortonworks Distrbution for Apache Hadoop

HDP 1.3
HDP 2.0
HDP 2.1

Hive 0.11.x
Hive 0.12.x
Hive 0.13.x

MapR Distribution for Apache Hadoop

MapR 1.2
MapR 2.0

Hive 0.7.1
Hive 0.9.x

DataDirect Type 5 JDBC MongoDB Connectivity

HADOOP APACHE HIVE ODBC DRIVER

HADOOP APACHE HIVE ODBC DRIVER

Connectivity for BIG DATA

Ensure Reliable, High Performance Big Data Operations

DataDirect Type 5 JDBC

MongoDB Connectivity