yum安装CDH5.5 hive、impala的过程详解

程序员文章站 2023-10-30 11:50:04

一、安装hive 组件安排如下： 172.16.57.75 bd-ops-test-75 mysql-server 172.16.57.77 bd-ops-t...

一、安装hive

组件安排如下：

172.16.57.75 bd-ops-test-75 mysql-server
172.16.57.77 bd-ops-test-77 hiveserver2 hivemetastore

1.安装hive

在77上安装hive：

# yum install hive hive-metastore hive-server2 hive-jdbc hive-hbase -y

在其他节点上可以安装客户端：

# yum install hive hive-server2 hive-jdbc hive-hbase -y

2.安装mysql

yum方式安装mysql：

# yum install mysql mysql-devel mysql-server mysql-libs -y

启动数据库：

# 配置开启启动
# chkconfig mysqld on
# service mysqld start

安装jdbc驱动：

# yum install mysql-connector-java
# ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/hive/lib/mysql-connector-java.jar

设置mysql初始密码为bigdata：

# mysqladmin -uroot password 'bigdata'

进入数据库后执行如下：

create database metastore;
use metastore;
source /usr/lib/hive/scripts/metastore/upgrade/mysql/hive-schema-1.1.0.mysql.sql;
create user 'hive'@'localhost' identified by 'hive';
grant all privileges on metastore.* to 'hive'@'localhost';
grant all privileges on metastore.* to 'hive'@'%';
flush privileges;

注意：创建的用户为 hive，密码为 hive ，你可以按自己需要进行修改。

修改 hive-site.xml 文件中以下内容：

<property>
<name>javax.jdo.option.connectionurl</name>
<value>jdbc:mysql://172.16.57.75:3306/metastore?useunicode=true&characterencoding=utf-8</value>
</property>
<property>
<name>javax.jdo.option.connectiondrivername</name>
<value>com.mysql.jdbc.driver</value>
</property>

3.配置hive

修改/etc/hadoop/conf/hadoop-env.sh，添加环境变量 hadoop_mapred_home，如果不添加，则当你使用 yarn 运行 mapreduce 时候会出现 unkown rpc type 的异常

export hadoop_mapred_home=/usr/lib/hadoop-mapreduce

在 hdfs 中创建 hive 数据仓库目录:

hive 的数据仓库在 hdfs 中默认为 /user/hive/warehouse,建议修改其访问权限为 1777，以便其他所有用户都可以创建、访问表，但不能删除不属于他的表。

每一个查询 hive 的用户都必须有一个 hdfs 的 home 目录( /user 目录下，如 root 用户的为 /user/root)
hive 所在节点的 /tmp 必须是 world-writable 权限的。

创建目录并设置权限：

# sudo -u hdfs hadoop fs -mkdir /user/hive
# sudo -u hdfs hadoop fs -chown hive /user/hive
# sudo -u hdfs hadoop fs -mkdir /user/hive/warehouse
# sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
# sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse

修改hive-env设置jdk环境变量 :

# vim /etc/hive/conf/hive-env.sh
export java_home=/opt/programs/jdk1.7.0_67

启动hive-server和metastore:

# service hive-metastore start
# service hive-server2 start

4、测试

$ hive -e'create table t(id int);'
$ hive -e'select * from t limit 2;'
$ hive -e'select id from t;'

访问beeline:

$ beeline
beeline> !connect jdbc:hive2://localhost:10000；

5、与hbase集成

先安装 hive-hbase:

# yum install hive-hbase -y

如果你是使用的 cdh4，则需要在 hive shell 里执行以下命令添加 jar：

$ add jar /usr/lib/hive/lib/zookeeper.jar;
$ add jar /usr/lib/hive/lib/hbase.jar;
$ add jar /usr/lib/hive/lib/hive-hbase-handler-<hive_version>.jar
# guava 包的版本以实际版本为准。
$ add jar /usr/lib/hive/lib/guava-11.0.2.jar;

如果你是使用的 cdh5，则需要在 hive shell 里执行以下命令添加 jar：

add jar /usr/lib/hive/lib/zookeeper.jar;
add jar /usr/lib/hive/lib/hive-hbase-handler.jar;
add jar /usr/lib/hbase/lib/guava-12.0.1.jar;
add jar /usr/lib/hbase/hbase-client.jar;
add jar /usr/lib/hbase/hbase-common.jar;
add jar /usr/lib/hbase/hbase-hadoop-compat.jar;
add jar /usr/lib/hbase/hbase-hadoop2-compat.jar;
add jar /usr/lib/hbase/hbase-protocol.jar;
add jar /usr/lib/hbase/hbase-server.jar;

以上你也可以在 hive-site.xml 中通过 hive.aux.jars.path 参数来配置，或者你也可以在 hive-env.sh 中通过 export hive_aux_jars_path= 来设置。

二、安装impala

与hive类似，impala也可以直接与hdfs和hbase库直接交互。只不过hive和其它建立在mapreduce上的框架适合需要长时间运行的批处理任务。例如：那些批量提取，转化，加载（etl）类型的job，而impala主要用于实时查询。

组件分配如下：

172.16.57.74 bd-ops-test-74 impala-state-store impala-catalog impala-server 
172.16.57.75 bd-ops-test-75 impala-server
172.16.57.76 bd-ops-test-76 impala-server
172.16.57.77 bd-ops-test-77 impala-server

1、安装

在74节点安装：

yum install impala-state-store impala-catalog impala-server -y

在75、76、77节点上安装：

yum install impala-server -y

2、配置

2.1修改配置文件

查看安装路径：

# find / -name impala
/var/run/impala
/var/lib/alternatives/impala
/var/log/impala
/usr/lib/impala
/etc/alternatives/impala
/etc/default/impala
/etc/impala
/etc/default/impala

impalad的配置文件路径由环境变量impala_conf_dir指定，默认为/usr/lib/impala/conf，impala 的默认配置在/etc/default/impala，修改该文件中的 impala_catalog_service_host 和 impala_state_store_host

impala_catalog_service_host=bd-ops-test-74
impala_state_store_host=bd-ops-test-74
impala_state_store_port=24000
impala_backend_port=22000
impala_log_dir=/var/log/impala
impala_catalog_args=" -log_dir=${impala_log_dir} -sentry_config=/etc/impala/conf/sentry-site.xml"
impala_state_store_args=" -log_dir=${impala_log_dir} -state_store_port=${impala_state_store_port}"
impala_server_args=" \
-log_dir=${impala_log_dir} \
-use_local_tz_for_unix_timestamp_conversions=true \
-convert_legacy_hive_parquet_utc_timestamps=true \
-catalog_service_host=${impala_catalog_service_host} \
-state_store_port=${impala_state_store_port} \
-use_statestore \
-state_store_host=${impala_state_store_host} \
-be_port=${impala_backend_port} \
-server_name=server1\
-sentry_config=/etc/impala/conf/sentry-site.xml"
enable_core_dumps=false
# libhdfs_opts=-djava.library.path=/usr/lib/impala/lib
# mysql_connector_jar=/usr/share/java/mysql-connector-java.jar
# impala_bin=/usr/lib/impala/sbin
# impala_home=/usr/lib/impala
# hive_home=/usr/lib/hive
# hbase_home=/usr/lib/hbase
# impala_conf_dir=/etc/impala/conf
# hadoop_conf_dir=/etc/impala/conf
# hive_conf_dir=/etc/impala/conf
# hbase_conf_dir=/etc/impala/conf

设置 impala 可以使用的最大内存：在上面的 impala_server_args 参数值后面添加 -mem_limit=70% 即可。

如果需要设置 impala 中每一个队列的最大请求数，需要在上面的 impala_server_args 参数值后面添加 -default_pool_max_requests=-1 ，该参数设置每一个队列的最大请求数，如果为-1，则表示不做限制。

在节点74上创建hive-site.xml、core-site.xml、hdfs-site.xml的软链接至/etc/impala/conf目录并作下面修改在hdfs-site.xml文件中添加如下内容：

<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hadoop-hdfs/dn._port</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>

同步以上文件到其他节点。

2.2创建socket path

在每个节点上创建/var/run/hadoop-hdfs:

# mkdir -p /var/run/hadoop-hdfs

2.3用户要求

impala 安装过程中会创建名为 impala 的用户和组，不要删除该用户和组。

如果想要 impala 和 yarn 和 llama 合作，需要把 impala 用户加入 hdfs 组。

impala 在执行 drop table 操作时，需要把文件移到到 hdfs 的回收站，所以你需要创建一个 hdfs 的目录 /user/impala，并将其设置为impala 用户可写。同样的，impala 需要读取 hive 数据仓库下的数据，故需要把 impala 用户加入 hive 组。

impala 不能以 root 用户运行，因为 root 用户不允许直接读。

创建 impala 用户家目录并设置权限：

sudo -u hdfs hadoop fs -mkdir /user/impala
sudo -u hdfs hadoop fs -chown impala /user/impala

查看 impala 用户所属的组：

# groups impala
impala : impala hadoop hdfs hive

由上可知，impala 用户是属于 imapal、hadoop、hdfs、hive 用户组的。

2.4启动服务

在 74节点启动：

# service impala-state-store start
# service impala-catalog start

2.5使用impala-shell

使用impala-shell启动impala shell，连接 74，并刷新元数据

#impala-shell 
starting impala shell without kerberos authentication
connected to bd-dev-hadoop-70:21000
server version: impalad version 2.3.0-cdh5.5.1 release (build 73bf5bc5afbb47aa7eab06cfbf6023ba8cb74f3c)
***********************************************************************************
welcome to the impala shell. copyright (c) 2015 cloudera, inc. all rights reserved.
(impala shell v2.3.0-cdh5.5.1 (73bf5bc) built on wed dec 2 10:39:33 pst 2015)
after running a query, type summary to see a summary of where time was spent.
***********************************************************************************
[bd-dev-hadoop-70:21000] > invalidate metadata;

当在 hive 中创建表之后，第一次启动 impala-shell 时，请先执行 invalidate metadata 语句以便 impala 识别出新创建的表(在 impala 1.2 及以上版本，你只需要在一个节点上运行 invalidate metadata ，而不是在所有的 impala 节点上运行)。

你也可以添加一些其他参数，查看有哪些参数：

#impala-shell -h
usage: impala_shell.py [options]
options:
-h, --help show this help message and exit
-i impalad, --impalad=impalad
<host:port> of impalad to connect to
[default: bd-dev-hadoop-70:21000]
-q query, --query=query
execute a query without the shell [default: none]
-f query_file, --query_file=query_file
execute the queries in the query file, delimited by ;
[default: none]
-k, --kerberos connect to a kerberized impalad [default: false]
-o output_file, --output_file=output_file
if set, query results are written to the given file.
results from multiple semicolon-terminated queries
will be appended to the same file [default: none]
-b, --delimited output rows in delimited mode [default: false]
--print_header print column names in delimited mode when pretty-
printed. [default: false]
--output_delimiter=output_delimiter
field delimiter to use for output in delimited mode
[default: \t]
-s kerberos_service_name, --kerberos_service_name=kerberos_service_name
service name of a kerberized impalad [default: impala]
-v, --verbose verbose output [default: true]
-p, --show_profiles always display query profiles after execution
[default: false]
--quiet disable verbose output [default: false]
-v, --version print version information [default: false]
-c, --ignore_query_failure
continue on query failure [default: false]
-r, --refresh_after_connect
refresh impala catalog after connecting
[default: false]
-d default_db, --database=default_db
issues a use database command on startup
[default: none]
-l, --ldap use ldap to authenticate with impala. impala must be
configured to allow ldap authentication.
[default: false]
-u user, --user=user user to authenticate with. [default: root]
--ssl connect to impala via ssl-secured connection
[default: false]
--ca_cert=ca_cert full path to certificate file used to authenticate
impala's ssl certificate. may either be a copy of
impala's certificate (for self-signed certs) or the
certificate of a trusted third-party ca. if not set,
but ssl is enabled, the shell will not verify impala's
server certificate [default: none]
--config_file=config_file
specify the configuration file to load options. file
must have case-sensitive '[impala]' header. specifying
this option within a config file will have no effect.
only specify this as a option in the commandline.
[default: /root/.impalarc]
--live_summary print a query summary every 1s while the query is
running. [default: false]
--live_progress print a query progress every 1s while the query is
running. [default: false]
--auth_creds_ok_in_clear
if set, ldap authentication may be used with an
insecure connection to impala. warning: authentication
credentials will therefore be sent unencrypted, and
may be vulnerable to attack. [default: none]

使用 impala 导出数据：

impala-shell -i '172.16.57.74:21000' -r -q "select * from test" -b --output_delimiter="\t" -o result.txt

以上所述是小编给大家介绍的yum安装cdh5.5 hive、impala的过程详解，希望对大家有所帮助

上一篇：当初朱元璋参加农民起义能够成功为什么李自成会失败呢

下一篇： Vim中特殊字符的读写详解