欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

sqoop的安装及简单使用

程序员文章站 2022-04-19 11:15:21
...

一、说明

1、sqoop是sql to hadoop的缩写。

2、连接传统关系型数据库和Hadoop的桥梁,把关系型数据库的数据导入到 Hadoop 系统 ( 如 HDFS、HBase 和 Hive) 中; 把数据从 Hadoop 系统里抽取并导出到关系型数据库中;

3、利用MapReduce,批处理方式进行数据传输

4、sqoop有两个版本,sqoop1(1.4.x)和sqoop2(1.99.x);因为官方并不建议在生产环境中使用sqoop2,所以这里使用的是sqoop1;

5、sqoop1底层是通过MapReduce来实现的,而且是只有map没有reduce的;

6、sqoop的执行是要提交yarn的;

7、本篇测试前要启动hdoop(包括yarn),mysql;

sqoop的安装及简单使用

二、sqoop1的安装

1、下载位置

http://mirror.bit.edu.cn/apache/sqoop/1.4.7/

sqoop的安装及简单使用

2、将下载的安装包放到linux某一目录下并解压重命名

[aaa@qq.com sqoop]# pwd
/opt/softWare/sqoop
[aaa@qq.com sqoop]# ls
sqoop-1.4.7  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7

3、配置环境变量

vim /etc/profile
 #sqoop
 export SQOOP_HOME=/opt/softWare/sqoop/sqoop-1.4.7
 export PATH=$PATH:$SQOOP_HOME/bin
source /etc/profile

4、修改sqoop的配置文件

cd conf/
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh
 #Set the path for where zookeper config dir is
 #export ZOOCFGDIR=
 export HADOOP_COMMON_HOME=/opt/softWare/hadoop/hadoop-2.7.3
 export HADOOP_MAPRED_HOME=/opt/softWare/hadoop/hadoop-2.7.3
 export HIVE_HOME=/opt/softWare/hive/apache-hive-2.1.1-bin
 export HBASE_HOME=/opt/softWare/hbase/hbase-1.2.6

在配置文件最后一行增加上面内容;

5、将mysql的驱动包上传到sqoop的lib目录下

mysql-connector-java-5.1.47.jar

6、查看sqoop的命令

[aaa@qq.com bin]# ./sqoop help
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 10:43:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

7、使用sqoop查询mysql中的数据库

[aaa@qq.com bin]# ./sqoop list-databases --connect jdbc:mysql://192.168.230.21:3306/test?characterEncoding=UTF-8 --username root --password '123456'
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 10:44:06 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/06/29 10:44:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/06/29 10:44:06 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/softWare/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/softWare/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
information_schema
hive
ke
mysql
performance_schema
test

8、mysql下的stu的表数据为

mysql> select * from stu;
+----------+------+----------+
| name     | age  | address  |
+----------+------+----------+
| zhangsan |   20 | henan    |
| lisi     |   20 | hebei    |
| wangwu   |   20 | beijing  |
| liuqi    |   20 | shandong |
| xuwei    |   20 | fujian   |
+----------+------+----------+
5 rows in set (0.00 sec)

9、将mysql下的stu表数据导入到hdfs

[aaa@qq.com bin]# ./sqoop import -m 1 --connect jdbc:mysql://192.168.230.21:3306/mysql --username root --password 123456 --table stu --target-dir /user/sqoop/datatest
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/softWare/sqoop/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/06/29 09:52:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/06/29 09:52:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/06/29 09:52:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/06/29 09:52:29 INFO tool.CodeGenTool: Beginning code generation
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/softWare/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/softWare/hbase/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/06/29 09:52:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `stu` AS t LIMIT 1
20/06/29 09:52:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `stu` AS t LIMIT 1
20/06/29 09:52:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/softWare/hadoop/hadoop-2.7.3
Note: /tmp/sqoop-root/compile/de2079aebddc781b7b85a54da470b803/stu.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
20/06/29 09:52:31 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/de2079aebddc781b7b85a54da470b803/stu.jar
20/06/29 09:52:31 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/06/29 09:52:31 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/06/29 09:52:31 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/06/29 09:52:31 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/06/29 09:52:31 INFO mapreduce.ImportJobBase: Beginning import of stu
20/06/29 09:52:31 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/06/29 09:52:32 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/06/29 09:52:32 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.230.21:8032
20/06/29 09:52:46 INFO db.DBInputFormat: Using read commited transaction isolation
20/06/29 09:52:46 INFO mapreduce.JobSubmitter: number of splits:1
20/06/29 09:52:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1593394152340_0001
20/06/29 09:52:48 INFO impl.YarnClientImpl: Submitted application application_1593394152340_0001
20/06/29 09:52:48 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1593394152340_0001/
20/06/29 09:52:48 INFO mapreduce.Job: Running job: job_1593394152340_0001
20/06/29 09:53:00 INFO mapreduce.Job: Job job_1593394152340_0001 running in uber mode : false
20/06/29 09:53:00 INFO mapreduce.Job:  map 0% reduce 0%
20/06/29 09:53:11 INFO mapreduce.Job:  map 100% reduce 0%
20/06/29 09:53:11 INFO mapreduce.Job: Job job_1593394152340_0001 completed successfully
20/06/29 09:53:12 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=138019
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=84
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=7657
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=7657
		Total vcore-milliseconds taken by all map tasks=7657
		Total megabyte-milliseconds taken by all map tasks=7840768
	Map-Reduce Framework
		Map input records=5
		Map output records=5
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=77
		CPU time spent (ms)=990
		Physical memory (bytes) snapshot=114147328
		Virtual memory (bytes) snapshot=2082172928
		Total committed heap usage (bytes)=42663936
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=84
20/06/29 09:53:12 INFO mapreduce.ImportJobBase: Transferred 84 bytes in 39.4567 seconds (2.1289 bytes/sec)
20/06/29 09:53:12 INFO mapreduce.ImportJobBase: Retrieved 5 records.

注意:-m:表明需要使用几个map任务并发执行

10、查看效果

[aaa@qq.com ~]# hdfs dfs -ls /user/sqoop/datatest
Found 2 items
-rw-r--r--   1 root supergroup          0 2020-06-29 09:53 /user/sqoop/datatest/_SUCCESS
-rw-r--r--   1 root supergroup         84 2020-06-29 09:53 /user/sqoop/datatest/part-m-00000
[aaa@qq.com ~]# hdfs dfs -cat /user/sqoop/datatest/part-m-0000
cat: `/user/sqoop/datatest/part-m-0000': No such file or directory
[aaa@qq.com ~]# hdfs dfs -cat /user/sqoop/datatest/part-m-00000
zhangsan,20,henan
lisi,20,hebei
wangwu,20,beijing
liuqi,20,shandong
xuwei,20,fujian
[aaa@qq.com ~]#

sqoop的安装及简单使用