欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

使用idea, sparksql读取hive中的数据

程序员文章站 2022-07-14 15:12:59
...
  1. 将hive下的conf的hive-site.xml配置文件放在resources下;
  2. 在应用 pom.xml中配置jar;
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>

3.代码:

  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .appName(this.getClass.getSimpleName)
      .master("local[2]")
      .config("dfs.client.use.datanode.hostname", "true") //以域名的方式返回 访问 相互通信
      .enableHiveSupport() //启动hive读取配置文件中的
      .getOrCreate()

    val sql = "show tables"
    val sql2 ="select * from employee"
    spark.sql(sql2).show()
    spark.table("employee").show()
    spark.stop();
  }
  1. 结果:
+-------+----+
|user_id|name|
+-------+----+
|      1|小林|
+-------+----+
  1. hive-site.xml 配置
<configuration>
<property>
        <name>hive.exec.scratchdir</name>
        <value>hdfs://hadoop001:9000/tmp/hive</value>
</property>
<property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop001:3306/hivedb?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8</value>
</property>
<property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
</property>
<property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
</property>
<property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
</property>
<property>
        <name>hive.metastore.warehouse.dir</name>
      <value>hdfs://hadoop001:9000/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
</property>
<property>
        <name>javax.jdo.option.Multithreaded</name>
        <value>true</value>
</property>
</configuration>

5.异常整理:

  • 5.1 config(“dfs.client.use.datanode.hostname”, “true”)配置
19/10/17 16:30:40 WARN DFSClient: Failed to connect to /192.168.0.3:50010 for block BP-744454093-192.168.0.3-1567066072363:blk_1073742661_1838, add to deadNodes and continue. 
java.net.ConnectException: Connection timed out: no further information
  • 5.2 pom.xml中插入jar spark-hive_2.11.jar
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.