Hive中静态分区和动态分区

差别

两者主要的差别在于:加载数据的时候,静态分区需要手动设定不同的分区,按分区分别导入数据,;而动态分区不需要指定分区key的值,会根据key对应列的值自动分区写入,如果该列值对应的分区目录还没有创建, 会自动创建并写入数据。

静态分区

创建分区表

create table zhangsiyao.dt_0802_0815 (itime string,uid string,gid string,app_ver string,unet int ,device_type string,device_os string,client_type string,crtv_id int,country string,province string ,city string,isp string,ad_location string,ad_status string,age_gt string,sex_gt string,income_gt string,marital_status_gt string,sponsor_id int,creative_name string,creative_title string,creative_abstract string,category_id int,create_time string,update_time string) partitioned by (dt string);

导入数据

load data local inpath ‘/home/warehouse/user.txt’ overwrite into table teacher partition(work_date=”2016-07-12”);

需要按照分区一个分区一个分区导入数据

动态分区

创建分区表(和静态分区一样)

create table zhangsiyao.dt_0811_0820 (itime string,uid string,gid string,app_ver string,unet int ,device_type string,device_os string,client_type string,crtv_id int,country string,province string ,city string,isp string,ad_location string,ad_status string,age_gt string,sex_gt string,income_gt string,marital_status_gt string,sponsor_id int,creative_name string,creative_title string,creative_abstract string,category_id int,create_time string,update_time string) partitioned by (dt string);

创建数据表(中间数据)

create table zhangsiyao.data_analy_11_20 as select a.itime,a.uid,a.gid,a.app_ver,a.unet ,a.device_type,a.device_os,a.client_type,a.crtv_id,a.country,a.province,a.city,a.isp,a.ad_location,a.ad_status,a.dt,b.age_gt,b.sex_gt,b.income_gt,b.marital_status_gt,c.sponsor_id,c.creative_name,c.creative_title,c.creative_abstract,c.category_id,c.create_time,c.update_time from (select from ad.wireless_ad_org_final where dt>’2017-08-10’ and crtv_id <>-1) a left join (select from user_portraint.focus_user_portraint_profile where dt>’2017-08-10’) b on (a.dt=b.dt and a.client_type=b.client_type and a.uid=b.uid) left join (select * from ad.ad_creative_ods where dt>’2017-08-10’) c on (a.dt=b.dt and a.crtv_id=c.creative_id);

设置动态分区

set hive.exec.dynamic.partition=true;(可通过这个语句查看:set hive.exec.dynamic.partition;)
set hive.exec.dynamic.partition.mode=nonstrict; (strict要求至少有一个静态分区, nonstrict可以都是动态分区)
set hive.exec.max.dynamic.partitions=100000;(如果自动分区数大于这个参数,将会报错)
set hive.exec.max.dynamic.partitions.pernode=100000;

导入数据

insert overwrite table zhangsiyao.dt_0811_0820 partition(dt) select itime,uid,gid,app_ver,unet,device_type,device_os,client_type,crtv_id,country,province,city,isp,ad_location,ad_status,age_gt,sex_gt,income_gt,marital_status_gt,sponsor_id,creative_name,creative_title,creative_abstract,category_id,create_time,update_time,dt from zhangsiyao.data_analy_11_20 where dt<’2017-08-21’;

这里需要注意的是,用select选择数据导入动态分区时,要把关键字放在最后面,因为动态分区默认以最后一个关键字作为分区关键字