首页 > 资讯 > 云计算 > 正文

Hive与HBase实现数据互导

16-05-30 来源：[db:作者]

收藏我要投稿

Hive与HBase实现数据互导

建立与HBase的识别表

hive> create table hive_hbase_1(key int,value string)

> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:info")

> TBLPROPERTIES ("hbase.table.name" = "userinfo");

Time taken: 8.896 seconds

hive>

查看Hbase中的表

hbase(main):001:0> list

TABLE

blog

friend

friend02

heroes

heroesIndex

myt

stu

table

tanggao

tanggao11

tanggao111

tanggaozhou

test

userinfo

word

word2

16 row(s) in 0.7760 seconds

=> ["blog", "friend", "friend02", "heroes", "heroesIndex", "myt", "stu", "table", "tanggao", "tanggao11", "tanggao111", "tanggaozhou", "test", "userinfo", "word", "word2"]

hbase(main):002:0>

2.使用sql导入数据

i.预先准备数据在hdfs的user/tg目录下放一个a.txt

1tanggao

2zhousiyuan

3mother

4father

a)新建hive的数据表

hive> create table famaly(id int,name string) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;

Time taken: 0.483 seconds

2.0.0版本会默认到你的hdfs根目录下的user/hadoop用户下找，比如我的是user/tg

hive> load data inpath 'a.txt' overwrite into table famaly;

Loading data to table default.famaly

Time taken: 2.139 seconds

查看信息

hive> select * from famaly;

1tanggao

2zhousiyuan

3mother

4father

Time taken: 2.912 seconds, Fetched: 4 row(s)

查看表结构

hive> desc famaly;

id int

name string

Time taken: 0.549 seconds, Fetched: 2 row(s)

第二种方法查看表结构同上

hive> describe famaly;

id int

name string

Time taken: 0.087 seconds, Fetched: 2 row(s)

hive>

使用sql导入数据到hive_hbase_1

hive> insert overwrite table hive_hbase_1 select * from famaly where id=1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1464498775870_0001, Tracking URL = http://master:8088/proxy/application_1464498775870_0001/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job -kill job_1464498775870_0001

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 22:57:58,093 Stage-0 map = 0%, reduce = 0%

2016-05-28 22:58:28,439 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 3.81 sec

MapReduce Total cumulative CPU time: 3 seconds 810 msec

Ended Job = job_1464498775870_0001

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1 Cumulative CPU: 3.91 sec HDFS Read: 10964 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 910 msec

Time taken: 89.327 seconds

查看数据

会显示刚刚插入的数据

hive> select * from hive_hbase_1;

1tanggao

Time taken: 0.916 seconds, Fetched: 1 row(s)

hive>

登录HBase 查看HBase数据

hbase(main):002:0> scan 'userinfo'

ROW COLUMN+CELL

1 column=cf1:info, timestamp=1464501508097, value=tanggao

1 row(s) in 0.7990 seconds

hbase(main):003:0>

是不是很神奇，在hive中添加的数据已经在Hbase中了

下面再看看在Hbase中插入数据，看看hive中是不是也有了?试试看

hbase(main):003:0> put 'userinfo','fid','cf1:info','tangshaoyan'

0 row(s) in 0.2270 seconds

hbase(main):004:0>

查看hive

hive> select * from hive_hbase_1;

1tanggao

NULLtangshaoyan

Time taken: 0.235 seconds, Fetched: 2 row(s)

hive>

没错，刚刚在hbase中插入的数据，已经在hive里了

hive访问Hbase中已经存在的Hbase表

HBase表报备

：已经存在了heroes表

hbase(main):007:0> scan 'heroes'

ROW COLUMN+CELL

0 column=info:email, timestamp=1463743975381, value=0@qq.com

0 column=info:name, timestamp=1463743975381, value=peter

0 column=info:power, timestamp=1463743975381, value=Idotknow

1 column=info:email, timestamp=1463743975391, value=1@qq.com

1 column=info:name, timestamp=1463743975391, value=hiro

1 column=info:power, timestamp=1463743975391, value=Idotknow

2 column=info:email, timestamp=1463743975396, value=2@qq.com

2 column=info:name, timestamp=1463743975396, value=sylar

2 column=info:power, timestamp=1463743975396, value=Idotknow

3 column=info:email, timestamp=1463743975399, value=3@qq.com

3 column=info:name, timestamp=1463743975399, value=claire

3 column=info:power, timestamp=1463743975399, value=Idotknow

4 column=info:email, timestamp=1463743975403, value=4@qq.com

4 column=info:name, timestamp=1463743975403, value=noah

4 column=info:power, timestamp=1463743975403, value=Idotknow

5 row(s) in 0.2140 seconds

hbase(main):008:0>

使用CREATE EXTERNAL TABLE：

hive> create external table hbase_hive_1(key int,value string)

> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "info:name")

> TBLPROPERTIES("hbase.table.name" = "heroes");

Time taken: 0.424 seconds

hive> select * from hbase_hive_1;

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.222 seconds, Fetched: 5 row(s)

hive>

从上面的操作后，hive已经可以访问HBase中已经存在的原有数据了

三、多列和多列族(Multiple Columns and Families)

hive> CREATE TABLE hive_hbase_add1(key int, value1 string, value2 int, value3 int)

> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:col1,info:col2,city:area")

> TBLPROPERTIES("hbase.table.name" = "student_info");

Time taken: 2.624 seconds

hive> select * from hbase_hive_1;

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.225 seconds, Fetched: 5 row(s)

hive> set hive.cli.print.header=true;

hive> select * from hbase_hive_1;

hbase_hive_1.keyhbase_hive_1.value

0peter

1hiro

2sylar

3claire

4noah

Time taken: 0.203 seconds, Fetched: 5 row(s)

hive> desc hbase_hive_1;

col_namedata_typecomment

key int

value string

Time taken: 0.198 seconds, Fetched: 2 row(s)

hive> insert overwrite table hive_hbase_add1 select key ,value,key+1,value from hbase_hive_1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1464498775870_0002, Tracking URL = http://master:8088/proxy/application_1464498775870_0002/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job -kill job_1464498775870_0002

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 23:48:40,536 Stage-0 map = 0%, reduce = 0%

2016-05-28 23:49:06,565 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 3.52 sec

MapReduce Total cumulative CPU time: 3 seconds 520 msec

Ended Job = job_1464498775870_0002

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1 Cumulative CPU: 3.52 sec HDFS Read: 5082 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 520 msec

_col0_col1_col2_col3

Time taken: 86.323 seconds

hive> insert overwrite table hive_hbase_add1 select key ,value,key+1,key+1000 from hbase_hive_1;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = tg_20160528223128_abf71520-622b-42b5-94c3-3bbb5492b558

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1464498775870_0003, Tracking URL = http://master:8088/proxy/application_1464498775870_0003/

Kill Command = /software/hadoop-2.6.4/bin/hadoop job -kill job_1464498775870_0003

Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0

2016-05-28 23:54:01,682 Stage-0 map = 0%, reduce = 0%

2016-05-28 23:54:24,725 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 4.23 sec

MapReduce Total cumulative CPU time: 4 seconds 230 msec

Ended Job = job_1464498775870_0003

MapReduce Jobs Launched:

Stage-Stage-0: Map: 1 Cumulative CPU: 4.23 sec HDFS Read: 5166 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 4 seconds 230 msec

keyvaluec2c3

Time taken: 72.543 seconds

hive> select * from hive_hbase_add1;

hive_hbase_add1.keyhive_hbase_add1.value1hive_hbase_add1.value2hive_hbase_add1.value3

0peter1NULL

1hiro2NULL

2sylar3NULL

3claire4NULL

4noah5NULL

Time taken: 0.454 seconds, Fetched: 5 row(s)

第四列类型不匹配，没有插进，都是null

下面换为正确的类型

hive> select * from hive_hbase_add1;

hive_hbase_add1.keyhive_hbase_add1.value1hive_hbase_add1.value2hive_hbase_add1.value3

0peter11000

1hiro21001

2sylar31002

3claire41003

4noah51004

Time taken: 0.746 seconds, Fetched: 5 row(s)

hive>

登录HBase查看数据

hbase(main):008:0> list

TABLE

blog

friend

friend02

heroes

heroesIndex

myt

stu

student_info

table

tanggao

tanggao11

tanggao111

tanggaozhou

test

userinfo

17 row(s) in 0.1140 seconds

=> ["blog", "friend", "friend02", "heroes", "heroesIndex", "myt", "stu", "student_info", "table", "tanggao", "tanggao11", "tanggao111", "tanggaozhou", "test", "userinfo", "word", "word2"]

hbase(main):009:0> scan 'student_info'

ROW COLUMN+CELL

0 column=city:area, timestamp=1464504

863521, value=1000

0 column=info:col1, timestamp=1464504

863521, value=peter

0 column=info:col2, timestamp=1464504

863521, value=1

1 column=city:area, timestamp=1464504

863521, value=1001

1 column=info:col1, timestamp=1464504

863521, value=hiro

1 column=info:col2, timestamp=1464504

863521, value=2

2 column=city:area, timestamp=1464504

863521, value=1002

2 column=info:col1, timestamp=1464504

863521, value=sylar

2 column=info:col2, timestamp=1464504

863521, value=3

3 column=city:area, timestamp=1464504

863521, value=1003

3 column=info:col1, timestamp=1464504

863521, value=claire

3 column=info:col2, timestamp=1464504

863521, value=4

4 column=city:area, timestamp=1464504

863521, value=1004

4 column=info:col1, timestamp=1464504

863521, value=noah

4 column=info:col2, timestamp=1464504

863521, value=5

5 row(s) in 0.0720 seconds

hbase(main):010:0>

发现hive中的四列 key int, value1 string, value2 int, value3 int

key对应HBase的行健

value1，value2对应Hbase的列族info的col1,col2

value3 对应Hbase列族city:area

是不是发现了hive中的表，多列存放到hbase少量固定的列簇中。

点击复制链接与好友分享!回本站首页