Hbase Practice
介绍
HBase是开源、分布式、版本化的非关系型(列式)数据库,由Hdfs提供存储、MapReduce进行并行计算,是Google Bigtable的一个开源实现。
非关系型,不要求数据之间有严格的关系,甚至允许同一列的不同功行存储不同类型的数据。
优势:
- 典型的NoSQL
- 容量大, 单表可存储百亿行、百万列,在横向和纵向两个维度插入数据,有很高的弹性。
- 采用LSM树作为内部存储结构(周期性的将小文件合并成大文件,减少对磁盘的访问)
- 列存储,面向列的存储和权限控制
- 稀疏性,关系型DB中,每一个字段类型是事先定义好的,占用固定的存储空间,即使空值也需要占用存储。Hbase中为空的列不占用存储空间。
- 扩展性强,根据表region大小进行分区,存储在集群不同的节点上。热扩展,无需停止现有服务。
- 高可靠性,WAL(Write-Ahead-Log)预写日志将数据操作(CRUD)记录下来,Replication机制根据日志操作来做数据同步。
进程说明: HMaster
- 管理对表的CRUD
- 管理RegionServer的负载均衡,调度Region(分配和移除)
- 处理RegionServer的故障转移
RegionServer
- 处理分配的region
- 处理客户端请求
- 刷新缓存到hdfs
- 处理region分片
- 执行压缩
概念
- Column Family:列族,一组相关的列组成,有相同的名称前缀,类似一个子表;列族是分开存储的;
- Column Qualifier,列标识;
- Cell:单元格,row key + column family + column qualifier 确定一个cell,一个cell保存着同一份数据的多个版本(默认维护3个版本),
- Timestamp:根据时间戳来获取特定版本的数据,默认获取最新的。
连接 HMaster
1
2
3
4
5
6
7
8
9
10
11
[root@hadoop001 ~]# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hbase-1.7.1/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2022-05-12 17:41:43,952 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 1.7.1, r2d9273667e418e7023f9104a830cdcb8233b6f25, Fri Jul 16 00:20:26 PDT 2021
创建表 - create
注意:
- 跟(hive)sql不同,除了关键字,表明、列族要加引号;
- 大小写敏感,如NAME不能写作name;
- 命令行结尾不需要/不能添加分号';'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
hbase(main):002:0> create 'Students','StuInfo', 'Grades'
0 row(s) in 2.5090 seconds
=> Hbase::Table - Students
hbase(main):025:0* exists 'Students'
Table Students does exist
0 row(s) in 0.0060 seconds
hbase(main):026:0> describe 'Students'
Table Students is ENABLED
Students
COLUMN FAMILIES DESCRIPTION
{NAME => 'Grades', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DE
LETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION =>
'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_S
COPE => '0'}
{NAME => 'StuInfo', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_D
ELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION =
> 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_
SCOPE => '0'}
2 row(s) in 0.0260 seconds
hbase(main):021:0> create 'Students2', {NAME=>'StuInfo',VERSIONS=>3}, {NAME=>'Grades', BLOCKCACHE=>true}
0 row(s) in 1.2410 seconds
=> Hbase::Table - Students2
VERSION => 3,表示Cell内的数据可以保存3个版本 BLOCKCACHE => true, 表示读取数据是允许缓存。
修改表 - alter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
hbase(main):028:0> alter 'Students', {NAME=>'Grades', VERSIONS=>4}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9060 seconds
hbase(main):029:0> describe 'Students'
Table Students is ENABLED
Students
COLUMN FAMILIES DESCRIPTION
{NAME => 'Grades', BLOOMFILTER => 'ROW', VERSIONS => '4', IN_MEMORY => 'false', KEEP_DE
LETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION =>
'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_S
COPE => '0'}
{NAME => 'StuInfo', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_D
ELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION =
> 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_
SCOPE => '0'}
2 row(s) in 0.0250 seconds
增加/删除列族 - alter & delete
1
2
3
4
5
6
7
8
9
10
11
hbase(main):030:0> alter 'Students', 'Hobby'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.8800 seconds
hbase(main):032:0> alter 'Students', 'delete' => 'Hobby'
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.1330 seconds
插入数据 - put
一条put语句插入或修改一个记录、一个列族里面的一个字段;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
hbase(main):052:0> put 'Students', '0001', 'StuInfo:Name', 'Tome', 1
0 row(s) in 0.0470 seconds
hbase(main):053:0> put 'Students', '0001', 'StuInfo:Age', '18'
0 row(s) in 0.0100 seconds
hbase(main):054:0> put 'Students', '0001', 'StuInfo:Sex', 'Male'
0 row(s) in 0.0050 seconds
hbase(main):055:0> put 'Students', '0001', 'StuInfo:Mobile', '13811112222'
0 row(s) in 0.0040 seconds
hbase(main):056:0> put 'Students', '0001', 'Grades:English', '80'
0 row(s) in 0.0050 seconds
hbase(main):057:0> put 'Students', '0001', 'Grades:Math', '85'
0 row(s) in 0.0050 seconds
hbase(main):058:0> put 'Students', '0001', 'Grades:Chinese', '96'
0 row(s) in 0.0050 seconds
获取数据 - get
获取全部或部分字段
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
hbase(main):059:0> get 'Students', '0001'
COLUMN CELL
Grades:Chinese timestamp=1652351587028, value=96
Grades:English timestamp=1652351573490, value=80
Grades:Math timestamp=1652351579064, value=85
StuInfo:Age timestamp=1652351534074, value=18
StuInfo:Mobile timestamp=1652351560808, value=13811112222
StuInfo:Name timestamp=1, value=Tome
StuInfo:Sex timestamp=1652351544030, value=Male
1 row(s) in 0.0270 seconds
hbase(main):073:0> get 'Students', '0001', {COLUMN=>['StuInfo:Name', 'Grades:English']}
COLUMN CELL
Grades:English timestamp=1652351573490, value=80
StuInfo:Name timestamp=1, value=Tome
1 row(s) in 0.0110 seconds
扫描 & 过滤 - scan & filter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
hbase(main):005:0> scan 'Students'
ROW COLUMN+CELL
0001 column=Grades:Chinese, timestamp=1652351587028, value=96
0001 column=Grades:English, timestamp=1652351573490, value=80
0001 column=Grades:Math, timestamp=1652351579064, value=85
0001 column=StuInfo:Age, timestamp=1652351534074, value=18
0001 column=StuInfo:Mobile, timestamp=1652351560808, value=13811112222
0001 column=StuInfo:Name, timestamp=1, value=Tome
0001 column=StuInfo:Sex, timestamp=1652351544030, value=Male
0002 column=Grades:English, timestamp=2, value=78
0002 column=Grades:Math, timestamp=2, value=83
0002 column=StuInfo:Age, timestamp=2, value=17
0002 column=StuInfo:Name, timestamp=2, value=Marry
0002 column=StuInfo:Sex, timestamp=2, value=Female
2 row(s) in 0.0160 seconds
比较器
- BinaryComparator 匹配完整字节数组
- BinaryPrefixComparator 匹配字节数组前缀
- BitComparator 匹配比特位
- NullComparator 匹配空值
- RegexStringComparator 匹配正则表达式
- SubstringComparator 匹配子字符串
行键过滤器
- RowFilter 配合比较器和运算符,实现行键字符串的比较和过滤
- PrefixFilter 行键前缀比较器,比较行键前缀 scan ‘Student’, FILTER => “PrefixFilter(‘0001’)” 同scan ‘Student’, FILTER => “RowFilter(=,’substring:0001’)”
- KeyOnlyFilter 只对单元格的键进行过滤和显示,不显示值 scan ‘Student’, FILTER => “KeyOnlyFilter()”
- FirstKeyOnlyFilter 只扫描显示相同键的第一个单元格,其键值对会显示出来 scan ‘Student’, FILTER => “FirstKeyOnlyFilter()”
- InclusiveStopFilter 替代 ENDROW 返回终止条件行 scan ‘Student’, { STARTROW => ‘0001’, FIILTER => “InclusiveStopFilter(‘binary:0002’)” }
同scan ‘Student’, { STARTROW => ‘0001’, ENDROW => ‘0003’ }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
hbase(main):006:0> scan 'Students', FILTER=>"PrefixFilter('0001')" ROW COLUMN+CELL 0001 column=Grades:Chinese, timestamp=1652351587028, value=96 0001 column=Grades:English, timestamp=1652351573490, value=80 0001 column=Grades:Math, timestamp=1652351579064, value=85 0001 column=StuInfo:Age, timestamp=1652351534074, value=18 0001 column=StuInfo:Mobile, timestamp=1652351560808, value=13811112222 0001 column=StuInfo:Name, timestamp=1, value=Tome 0001 column=StuInfo:Sex, timestamp=1652351544030, value=Male 1 row(s) in 0.0190 seconds hbase(main):012:0> scan 'Students', FILTER=>"RowFilter(=, 'binary:0001')" ROW COLUMN+CELL 0001 column=Grades:Chinese, timestamp=1652351587028, value=96 0001 column=Grades:English, timestamp=1652351573490, value=80 0001 column=Grades:Math, timestamp=1652351579064, value=85 0001 column=StuInfo:Age, timestamp=1652351534074, value=18 0001 column=StuInfo:Mobile, timestamp=1652351560808, value=13811112222 0001 column=StuInfo:Name, timestamp=1, value=Tome 0001 column=StuInfo:Sex, timestamp=1652351544030, value=Male 1 row(s) in 0.0160 seconds
列族过滤器 列过滤器 描述 示例
- FamilyFilter 列族过滤
- QualifierFilter 列标识过滤器,只显示对应列名的数据
- ColumnPrefixFilter 对列名称的前缀进行过滤
- MultipleColumnPrefixFilter 可以指定多个前缀对列名称过滤
- ColumnRangeFilter 过滤列名称的范围
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
hbase(main):003:0> scan 'Students', FILTER => "FamilyFilter(=, 'substring:StuInfo')" ROW COLUMN+CELL 0001 column=StuInfo:Age, timestamp=1652351534074, value=18 0001 column=StuInfo:Mobile, timestamp=1652351560808, value=13811112222 0001 column=StuInfo:Name, timestamp=1, value=Tome 0001 column=StuInfo:Sex, timestamp=1652351544030, value=Male 0002 column=StuInfo:Age, timestamp=2, value=17 0002 column=StuInfo:Name, timestamp=2, value=Marry 0002 column=StuInfo:Sex, timestamp=2, value=Female 2 row(s) in 0.0160 seconds hbase(main):004:0> scan 'Students', FILTER => "QualifierFilter(=, 'substring:Math')" ROW COLUMN+CELL 0001 column=Grades:Math, timestamp=1652351579064, value=85 0002 column=Grades:Math, timestamp=2, value=83 2 row(s) in 0.0130 seconds
值过滤器
- ValueFilter 值过滤器,找到符合值条件的键值对 同 get ‘Student’, ‘0001’
- SingleColumnValueFilter 在指定的列族和列中进行比较的值过滤器
- SingleColumnValueExcludeFilter 排除匹配成功的值
列出>80分的科目及成绩:
1
2
3
4
5
6
7
hbase(main):030:0> scan 'Students', FILTER=>"(FamilyFilter(=, 'binary:Grades') AND ValueFilter(>,'binary:80'))"
ROW COLUMN+CELL
0001 column=Grades:Chinese, timestamp=1652351587028, value=96
0001 column=Grades:Math, timestamp=1652351579064, value=85
0002 column=Grades:Math, timestamp=2, value=83
2 row(s) in 0.0060 seconds
多个过滤器
1
2
3
4
5
hbase(main):026:0> scan 'Students', FILTER=>"(RowFilter(=, 'binary:0001') AND QualifierFilter(=,'binary:Math'))"
ROW COLUMN+CELL
0001 column=Grades:Math, timestamp=1652351579064, value=85
1 row(s) in 0.0180 seconds
清空数据表 - truncate
1
2
3
4
5
hbase(main):048:0* truncate 'Students'
Truncating 'Students' table (it may take a while):
- Disabling table...
- Truncating table...
0 row(s) in 3.3760 seconds
删除表 - disable & drop
与其他数据库不同,需要先disable库名。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
hbase(main):040:0> drop 'Students'
ERROR: Table Students is enabled. Disable it first.
Here is some help for this command:
Drop the named table. Table must first be disabled:
hbase> drop 't1'
hbase> drop 'ns1:t1'
hbase(main):041:0> disable 'Students'
0 row(s) in 2.2430 seconds
hbase(main):042:0> drop 'Students'
0 row(s) in 1.2410 seconds
hbase(main):043:0> exists 'Students'
Table Students does not exist
0 row(s) in 0.0130 seconds