TPC-H Benchmark
SelectDB Cloud 致力于提供极致性能、简单易用的云上实时数据分析服务,在宽表聚合、多表关联和高并发点查等场景下均具有优异的性能表现。本文将介绍 SelectDB 在 TPC-H 标准测试上的测试方法和测试结果。
概述
TPC-H 是一个决策支持基准(Decision Support Benchmark),它由一套面向业务的特别查询和并发数据修改组成,使用的数据具有广泛的行业相关性。该基准测试通过一系列的查询操作,来评估数据库系统在处理复杂查询和数据挖掘任务时的性能。
说明
- 本文 TPC-H 的实现基于 TPC-H 的基准测试,并不符合 TPC-H 基准测试的所有要求。本测试结果不能等同于完全遵守 TPC-H 测试规范所获得的测试结果,因此不能与完全遵守该测试规范获得的测试结果进行对比。
- TPC-H 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议使用实际业务数据进行进一步的测试。
测试环境
-
数据库环境
环境配置项 配置说明 地域和可用区 华东1(杭州)地域,可用区K 规格 64核 512GB 磁盘 800GB 高性能云硬盘 SelectDB Cloud 内核版本 3.0.6 -
客户端环境
环境配置项 配置说明 下载测试工具的设备 云服务器ECS实例 地域和可用区 华东1(杭州)地域 实例规格 ecs.g7.2xlarge 操作系统 Ubuntu 22.04.1 LTS 网络 用户专有网络(VPC)
测试数据集
整个测试模拟生成 TPC-H 100GB 和 500GB 的数据并导入 SelectDB 进行测试,下面是测试 100GB 数据表的相关及数据量。
TPC-H表名 | 行数 | 导入后大小 | 备注 |
---|---|---|---|
REGION | 5 | 400KB | 区域表 |
NATION | 25 | 7.714KB | 国家表 |
SUPPLIER | 100万 | 85.528MB | 供应商表 |
PART | 2000万 | 752.330MB | 零部件表 |
PARTSUPP | 8000万 | 4.375GB | 零部件供应表 |
CUSTOMER | 1500万 | 1.317GB | 客户表 |
ORDERS | 1.5亿 | 6.301GB | 订单表 |
LINEITEM | 6亿 | 20.882GB | 订单明细表 |
测试步骤
如下介绍进行测试所需要的前置工作和测试步骤。涉及的脚本获取请参见测试工具
步骤一:安装 unzip 工具
-
安装 unzip,示例如下:
sudo apt install unzip
这个命令将会安装 unzip 以及它的所有依赖项。
-
安装完成后,可以通过运行以下命令来验证 unzip 是否已成功安装。
unzip --help
步骤二:下载安装 TPC-H 数据生成工具
从上述脚本库中获取脚本后,解压脚本文件并进入对应目录,执行以下指令,下载并编译 tpch-dbgen 工具,示例如下:
tar -zxvf yaochi_performance_tool.tar.gz
cd ./yaochi_performance_tool/tpch-tools/bin
bash build-tpch-dbgen.sh
安装成功后,将在TPC-H_Tools_v3.0.0/
目录下生成dbgen
二进制文件。
步骤三:生成 TPC-H 测试集
在安装测试工具目录执行以下脚本生成 TPC-H 数据集,示例如下:
cd ./yaochi_performance_tool/tpch-tools/bin
bash gen-tpch-data.sh
数据会以.tbl
为后缀在tpch-data/
目录下生成,默认情况下的文件总大小约 100GB。生成时间可能在数分钟到 1 小时不等。
步骤三:建表
-
准备
doris-cluster.conf
文件在调用导入脚本前,需要将测试数据库的连接信息写在
doris-cluster.conf
文件中。文件位置在tpch-tools/conf/
目录下,内容包括连接集群的地址、HTTP 端口、用户名、密码和待导入数据的 DB。export FE_HOST="xxx"
export FE_HTTP_PORT="8080"
export FE_QUERY_PORT="9030"
export USER="root"
export PASSWORD='xxx'
export DB="tpch1"说明 您可以在 SelectD Cloud 平台上选择仓库,点击 连接 > 查看当前仓库的 私有连接地址 (或 公网连接地址 )以及 HTTP协议端口,然后替换上述脚本中的变量。
-
创建TPC-H表。
在
./yaochi_performance_tool/tpch-tools/in
目录下执行脚本来自动创建测试用表。bash create-tpch-tables.sh
步骤四:导入数据
在./yaochi_performance_tool/tpch-tools/bin
目录下执行脚本完成测试集数据导入。
bash ./load-tpch-data.sh
步骤五:检查导入数据
按照上述流程和参数执行的场合,数据量应和上文[测试数据集]给出的生成数据的行数一致。
SELECT COUNT(*) FROM lineitem;
SELECT COUNT(*) FROM orders;
SELECT COUNT(*) FROM partsupp;
SELECT COUNT(*) FROM part;
SELECT COUNT(*) FROM customer;
SELECT COUNT(*) FROM supplier;
SELECT COUNT(*) FROM nation;
SELECT COUNT(*) FROM region;
SELECT COUNT(*) FROM revenue0;
步骤六:查询测试
-
执行查询脚本
执行下面的命令完成查询测试。
./run-tpch-queries.sh
测试SQL详情请参见TPCH-Query-SQL。
说明 目前 SelectDB 的查询优化器和统计信息功能仍有提升空间,所以我们在 TPC-H 中重写了一些查询以适应SelectDB的执行框架,但不影响结果的正确性。
-
单个 SQL 执行
下面是本次测试时使用的 SQL 语句,您也可以从代码库中获取最新的查询语句,最新测试查询语句请参见TPC-H 测试查询语句。
--ENV config
set global experimental_enable_nereids_planner=true;
set global experimental_enable_pipeline_engine=true;
set global enable_runtime_filter_prune=false;
set global runtime_filter_wait_time_ms=10000;
set global enable_fallback_to_original_planner=false;
set global query_timeout=1000;
--Q1
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '90' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
--Q2
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
partsupp join
(
select
ps_partkey as a_partkey,
min(ps_supplycost) as a_min
from
partsupp,
part,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
and p_size = 15
and p_type like '%BRASS'
group by a_partkey
) A on ps_partkey = a_partkey and ps_supplycost=a_min ,
part,
supplier,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 15
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
limit 100;
--Q3
select l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
(
select l_orderkey, l_extendedprice, l_discount, o_orderdate, o_shippriority, o_custkey from
lineitem join orders
where l_orderkey = o_orderkey
and o_orderdate < date '1995-03-15'
and l_shipdate > date '1995-03-15'
) t1 join customer c
on c.c_custkey = t1.o_custkey
where c_mktsegment = 'BUILDING'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
limit 10;
--Q4
select o_orderpriority,
count(*) as order_count
from
(
select
*
from
lineitem
where l_commitdate < l_receiptdate
) t1
right semi join orders
on t1.l_orderkey = o_orderkey
where
o_orderdate >= date '1993-07-01'
and o_orderdate < date '1993-07-01' + interval '3' month
group by
o_orderpriority
order by
o_orderpriority;
--Q5
select n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and o_orderdate >= date '1994-01-01'
and o_orderdate < date '1994-01-01' + interval '1' year
group by
n_name
order by
revenue desc;
--Q6
select sum(l_extendedprice * l_discount) as revenue
from
lineitem
where
l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
and l_discount between .06 - 0.01 and .06 + 0.01
and l_quantity < 24;
--Q7
select supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
(
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
extract(year from l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) as volume
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and o_orderkey = l_orderkey
and c_custkey = o_custkey
and s_nationkey = n1.n_nationkey
and c_nationkey = n2.n_nationkey
and (
(n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
)
and l_shipdate between date '1995-01-01' and date '1996-12-31'
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year;
--Q8
select o_year,
sum(case
when nation = 'BRAZIL' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
lineitem,
orders,
customer,
supplier,
part,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'AMERICA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'ECONOMY ANODIZED STEEL'
) as all_nations
group by
o_year
order by
o_year;
--Q9
select nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
lineitem join orders on o_orderkey = l_orderkey
join[shuffle] part on p_partkey = l_partkey
join[shuffle] partsupp on ps_partkey = l_partkey
join[shuffle] supplier on s_suppkey = l_suppkey
join[broadcast] nation on s_nationkey = n_nationkey
where
ps_suppkey = l_suppkey and
p_name like '%green%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
--Q10
select c_custkey,
c_name,
sum(t1.l_extendedprice * (1 - t1.l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
(
select o_custkey,l_extendedprice,l_discount from lineitem, orders
where l_orderkey = o_orderkey
and o_orderdate >= date '1993-10-01'
and o_orderdate < date '1993-10-01' + interval '3' month
and l_returnflag = 'R'
) t1,
nation
where
c_custkey = t1.o_custkey
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
limit 20;
--Q11
select ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
(
select s_suppkey
from supplier, nation
where s_nationkey = n_nationkey and n_name = 'GERMANY'
) B
where
ps_suppkey = B.s_suppkey
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * 0.000002
from
partsupp,
(select s_suppkey
from supplier, nation
where s_nationkey = n_nationkey and n_name = 'GERMANY'
) A
where
ps_suppkey = A.s_suppkey
)
order by
value desc;
--Q12
select l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in ('MAIL', 'SHIP')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date '1994-01-01'
and l_receiptdate < date '1994-01-01' + interval '1' year
group by
l_shipmode
order by
l_shipmode;
--Q13
select c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey) as c_count
from
orders right outer join customer on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) as c_orders
group by
c_count
order by
custdist desc,
c_count desc;
--Q14
select 100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
part,
lineitem
where
l_partkey = p_partkey
and l_shipdate >= date '1995-09-01'
and l_shipdate < date '1995-09-01' + interval '1' month;
--Q15
select s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
)
order by
s_suppkey;
--Q16
select p_brand,
p_type,
p_size,
count(distinct ps_suppkey) as supplier_cnt
from
partsupp,
part
where
p_partkey = ps_partkey
and p_brand <> 'Brand#45'
and p_type not like 'MEDIUM POLISHED%'
and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
and ps_suppkey not in (
select
s_suppkey
from
supplier
where
s_comment like '%Customer%Complaints%'
)
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
--Q17
select sum(l_extendedprice) / 7.0 as avg_yearly
from
lineitem join [broadcast]
part p1 on p1.p_partkey = l_partkey
where
p1.p_brand = 'Brand#23'
and p1.p_container = 'MED BOX'
and l_quantity < (
select
0.2 * avg(l_quantity)
from
lineitem join [broadcast]
part p2 on p2.p_partkey = l_partkey
where
l_partkey = p1.p_partkey
and p2.p_brand = 'Brand#23'
and p2.p_container = 'MED BOX'
);
--Q18
select c_name,
c_custkey,
t3.o_orderkey,
t3.o_orderdate,
t3.o_totalprice,
sum(t3.l_quantity)
from
customer join
(
select * from
lineitem join
(
select * from
orders left semi join
(
select
l_orderkey
from
lineitem
group by
l_orderkey having sum(l_quantity) > 300
) t1
on o_orderkey = t1.l_orderkey
) t2
on t2.o_orderkey = l_orderkey
) t3
on c_custkey = t3.o_custkey
group by
c_name,
c_custkey,
t3.o_orderkey,
t3.o_orderdate,
t3.o_totalprice
order by
t3.o_totalprice desc,
t3.o_orderdate
limit 100;
--Q19
select sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#12'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity >= 1 and l_quantity <= 1 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 10 and l_quantity <= 10 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#34'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity >= 20 and l_quantity <= 20 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
--Q20
select s_name, s_address from
supplier left semi join
(
select * from
(
select l_partkey,l_suppkey, 0.5 * sum(l_quantity) as l_q
from lineitem
where l_shipdate >= date '1994-01-01'
and l_shipdate < date '1994-01-01' + interval '1' year
group by l_partkey,l_suppkey
) t2 join
(
select ps_partkey, ps_suppkey, ps_availqty
from partsupp left semi join part
on ps_partkey = p_partkey and p_name like 'forest%'
) t1
on t2.l_partkey = t1.ps_partkey and t2.l_suppkey = t1.ps_suppkey
and t1.ps_availqty > t2.l_q
) t3
on s_suppkey = t3.ps_suppkey
join nation
where s_nationkey = n_nationkey
and n_name = 'CANADA'
order by s_name;
--Q21
select s_name, count(*) as numwait
from
lineitem l2 right semi join
(
select * from
lineitem l3 right anti join
(
select * from
orders join lineitem l1 on l1.l_orderkey = o_orderkey and o_orderstatus = 'F'
join
(
select * from
supplier join nation
where s_nationkey = n_nationkey
and n_name = 'SAUDI ARABIA'
) t1
where t1.s_suppkey = l1.l_suppkey and l1.l_receiptdate > l1.l_commitdate
) t2
on l3.l_orderkey = t2.l_orderkey and l3.l_suppkey <> t2.l_suppkey and l3.l_receiptdate > l3.l_commitdate
) t3
on l2.l_orderkey = t3.l_orderkey and l2.l_suppkey <> t3.l_suppkey
group by
t3.s_name
order by
numwait desc,
t3.s_name
limit 100;
--Q22
with tmp as (select
avg(c_acctbal) as av
from
customer
where
c_acctbal > 0.00
and substring(c_phone, 1, 2) in
('13', '31', '23', '29', '30', '18', '17'))
select cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
(
select
substring(c_phone, 1, 2) as cntrycode,
c_acctbal
from
orders right anti join customer c on o_custkey = c.c_custkey join tmp on c.c_acctbal > tmp.av
where
substring(c_phone, 1, 2) in
('13', '31', '23', '29', '30', '18', '17')
) as custsale
group by
cntrycode
order by
cntrycode;
步骤七:测试结果
以下将给出 TPCH 100GB 和 500GB 的测试结果:
Query | TPCH 100GB(s) | TPCH 500GB(s) |
---|---|---|
Q1 | 1.74 | 10.04 |
Q2 | 0.07 | 0.19 |
Q3 | 0.34 | 3.43 |
Q4 | 0.19 | 1.1 |
Q5 | 0.81 | 7.52 |
Q6 | 0.03 | 0.15 |
Q7 | 0.54 | 5.74 |
Q8 | 0.26 | 2.56 |
Q9 | 2.62 | 18.44 |
Q10 | 0.91 | 5.45 |
Q11 | 0.08 | 0.36 |
Q12 | 0.09 | 0.47 |
Q13 | 1.32 | 6.6 |
Q14 | 0.12 | 0.59 |
Q15 | 0.18 | 0.85 |
Q16 | 0.28 | 1.17 |
Q17 | 0.1 | 0.45 |
Q18 | 1.7 | 9.94 |
Q19 | 0.18 | 1.9 |
Q20 | 0.39 | 0.62 |
Q21 | 0.65 | 7.23 |
Q22 | 0.19 | 1.04 |
合计 | 12.79 | 85.84 |