跳到主要内容

TPC-H Benchmark

SelectDB Cloud 致力于提供极致性能、简单易用的云上实时数据分析服务,在宽表聚合、多表关联和高并发点查等场景下均具有优异的性能表现。本文将介绍 SelectDB 在 TPC-H 标准测试上的测试方法和测试结果。

概述

TPC-H 是一个决策支持基准(Decision Support Benchmark),它由一套面向业务的特别查询和并发数据修改组成,使用的数据具有广泛的行业相关性。该基准测试通过一系列的查询操作,来评估数据库系统在处理复杂查询和数据挖掘任务时的性能。

说明

  • 本文 TPC-H 的实现基于 TPC-H 的基准测试,并不符合 TPC-H 基准测试的所有要求。本测试结果不能等同于完全遵守 TPC-H 测试规范所获得的测试结果,因此不能与完全遵守该测试规范获得的测试结果进行对比。
  • TPC-H 在内的标准测试集通常和实际业务场景差距较大,并且部分测试会针对测试集进行参数调优。所以标准测试集的测试结果仅能反映数据库在特定场景下的性能表现。建议使用实际业务数据进行进一步的测试。

测试环境

  • 数据库环境

    环境配置项配置说明
    地域和可用区华东1(杭州)地域,可用区K
    规格64核 512GB
    磁盘800GB 高性能云硬盘
    SelectDB Cloud 内核版本3.0.6
  • 客户端环境

    环境配置项配置说明
    下载测试工具的设备云服务器ECS实例
    地域和可用区华东1(杭州)地域
    实例规格ecs.g7.2xlarge
    操作系统Ubuntu 22.04.1 LTS
    网络用户专有网络(VPC)

测试数据集

整个测试模拟生成 TPC-H 100GB 和 500GB 的数据并导入 SelectDB 进行测试,下面是测试 100GB 数据表的相关及数据量。

TPC-H表名行数导入后大小备注
REGION5400KB区域表
NATION257.714KB国家表
SUPPLIER100万85.528MB供应商表
PART2000万752.330MB零部件表
PARTSUPP8000万4.375GB零部件供应表
CUSTOMER1500万1.317GB客户表
ORDERS1.5亿6.301GB订单表
LINEITEM6亿20.882GB订单明细表

测试步骤

如下介绍进行测试所需要的前置工作和测试步骤。涉及的脚本获取请参见测试工具

步骤一:安装 unzip 工具

  1. 安装 unzip,示例如下:

    sudo apt install unzip

    这个命令将会安装 unzip 以及它的所有依赖项。

  2. 安装完成后,可以通过运行以下命令来验证 unzip 是否已成功安装。

    unzip --help

步骤二:下载安装 TPC-H 数据生成工具

从上述脚本库中获取脚本后,解压脚本文件并进入对应目录,执行以下指令,下载并编译 tpch-dbgen 工具,示例如下:

tar -zxvf yaochi_performance_tool.tar.gz
cd ./yaochi_performance_tool/tpch-tools/bin
bash build-tpch-dbgen.sh

安装成功后,将在TPC-H_Tools_v3.0.0/目录下生成dbgen二进制文件。

步骤三:生成 TPC-H 测试集

在安装测试工具目录执行以下脚本生成 TPC-H 数据集,示例如下:

cd ./yaochi_performance_tool/tpch-tools/bin
bash gen-tpch-data.sh

数据会以.tbl为后缀在tpch-data/目录下生成,默认情况下的文件总大小约 100GB。生成时间可能在数分钟到 1 小时不等。

步骤三:建表

  1. 准备doris-cluster.conf文件

    在调用导入脚本前,需要将测试数据库的连接信息写在doris-cluster.conf文件中。文件位置在tpch-tools/conf/目录下,内容包括连接集群的地址、HTTP 端口、用户名、密码和待导入数据的 DB。

    export FE_HOST="xxx"
    export FE_HTTP_PORT="8080"
    export FE_QUERY_PORT="9030"
    export USER="root"
    export PASSWORD='xxx'
    export DB="tpch1"

    说明 您可以在 SelectD Cloud 平台上选择仓库,点击 连接 > 查看当前仓库的 私有连接地址 (或 公网连接地址 )以及 HTTP协议端口,然后替换上述脚本中的变量。

  2. 创建TPC-H表。

    ./yaochi_performance_tool/tpch-tools/in目录下执行脚本来自动创建测试用表。

    bash create-tpch-tables.sh

步骤四:导入数据

./yaochi_performance_tool/tpch-tools/bin目录下执行脚本完成测试集数据导入。

bash ./load-tpch-data.sh

步骤五:检查导入数据

按照上述流程和参数执行的场合,数据量应和上文[测试数据集]给出的生成数据的行数一致。

SELECT  COUNT(*) FROM lineitem;
SELECT COUNT(*) FROM orders;
SELECT COUNT(*) FROM partsupp;
SELECT COUNT(*) FROM part;
SELECT COUNT(*) FROM customer;
SELECT COUNT(*) FROM supplier;
SELECT COUNT(*) FROM nation;
SELECT COUNT(*) FROM region;
SELECT COUNT(*) FROM revenue0;

步骤六:查询测试

  • 执行查询脚本

    执行下面的命令完成查询测试。

    ./run-tpch-queries.sh

    测试SQL详情请参见TPCH-Query-SQL

    说明 目前 SelectDB 的查询优化器和统计信息功能仍有提升空间,所以我们在 TPC-H 中重写了一些查询以适应SelectDB的执行框架,但不影响结果的正确性。

  • 单个 SQL 执行

    下面是本次测试时使用的 SQL 语句,您也可以从代码库中获取最新的查询语句,最新测试查询语句请参见TPC-H 测试查询语句

    --ENV config
    set global experimental_enable_nereids_planner=true;
    set global experimental_enable_pipeline_engine=true;
    set global enable_runtime_filter_prune=false;
    set global runtime_filter_wait_time_ms=10000;
    set global enable_fallback_to_original_planner=false;
    set global query_timeout=1000;

    --Q1
    select
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
    from
    lineitem
    where
    l_shipdate <= date '1998-12-01' - interval '90' day
    group by
    l_returnflag,
    l_linestatus
    order by
    l_returnflag,
    l_linestatus;

    --Q2
    select
    s_acctbal,
    s_name,
    n_name,
    p_partkey,
    p_mfgr,
    s_address,
    s_phone,
    s_comment
    from
    partsupp join
    (
    select
    ps_partkey as a_partkey,
    min(ps_supplycost) as a_min
    from
    partsupp,
    part,
    supplier,
    nation,
    region
    where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'
    and p_size = 15
    and p_type like '%BRASS'
    group by a_partkey
    ) A on ps_partkey = a_partkey and ps_supplycost=a_min ,
    part,
    supplier,
    nation,
    region
    where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = 15
    and p_type like '%BRASS'
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'EUROPE'

    order by
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey
    limit 100;

    --Q3
    select l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
    from
    (
    select l_orderkey, l_extendedprice, l_discount, o_orderdate, o_shippriority, o_custkey from
    lineitem join orders
    where l_orderkey = o_orderkey
    and o_orderdate < date '1995-03-15'
    and l_shipdate > date '1995-03-15'
    ) t1 join customer c
    on c.c_custkey = t1.o_custkey
    where c_mktsegment = 'BUILDING'
    group by
    l_orderkey,
    o_orderdate,
    o_shippriority
    order by
    revenue desc,
    o_orderdate
    limit 10;

    --Q4
    select o_orderpriority,
    count(*) as order_count
    from
    (
    select
    *
    from
    lineitem
    where l_commitdate < l_receiptdate
    ) t1
    right semi join orders
    on t1.l_orderkey = o_orderkey
    where
    o_orderdate >= date '1993-07-01'
    and o_orderdate < date '1993-07-01' + interval '3' month
    group by
    o_orderpriority
    order by
    o_orderpriority;

    --Q5
    select n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
    from
    customer,
    orders,
    lineitem,
    supplier,
    nation,
    region
    where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'ASIA'
    and o_orderdate >= date '1994-01-01'
    and o_orderdate < date '1994-01-01' + interval '1' year
    group by
    n_name
    order by
    revenue desc;

    --Q6
    select sum(l_extendedprice * l_discount) as revenue
    from
    lineitem
    where
    l_shipdate >= date '1994-01-01'
    and l_shipdate < date '1994-01-01' + interval '1' year
    and l_discount between .06 - 0.01 and .06 + 0.01
    and l_quantity < 24;

    --Q7
    select supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
    from
    (
    select
    n1.n_name as supp_nation,
    n2.n_name as cust_nation,
    extract(year from l_shipdate) as l_year,
    l_extendedprice * (1 - l_discount) as volume
    from
    supplier,
    lineitem,
    orders,
    customer,
    nation n1,
    nation n2
    where
    s_suppkey = l_suppkey
    and o_orderkey = l_orderkey
    and c_custkey = o_custkey
    and s_nationkey = n1.n_nationkey
    and c_nationkey = n2.n_nationkey
    and (
    (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
    or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
    )
    and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
    group by
    supp_nation,
    cust_nation,
    l_year
    order by
    supp_nation,
    cust_nation,
    l_year;

    --Q8

    select o_year,
    sum(case
    when nation = 'BRAZIL' then volume
    else 0
    end) / sum(volume) as mkt_share
    from
    (
    select
    extract(year from o_orderdate) as o_year,
    l_extendedprice * (1 - l_discount) as volume,
    n2.n_name as nation
    from
    lineitem,
    orders,
    customer,
    supplier,
    part,
    nation n1,
    nation n2,
    region
    where
    p_partkey = l_partkey
    and s_suppkey = l_suppkey
    and l_orderkey = o_orderkey
    and o_custkey = c_custkey
    and c_nationkey = n1.n_nationkey
    and n1.n_regionkey = r_regionkey
    and r_name = 'AMERICA'
    and s_nationkey = n2.n_nationkey
    and o_orderdate between date '1995-01-01' and date '1996-12-31'
    and p_type = 'ECONOMY ANODIZED STEEL'
    ) as all_nations
    group by
    o_year
    order by
    o_year;

    --Q9
    select nation,
    o_year,
    sum(amount) as sum_profit
    from
    (
    select
    n_name as nation,
    extract(year from o_orderdate) as o_year,
    l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
    from
    lineitem join orders on o_orderkey = l_orderkey
    join[shuffle] part on p_partkey = l_partkey
    join[shuffle] partsupp on ps_partkey = l_partkey
    join[shuffle] supplier on s_suppkey = l_suppkey
    join[broadcast] nation on s_nationkey = n_nationkey
    where
    ps_suppkey = l_suppkey and
    p_name like '%green%'
    ) as profit
    group by
    nation,
    o_year
    order by
    nation,
    o_year desc;

    --Q10
    select c_custkey,
    c_name,
    sum(t1.l_extendedprice * (1 - t1.l_discount)) as revenue,
    c_acctbal,
    n_name,
    c_address,
    c_phone,
    c_comment
    from
    customer,
    (
    select o_custkey,l_extendedprice,l_discount from lineitem, orders
    where l_orderkey = o_orderkey
    and o_orderdate >= date '1993-10-01'
    and o_orderdate < date '1993-10-01' + interval '3' month
    and l_returnflag = 'R'
    ) t1,
    nation
    where
    c_custkey = t1.o_custkey
    and c_nationkey = n_nationkey
    group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
    order by
    revenue desc
    limit 20;

    --Q11
    select ps_partkey,
    sum(ps_supplycost * ps_availqty) as value
    from
    partsupp,
    (
    select s_suppkey
    from supplier, nation
    where s_nationkey = n_nationkey and n_name = 'GERMANY'
    ) B
    where
    ps_suppkey = B.s_suppkey
    group by
    ps_partkey having
    sum(ps_supplycost * ps_availqty) > (
    select
    sum(ps_supplycost * ps_availqty) * 0.000002
    from
    partsupp,
    (select s_suppkey
    from supplier, nation
    where s_nationkey = n_nationkey and n_name = 'GERMANY'
    ) A
    where
    ps_suppkey = A.s_suppkey
    )
    order by
    value desc;

    --Q12
    select l_shipmode,
    sum(case
    when o_orderpriority = '1-URGENT'
    or o_orderpriority = '2-HIGH'
    then 1
    else 0
    end) as high_line_count,
    sum(case
    when o_orderpriority <> '1-URGENT'
    and o_orderpriority <> '2-HIGH'
    then 1
    else 0
    end) as low_line_count
    from
    orders,
    lineitem
    where
    o_orderkey = l_orderkey
    and l_shipmode in ('MAIL', 'SHIP')
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '1994-01-01'
    and l_receiptdate < date '1994-01-01' + interval '1' year
    group by
    l_shipmode
    order by
    l_shipmode;

    --Q13
    select c_count,
    count(*) as custdist
    from
    (
    select
    c_custkey,
    count(o_orderkey) as c_count
    from
    orders right outer join customer on
    c_custkey = o_custkey
    and o_comment not like '%special%requests%'
    group by
    c_custkey
    ) as c_orders
    group by
    c_count
    order by
    custdist desc,
    c_count desc;

    --Q14
    select 100.00 * sum(case
    when p_type like 'PROMO%'
    then l_extendedprice * (1 - l_discount)
    else 0
    end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
    from
    part,
    lineitem
    where
    l_partkey = p_partkey
    and l_shipdate >= date '1995-09-01'
    and l_shipdate < date '1995-09-01' + interval '1' month;

    --Q15
    select s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
    from
    supplier,
    revenue0
    where
    s_suppkey = supplier_no
    and total_revenue = (
    select
    max(total_revenue)
    from
    revenue0
    )
    order by
    s_suppkey;

    --Q16
    select p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
    from
    partsupp,
    part
    where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#45'
    and p_type not like 'MEDIUM POLISHED%'
    and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
    and ps_suppkey not in (
    select
    s_suppkey
    from
    supplier
    where
    s_comment like '%Customer%Complaints%'
    )
    group by
    p_brand,
    p_type,
    p_size
    order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size;

    --Q17
    select sum(l_extendedprice) / 7.0 as avg_yearly
    from
    lineitem join [broadcast]
    part p1 on p1.p_partkey = l_partkey
    where
    p1.p_brand = 'Brand#23'
    and p1.p_container = 'MED BOX'
    and l_quantity < (
    select
    0.2 * avg(l_quantity)
    from
    lineitem join [broadcast]
    part p2 on p2.p_partkey = l_partkey
    where
    l_partkey = p1.p_partkey
    and p2.p_brand = 'Brand#23'
    and p2.p_container = 'MED BOX'
    );

    --Q18
    select c_name,
    c_custkey,
    t3.o_orderkey,
    t3.o_orderdate,
    t3.o_totalprice,
    sum(t3.l_quantity)
    from
    customer join
    (
    select * from
    lineitem join
    (
    select * from
    orders left semi join
    (
    select
    l_orderkey
    from
    lineitem
    group by
    l_orderkey having sum(l_quantity) > 300
    ) t1
    on o_orderkey = t1.l_orderkey
    ) t2
    on t2.o_orderkey = l_orderkey
    ) t3
    on c_custkey = t3.o_custkey
    group by
    c_name,
    c_custkey,
    t3.o_orderkey,
    t3.o_orderdate,
    t3.o_totalprice
    order by
    t3.o_totalprice desc,
    t3.o_orderdate
    limit 100;

    --Q19
    select sum(l_extendedprice* (1 - l_discount)) as revenue
    from
    lineitem,
    part
    where
    (
    p_partkey = l_partkey
    and p_brand = 'Brand#12'
    and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
    and l_quantity >= 1 and l_quantity <= 1 + 10
    and p_size between 1 and 5
    and l_shipmode in ('AIR', 'AIR REG')
    and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
    p_partkey = l_partkey
    and p_brand = 'Brand#23'
    and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
    and l_quantity >= 10 and l_quantity <= 10 + 10
    and p_size between 1 and 10
    and l_shipmode in ('AIR', 'AIR REG')
    and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
    p_partkey = l_partkey
    and p_brand = 'Brand#34'
    and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
    and l_quantity >= 20 and l_quantity <= 20 + 10
    and p_size between 1 and 15
    and l_shipmode in ('AIR', 'AIR REG')
    and l_shipinstruct = 'DELIVER IN PERSON'
    );

    --Q20
    select s_name, s_address from
    supplier left semi join
    (
    select * from
    (
    select l_partkey,l_suppkey, 0.5 * sum(l_quantity) as l_q
    from lineitem
    where l_shipdate >= date '1994-01-01'
    and l_shipdate < date '1994-01-01' + interval '1' year
    group by l_partkey,l_suppkey
    ) t2 join
    (
    select ps_partkey, ps_suppkey, ps_availqty
    from partsupp left semi join part
    on ps_partkey = p_partkey and p_name like 'forest%'
    ) t1
    on t2.l_partkey = t1.ps_partkey and t2.l_suppkey = t1.ps_suppkey
    and t1.ps_availqty > t2.l_q
    ) t3
    on s_suppkey = t3.ps_suppkey
    join nation
    where s_nationkey = n_nationkey
    and n_name = 'CANADA'
    order by s_name;

    --Q21
    select s_name, count(*) as numwait
    from
    lineitem l2 right semi join
    (
    select * from
    lineitem l3 right anti join
    (
    select * from
    orders join lineitem l1 on l1.l_orderkey = o_orderkey and o_orderstatus = 'F'
    join
    (
    select * from
    supplier join nation
    where s_nationkey = n_nationkey
    and n_name = 'SAUDI ARABIA'
    ) t1
    where t1.s_suppkey = l1.l_suppkey and l1.l_receiptdate > l1.l_commitdate
    ) t2
    on l3.l_orderkey = t2.l_orderkey and l3.l_suppkey <> t2.l_suppkey and l3.l_receiptdate > l3.l_commitdate
    ) t3
    on l2.l_orderkey = t3.l_orderkey and l2.l_suppkey <> t3.l_suppkey

    group by
    t3.s_name
    order by
    numwait desc,
    t3.s_name
    limit 100;

    --Q22
    with tmp as (select
    avg(c_acctbal) as av
    from
    customer
    where
    c_acctbal > 0.00
    and substring(c_phone, 1, 2) in
    ('13', '31', '23', '29', '30', '18', '17'))

    select cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
    from
    (
    select
    substring(c_phone, 1, 2) as cntrycode,
    c_acctbal
    from
    orders right anti join customer c on o_custkey = c.c_custkey join tmp on c.c_acctbal > tmp.av
    where
    substring(c_phone, 1, 2) in
    ('13', '31', '23', '29', '30', '18', '17')
    ) as custsale
    group by
    cntrycode
    order by
    cntrycode;

步骤七:测试结果

以下将给出 TPCH 100GB 和 500GB 的测试结果:

QueryTPCH 100GB(s)TPCH 500GB(s)
Q11.7410.04
Q20.070.19
Q30.343.43
Q40.191.1
Q50.817.52
Q60.030.15
Q70.545.74
Q80.262.56
Q92.6218.44
Q100.915.45
Q110.080.36
Q120.090.47
Q131.326.6
Q140.120.59
Q150.180.85
Q160.281.17
Q170.10.45
Q181.79.94
Q190.181.9
Q200.390.62
Q210.657.23
Q220.191.04
合计12.7985.84