ClickHouse 18.12.13-2018-09-10版本新特性解析

栏目: 数据库 · 发布时间: 6年前

ClickHouse 18.12.13-2018-09-10版本新特性解析

ClickHouse的发版速度是众所周知的快

在最近,他们正式发出了18.12.13-2018-09-10版本

相关的CHANGELOG更是多的吓人

为了能够更好的使用新版特性,特做了详细的介绍

其中新特性部分,为人工翻译、校对,毕竟这部分内容是重点,后面为Google翻译

新特性列表(按照GitHub中CHANGELOG顺序)

支持Decimal

  • Added the DECIMAL(digits, scale) data type (Decimal32(scale), Decimal64(scale), Decimal128(scale)). To enable it, use the setting allow_experimental_decimal_type. #2846 #2970 #3008 #3047
SELECT *
FROM data_type_families
WHERE name LIKE '%De%'

┌─name───────┬─case_insensitive─┬─alias_to─┐
 Decimal32                  1           
 Decimal64                  1           
 Decimal128                 1           
 Decimal                    1           
└────────────┴──────────────────┴──────────┘

新的WITH ROLLUP修饰符GROUP BY,替代语法 GROUP BY ROLLUP(…)

  • New WITH ROLLUP modifier for GROUP BY (alternative syntax: GROUP BY ROLLUP(…)). #2948

JOIN查询会把*解析为字段

  • In requests with JOIN, the star character expands to a list of columns in all tables, in compliance with the SQL standard. You can restore the old behavior by setting - asterisk_left_columns_only to 1 on the user configuration level. Winter Zhang

JOIN支持table functions(remote/merge/numbers/url)

  • Added support for JOIN with table functions. Winter Zhang

终端支持tab自动补全

  • Autocomplete by pressing Tab in clickhouse-client. Sergey Shcherbin

终端支持ctrl c取消输入

  • Ctrl+C in clickhouse-client clears a query that was entered. #2877

可指定默认的JOIN行为

  • Added the join_default_strictness setting (values: “, ‘any’, ‘all’). This allows you to not specify ANY or ALL for JOIN. #2982

server log关联查询ID

  • Each line of the server log related to query processing shows the query ID. #2482
2018.09.15 23:01:12.934700 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: (from xx.xx.80.34:37066, user: user) select * from numbers(100)
2018.09.15 23:01:12.934984 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2018.09.15 23:01:12.935029 [ 86 ] <Trace> SystemLog (system.query_log): Flushing system log
2018.09.15 23:01:12.935036 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Debug> executeQuery: Query pipeline:
Expression
 Expression
  Limit
   Numbers

2018.09.15 23:01:12.935391 [ 277 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Trace> ThreadStatus: Thread 277 exited
2018.09.15 23:01:12.935449 [ 87 ] <Trace> SystemLog (system.query_thread_log): Flushing system log
2018.09.15 23:01:12.935517 [ 275 ] {b7a52046-e852-41c9-9e13-54533917bc56} <Information> executeQuery: Read 100 rows, 800.00 B in 0.001 sec., 146631 rows/sec., 1.12 MiB/sec.
2018.09.15 23:01:12.935557 [ 275 ] <Debug> MemoryTracker: Peak memory usage (total): 1.00 MiB.
2018.09.15 23:01:12.935572 [ 275 ] <Information> TCPHandler: Processed in 0.001 sec.

终端可以直接print日志

  • Now you can get query execution logs in clickhouse-client (use the send_logs_level setting). With distributed query processing, logs are cascaded from all the servers. #2482
SELECT *
FROM system.settings
WHERE name = 'send_logs_level'

┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 send_logs_level  none         0  Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' 
└─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 rows in set. Elapsed: 0.002 sec.

xx.xx.xx.xx. :) set send_logs_level = 'trace';

SET send_logs_level = 'trace'

Ok.

0 rows in set. Elapsed: 0.001 sec.

xx.xx.xx.xx. :) select * from system.settings where name = 'send_logs_level' ;

SELECT *
FROM system.settings
WHERE name = 'send_logs_level'

[xx.xx.xx.xx.] 2018.09.15 23:30:51.158294 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: (from 127.0.0.1:59056, user: user) select * from system.settings where name = 'send_logs_level'
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159056 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159152 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Debug> executeQuery: Query pipeline:
Expression
 Expression
  Filter
   One

┌─name────────────┬─value─┬─changed─┬─description──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 send_logs_level  trace        1  Send server text logs with specified minumum level to client. Valid values: 'trace', 'debug', 'info', 'warning', 'error', 'none' 
└─────────────────┴───────┴─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159654 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 398 ] <Trace> ThreadStatus: Thread 398 exited
[xx.xx.xx.xx.] 2018.09.15 23:30:51.159803 {dbecff24-3edc-4584-8379-02cb23aedcb2} [ 275 ] <Information> executeQuery: Read 178 rows, 25.15 KiB in 0.001 sec., 127014 rows/sec., 17.53 MiB/sec.


# 分布式查询的日志也会打到当前终端

记录setting行为到query_log

  • The system.query_log and system.processes (SHOW PROCESSLIST) tables now have information about all changed settings when you run a query (the nested structure of the Settings data). - Added the log_query_settings setting. #2482
SELECT *
FROM system.query_log
ORDER BY event_time DESC
LIMIT 1

Row 1:
──────
type:                 1
event_date:           2018-09-15
event_time:           2018-09-15 23:42:17
query_start_time:     2018-09-15 23:42:17
query_duration_ms:    0
read_rows:            0
read_bytes:           0
written_rows:         0
written_bytes:        0
result_rows:          0
result_bytes:         0
memory_usage:         0
query:                select * from system.metrics
exception:
stack_trace:
is_initial_query:     1
user:                 user
query_id:             0881d528-a79c-4bd7-8c0a-38ce270b95f1
address:              ��
M
port:                 33384
initial_user:         user
initial_query_id:     0881d528-a79c-4bd7-8c0a-38ce270b95f1
initial_address:      ��
M
initial_port:         33384
interface:            2
os_user:
client_hostname:
client_name:
client_revision:      0
client_version_major: 0
client_version_minor: 0
client_version_patch: 0
http_method:          1
http_user_agent:      Go-http-client/1.1
quota_key:
revision:             54407
thread_numbers:       []
ProfileEvents.Names:  []
ProfileEvents.Values: []
Settings.Names:       ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','readonly','max_memory_usage']
Settings.Values:      ['48','0','64','random','1','1','32212254720']

记录线程数

  • The system.query_log and system.processes tables now show information about the number of threads that are participating in query execution (see the thread_numbers column). #2482
SELECT thread_numbers
FROM system.query_log
ORDER BY event_time DESC
LIMIT 10

┌─thread_numbers─┐
 []             
 [88]           
 []             
 [77,173]       
 []             
 []             
 [77,174]       
 []             
 [77,171]       
 []             
└────────────────┘


SELECT thread_numbers
FROM system.processes
LIMIT 10

┌─thread_numbers─┐
 [77]           
└────────────────┘

增加进程统计信息

  • Added ProfileEvents counters that measure the time spent on reading and writing over the network and reading and writing to disk, the number of network errors, and the time spent - waiting when network bandwidth is limited. #2482

  • Added ProfileEventscounters that contain the system metrics from rusage (you can use them to get information about CPU usage in userspace and the kernel, page faults, and context -

    switches), as well as taskstats metrics (use these to obtain information about I/O wait time, CPU wait time, and the amount of data read and recorded, both with and without page c ache). -#2482

  • The ProfileEvents counters are applied globally and for each query, as well as for each query execution thread, which allows you to profile resource consumption by query in detail. #2482

SELECT *
FROM system.processes
LIMIT 10

Row 1:
──────
is_initial_query:     1
user:                 user
query_id:             7e65449f-8899-4c5f-8859-20eeae32d1b1
address:              127.0.0.1
port:                 38610
initial_user:         user
initial_query_id:     7e65449f-8899-4c5f-8859-20eeae32d1b1
initial_address:      127.0.0.1
initial_port:         38610
interface:            1
os_user:              root
client_hostname:      xx.xx.xx.xx.
client_name:          ClickHouse client
client_version_major: 18
client_version_minor: 12
client_version_patch: 13
client_revision:      54407
http_method:          0
http_user_agent:
quota_key:
elapsed:              0.000800632
is_cancelled:         0
read_rows:            0
read_bytes:           0
total_rows_approx:    0
written_rows:         0
written_bytes:        0
memory_usage:         880
peak_memory_usage:    880
query:                select * from system.processes  limit 10
thread_numbers:       [181]
ProfileEvents.Names:  ['Query','SelectQuery','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','ContextLock','RWLockAcquiredReadLocks']
ProfileEvents.Values: [1,1,36,1,10,1,57,3,1]
Settings.Names:       ['max_threads','use_uncompressed_cache','background_pool_size','load_balancing','log_queries','max_memory_usage']
Settings.Values:      ['48','0','64','random','1','64424509440']

新增每个查询执行线程的信息

  • Added the system.query_thread_log table, which contains information about each query execution thread. Added the log_query_threads setting. #2482
SELECT *
FROM system.query_thread_log
ORDER BY event_time DESC
LIMIT 3

Row 1:
──────
event_date:           2018-09-15
event_time:           2018-09-15 22:29:17
query_start_time:     2018-09-15 22:29:17
query_duration_ms:    4
read_rows:            2178
read_bytes:           1446750
written_rows:         0
written_bytes:        0
memory_usage:         27136
peak_memory_usage:    44996168
thread_name:          ParalInputsProc
thread_number:        252
os_thread_id:         19255
master_thread_number: 74
master_os_thread_id:  9227
query:                select * from system.query_thread_log order by event_time desc  limit 10
is_initial_query:     1
user:                 user
query_id:             55740e27-e796-4b0f-a9ff-530363f91d76
address:              ��
port:                 52456
initial_user:         user
initial_query_id:     55740e27-e796-4b0f-a9ff-530363f91d76
initial_address:      ��
initial_port:         52456
interface:            1
os_user:              root
client_hostname:      xx.xx.xx.xx.
client_name:          ClickHouse client
client_revision:      54407
client_version_major: 18
client_version_minor: 12
client_version_patch: 13
http_method:          0
http_user_agent:
quota_key:
revision:             54407
ProfileEvents.Names:  ['FileOpen','ReadBufferFromFileDescriptorRead','ReadBufferFromFileDescriptorReadBytes','ReadCompressedBytes','CompressedReadBufferBlocks','CompressedReadBufferBytes','IOBufferAllocs','IOBufferAllocBytes','MarkCacheMisses','CreatedReadBufferOrdinary','DiskReadElapsedMicroseconds','ContextLock','RealTimeMicroseconds','UserTimeMicroseconds','SystemTimeMicroseconds','SoftPageFaults']
ProfileEvents.Values: [80,118,231033,229113,46,1101082,80,42862276,40,40,1135,2,4328,890,2671,678]

新增 system.metrics and system.events

  • The system.metrics and system.events tables now have built-in documentation. #3016
SELECT *
FROM system.metrics

┌─metric───────────────────────────────────┬──────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 Query                                              1  Number of executing queries                                                                                                                                                                                                                                
 Merge                                              0  Number of executing background merges                                                                                                                                                                                                                      
 PartMutation                                       0  Number of mutations (ALTER DELETE/UPDATE)                                                                                                                                                                                                                  
 ReplicatedFetch                                    0  Number of data parts fetching from replica                                                                                                                                                                                                                 
 ReplicatedSend                                     0  Number of data parts sending to replicas                                                                                                                                                                                                                   
 ReplicatedChecks                                   0  Number of data parts checking for consistency


SELECT *
FROM system.events
LIMIT 10

┌─event───────────────────────────────────┬────value─┬─description────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
 Query                                         127  Number of queries started to be interpreted and maybe executed. Does not include queries that are failed to parse, that are rejected due to AST size limits; rejected due to quota limits or limits on number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. 
 SelectQuery                                   124  Same as Query, but only for SELECT queries.                                                                                                                                                                                                                
 FileOpen                                    54504  Number of files opened.                                                                                                                                                                                                                                    
 Seek                                           47  Number of times the 'lseek' function was called.

新增arrayEnumerateDense函数

  • Added the arrayEnumerateDense function. Amos Bird
SELECT arrayEnumerateDense([(1, 2), (3, 4), (1, 2), (1, 2), (2, 3), (2, 3)])

┌─arrayEnumerateDense(array(tuple(1, 2), tuple(3, 4), tuple(1, 2), tuple(1, 2), tuple(2, 3), tuple(2, 3)))─┐
 [1,2,1,1,3,3]                                                                                            
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

新增arrayCumSumNonNegativ/arrayDifference函数

  • Added the arrayCumSumNonNegative and arrayDifference functions. Aleksey Studnev

  • 似乎还没上,无法使用。

新增retention函数

  • Added the retention aggregate function. Sundy Li
SELECT
    uid,
    retention(date = '2018-08-06', date = '2018-08-07', date = '2018-08-08') AS r
FROM retention_test
WHERE date IN ('2018-08-06', '2018-08-07', '2018-08-08')
GROUP BY uid
ORDER BY uid ASC
LIMIT 3

┌─uid─┬─r───────┐
   0  [1,1,1] 
   1  [1,1,1] 
   2  [1,1,1] 
└─────┴─────────┘

states函数可以使用加号云算法

  • Now you can add (merge) states of aggregate functions by using the plus operator, and multiply the states of aggregate functions by a nonnegative constant. #3062 #3034
CREATE TABLE add_aggregate
(
    a UInt32,
    b UInt32
)
ENGINE = Memory

INSERT INTO add_aggregate VALUES(1, 2);

INSERT INTO add_aggregate VALUES(3, 1);


SELECT minMerge(x)
FROM
(
    SELECT minState(a) + minState(b) AS x
    FROM add_aggregate
)

┌─minMerge(x)─┐
           1 
└─────────────┘

虚拟列

  • Tables in the MergeTree family now have the virtual column _partition_id. #3089
SELECT _partition_id
FROM test.partition_id
ORDER BY _partition_id ASC

┌─_partition_id─┐
 197004        
 197004        
 197007        
 197007        
 197010        
 197010        
 201809        
 201809        
 201809        
└───────────────┘



SELECT *
FROM test.partition_id
ORDER BY d ASC

┌──────────d─┬───x─┐
 1970-04-11    1 
 1970-04-11    1 
 1970-07-20    2 
 1970-07-20    2 
 1970-10-28    3 
 1970-10-28    3 
 2018-09-13  100 
 2018-09-14  100 
 2018-09-15  100 
└────────────┴─────┘

Bug修复:

  • 修复了 Dictionary 表的问题(抛出 Size of offsets doesn't match size of columnUnknown compression method 异常)。此错误出现在版本18.10.3中。 #2913
  • 修复了合并 CollapsingMergeTree 表时如果其中一个数据部分为空(这些部分是在合并期间形成或者 ALTER DELETE 所有数据都已删除)的错误,并且该 vertical 算法用于合并。 #3049
  • 在修正了比赛条件 DROPTRUNCATE 用于 Memory 与同时表 SELECT ,这可能导致服务器崩溃。此错误出现在1.1.54388版本中。 #3038
  • 修复了在返回错误时插入 Replicated 表时数据丢失的可能性 Session is expired (可以通过 ReplicatedDataLoss 度量检测到数据丢失)。版本1.1.54378中发生此错误。 #2939 #2949 #2964
  • 修复了一段时间内的段错误 JOIN ... ON#3000
  • 修复了 WHERE 表达式完全由限定列名称组成时的错误搜索列名称,例如 WHERE table.column#2994
  • 修复了在执行分布式查询时发生的“未找到列”错误,如果从远程服务器请求包含带有子查询的IN表达式的单个列。 #3087
  • 修复了 Block structure mismatch in UNION stream: different number of columns 分布式查询发生的错误,如果其中一个分片是本地分区而另一个分片不是,并且 PREWHERE 触发了移动优化。 #2226 #3037 #3055 #3065 #3073 #3090 #3093
  • 修复了 pointInPolygon 非凸多边形的某些情况的函数。 #2910
  • 修正了 nan 与整数比较时的错误结果。 #3024
  • 修复了 zlib-ng 库中可能导致段错误的错误。 #2854
  • 修复了插入带有 AggregateFunction 列的表时的内存泄漏,如果聚合函数的状态不简单(单独分配内存),以及单个插入请求是否导致多个小块。 #3084
  • 修复了同时创建和删除相同 BufferMergeTree 表格时的竞争条件。
  • 修复了比较由某些非平凡类型组成的元组(例如元组)时出现段错误的可能性。 #2989
  • 修复了运行某些 ON CLUSTER 查询时出现段错误的可能性。 张冬
  • 修复了数组元素 arrayDistinct 函数中的错误 Nullable#2845 #2937
  • enable_optimize_predicate_expression 选项现在可以正确支持案例 SELECT *张冬
  • 修复了重新初始化ZooKeeper会话时的段错误。 #2917
  • 修复了使用ZooKeeper时潜在的阻塞问题。
  • 修复了在a中添加嵌套数据结构的错误代码 SummingMergeTree
  • 在为聚合函数的状态分配内存时,正确地考虑了对齐,这使得在实现聚合函数的状态时可以使用需要对齐的操作。 晨星-XC

安全修复:

  • 安全使用ODBC数据源。与ODBC驱动程序的交互使用单独的 clickhouse-odbc-bridge 进程。第三方ODBC驱动程序中的错误不再导致服务器稳定性或漏洞问题。 #2828 #2879 #2886 #2893 #2921
  • 修复了 catBoostPool 表函数中文件路径的错误验证问题。 #2894
  • 系统表(的内容 tablesdatabasespartscolumnsparts_columnsmergesmutationsreplicas ,和 replication_queue )根据用户的配置访问数据库过滤( allow_databases )。 张冬

向后不兼容的变化:

  • 在使用JOIN的请求中,星形字符扩展为所有表中的列列表,符合 SQL 标准。您可以通过 asterisk_left_columns_only 在用户配置级别设置为1 来恢复旧行为。

构建更改:

  • 现在大多数集成测试都可以通过提交来运行。
  • 代码样式检查也可以通过提交运行。
  • memcpy 上CentOS7 / Fedora的建设时,实施正确选择。 Etienne Champetier
  • 使用clang进行构建时, -Weverything 除了常规警告外,还添加了一些警告 -Wall-Wextra -Werror#2957
  • 调试构建使用 jemalloc 调试选项。
  • 用于与ZooKeeper交互的库的接口被声明为abstract。 #2950

以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

The Shallows

The Shallows

Nicholas Carr / W. W. Norton & Company / 2010-6-15 / USD 26.95

"Is Google making us stupid?" When Nicholas Carr posed that question, in a celebrated Atlantic Monthly cover story, he tapped into a well of anxiety about how the Internet is changing us. He also crys......一起来看看 《The Shallows》 这本书的介绍吧!

CSS 压缩/解压工具
CSS 压缩/解压工具

在线压缩/解压 CSS 代码

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

URL 编码/解码
URL 编码/解码

URL 编码/解码