How the Citus distributed query executor adapts to your Postgres workload

栏目: IT技术 · 发布时间: 3年前

内容简介:In one of our recent releases of theWe call this new Citus feature the “adaptive executor” and we thought it would be useful to walk through what the Citus adaptive executor means for Postgres and how it works.In a distributed database like Citus, there ar

In one of our recent releases of the open source Citus extension , we overhauled the way Citus executes distributed SQL queries—with the net effect being some huge improvements in terms of performance, user experience, Postgres compatibility, and resource management. The Citus executor is now able to dynamically adapt to the type of distributed SQL query, ensuring fast response times both for quick index lookups and big analytical queries.

We call this new Citus feature the “adaptive executor” and we thought it would be useful to walk through what the Citus adaptive executor means for Postgres and how it works.

Why we needed to improve upon the original Citus query executors

In a distributed database like Citus, there are two ways of handling a Postgres query. Either:

  • the query contains enough information to determine in which shard (i.e. in which Postgres table on the worker) the data is stored—and to route the query to the right node with minimal overhead, or
  • the query is parallelized across all shards to handle large data volumes

Citus supports both models for handling SQL queries in Postgres, which has allowed Citus to scale bothmulti-tenant SaaS applications as well asreal-time analytics applications, including time series dashboards .

To handle the requirements of these different types of PostgreSQL workloads, Citus traditionally had multiple executor code paths. Previously:

  • the Citus router executor was optimized for routing queries directly to the right node, with minimal overhead, and
  • the Citus real-time executor was optimized for large parallel computations by opening a connection per shard, to query many shards using all available cores

Because the original Citus real-time executor was not designed with distributed transactions in mind, it did not have the ability to reuse connections for multiple shards on the same worker.

On the other hand, because the original Citus router executor was designed for minimal overhead, it explicitly reused connections for all shards.

Using different executors in the same transaction block used to complicate migrations from PostgreSQL to Citus. Moreover, there is a large class of queries that do span across shards, but do not benefit from multi-core parallelism yet still paid the overhead of establishing many connections per Citus worker node.

In one of our recent Citus open source releases, we delivered a single executor that can handle different workloads and solves all the shortcomings of the real-time and router executor in a single unified code path: The Citus adaptive executor.

Introducing the Citus Adaptive Executor

The Citus adaptive executor uses a dynamic pool of connections to each worker node to execute Postgres queries on the shards (SQL tasks).

What this means is: the Citus adaptive executor can execute multiple tasks over a single connection per Citus worker node, to minimize overhead or to parallelize queries across connections per worker node to use multiple cores—giving you the ability to use hundreds of cores and combine the memory of many servers.

The Citus executor parallelizes not only SELECT queries, but also DML (e.g. UPDATE ), DDL (e.g. CREATE INDEX ) and other utility statements (e.g. VACUUM ) across multiple worker nodes and multiple cores. Moreover, these parallelized statements can be part of bigger transaction blocks or stored procedures that are executed as one distributed Postgres transaction.

To ensure queries can always see the result of preceding statements in a transaction block, the adaptive executor first checks whether SQL tasks need to be assigned to a particular connection. If there were preceding writes on the shard(s) that the task accesses, then the executor assigns the SQL task to the connection that did the write—and otherwise the executor assigns the task to the pool to be executed as soon as a connection becomes available.

By dynamically adjusting the pool size to the SQL query, the Citus adaptive executor can handle:

  • queries that are routed to a single worker node,
  • queries that are parallelized across all worker node over a single connection (=core) per node, as well as
  • queries that are parallelized across many cores per worker.

The adaptive executor also enables users to limit the number of connections per worker that a query can open, to handle distributed PostgreSQL queries with high concurrency.

The not-so-secret sauce behind the Adaptive Executor

Part of what makes the executor adaptive is a technique we call “slow start”, which is inspired by the famous TCP Slow Start algorithm that allows TCP to adapt to the available bandwidth and avoid congestion.

Every 10 milliseconds, the number of new connections that the adaptive executor can open to a worker node will double, if there are tasks remaining for that worker node. That way, if a distributed query does a quick ~1ms index lookup on each shard, the executor will never open more than a single connection per node. On the other hand, if we’re dealing with an analytical query that takes several seconds, the executor will quickly parallelize it across all available cores by running tasks in parallel over multiple connections per node.

So the Citus adaptive executor does not need to guess how long a SQL query will take, but rather will adapt to each query’s runtime.

In the upcoming Citus 9.3 open source release, the adaptive executor will also factor in the total number of connections it makes to each worker node. When it approaches the connection limit on a worker ( max_connections ), the executor will avoid opening additional connections for parallelism, such that the total number of connections to each worker never exceeds the total number of connections that the client is making to the coordinator. That ensures you can run many parallel queries concurrently without any issues.

Finally, while this might not seem important to most users, as a developer I really like the Citus adaptive executor’s well-documented code and especially its elegant state machines . The extensive code comments and state machines help ensure we keep one of the c


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Java EE WEB开发与项目实战

Java EE WEB开发与项目实战

李俊青 / 华中科技大学出版社 / 2011-11 / 59.80元

本书采用工程案例的形式,将日常Java EE项目开发所涉及的技术要点进行了解析,系统介绍了Apache的安装、Tomcat的安装、虚拟主机的配置、开发工具的搭配使用、验证码的使用、过滤器的使用、密码的加密与解密、JavaMail邮件发送、Web在线编辑器的使用、文件上传、数据库连接池、Ajax与Servlet的身份认证、Struts框架的应用、JSF框架的应用、Spring框架的应用、Hibern......一起来看看 《Java EE WEB开发与项目实战》 这本书的介绍吧!

SHA 加密
SHA 加密

SHA 加密工具

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具