Materialize: A Streaming Data Warehouse

栏目: IT技术 · 发布时间: 6年前

内容简介:Databases, and data infrastructure generally, have made substantial progress over the years.We now have access to cloud-native infrastructure that allows just about anyone to set up, maintain, and query databases at substantial scale. This is a serious dep

Databases, and data infrastructure generally, have made substantial progress over the years.

We now have access to cloud-native infrastructure that allows just about anyone to set up, maintain, and query databases at substantial scale. This is a serious departure from the monolithic software of years past, where getting access to a database involved multiple people and several companies.

However, the data still doesn’t move as fast as it should.

We believe that all information across an enterprise should be up-to-date, immediately. When a storefront accepts an order from a customer, this information should be visible everywhere: from portals used by customer service agents, to back-office inventory management and logistics, from mobile apps that consumers use to track their order, to business analysts optimizing their organization. There is little gained, and a great deal lost, by slowing down the movement of data. No data user wants to wait overnight for “jobs” to complete. Often even minutes can be too long. Demand milliseconds .

This shouldn’t come at the cost of the gains made by data infrastructure over the years: analysts still want to use declarative query languages rather than directly programming applications. Interoperability is paramount: existing dashboards, visualization, and tooling use standards and protocols that cannot simply be jettisoned. Cloud-native deployment is non-negotiable. A viable solution should look and feel like much of existing infrastructure, except instantaneous.

We also cannot regress on delivering strong consistency . When there are moments between changes to your data and analysts observing the results, users should never be presented with incorrect information. All results should reflect correct answers at some point in time (which ideally moves forward as briskly as possible).

Given these requirements, how do we get there? Traditional data processing infrastructure, but faster, isn’t the answer: it’s designed to repeatedly ask about the current state of the world, rather than to react to those changes that occur, as they occur. We need fundamentally new infrastructure based on reactive models of computation, that move new information through established dataflows as quickly as possible.

Streaming without Compromises

We believe that streaming architectures are the only ones that can produce this ideal data infrastructure. Streaming is more than a different programming model, pivoting data processing from a query-based “polling” design – with staleness built in – to a reactive model that responds to changes the moment they happen. It also bypasses repeated work on unchanged data, which allows it to scale to substantially larger volumes of work.

To fully leverage streaming’s potential, we need to rebuild the data warehouse from the inside out, so that users do not have to rebuild their data infrastructure themselves. Many people hoped that event-streaming itself would be the revolution. Cobbled together with free software, streaming is indeed an exciting development, but today requires huge sacrifices in interoperability, flexibility, and ease of use. Catering to data platform experts, it leaves millions of users who would benefit from real-time analytics behind. We believe the real solution looks a lot more like the familiar data warehouse that organizations have been used to for decades, modernized for the always-up-to-date real-time world of 2020, with industry-standard SQL as the interface.

Today we’d like to introduce Materialize: the first Streaming Data Warehouse. It connects directly to your existing event-streaming infrastructure, and to the client, it walks and quacks like Postgres, so that familiar tooling can plug-and-play with it exactly as if they’re talking to an analytics-capable read-replica of an OLTP database. Materialize builds on top of years of award-winning research and open-source development. Built on top of the Timely Dataflow research project, it gives users the power of cutting-edge streaming computation with the declarative ease of PostgreSQL.

We’re excited to take the wrapping off of Materialize today.Download it to play around on your laptop, check out the source on GitHub , or sign up for regular updates to this blog!


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Java编程的逻辑

Java编程的逻辑

马俊昌 / 机械工业出版社 / 2018-1-1 / 99

Java专家撰写,力求透彻讲解每个知识点,逐步建立编程知识图谱。本书以Java语言为例,由基础概念入手,到背后实现原理与逻辑,再到应用实践,融会贯通。 全书共六大部分,其要点如下。 第一部分(第1~2章)讲解计算机程序的基本执行流程与元素,以及数据背后的二进制表示,帮读者掌握编程的基本概念。 第二部分(第3~7章)讲解面向对象的编程原理与逻辑,涉及类、继承与多态、接口与抽象类、异......一起来看看 《Java编程的逻辑》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

SHA 加密
SHA 加密

SHA 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换