内容简介:One solution is to think of data deletion not as an event, but as a process. At Twitter, we call this process “erasure” and coordinate data deletion between systems using an erasure pipeline. In this post, we’ll discuss how to set up an erasure pipeline, i
Microservices architectures tend to distribute responsibility for data throughout an organization. This poses challenges to ensuring that data is deleted. A common solution is to set an organization-wide standard of per-dataset or per-record retentions. There will always be data, however, that spans multiple datasets and records. This data is often distributed throughout your microservices architecture, requiring coordination between systems and teams to delete it.
One solution is to think of data deletion not as an event, but as a process. At Twitter, we call this process “erasure” and coordinate data deletion between systems using an erasure pipeline. In this post, we’ll discuss how to set up an erasure pipeline, including data discoverability, access, and processing. We’ll also touch on common problems and how to ensure ongoing maintenance of an erasure pipeline.
Discoverability
First, you’ll need to find the data that needs to be deleted. Data about a given event, user, or record could be in online or offline datasets, and may be owned by disparate parts of your organization. So your first job will be to use your knowledge of your organization, the expertise of your peers, and organization-wide communication channels to compile a list of all relevant data.
Data Access and Processing Methods
The data you find will usually be accessible to you in one of three ways. Online data will be mutable via (1) a real-time API or (2) an asynchronous mutator. Offline warehoused data will be mutable via (3) a parallel-distributed processing framework like MapReduce . In order to reach every piece of data, your pipeline will need to support each of these three processing methods.
Data mutable via a real-time API is the simplest. Your erasure pipeline can call that API to perform data deletion tasks. Once the API calls have succeeded for each piece of data, the data has been deleted and your erasure pipeline is finished.
The downside of this approach is that it assumes every data deletion task can be completed within the span of an API call, usually seconds or milliseconds, when it may take longer. In this case, your erasure pipeline has to get a bit more complicated. Examples of data that can’t be deleted in the span of an API call include data that is exported to offline snapshots, or data that exists in multiple backend systems and caches. This data denormalization is inherent to your microservices architecture and increases performance. It also means that responsibility for the data’s lifecycle is delegated to the team who owns the data’s APIs and business logic.
You’ll need to inform data owners that data deletion needs to happen. Your erasure pipeline can publish erasure events to a distributed queue, like Kafka , which partner teams subscribe to in order to initiate data deletion. They process the erasure event and call back to your team to confirm that the data was deleted.
Finally, there may be completely offline datasets containing data that needs to be deleted, such as snapshots or model training data. In these cases, you can provide an offline dataset which partner teams use to remove erasable data from their datasets. This offline dataset can be as simple as persisted logs from your erasure event publisher.
An Erasure Pipeline
The erasure pipeline we’ve described thus far has a few key requirements. It must:
- Accept incoming erasure requests
- Track and persist which pieces of data have been deleted
- Call synchronous APIs to delete data
- Publish erasure events for asynchronous erasure
- Generate an offline dataset of erasure events
An example erasure pipeline might look like this:
This Tweet is unavailable
This Tweet is unavailable.
以上所述就是小编给大家介绍的《Deleting data distributed throughout a microservice architecture》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Python学习手册
Mark Lutz / 侯靖 / 机械工业出版社 / 2009-8 / 89.00元
《Python学习手册(第3版)》讲述了:Python可移植、功能强大、易于使用,是编写独立应用程序和脚本应用程序的理想选择。无论你是刚接触编程或者刚接触Python,通过学习《Python学习手册(第3版)》,你可以迅速高效地精通核心Python语言基础。读完《Python学习手册(第3版)》,你会对这门语言有足够的了解,从而可以在你所从事的任何应用领域中使用它。 《Python学习手册(......一起来看看 《Python学习手册》 这本书的介绍吧!