An Intro into the Lambda Architecture

栏目: IT技术 · 发布时间: 6年前

内容简介：The Lambda Architecture itself is a software design pattern, aiming to unify data processing. Its design enables it to process substantial quantities of data by applying both methods of batch and stream processing. A combination of these methods is used as

The Lambda Architecture itself is a software design pattern, aiming to unify data processing. Its design enables it to process substantial quantities of data by applying both methods of batch and stream processing. A combination of these methods is used as the patterns architecture approaches typical obstacles like latency, throughput and fault-tolerance.

It is used for high availability online applications, where, due to time delays, data validity is required. Generating precise and complete views by using batch processing and providing views of online data is done simultaneously.

Functionality

The Lambda Architecture has three main components, which are responsible for two main tasks. To interact and process newly incoming data and to react to queries on the existing data source. The incoming data sets will be handed off to the batch and the speed layer for further processing.

Batch Layer

The batch layer is responsible for taking care of the master data set. The master data set consists of an append-only, immutable set which only contains raw data. This is done by using a distributed processing system, which may handle massive amounts of data at once.

It gains its accuracy by being able to process all available data whilst generating views. By precomputing views based on the complete data set it is able to eliminate any error in the raw data. The output is typically generated by using map-reduce.

Map-reduce is a technique which takes a large data set and divides it into subsets. A specific function is then performed on each subset. These subsets are combined to form the output.

This output is usually stored in a read-only database, where updates fully delete the existing precomputed views. The batch layer allows the processing of older data sets. By analysing these it is possible to optimize the processing function used in the map-reduce action.

Speed Layer

The speed layer processes data streams in real-time. Therefore it neither guarantees its data to accurate nor to have fixed corrupt data. It attempts to minimize latency whilst granting real-time views into the most recent data. Thus its main purpose is to fill any gaps in the data caused by the batch layer’s lag in providing views based on the most recent data. The output of the speed layer may be thrown away after the calculations of the batch layers are finished.

Serving Layer

The serving layer combines the output from both batch and speed layer. As the initial entry point, it receives queries and responds to them. The complete data set is already available as it can use precomputed views or build them based on the processed data.

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

MySQL权威指南

Randy Jay Yarger / 林琪、朱涛江 / 中国电力出版社 / 2003-11-1 / 49.00元

为一种开源数据库，MySQL已经成为最流行的服务器软件包之一。开发人员在其数据库引擎中提供了丰富的特性（只需很少的内存和CPU支持）。因此，众多Linux和Unix服务器（以及一些Windows服务器）都采用MySQL作为其数据库引擎。由于MySQL作为Web站点后端时速度特别快而且相当方便，所有在目前流行的一个词LAMP（表示Linux、Apache、MySQL和Perl、Python或......一起来看看《MySQL权威指南》这本书的介绍吧!

码农工具

An Intro into the Lambda Architecture

Functionality

Batch Layer

Speed Layer

Serving Layer

MySQL权威指南

HTML 压缩/解压工具

图片转BASE64编码

RGB CMYK 转换工具