内容简介:Piflow是一个基于分布式计算框架Spark开发的大数据流水线系统。该系统将数据的采集、清洗、计算、存储等各个环节封装成组件,以所见即所得方式进行流水线配置。简单易用,功能强大。它具有如下特性: 简单易用:可...
Piflow是一个基于分布式计算框架Spark开发的大数据流水线系统。该系统将数据的采集、清洗、计算、存储等各个环节封装成组件,以所见即所得方式进行流水线配置。简单易用,功能强大。它具有如下特性:
- 简单易用:可视化配置流水线,实时监控流水线运行状态,查看日志
- 功能强大:提供100+的数据处理组件, 包括Hadoop 、Spark、MLlib、Hive、Solr、Redis、MemCache、ElasticSearch、JDBC、MongoDB、HTTP、FTP、XML、CSV、JSON等,同时集成了微生物领域的相关算法。
- 扩展性强:支持自定义开发数据处理组件
- 性能优越:基于分布式计算引擎Spark开发
试用地址:http://piflow.ml/piflow-web/login ,用户名/密码:admin/admin
安装使用说明详见:https://github.com/cas-bigdatalab/piflow
支持的数据处理组件如下:
|
组名 |
组件名 |
|
Hive: |
cn.piflow.bundle.hive.SelectHiveQL |
|
Hive: |
cn.piflow.bundle.hive.PutHiveStreaming |
|
Hive: |
cn.piflow.bundle.hive.PutHiveQL |
|
Hdfs: |
cn.piflow.bundle.hdfs.PutHdfs |
|
Hdfs: |
cn.piflow.bundle.hdfs.DeleteHdfs |
|
Hdfs: |
cn.piflow.bundle.hdfs.UnzipFilesOnHDFS |
|
Hdfs: |
cn.piflow.bundle.hdfs.GetHdfs |
|
Hdfs: |
cn.piflow.bundle.hdfs.ListHdfs |
|
Http: |
cn.piflow.bundle.http.InvokeUrl |
|
Http: |
cn.piflow.bundle.http.GetUrl |
|
Http: |
cn.piflow.bundle.http.UnGZip |
|
Http: |
cn.piflow.bundle.http.PostUrl |
|
Http: |
cn.piflow.bundle.http.LoadZipFromUrl |
|
Http: |
cn.piflow.bundle.http.FileDownHDFS |
|
RDF: |
cn.piflow.bundle.rdf.CsvToNeo4J |
|
RDF: |
cn.piflow.bundle.rdf.RdfToDF |
|
Spider: |
cn.piflow.bundle.internetWorm.spider |
|
Jdbc: |
cn.piflow.bundle.jdbc.JdbcRead |
|
Jdbc: |
cn.piflow.bundle.jdbc.JdbcReadFromOracle |
|
Jdbc: |
cn.piflow.bundle.jdbc.JdbcWrite |
|
Jdbc: |
cn.piflow.bundle.jdbc.JdbcWriteToOracle |
|
Streaming: |
cn.piflow.bundle.streaming.FlumeStream |
|
Streaming: |
cn.piflow.bundle.streaming.KafkaStream |
|
Streaming: |
cn.piflow.bundle.streaming.SocketTextStreamByWindow |
|
Streaming: |
cn.piflow.bundle.streaming.SocketTextStream |
|
Streaming: |
cn.piflow.bundle.streaming.TextFileStream |
|
MongoDB: |
cn.piflow.bundle.impala.SelectImpala |
|
MongoDB: |
cn.piflow.bundle.mongodb.GetMongo |
|
MongoDB: |
cn.piflow.bundle.mongodb.PutMongo |
|
CSV: |
cn.piflow.bundle.csv.FolderCsvParser |
|
CSV: |
cn.piflow.bundle.csv.CsvSave |
|
CSV: |
cn.piflow.bundle.csv.CsvParser |
|
CSV: |
cn.piflow.bundle.csv.CsvStringParser |
|
File: |
cn.piflow.bundle.file.PutFile |
|
File: |
cn.piflow.bundle.file.FetchFile |
|
File: |
cn.piflow.bundle.file.RegexTextProcess |
|
Script: |
cn.piflow.bundle.script.ShellExecutor |
|
Script: |
cn.piflow.bundle.script.DataFrameRowParser |
|
Common: |
cn.piflow.bundle.common.Distinct |
|
Common: |
cn.piflow.bundle.common.ConvertSchema |
|
Common: |
cn.piflow.bundle.common.Fork |
|
Common: |
cn.piflow.bundle.common.SelectField |
|
Common: |
cn.piflow.bundle.common.Join |
|
Common: |
cn.piflow.bundle.common.DoFlatMapStop |
|
Common: |
cn.piflow.bundle.common.ExecuteSQLStop |
|
Common: |
cn.piflow.bundle.common.Merge |
|
Common: |
cn.piflow.bundle.common.DoMapStop |
|
Common: |
cn.piflow.bundle.common.Subtract |
|
Data Clean: |
cn.piflow.bundle.clean.IdentityNumberClean |
|
Data Clean: |
cn.piflow.bundle.clean.PhoneNumberClean |
|
Data Clean: |
cn.piflow.bundle.clean.EmailClean |
|
Data Clean: |
cn.piflow.bundle.clean.TitleClean |
|
Message Queue: |
cn.piflow.bundle.kafka.WriteToKafka |
|
Message Queue: |
cn.piflow.bundle.kafka.ReadFromKafka |
|
Microorganism: |
cn.piflow.bundle.microorganism.Ensembl_gff3Parser |
|
Microorganism: |
cn.piflow.bundle.microorganism.GeneParser |
|
Microorganism: |
cn.piflow.bundle.microorganism.RefseqParser |
|
Microorganism: |
cn.piflow.bundle.microorganism.GoDataParse |
|
Microorganism: |
cn.piflow.bundle.microorganism.PfamDataParser |
|
Microorganism: |
cn.piflow.bundle.microorganism.GoldDataParse |
|
Microorganism: |
cn.piflow.bundle.microorganism.Swissprot_TrEMBLDataParser |
|
Microorganism: |
cn.piflow.bundle.microorganism.EmblParser |
|
Microorganism: |
cn.piflow.bundle.microorganism.PDBParser |
|
Microorganism: |
cn.piflow.bundle.microorganism.GenBankParse |
|
Microorganism: |
cn.piflow.bundle.microorganism.TaxonomyParse |
|
Microorganism: |
cn.piflow.bundle.microorganism.BioProjetDataParse |
|
Microorganism: |
cn.piflow.bundle.microorganism.BioSampleParse |
|
Microorganism: |
cn.piflow.bundle.microorganism.InterprodataParse |
|
Microorganism: |
cn.piflow.bundle.microorganism.MicrobeGenomeDataParser |
|
Memcache: |
cn.piflow.bundle.memcache.ComplementByMemcache |
|
Memcache: |
cn.piflow.bundle.memcache.PutMemcache |
|
Memcache: |
cn.piflow.bundle.memcache.GetMemcache |
|
Mechine Learning: |
cn.piflow.bundle.ml_clustering.GaussianMixtureTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.LogisticRegressionTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.RandomForestTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.DecisionTreePrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.RandomForestPrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_clustering.BisectingKMeansPrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_clustering.LDAPrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.MultilayerPerceptronTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.GBTTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_clustering.BisectingKMeansTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.MultilayerPerceptronPrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.GBTPrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_clustering.KmeansTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.NaiveBayesPrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.DecisionTreeTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_clustering.LDATraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.LogisticRegressionPrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_feature.WordToVec |
|
Mechine Learning: |
cn.piflow.bundle.ml_clustering.KmeansPrediction |
|
Mechine Learning: |
cn.piflow.bundle.ml_classification.NaiveBayesTraining |
|
Mechine Learning: |
cn.piflow.bundle.ml_clustering.GaussianMixturePrediction |
|
ElasticSearch: |
cn.piflow.bundle.es.PutEs |
|
ElasticSearch: |
cn.piflow.bundle.es.QueryEs |
|
ElasticSearch: |
cn.piflow.bundle.es.FetchEs |
|
Redis: |
cn.piflow.bundle.redis.WriteToRedis |
|
Redis: |
cn.piflow.bundle.redis.ReadFromRedis |
|
Xml: |
cn.piflow.bundle.xml.XmlParser |
|
Xml: |
cn.piflow.bundle.xml.XmlStringParser |
|
Xml: |
cn.piflow.bundle.xml.FlattenXmlParser |
|
Xml: |
cn.piflow.bundle.xml.XmlSave |
|
Xml: |
cn.piflow.bundle.xml.FolderXmlParser |
|
Ftp: |
cn.piflow.bundle.hdfs.SelectFilesByName |
|
Ftp: |
cn.piflow.bundle.ftp.UploadToFtp |
|
Ftp: |
cn.piflow.bundle.ftp.LoadFromFtp |
|
Ftp: |
cn.piflow.bundle.ftp.LoadFromFtpUrl |
|
Ftp: |
cn.piflow.bundle.ftp.LoadFromFtpToHDFS |
|
Ftp: |
cn.piflow.bundle.ftp.UnGz |
|
Ftp: |
cn.piflow.bundle.ftp.NewLoadFromFtp |
|
Excel: |
cn.piflow.bundle.excel.ExcelParser |
|
Solr: |
cn.piflow.bundle.solr.PutIntoSolr |
|
Solr: |
cn.piflow.bundle.solr.GetFromSolr |
|
Json: |
cn.piflow.bundle.json.JsonStringParser |
|
Json: |
cn.piflow.bundle.json.MultiFolderJsonParser |
|
Json: |
cn.piflow.bundle.json.FolderJsonParser |
|
Json: |
cn.piflow.bundle.json.JsonSave |
|
Json: |
cn.piflow.bundle.json.JsonParser |
|
Json: |
cn.piflow.bundle.json.EvaluateJsonPath |
|
GraphX: |
cn.piflow.bundle.graphx.LoadGraph |
|
GraphX: |
cn.piflow.bundle.graphx.LabelPropagation |
以上所述就是小编给大家介绍的《PiFlow v0.5 发布:大数据流水线系统》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
企业应用架构模式
Martin Fowler / 王怀民、周斌 / 机械工业出版社 / 2010-4 / 59.00元
《企业应用架构模式》作者是当今面向对象软件开发的权威,他在一组专家级合作者的帮助下,将40多种经常出现的解决方案转化成模式,最终写成这本能够应用于任何一种企业应用平台的、关于解决方案的、不可或缺的手册。《企业应用架构模式》获得了2003年度美国软件开发杂志图书类的生产效率奖和读者选择奖。《企业应用架构模式》分为两大部分。第一部分是关于如何开发企业应用的简单介绍。第二部分是《企业应用架构模式》的主体......一起来看看 《企业应用架构模式》 这本书的介绍吧!