mysql-kettle-superset电商可视化数据分析

栏目: IT技术 · 发布时间: 5年前

内容简介：对电商业务中的用户、商品、订单的数据进行分析，观察运营的情况业务数据库：Mysql：存储最原始的数据ETL：Kettle

1、项目概述

需求

对电商业务中的用户、商品、订单的数据进行分析，观察运营的情况

架构

业务数据库：Mysql：存储最原始的数据

ETL：Kettle

数据仓库：Mysql：存储需要进行分析处理的数据

分析处理：SQL/Kettle

可视化：Superset

2、准备工作

系统

linux系统

软件

VMware虚拟机——安装 linux 操作系统

1 Windows版下载地址:
2 https://www.vmware.com/

finalshell——远程操作系统

1 Windows版下载地址:
2 http://www.hostbuf.com/downloads/finalshell_install.exe
3 Mac版,Linux版安装及教程:
4 http://www.hostbuf.com/t/1059.html

mysql——数据库（安装版和压缩包版）

1 Windows版下载地址:
2 https://www.mysql.com//downloads/

datagrip——数据库管理工具

链接：https://pan.baidu.com/s/1K1pPIX9uZiAKOAiFgHMlnw 
提取码：lhr4

Navicat——数据库管理工具

链接：https://pan.baidu.com/s/1eaW3CMhen_7X5sjVgs7enw 
提取码：fqov

kettle——如有安装问题请自行度娘

1、Kettle的下载与安装（本文使用kettle版本为pdi-ce-7.1.0.0-12）点击下载地址官方网站

可视化工具

superset——有问题请度娘

linux环境安装依赖
yum upgrade python-setuptools
yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel
安装superset
supersetcd /root/anaconda3/
pip install email_validator -i https://pypi.douban.com/simple
pip install superset==0.30.0 -i https://pypi.douban.com/simple

3、数据环境

1、导入业务数据

将这段 sql 代码下载运行，生成数据库，表格

链接：https://pan.baidu.com/s/1uVYISah6hYkBqiyhIk407w 
提取码：sfdm

2、构建数据仓库

通过kettle将业务数据抽取到数据分析的数据库中

链接：https://pan.baidu.com/s/1shH0zexh3WraQnMt17n-SA 
提取码：ao7n

生成表格——kettle操作略

mysql> use itcast_shop_bi;

Database changed

mysql> show tables;
+--------------------------+
| Tables_in_itcast_shop_bi |
+--------------------------+
| ods_itcast_good_cats     |商品分类表
| ods_itcast_goods         |商品表
| ods_itcast_order_goods   |订单及详情表
| ods_itcast_orders        |订单表
| ods_itcast_users         |用户表
| ods_itcast_area　　　　　　|行政区域表
+--------------------------+

3、自动化构建抽取实现

1、地区表以及商品分类表的自动抽取

2、商品表、订单表、订单详情表、用户表

3、设置定时自动运行

4、数据分析

需求1

需求：统计 2019-09-05 订单支付的总金额、订单的总笔数

演变：统计每天的订单支付的总金额和订单的总笔数

指标：总金额、订单总笔数

维度：天

-- 创建结果表
use itcast_shop_bi;
create table app_order_total(
    id int primary key auto_increment,
    dt date,
    total_money double,
    total_cnt int
);

-- 将分析的结果保存到结果表
insert into app_order_total
select
  null,
  substring(createTime,1,10) as dt,-- 2019-09-05这一天的日期
  round(sum(realTotalMoney),2) as total_money, -- 分组后这一天的所有订单总金额
  count(orderId) as total_cnt -- 分组后这一天的订单总个数
from
  ods_itcast_orders
where
  substring(createTime,1,10) = '2019-09-05'
group by
  substring(createTime,1,10);

-- 表结构及内容
mysql> desc app_order_user;
+----------------+------+------+-----+---------+----------------+
| Field          | Type | Null | Key | Default | Extra          |
+----------------+------+------+-----+---------+----------------+
| id             | int  | NO   | PRI | NULL    | auto_increment |
| dt             | date | YES  |     | NULL    |                |
| total_user_cnt | int  | YES  |     | NULL    |                |
+----------------+------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> select * from app_order_user;
+----+------------+----------------+
| id | dt         | total_user_cnt |
+----+------------+----------------+
|  1 | 2019-09-05 |             11 |
|  2 | 2019-09-05 |             11 |
+----+------------+----------------+
2 rows in set (0.01 sec)

需求2

需求：统计2019-09-05当天所有下单的用户总数

演变：统计订单表中2019-09-05这一天的所有订单的用户id的个数

-- 创建结果表
use itcast_shop_bi;
create table app_order_user(
    id int primary key auto_increment,
    dt date,
    total_user_cnt int
);

-- 插入结果数据
insert into app_order_user
select
  null,
  substring(createTime,1,10) as dt,-- 2019-09-05这一天的日期
  count(distinct userId) as total_user_cnt
from
  ods_itcast_orders
where
  substring(createTime,1,10) = '2019-09-05'
group by
  substring(createTime,1,10);

需求3

需求； 每天不同支付方式订单总额/订单笔数分析

指标：订单总额、订单总笔数

维度：时间维度【天】、支付方式维度

-- 创建结果表
create table app_order_paytype(
    id int primary key auto_increment,
    dt date,
    pay_type varchar(20),
    total_money double,
    total_cnt int
);

-- 插入结果数据
insert into app_order_paytype
select
  null,
  substring(createTime,1,10) as dt,-- 获取每一天的日期
  case payType when 1 then '支付宝' when 2 then '微信' when 3 then '现金' else '其他' end as pay_type,
  round(sum(realTotalMoney),2) as total_money, -- 分组后这一天的所有订单总金额
  count(orderId) as total_cnt -- 分组后这一天的订单总个数
from
  ods_itcast_orders
group by
  substring(createTime,1,10),payType;

需求4

需求；统计2019年9月下订单最多的用户TOP5，也就是前5名

方式一：上面考虑的是简单的情况，只获取订单个数最多的前5个人

select
    date_format(dt,'%Y-%m') as dt,
    userId,
    userName,
    count(orderId) as total_cnt
from
    ods_itcast_orders
where
    date_format(dt,'%Y-%m') = '2019-09'
group by
    date_format(dt,'%Y-%m'),userId,userName
order by
    total_cnt desc
limit 5;

方式二：我们希望得到订单个数最多的排名的前5名，如果个数相同排名相同

select
       *
from (
          select *,
                 dense_rank() over (partition by dt order by total_cnt desc) as rn
          from (
                   select date_format(dt, '%Y-%m') as dt,
                          userId,
                          userName,
                          count(orderId)           as total_cnt
                   from ods_itcast_orders
                   where date_format(dt, '%Y-%m') = '2019-09'
                   group by date_format(dt, '%Y-%m'), userId, userName
               ) tmp1
) tmp2 where rn < 6;

需求5

需求： 统计不同分类的订单总金额以及订单总笔数【类似于统计不同支付类型的订单总金额和总笔数】

-- 创建结果表
use itcast_shop_bi;
drop table if exists app_order_goods_cat;
create table app_order_goods_cat(
    id int primary key auto_increment,
    dt date,
    cat_name varchar(20),
    total_money double,
    total_num int
);
-- step2：先构建三级分类与一级分类之间的关系
-- 使用join实现
drop table if exists tmp_goods_cats;
create temporary table tmp_goods_cats as
select
    t3.catId as t3Id,-- 三级分类id
    t3.catName as t3Name, -- 三级分类名称
    t2.catId as t2Id,
    t2.catName as t2Name,
    t1.catId as t1Id,
    t1.catName as t1Name
from
    ods_itcast_good_cats t3  join ods_itcast_good_cats t2 on t3.parentId = t2.catId
    join ods_itcast_good_cats t1 on t2.parentId = t1.catId;
    
    
CREATE UNIQUE INDEX idx_goods_cat3 ON tmp_goods_cats(t3Id);
CREATE UNIQUE INDEX idx_itheima_goods ON ods_itcast_goods(goodsId);
CREATE INDEX idx_itheima__order_goods ON ods_itcast_order_goods(goodsId);

-- 插入结果数据
insert into app_order_goods_cat
select
    null,
    substring(c.createtime,1,10) as dt,
    a.t1Name,
    sum(c.payPrice) as total_money,
    count(distinct orderId) as total_num
from
    tmp_goods_cats a left join ods_itcast_goods b on a.t3Id = b.goodsCatId
    left join ods_itcast_order_goods c on b.goodsId = c.goodsId
where
    substring(c.createtime,1,10) = '2019-09-05'
group by
    substring(c.createtime,1,10),a.t1Name;

5、构建自动化Kettle作业实现自动化分析

创建一个作业

配置SQL脚本

mysql-kettle-superset电商可视化数据分析

定义作业的变量

6、可视化构建

订单销售总额

订单总笔数

订单总用户数

不同支付方式的总订单金额比例

不同支付方式的订单个数

不同商品分类的订单总金额

不同商品分类的订单总个数

词云图

7、构建看板

以上就是本文的全部内容，希望本文的内容对大家的学习或者工作能带来一定的帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Writing Apache Modules with Perl and C

Lincoln Stein、Doug MacEachern / O'Reilly Media, Inc. / 1999-03 / USD 39.95

Apache is the most popular Web server on the Internet because it is free, reliable, and extensible. The availability of the source code and the modular design of Apache makes it possible to extend Web......一起来看看《Writing Apache Modules with Perl and C》这本书的介绍吧!

码农工具

JS 压缩/解压工具

在线压缩/解压 JS 代码

MD5 加密

MD5 加密工具

HEX CMYK 转换工具

HEX CMYK 互转工具

mysql-kettle-superset电商可视化数据分析

1、项目概述

需求

架构

2、准备工作

系统

软件

可视化工具

3、数据环境

1、导入业务数据

2、构建数据仓库

3、 自动化构建抽取实现

4、数据分析

需求1

需求2

需求3

需求4

方式一：上面考虑的是简单的情况，只获取订单个数最多的前5个人

需求5

5、构建自动化Kettle作业实现自动化分析

创建一个作业

配置SQL脚本

定义作业的变量

6、可视化构建

订单销售总额

订单总笔数

订单总用户数

不同支付方式的总订单金额比例

不同支付方式的订单个数

不同商品分类的订单总金额

不同商品分类的订单总个数

词云图

7、 构建看板

Writing Apache Modules with Perl and C

JS 压缩/解压工具

MD5 加密

HEX CMYK 转换工具

3、自动化构建抽取实现

7、构建看板