python3+webpy+gunicorn原理与开发

栏目: Python · 发布时间: 7年前

内容简介：本文基于亲身实践探索，教给大家如何正确理解与使用webpy。完整demo：python web开发分为2个组成部分。

本文基于亲身实践探索，教给大家如何正确理解与使用webpy。

代码

完整demo： https://github.com/owenliang/webpy-demo

原理

python web开发分为2个组成部分。

web容器

首先要有web容器，和 java 非常类似。web容器实现了http协议的socket服务端，可能是单线程阻塞模型、I/O多路复用模型，多进程模型，多线程模型，总之这些都是容器的责任，容器应该提供合适的并发机制。

当然，web容器不懂如何处理业务，所以需要有应用代码。应用代码接收http request，返回http response给web容器，由web容器将其序列化并发送回客户端。

wsgi规范

这里最重要的就是web容器和应用之间的桥梁是什么？就是wsgi规范，其实就是要求应用提供一个回调函数来处理请求，这个回调函数的参数和返回值是明确规范的：

def wsgi(env, start_resp):

web容器将请求的信息放在env里面，start_resp是一个应答函数，应用产生应答后调用start_resp将应答的状态码和头部信息等返回给web容器，由web容器进一步处理。

容器不需要我们自己去写，有开源的实现，我们今天用的就是最实用的一款叫做： gunicorn 。

应用框架

我们开发应用只需要写一个py文件，实现遵循wsgi格式的回调函数，然后将py文件的路径告诉web容器。web容器也是一个 python 程序，它会通过__import__函数动态的加载我们的py模块，找到模块里的回调函数，然后就可以开始接纳客户端请求了。

但是wsgi标准太简陋了，获取request里的表单字段都很麻烦，所以我们要用web框架，它们替我们实现了wsgi回调，并将请求经过进一步处理加工后，提供非常好用的API给我们，这样我们才有更高的开发效率，那么今天要学的web框架就是 web-py ，它是python最经典最简单的web框架，开发网站并不需要那么复杂东西，无非MVC而已，对吗？

难点剖析

其实搞懂了web容器和app之间的关系，我们就已经成功了一半。

接下来就是去理解web.py的实现原理，我们知道web.py的一切都是从wsgi回调函数开始的。

通过学习web.py的官方入门手册，你可以掌握web.py的基本用法，但这完全不足够建立信心来正确使用它，所以我大概的扫了一下web.py的源码。

通过扫web.py源码，我解决了几个重要的疑惑，下面给大家分享一下。

web容器原理

web.py是支持多线程容器的，也就是在一个web容器进程内，会有多个线程同时处理多个请求，这就要求框架具有线程安全性。

这种情况下框架如何设计框架，以及数据库模块在多线程情况下是否有连接池等特性，是我比较好奇的问题。

在设计线程安全的web框架方面，有2种选择：

将请求和应答的上下文数据，作为object封装，然后在框架的各个组件和层级之间传递。
将请求和应答的上下文数据，作为thread local线程局部存储，框架各组件可以直接获取。

前者对于轻量级的框架真的算很麻烦的事情，因为多个线程的request请求不同，所以每个线程会先生成自己的request object，然后在后续处理流程中传来传去。

thread local线程局部存储

web.py采用了后者，以请求request的处理为例：

web容器在某个线程中回调了webpy的wsgi函数，那么webpy在wsgi里会先生成thread local的request变量，然后进入请求路由将请求交给我们的业务处理逻辑，我们可以直接取thread local里的request变量，因为它就是当前线程正在处理的request，这就避免了框架将request一路向后透传的问题。

当然，这不是使用thread local的唯一理由，另外一个重要目的是实现资源的统一回收，下面我们先详细看一下web.py是怎么玩转thread local的。

首先，它继承threading.local实现了自己的一个thread local字典：

from threading import local as threadlocal    
 
class ThreadedDict(threadlocal):
    """
    Thread local storage.
    
        >>> d = ThreadedDict()
        >>> d.x = 1
        >>> d.x
        1
        >>> import threading
        >>> def f(): d.x = 2
        ...
        >>> t = threading.Thread(target=f)
        >>> t.start()
        >>> t.join()
        >>> d.x
        1
    """
    _instances = set()
    
    def __init__(self):
        ThreadedDict._instances.add(self)
        
    def __del__(self):
        ThreadedDict._instances.remove(self)
        
    def __hash__(self):
        return id(self)

没用过threading.local没关系，它就是一个字典结构，里面存放的k-v是线程局部可见的，每个线程看到的字典是不一样的。

web.py对其重要的改造就是维护了一个类静态变量叫做_instances，每次创建新的ThreadedDict实例就会被加入进去，相当于记录了程序创建过的所有thread local字典对象。

框架thread local dict

上述的thread local字典被用在2个关键位置。

第一个位置就是框架核心，当我们初始化框架时会生成application对象：

app = web.application(urls, globals())

那么application.py中头部会引入另外一个Module：

from . import webapi as web

这个webapi.py文件定义了第一个threaded dict：

ctx = context = threadeddict()

这就是框架用来为不同线程分别保存各自请求上下文数据的字典了。

当一个请求调用到web.py的wsgi函数时，就会将请求的env解析到ctx中，供业务和后续流程获取：

    def load(self, env):
        """Initializes ctx using env."""
        ctx = web.ctx
        ctx.clear()
        ctx.status = '200 OK'
        ctx.headers = []
        ctx.output = ''
        ctx.environ = ctx.env = env
        ctx.host = env.get('HTTP_HOST')
 
        if env.get('wsgi.url_scheme') in ['http', 'https']:
            ctx.protocol = env['wsgi.url_scheme']
        elif env.get('HTTPS', '').lower() in ['on', 'true', '1']:
            ctx.protocol = 'https'
        else:

框架用它目的很简单，隔离不同线程，因为不同线程处理不同的请求。

数据库thread local

另外一个threaded dict的用途就是DB。

web.py提供数据库管理，只需要定义一个数据库对象就可以了，它是线程安全的：

# -*- coding: utf-8 -*-
 
import web
 
# 数据库
db_webpy = web.database(dbn = 'mysql', user = 'root', pw='baidu@123', db='webpy', pooling = False,)   # 单连接, 每次请求结束会被立即释放

db_webpy可以在任意请求中访问，用来操作数据库。

db_webpy是DB类的一个对象，在其构造时就会生成一个threaded dict放在对象内部：

class DB: 
    """Database"""
    def __init__(self, db_module, keywords):
        """Creates a database.
        """
        # some DB implementaions take optional paramater `driver` to use a specific driver modue
        # but it should not be passed to connect
        keywords.pop('driver', None)
 
        self.db_module = db_module
        self.keywords = keywords
 
        self._ctx = threadeddict()

看到了吗？当我们在某个线程访问数据库的时候，DB类会到self._ctx中查看是否有现成的数据库连接：

    def _getctx(self): 
        if not self._ctx.get('db'):
            self._load_context(self._ctx)
        return self._ctx
    ctx = property(_getctx)

这是DB类的一个方法，property没用过没关系，其最终效果是：访问self.ctx相当于调用了self._getctx。

在_getctx中，它检查了threaded dict中是否有db连接，如果没有就创建连接：

    def _load_context(self, ctx):
        ctx.dbq_count = 0
        ctx.transactions = [] # stack of transactions
        
        if self.has_pooling:
            ctx.db = self._connect_with_pooling(self.keywords)
        else:
            ctx.db = self._connect(self.keywords)
        ctx.db_execute = self._db_execute
        
        if not hasattr(ctx.db, 'commit'):
            ctx.db.commit = lambda: None
 
        if not hasattr(ctx.db, 'rollback'):
            ctx.db.rollback = lambda: None
            
        def commit(unload=True):
            # do db commit and release the connection if pooling is enabled.            
            ctx.db.commit()
            if unload and self.has_pooling:
                self._unload_context(self._ctx)
                
        def rollback():
            # do db rollback and release the connection if pooling is enabled.
            ctx.db.rollback()
            if self.has_pooling:
                self._unload_context(self._ctx)
                
        ctx.commit = commit
        ctx.rollback = rollback

对DB对象发起query查询时，它的逻辑就是访问self.ctx.db.cursor()创建数据库游标，当执行到self.ctx时就会执行self._getctx来初始化链接，一句话干了2件事情，实现比较巧妙：

    def _db_cursor(self):
        return self.ctx.db.cursor()

这些都不是重点，重点是DB类自始至终并没有释放过线程局部存储self._ctx中的db连接，这是否意味着web容器的每个线程会与数据库之间始终保持一个长连接呢？

答案并不是。

这就与ThreadedDict类在__init__的时候把自己放到_instances数组中的实现有关了，我们的db_webpy对象在构造self._ctx的时候会向ThreadedDict._instances中添加另外一个thread local字典。

当前线程中的请求处理完成之后，框架可以遍历ThreadedDict._instances数组中的2个thread local字典（其中一个是application构造时产生的，另外一个是DB对象构造时产生的），把它们里面存的线程局部变量全部清空掉，这些变量关联的对象就会被垃圾回收，包括数据库连接在内的资源也会被释放，连接得以关闭。

在wsgi函数最后调用了cleanup()清理当前线程处理请求期间创建的资源：

    def _cleanup(self):
        # Threads can be recycled by WSGI servers.
        # Clearing up all thread-local state to avoid interefereing with subsequent requests.
        utils.ThreadedDict.clear_all()

也就是遍历这2个ThreadedDict实例，进行一键释放引用：

    def clear_all():
        """Clears all ThreadedDict instances.
        """
        for t in list(ThreadedDict._instances):
            t.clear()
    clear_all = staticmethod(clear_all)

后续

明白了原理，大家现在可以再读一下我的demo，相信就有另外一番感受了，有问题欢迎留言。

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

An Introduction to the Analysis of Algorithms

Robert Sedgewick、Philippe Flajolet / Addison-Wesley Professional / 1995-12-10 / CAD 67.99

This book is a thorough overview of the primary techniques and models used in the mathematical analysis of algorithms. The first half of the book draws upon classical mathematical material from discre......一起来看看《An Introduction to the Analysis of Algorithms》这本书的介绍吧!

码农工具