一道面试题：去重排序

栏目: IT技术 · 发布时间: 5年前

内容简介：给定列表，列表内的数据全是str类型，对列表内的数据进行统计元素出现的次数，并排序。例：返回：

题目

给定列表，列表内的数据全是str类型，对列表内的数据进行统计元素出现的次数，并排序。

例： ["a", "b", "g", "g", "b", "g", "l", "k", "k"]

返回： {'g': 3, 'b': 2, 'k': 2, 'a': 1, 'l': 1}

解1

不使用任何内置函数：

def count_list_1(data:list):
    res = {}
    for item in data:
        if item not in res.keys():
            res[item] = 1
        else:
            res[item] += 1
    return dict(sorted(res.items(), key=lambda i:i[1], reverse=True))

测试：

print(count_list_1(["a", "b", "g", "g", "b", "g", "l", "k", "k"]))

# 返回
{'g': 3, 'b': 2, 'k': 2, 'a': 1, 'l': 1}

解2

使用列表的 count 函数。

def count_list_2(data:list):
    res = {}
    for item in data:
        res[item] = data.count(item)
    return dict(sorted(res.items(), key=lambda i:i[1], reverse=True))

测试：

print(count_list_2(["a", "b", "g", "g", "b", "g", "l", "k", "k"]))

# 返回
{'g': 3, 'b': 2, 'k': 2, 'a': 1, 'l': 1}

解3

使用 collections.Counter

import collections

def count_list_3(data:list):
    return dict(sorted(collections.Counter(data).items(), key=lambda i: i[1], reverse=True))

测试：

print(count_list_3(["a", "b", "g", "g", "b", "g", "l", "k", "k"]))

# 返回
{'g': 3, 'b': 2, 'k': 2, 'a': 1, 'l': 1}

消耗时间对比

对代码进行改造，随机生成一个字符串列表。然后使用 cProfile 进行测试。

# 统计列表中的重复的元素的个数, 并对结果进行排序
# ["a", "b", "g", "g", "b", "g"] => {"a": 1, "b":2, "g": 3}
import collections
import random
import string


def count_list_1(data: list):
    res = {}
    for item in data:
        if item not in res.keys():
            res[item] = 1
        else:
            res[item] += 1
    return dict(sorted(res.items(), key=lambda i: i[1], reverse=True))


def count_list_2(data: list):
    res = {}
    for item in data:
        if item not in res.keys():
            res[item] = data.count(item)
    return dict(sorted(res.items(), key=lambda i: i[1], reverse=True))


def count_list_3(data: list):
    return dict(sorted(collections.Counter(data).items(), key=lambda i: i[1], reverse=True))


if __name__ == "__main__":
    data = [random.choice(string.ascii_letters) for i in range(10000)]
    count_list_1(data)
    count_list_2(data)
    count_list_3(data)

测试命令: python -m cProfile -s tottime test.py |grep count_list

列表长度为10000，测试结果：

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1    0.002    0.002    0.003    0.003 test.py:9(count_list_1)
  1    0.001    0.001    0.010    0.010 test.py:19(count_list_2)
  1    0.000    0.000    0.000    0.000 test.py:27(count_list_3)

列表长度为50000，测试结果：

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.010    0.010    0.013    0.013 test.py:9(count_list_1)
    1    0.006    0.006    0.044    0.044 test.py:19(count_list_2)
    1    0.000    0.000    0.002    0.002 test.py:27(count_list_3)

列表长度为100000，测试结果：

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.020    0.020    0.026    0.026 test.py:9(count_list_1)
    1    0.012    0.012    0.090    0.090 test.py:19(count_list_2)
    1    0.000    0.000    0.004    0.004 test.py:27(count_list_3)

由此可见， collections.Counter 性能是最好的, 列表中的 count 函数次之。

有兴趣的童鞋可以研究下 Counter 的源码是怎么写的。

Enjoy your code, good luck.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

菜鸟侦探挑战数据分析

[日] 石田基广 / 支鹏浩 / 人民邮电出版社 / 2017-1 / 42

本书以小说的形式展开，讲述了主人公俵太从大学文科专业毕业后进入征信所，从零开始学习数据分析的故事。书中以主人公就职的征信所所在的商业街为舞台，选取贴近生活的案例，将平均值、t检验、卡方检验、相关、回归分析、文本挖掘以及时间序列分析等数据分析的基础知识融入到了生动有趣的侦探故事中，讲解由浅入深、寓教于乐，没有深奥的理论和晦涩的术语，同时提供了大量实际数据，使用免费自由软件RStudio引领读者进一步......一起来看看《菜鸟侦探挑战数据分析》这本书的介绍吧!

码农工具

一道面试题：去重排序

题目

解1

解2

解3

消耗时间对比

菜鸟侦探挑战数据分析

RGB转16进制工具

Markdown 在线编辑器

RGB CMYK 转换工具