Bloomfilters debunked: Dispelling 30 Years of math with Coq

栏目: IT技术 · 发布时间: 4年前

内容简介：There's this rather nifty feature of modern web browsers (such asWhile

There's this rather nifty feature of modern web browsers (such as Firefox or Chrome ) where the browser will automatically warn the user if they happen to navigate to a " malicious " url:

While conceptually simple, this feature actually requires more engineering effort than one would expect - in particular, tracking the set of known malicious URLs in a practical manner turns out to be somewhat difficult.

This is because, on one hand, trying to store the database of all known malicious URLs, which, bear in mind, may contain millions and millions of entries, is something that is just practically infeasible for most users.

Bloomfilters debunked: Dispelling 30 Years of math with Coq

Conversely, sending every URL that a user visits to some external service, where it could be logged and data-mined by nefarious third parties (i.e Google), is something that most users should probably not be comfortable with.

As it turns out, browsers actually implement this functionality by means of a rather interesting compromise .

Using a probabilistic data structure known as a Bloomfilter , Browsers maintain a approximate representation of the set of known malicious URLs locally. By querying this space-efficient local set, browsers will only send up a small proportion of URLs that have a high likelihood of actually being malicious.

Use a Bloomfilter to act as a local proxy for queries to an external database .

This proxy technique, which safe-guards the privacy of millions of users ever day, depends on two key properties of Bloomfilters:

No false negatives: This states that if a query for an URL in the Bloomfilter returns negative, then the queried item can be guaranteed to not be present in the set of malicious URLs - i.e the Bloomfilter is guaranteed to return a positive result for all known malicious URLs .
Low false positive rate: This property states that for any URL that is not in the set of malicious URLs, the likelihood of a Bloomfilter query returning a positive result should be fairly low - thus minimising the number of unnecessary infractions on user privacy.

This mechanism works quite well, and the guarantees of a Bloomfilter seem to be perfectly suited for this particular task, however it has one small problem, and that is:

The widely cited expression for the false positive rate of a bloomfilter is wrong!

In fact, as it turns out, the behaviours of a Bloomfilter have actually been the subject of 30 years of mathematical contention, requiring multiple corrections and even corrections of these corrections .

Given this history of errors, can we really have any certainty in our understanding of a Bloomfilter at all?

Well, never fear , I am writing this post to inform you that we have just recently used Coq to produce the very first certified proof of the false positive rate of a Bloomfilter, finally putting an end to this saga of errors and returning certainty (pardon the pun) to a mechanism that countless people rely on every single day.

In the rest of this post, we'll explore the main highlights of this research, answering the following questions:

What is a Bloomfilter?
Why did these errors initially arise?
What is the true behaviour of a Bloomfilter? and how can we be certain?

This research was just recently presented at CAV2020 under the title "Certifying Certainty and Uncertainty in Approximate Membership Query Structures", you can find the paper here , and a video presentation of it here .

The code and proofs used in this research project are FOSS, and can be found on the Ceramist repo: https://github.com/certichain/ceramist

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Bloomfilters debunked: Dispelling 30 Years of math with Coq

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

高可用MySQL

)Charles Bell Mats Kindahl Lars Thalmann / 宁青、唐李洋诸云萍 / 电子工业出版社 / 2011-10 / 98.00元

《高可用mysql:构建健壮的数据中心》是“mysql high availability”的中文翻译版，主要讲解真实环境下如何使用mysql 的复制、集群和监控特性，揭示mysql 可靠性和高可用性的方方面面。本书由mysql 开发团队亲自执笔，定位于解决mysql 数据库的常见应用瓶颈，在保持mysql 的持续可用性的前提下，挖潜各种提高性能的解决方案。本书分为三个部分。第一部分讲述mysql......一起来看看《高可用MySQL》这本书的介绍吧!

码农工具

Bloomfilters debunked: Dispelling 30 Years of math with Coq

高可用MySQL

随机密码生成器

HTML 编码/解码

HEX HSV 转换工具