Scaling Software Development

栏目: IT技术 · 发布时间: 4年前

内容简介:software engineering is programming integrated over timeThis quote, fromAs a codebase ages, it tends to increase the number of developers working on it, number of lines of code, and number of design requirements (of which traffic/data scale are an example)

software engineering is programming integrated over time

This quote, from Titus Winters , expresses an important notion: that software engineering as a discipline must be considered not just with programming at a point in time, but with programming over an extended period. One of the things that tends to happen to a codebase as time is added, is that it tends to scale. We often think of scale in terms of amount of traffic to be served, or volumes of data to be processed. And these parameters do often increase as a codebase ages. But more importantly, the codebase itself scales.

As a codebase ages, it tends to increase the number of developers working on it, number of lines of code, and number of design requirements (of which traffic/data scale are an example). As Titus suggests, and as I’ve seen in my experience, all of these factors change what's required in order to develop software that meets its goals.

I would like to contribute one of my own observations of what the the integration of programming over time means:

As the scale of a codebase increases, any properties of it which are not programmatically enforced will tend to regress.

A property of a program is any boolean statement or metric about the code or the resulting program. "Runs on a particular input without crashing" is a property, so is "adheres to our style-guide", as is "percent of lines of code covered by automated tests".

Programmatically enforced means that there is a software-driven process, which on some cadence (e.g. on every pull request, on every merge, or on a fixed schedule) ensures that a given property of the codebase is upheld.

I believe this observation is important because it connects our understanding of desirable properties about our code to scale. I believe these are often discussed independently of one another, or that a particular scale is assumed without being stated, and that leads to people talking past each other.

For example, the appeals of an automatic code formatter such as go fmt or black is not immediately obvious for a small development team. They may all have similar notions of code-style and their normal approach to development is capable of maintaining a consistently formatted codebase. However, as you grow to hundreds of developers and millions of lines of code, without automated tooling it becomes impossible to maintain consistent style.

Another desirable property is correctness. For a small program, manual testing as you make changes is often practical. However, as a project grows, remembering to perform all the manual tests becomes impractical, and automated test suites (whether unit or integration) generally become necessary to protect against regressions and ensure code works correctly in the first place.

These two examples highlight a common mistake people make: believing that because they can maintain a property of their code with attention to detail on a small scale, that it is necessarily true that attention to detail is sufficient on a larger scale.

As the field of software engineering has developed, our notions of programmatic checks have advanced. At first, developers were free to format their code however they pleased. And then we introduced descriptive (but unenforced) style guides such as PEP8, to guide individuals. And then we began programmatically enforcing adherence with tools such as pep8 . And then we introduced auto-formatters such as go fmt -- and more recent auto-formatters (e.g. black or rustfmt ) are even more aggressive.

Similarly the growth of static type checkers for dynamically typed languages (e.g. sorbet and mypy ) demonstrate the desire to programmatically enforce type safety in order to facilitate scale.

All of this brings me to a property that is near and dear to my heart -- security, specifically memory safety. I've written many times about the problems I see with memory unsafe languages (principally C and C++). And I often get push back of the form "I write C and it's fine, why aren't the Windows/Chrome/Android/iOS/Firefox developers more disciplined?" I believe scale is the key to understanding this question.

Even a significant project such as OpenSSL is dwarfed by the scale of a web browser or operating system. They have more developers, more lines of code, and more competing design requirements (e.g. performance, new features, customizability, security). The differences in scale along these axes can be significant, browsers usually have more than 100 commits per day and tens of millions of lines of code, while OpenSSL has single digit commits per day and hundreds of thousands of lines of code. At the scale of a browser or operating system, discipline is empirically irrelevant, programmatic enforcement is the only thing capable of withstanding the deluge of new code and churn in existing code. These projects use automated code formatters, tests, performance measurements, and other tools to cope with the complexity this scale brings.

In the same way that code formatters and coverage measurement are introduced to projects to ensure those properties do not regress at scale, things need to be introduced to ensure security does not regress in C and C++ codebases. Examples of such tools are the sanitizers and fuzzers . And indeed the very large projects I mentioned above all make extensive use of these. And yet they still struggle with getting a handle on their memory unsafety vulnerabilities. That is because none of these enforce the property that they care about: that there are no security vulnerabilities due to memory unsafety. They instead enforce weaker properties such as that no test case exhibits memory unsafety that ASAN is capable of catching. The much stronger property of absolute memory safety cannot be enforced over arbitrary C or C++. Hence safer languages such as Rust and Swift, where it can be enforced by disallowing unsafe .

My conclusion from this analysis is that I need to amend my critique: It is possible to write secure C/C++ code. It's just not possible to do it at scale. However, as Titus's quote reminds us, scale comes from time. All large codebases start out as small codebases. Therefore the only prudent thing to do is avoid memory unsafe languages entirely. And so long as you do have to maintain a C or C++ codebase, enforce as many safety properties programmatically as you can.

Hi, I'm Alex. I'm currently at a startup called Alloy. Before that I was a engineer working on Firefox security and before that at the U.S. Digital Service. I'm an avid open source contributor and live in Washington, DC.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

数据挖掘

数据挖掘

(美)Jiawei Han、(加)Micheline Kamber、(加)Jian Pei / 范明、孟小峰 / 机械工业出版社 / 2012-8 / 79.00元

数据挖掘领域最具里程碑意义的经典著作 完整全面阐述该领域的重要知识和技术创新 这是一本数据挖掘和知识发现的优秀教材,结构合理、条理清晰。本书既保留了相当篇幅讲述数据挖掘的基本概念和方法,又增加了若干章节介绍数据挖掘领域最新的技术和发展,因此既适合初学者学习又适合专业人员和实践者参考。本书视角广阔、资料翔实、内容全面,能够为有意深入研究相关技术的读者提供足够的参考和支持。总之, 强烈推荐......一起来看看 《数据挖掘》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

MD5 加密
MD5 加密

MD5 加密工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换