Memory in Javascript— beyond leaks

栏目: IT技术 · 发布时间: 5年前

内容简介：The way you create and access your data can affect your app’s performance. Let’s see how.JavaScript is a very high-level language. As such, most developers don’t tend to think a lot about how data is stored in memory. In this article, we will go over how d

The way you create and access your data can affect your app’s performance. Let’s see how.

Intro

JavaScript is a very high-level language. As such, most developers don’t tend to think a lot about how data is stored in memory. In this article, we will go over how data is stored in the memory, how it affects the CPU and memory and how the way you distribute and access your data in JS affects performance.

The Romantic Triangle

When the computer needs to do some computation, the processing unit (the CPU) needs data to process. So, according to the task at hand, it sends a request to the memory to fetch the data via the bus.

It looks like this:

So this is our romantic triangle -

CPU -> Bus -> Memory

Keeping a Healthy Relationship in a Trio is Hard…

The CPU is much faster than the memory. So this process of CPU -> BUS -> Memory -> BUS -> CPU “wastes” computation time. While memory is being looked up, the CPU sits idle and cannot do anything else.

In order to prevent idle time, a cache was added to the system. We won’t go into much detail about cache or cache types, but suffice to say that the cache is the CPU’s internal memory.

When the CPU gets orders to run, it first searches the cache for the data, and if the data isn’t there, it sends a request via the bus.

The bus, in turn, brings the requested data plus a contingent portion of memory and stores it in the cache for quick reference.

This way, the CPU will have fewer complaints about how slow the memory is and hence less CPU idle time.

Quarrels are a Part of Every Relationship

A problem that might arise — especially when we are dealing with processing massive amounts of data — is a phenomenon called “ cache miss” .

Cache missmeans that during a computation, the CPU finds out it doesn’t have the necessary data in the cache, and needs to request this data via the regular channels — you know, that sloooow memory.

An illustration of a cache miss. Data from an array is being processed, but data outside of cache limit is also requested for the computation — and this creates a “cache miss”

Ok — but I’m a JavaScript developer, why should I care?

Good question. In most cases, you wouldn’t. Today though, as more and more data is pouring into Node.js servers and even unfortunate rich clients , you have more chance of getting into a cache miss issue while iterating over large data sets.

I’ll believe it when I see it!!!

Fair enough. Let’s view an example.

Here’s a class called Boom .

The Boom class

This class ( Boom ) has just 3 properties — id , x, and y.

Now, let’s create a method that populates x and y.

Let’s build the setup:

Building the setup — 1000 times 1000 elements in the array

Now we will use this setup in a method:

Local access just does a 1000 times 1000 sets of the ‘x’ property to 0

What local access does is go over the array linearly and sets x to be 0.

If we repeat this function 100 times (look at the repeats constant in the setup), we can measure how long it takes to run:

The repeat function

The log output is this:

The Price to Pay for Missing Cache

Now, according to what we learned above, we can cause cache misses if we process data that is far away during the iteration. Far away data is data that is not in adjacent index. Here’s the function for this:

Iterate over elements that are far away from each other

What happens here is, that on every iteration we address the index that is ROWS away from the last iteration. So if ROWS is 1000 (as in our case), we get the following iteration: [0, 1000, 2000, … , 1, 1001, 2001, …].

Let’s add it to our speed test:

And here’s the end result:

The non-local iteration was almost 4 times slower. This difference will grow with bigger data. It happens because of the time that the CPU sits idle due to cache misses.

So what is the price you pay? It all depends on the size of your data.

Ok, I swear I will never do that!

You might not think so, but… there are cases in which you’d like to access an array with some logic that is not linear (e.g. 1,2,3,4,5) or not contingent (e.g. for (let i = 0; i < n; i+=1000) ).

For instance, you get data from a service or the DB, and need to process it sorted or filtered by some complex logic. This might cause the data to be accessed similarly to what was shown in the farAccess function.

Here’s an illustration that will (hopefully) make it clearer:

Looking at the image above, we see the data as it is stored in memory (top gray bar). Below, we see the array that was created when the data arrived from the server. Finally, we see the sorted array that holds references to objects stored in various positions in memory.

This way, iterating over the sorted array, can cause the multiple cache misses seen in the example above.

Note that this example is for a small array. Cache misses are relevant for much bigger data.

In a world where you need slick animations in the frontend and where you can be charged for every millisecond of CPU time in the backend (serverless or other) — this can become critical.

Oh no! All is lost!!!

No, not really.

There are various solutions to this problem, but now that you know the reason for this performance hit, you can probably think of a solution yourself. You just need to store data that is processed together closer in memory.

Such techniques are called the Data Locality design pattern .

Let’s continue our example. Assume that in our app the most common process is going over the data with the logic shown in the farAccess function. We’d like to optimize the data so that it will run fast in the most common for loop .

We will arrange the same data like this:

Sorting the data for a Data Locality optimization

So now, in diffArr , the objects that are in indices [0,1,2, …] in the original array, are now set like this [0,1000,2000, … , 1, 1001, 2001, …, 2, 1002, 2002, …]. The numbers denote the index of the object. This simulates sorting the array — which is one way to implement the Data Locality design pattern.

In order to easily test this, we will change our farAccess function a bit to get a custom array:

And now add the scenario to our test:

Run all the tests with a 2 seconds delay

We run this, and we get:

And voila! — we’ve optimized our data to fit the more common way we need to look at it.

The full example is live at https://yonatankra.com/performance/memory/liveExamples/objectPool2/index.html

Summary

In this article, we demonstrated the Data Locality design pattern. In essence, we showed that accessing a data structure in a way it is not optimized for, may cost us in performance. We then optimized the data to fit our way of processing it — and we saw how it improved performance.

The Data Locality design pattern used to be the domain of game developers, who had to deal with a lot of entities that are iterated over a lot of times. In the results above we see that data locality still matters even in high-level languages like JavaScript — and not just in gaming.

Today, with the amount of data being transferred between servers or even pushed into browsers, design patterns that used to be the day-to-day of game developers , should be considered by app developers — whether server-side or client-side.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

个体与交互

Ken Howard、Barry Rogers / 贾永娜、张凯峰 / 机械工业出版社华章公司 / 2012-3-20 / 45.00元

对敏捷软件开发的关注重点，通常都集中在“机制”方面，即过程和工具。“敏捷宣言”认为，个体与交互的价值要高于过程和工具，但这一点很容易被遗忘。在敏捷开发中，如果你重新将注意力放在人的方面，将会收获巨大利益。本书展示了如何解决敏捷团队在实际项目中遭遇的问题。同时，本书也是很有实用价值的敏捷用户指南，其中包含的故事、最佳实践方法、经验以及技巧均可应用到实际项目当中。通过逐步实践，你将学会如何让团......一起来看看《个体与交互》这本书的介绍吧!

码农工具