Generating globally unique mostly-sequential keys for DB rows across multiple processes

栏目: IT技术 · 发布时间: 4年前

内容简介：I wanted my domain classes to be able to reference each other by ID without having to add association properties - as I felt having all of this navigation in the model made it messy. I'd much rather have association properties within my aggregates (Purchas

I wanted my domain classes to be able to reference each other by ID without having to add association properties - as I felt having all of this navigation in the model made it messy. I'd much rather have association properties within my aggregates (PurchaseOrder to PurchaseOrderLines) but not elsewhere.

This was okay when I needed to add a reference to an object that already existed in the DB, but when I was adding a reference to a new object I needed to know the objects Id before Entity Framework Core saved it to the DB.

As my tables were using DB assigned incrementing INT columns this was not possible, so it become obvious I was going to have to switch to GUID Ids. The problem with using GUIDs as clustered primary keys is that GUIDs are designed to be random. Whereas an auto-incrementing INT key will always result in new rows being appended to the end of a table, GUID keys would result in new rows being inserted at random locations in the table (which is slower).

As we have a CreatedUtc column on each of our tables we could have made those our clustered index to ensure new rows are appended at least near the end of the table (if not at the very end), however, as these values will be used in foreign key indexes it means we have the same problem for inserting GUIDs into an index instead of always appending.

What I needed was some kind of sequential GUID. Microsoft has provided the SequentialGuidValueGenerator class for this kind of thing.

The Microsoft class creates a GUID, keeps the 8 most random bytes, and then assigns a sequential number to the remaining 8 bytes (these are the bytes that SQL Server uses for sorting GUIDs). Using this approach we are able to ensure we always append to the end of the table, and the 8 random bytes ensure we don't get key clashes.

The SequentialGuidValueGenerator class starts its first sequential number on UtcNow , and then simply increments the value each time we need a new key. This is fine in a single process scenario, but when we are running our app on Azure with multiple processes running it means that the sequential value of some of the processes can be significantly lagged behind the others.

For example, if we go through a period of time where we are receiving a single job to perform at a time then we might have a single process used more often than the others. The other Azure instances either sit idle or get shut down. Then, when things get busy and we need multiple processed then either

The running (but idle) processes find themselves in demand, and continue to generate sequential numbers based on the last time they were busy.
Azure starts up new processes to handle the extra load, their sequential numbers are based on the current time whereas the already running processes' sequential numbers are based on some point in the past.

Either way, if all of our processes aren't started at approximately the same time or they don't get allocated jobs on a purely round-robin basis (each process gets exactly the same number of jobs) then we end up having multiple processes inserting into different parts of the table because their sequential numbers are based on different historical times.

So, as I am developing an Azure system, I needed an alternative.

Requirements

I want GUID length unique identifiers at runtime *before* I update the DB so I can add references to other rows that have not yet been inserted.
I need the value to be unique not only within the current process, but across processes.
The last 8 bytes need to be sequential so that new rows on SQL Server tables with clustered primary indexes are always added to (or near to) then end of the table*
The last 8 bytes must be fairly close in range across processes so we don't end up with one processes appending new rows to the end of the table and another inserting them in the middle.

*SQL Server sorts GUIDs on the last 6 or 8 bytes.

My proposed format is

(8 crypto secure random bytes to identify the current process)

then

(8 bytes sequential time that must always be unique within the current process).

Resulting in a unique Id that is not only sequential, but where across processes will be based on approximately the same time and therefore mostly be appened to the end of tables, and also to the end of indexes.

// Gets the current DateTime.UtcNow.Ticks and adds
// an offset so we never get the same value twice
// If the resulting ticks are before UtcNow.Ticks
// then we jump the values forward so we don't return a
// value that would cause inserts to the middle of the table.
// In a single process this would ensure
// the value is always added to the end of the table.
public static class LongGeneratorPM
{
  private static long Offset = Consts.GetTime().Ticks;

  public static long Next()
  {
    long result;
    long ticksNow = Consts.GetTime().Ticks;
    do
    {
      result = Interlocked.Increment(ref Offset);
      if (result >= ticksNow)
        return result;
      Interlocked.CompareExchange(ref Offset, ticksNow, result);
    } while (true);
  }
}


// Creates a one-off 8 byte crypto secure random
// number as the process ID when the app starts
// and then appends the time based sequence number
// as the next 8 bytes to give us a sequential
// value this is statistically globally unique
public static class SequentialGuidGeneratorPMRng
{
  private static byte[] ProcessId = new byte[8];

  static SequentialGuidGeneratorPMRng()
  {
    using (var rng = new RNGCryptoServiceProvider())
    {
      rng.GetBytes(ProcessId);
    }
  }

  public static Guid Next()
  {
    byte[] result = new byte[16];
    byte[] sequentialTimeStamp =
   BitConverter.GetBytes(LongGeneratorPM.Next());
    if (!BitConverter.IsLittleEndian)
      Array.Reverse(sequentialTimeStamp);

    result[0] = ProcessId[0];
    result[1] = ProcessId[1];
    result[2] = ProcessId[2];
    result[3] = ProcessId[3];
    result[4] = ProcessId[4];
    result[5] = ProcessId[5];
    result[6] = ProcessId[6];
    result[7] = ProcessId[7];
    result[8] = sequentialTimeStamp[1];
    result[9] = sequentialTimeStamp[0];
    result[10] = sequentialTimeStamp[7];
    result[11] = sequentialTimeStamp[6];
    result[12] = sequentialTimeStamp[5];
    result[13] = sequentialTimeStamp[4];
    result[14] = sequentialTimeStamp[3];
    result[15] = sequentialTimeStamp[2];
    return new Guid(result);
  }
}

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Generating globally unique mostly-sequential keys for DB rows across multiple processes

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

深入浅出HTML5编程

弗里曼 (Eric Friiman)、罗布森 (Elisabdth Robson) / 东南大学出版社 / 2012-4 / 98.00元

《深入浅出HTML5编程(影印版)(英文)》就是你的特快车票，它可以带你学习如何使用今天的标准同时也会是明日的最佳实践来搭建Web应用。同时，你会了解HTML5的新API的基本知识，甚至你还会弄明白这些API是如何与你的网页进行交互，JaVaScript如何为它们提供动力，以及你如何使用它们来搭建能够打动你的老板并且吸引你的朋友的Web应用。一起来看看《深入浅出HTML5编程》这本书的介绍吧!

码农工具

RGB HSV 转换

RGB HSV 互转工具

RGB CMYK 转换工具

RGB CMYK 互转工具