Generating globally unique mostly-sequential keys for DB rows across multiple processes

栏目: IT技术 · 发布时间: 3年前

内容简介:I wanted my domain classes to be able to reference each other by ID without having to add association properties - as I felt having all of this navigation in the model made it messy. I'd much rather have association properties within my aggregates (Purchas

I wanted my domain classes to be able to reference each other by ID without having to add association properties - as I felt having all of this navigation in the model made it messy. I'd much rather have association properties within my aggregates (PurchaseOrder to PurchaseOrderLines) but not elsewhere.

This was okay when I needed to add a reference to an object that already existed in the DB, but when I was adding a reference to a new object I needed to know the objects Id before Entity Framework Core saved it to the DB.

As my tables were using DB assigned incrementing INT columns this was not possible, so it become obvious I was going to have to switch to GUID Ids. The problem with using GUIDs as clustered primary keys is that GUIDs are designed to be random. Whereas an auto-incrementing INT key will always result in new rows being appended to the end of a table, GUID keys would result in new rows being inserted at random locations in the table (which is slower).

As we have a CreatedUtc column on each of our tables we could have made those our clustered index to ensure new rows are appended at least near the end of the table (if not at the very end), however, as these values will be used in foreign key indexes it means we have the same problem for inserting GUIDs into an index instead of always appending.

What I needed was some kind of sequential GUID. Microsoft has provided the SequentialGuidValueGenerator class for this kind of thing.

The Microsoft class creates a GUID, keeps the 8 most random bytes, and then assigns a sequential number to the remaining 8 bytes (these are the bytes that SQL Server uses for sorting GUIDs). Using this approach we are able to ensure we always append to the end of the table, and the 8 random bytes ensure we don't get key clashes.

The SequentialGuidValueGenerator class starts its first sequential number on UtcNow , and then simply increments the value each time we need a new key. This is fine in a single process scenario, but when we are running our app on Azure with multiple processes running it means that the sequential value of some of the processes can be significantly lagged behind the others.

For example, if we go through a period of time where we are receiving a single job to perform at a time then we might have a single process used more often than the others. The other Azure instances either sit idle or get shut down. Then, when things get busy and we need multiple processed then either

  1. The running (but idle) processes find themselves in demand, and continue to generate sequential numbers based on the last time they were busy.
  2. Azure starts up new processes to handle the extra load, their sequential numbers are based on the current time whereas the already running processes' sequential numbers are based on some point in the past.

Either way, if all of our processes aren't started at approximately the same time or they don't get allocated jobs on a purely round-robin basis (each process gets exactly the same number of jobs) then we end up having multiple processes inserting into different parts of the table because their sequential numbers are based on different historical times.

So, as I am developing an Azure system, I needed an alternative.

Requirements

  1. I want GUID length unique identifiers at runtime *before* I update the DB so I can add references to other rows that have not yet been inserted.
  2. I need the value to be unique not only within the current process, but across processes.
  3. The last 8 bytes need to be sequential so that new rows on SQL Server tables with clustered primary indexes are always added to (or near to) then end of the table*
  4. The last 8 bytes must be fairly close in range across processes so we don't end up with one processes appending new rows to the end of the table and another inserting them in the middle.

*SQL Server sorts GUIDs on the last 6 or 8 bytes.

My proposed format is

(8 crypto secure random bytes to identify the current process)

then

(8 bytes sequential time that must always be unique within the current process).

Resulting in a unique Id that is not only sequential, but where across processes will be based on approximately the same time and therefore mostly be appened to the end of tables, and also to the end of  indexes.

// Gets the current DateTime.UtcNow.Ticks and adds
// an offset so we never get the same value twice
// If the resulting ticks are before UtcNow.Ticks
// then we jump the values forward so we don't return a
// value that would cause inserts to the middle of the table.
// In a single process this would ensure
// the value is always added to the end of the table.
public static class LongGeneratorPM
{
  private static long Offset = Consts.GetTime().Ticks;

  public static long Next()
  {
    long result;
    long ticksNow = Consts.GetTime().Ticks;
    do
    {
      result = Interlocked.Increment(ref Offset);
      if (result >= ticksNow)
        return result;
      Interlocked.CompareExchange(ref Offset, ticksNow, result);
    } while (true);
  }
}


// Creates a one-off 8 byte crypto secure random
// number as the process ID when the app starts
// and then appends the time based sequence number
// as the next 8 bytes to give us a sequential
// value this is statistically globally unique
public static class SequentialGuidGeneratorPMRng
{
  private static byte[] ProcessId = new byte[8];

  static SequentialGuidGeneratorPMRng()
  {
    using (var rng = new RNGCryptoServiceProvider())
    {
      rng.GetBytes(ProcessId);
    }
  }

  public static Guid Next()
  {
    byte[] result = new byte[16];
    byte[] sequentialTimeStamp =
   BitConverter.GetBytes(LongGeneratorPM.Next());
    if (!BitConverter.IsLittleEndian)
      Array.Reverse(sequentialTimeStamp);

    result[0] = ProcessId[0];
    result[1] = ProcessId[1];
    result[2] = ProcessId[2];
    result[3] = ProcessId[3];
    result[4] = ProcessId[4];
    result[5] = ProcessId[5];
    result[6] = ProcessId[6];
    result[7] = ProcessId[7];
    result[8] = sequentialTimeStamp[1];
    result[9] = sequentialTimeStamp[0];
    result[10] = sequentialTimeStamp[7];
    result[11] = sequentialTimeStamp[6];
    result[12] = sequentialTimeStamp[5];
    result[13] = sequentialTimeStamp[4];
    result[14] = sequentialTimeStamp[3];
    result[15] = sequentialTimeStamp[2];
    return new Guid(result);
  }
}

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

C程序设计

C程序设计

谭浩强 / 清华大学出版社 / 2005-7-1 / 26.00元

《C程序设计(第3版)》作者具有丰富的教学经验和编写教材的经验,并针对初学者的特点,精心策划、准确定位,使得《C程序设计(第3版)》概念清晰、例题丰富、深入浅出,受到专家和读者的一致好评。《C程序设计(第3版)》被普遍认为是学习C语言的好教材,并被全国大多数高校选用。十多年来《C程序设计(第3版)》累计发行了800多万册,刨同类书的全国最高记录,是学习C语言的主流用书。《C程序设计(第3版)》曾荣......一起来看看 《C程序设计》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

URL 编码/解码
URL 编码/解码

URL 编码/解码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器