I write in .NET and am faced with the requirement of storing/retrieving a large number of string Key Value pairs from disk. To minimize the total amount of Ram consumed by the application, I would read the Key-Value pair database by specific Keys, process them, and then move on to other keys.
I currently utilize C# Dictionaries & Concurrent Dictionaries to hold data in memory that needs to be inserted/retrieved fast, However this approach is not scalable for larger scale applications. I have also tried databases like SQLite/SQL Server where a simple table would hold Key and value columns(with Key Column set as Unique Primary Key) but accessing single value for a Key is generally slow, mainly due to the round-trip cost of running each query.
I have come across the option to use ZoneTree or RocksDB to hold key-value pairs along with some other options.
My Question is following:
- How Can RocksDB or ZoneTree be used to hold string key value Pairs?
- How does the performance of insert/Fetch compare with in-memory collections like a concurrent Dictionary.
- Between RocksDB & ZoneTree which one would offer the most optimal memory utilization?
After Some research I have been able to find an answer to my question. Here is how I setup a benchmark that demonstrates how RocksDB or ZoneTree would compare for storing Key-Value Pair data, a comparison of their memory utilization and insert/Fetch performance for ~10Million items.
Here is my code, This is written in .NET 8 Console Application, Run on an Intel Core i7-11th Gen processor, 64GBs of RAM, SSD on a Dell Inspiron 14 5000 2 in 1:
using System;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Text.Json;
using System.Threading.Tasks;
using RocksDbSharp;
using Tenray.ZoneTree.Comparers;
using Tenray.ZoneTree.Serializers;
using Tenray.ZoneTree;
using static TestRocksDb.Program;
namespace TestRocksDb
{
internal class Program
{
public static int MaxItems = 10000000;
static void Main(string[] args)
{
TestZoneTreePerformanceSingleThread();
Console.WriteLine("--------");
TestConcurrentDictionaryPerformanceSingleThread();
Console.WriteLine("--------");
TestRocksDBPerformanceSingleThread();
Console.WriteLine("--------");
TestConcurrentDictionaryPerformanceMultiTaskThread();
Console.WriteLine("--------");
TestRocksDBPerformanceMultiThread();
Console.WriteLine("--------");
Console.ReadLine();
}
public static void TestConcurrentDictionaryPerformanceSingleThread()
{
Console.WriteLine("Initiating Trial for Single thread performance of Concurrent Dictionary");
Stopwatch stopwatch = new Stopwatch();
{
long totalInsertTime = 0;
// Create a ConcurrentBag with 100,000 Student items
var bag = new ConcurrentDictionary<string, Student>();
var random = new Random();
stopwatch.Restart();
for (int i = 0; i < MaxItems; i++)
{
var student = new Student
{
Id = i,
Name = $"Student{i}",
Age = random.Next(18, 25),
GPA = Math.Round(random.NextDouble() * 4.0, 2),
Address = $"Address {i}",
PhoneNumber = $"555-0100{i:D4}",
Email = $"student{i}@example.com",
EnrollmentDate = DateTime.Now.AddDays(-random.Next(1000)).ToString("yyyy-MM-dd")
};
bag.TryAdd(i.ToString(), student);
}
stopwatch.Stop();
totalInsertTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to insert " + MaxItems + " items: " + stopwatch.ElapsedMilliseconds + " ms");
// Measure the time taken to read items
stopwatch.Restart();
for (int i = 0; i < MaxItems; i++)
{
var value = bag[i.ToString()];
}
stopwatch.Stop();
double totalReadTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to read " + MaxItems + " items: " + totalReadTime + " ms");
// Additional metrics
double avgInsertTime =Math.Round( (totalInsertTime / (double)MaxItems)*1000,1);
Console.WriteLine($"Average insert time per item: {avgInsertTime} us");
double avgReadTime =Math.Round( (totalReadTime / (double)MaxItems)*1000,1);
Console.WriteLine($"Average read time per item: {avgReadTime} us");
Console.WriteLine($"Test Finished");
}
}
public static void TestRocksDBPerformanceSingleThread()
{
Console.WriteLine("Initiating Trial for Single thread performance of RocksDB");
bool InsertAllowed = true;
bool DeleteEarlierRecords = true;
string dbPath = "rocksdb";
if (Directory.Exists(dbPath) && DeleteEarlierRecords)
{
Console.WriteLine("Deleting Earlier database directory");
Directory.Delete(dbPath, true);
}
Stopwatch stopwatch = new Stopwatch();
Console.WriteLine("Initiating Rocks Db Trial, Max Concurrent Tasks: 1, MaxItems: " + MaxItems);
// Directory to store the RocksDB database
stopwatch.Start();
var options = new DbOptions().SetCreateIfMissing(true);
// Initialize RocksDB
using (var db = RocksDb.Open(options, dbPath))
{
stopwatch.Stop();
Console.WriteLine($"DB Initialization Time: " + stopwatch.ElapsedMilliseconds + " ms");
long totalInsertTime = 0;
if (InsertAllowed)
{
var random = new Random();
stopwatch.Restart();
for (int i = 0; i < MaxItems; i++)
{
var student = new Student
{
Id = i,
Name = $"Student{i}",
Age = random.Next(18, 25),
GPA = Math.Round(random.NextDouble() * 4.0, 2),
Address = $"Address {i}",
PhoneNumber = $"555-0100{i:D4}",
Email = $"student{i}@example.com",
EnrollmentDate = DateTime.Now.AddDays(-random.Next(1000)).ToString("yyyy-MM-dd")
};
string serializedStudent = JsonSerializer.Serialize(student);
db.Put(i.ToString(), serializedStudent);
}
stopwatch.Stop();
totalInsertTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to insert " + MaxItems + " items: " + stopwatch.ElapsedMilliseconds + " ms");
}
else Console.WriteLine("Insertion would be skipped.");
// Measure the time taken to read items
List<double> times = new List<double>();
stopwatch.Restart();
for (int i = 0; i < MaxItems; i++)
{
string value = db.Get(i.ToString());
var student = JsonSerializer.Deserialize<Student>(value);
}
stopwatch.Stop();
long totalReadTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to read " + MaxItems + " items: " + stopwatch.ElapsedMilliseconds + " ms");
// Additional metrics
double avgInsertTime = Math.Round((totalInsertTime / (double)MaxItems) * 1000, 1);
Console.WriteLine($"Average insert time per item: {avgInsertTime} us");
double avgReadTime = Math.Round((totalReadTime / (double)MaxItems) * 1000, 1);
Console.WriteLine($"Average read time per item: {avgReadTime} us");
Console.WriteLine($"Execution Finished");
}
}
public static void TestZoneTreePerformanceSingleThread()
{
Console.WriteLine("Initiating Trial for Single thread performance of Zone tree");
bool InsertAllowed = true;
bool DeleteEarlierRecords = true;
string dbPath = "zonetree";
if (Directory.Exists(dbPath) && DeleteEarlierRecords)
{
Console.WriteLine("Deleting Earlier database directory");
Directory.Delete(dbPath, true);
}
Stopwatch stopwatch = new Stopwatch();
Console.WriteLine("Initiating Rocks Db Trial, Max Concurrent Tasks: 1, MaxItems: " + MaxItems);
// Directory to store the RocksDB database
stopwatch.Start();
using var zoneTree = new ZoneTreeFactory<string, string>()
.SetDataDirectory(dbPath)
.OpenOrCreate();
using var maintainer = zoneTree.CreateMaintainer();
// Initialize RocksDB
{
stopwatch.Stop();
Console.WriteLine($"DB Initialization Time: " + stopwatch.ElapsedMilliseconds + " ms");
long totalInsertTime = 0;
if (InsertAllowed)
{
var random = new Random();
stopwatch.Restart();
for (int i = 0; i < MaxItems; i++)
{
var student = new Student
{
Id = i,
Name = $"Student{i}",
Age = random.Next(18, 25),
GPA = Math.Round(random.NextDouble() * 4.0, 2),
Address = $"Address {i}",
PhoneNumber = $"555-0100{i:D4}",
Email = $"student{i}@example.com",
EnrollmentDate = DateTime.Now.AddDays(-random.Next(1000)).ToString("yyyy-MM-dd")
};
string serializedStudent = JsonSerializer.Serialize(student);
zoneTree.Upsert(i.ToString(), serializedStudent);
}
maintainer.CompleteRunningTasks();
stopwatch.Stop();
totalInsertTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to insert " + MaxItems + " items: " + stopwatch.ElapsedMilliseconds + " ms");
}
else Console.WriteLine("Insertion would be skipped.");
// Measure the time taken to read items
List<double> times = new List<double>();
stopwatch.Restart();
for (int i = 0; i < MaxItems; i++)
{
zoneTree.TryGet(i.ToString(), out string value);// db.Get(i.ToString());
var student = JsonSerializer.Deserialize<Student>(value);
}
stopwatch.Stop();
long totalReadTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to read " + MaxItems + " items: " + stopwatch.ElapsedMilliseconds + " ms");
// Additional metrics
double avgInsertTime = Math.Round((totalInsertTime / (double)MaxItems) * 1000, 1);
Console.WriteLine($"Average insert time per item: {avgInsertTime} us");
double avgReadTime = Math.Round((totalReadTime / (double)MaxItems) * 1000, 1);
Console.WriteLine($"Average read time per item: {avgReadTime} us");
Console.WriteLine($"Execution Finished");
}
}
public static void TestConcurrentDictionaryPerformanceMultiTaskThread()
{
Console.WriteLine("Initiating Trial for Multi Task Read performance of Concurrent Dictionary");
Stopwatch stopwatch = new Stopwatch();
{
long totalInsertTime = 0;
// Create a ConcurrentBag with 100,000 Student items
var bag = new ConcurrentDictionary<string, Student>();
var random = new Random();
stopwatch.Restart();
for (int i = 0; i < MaxItems; i++)
{
var student = new Student
{
Id = i,
Name = $"Student{i}",
Age = random.Next(18, 25),
GPA = Math.Round(random.NextDouble() * 4.0, 2),
Address = $"Address {i}",
PhoneNumber = $"555-0100{i:D4}",
Email = $"student{i}@example.com",
EnrollmentDate = DateTime.Now.AddDays(-random.Next(1000)).ToString("yyyy-MM-dd")
};
bag.TryAdd(i.ToString(), student);
}
stopwatch.Stop();
totalInsertTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to insert " + MaxItems + " items: " + stopwatch.ElapsedMilliseconds + " ms");
// Measure the time taken to read items
stopwatch.Restart();
CancellationToken ct = new CancellationToken();
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount, // Adjust as needed
CancellationToken = ct, // You can use cancellation token if needed
TaskScheduler = TaskScheduler.Default,
};
var pfResulst = Parallel.ForEach(bag, options, item =>
{
var studentID = item.Value.Id;
var gpa = item.Value.GPA;
});
while(!pfResulst.IsCompleted)
{
Thread.Sleep(30);
}
stopwatch.Stop();
double totalReadTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to read " + MaxItems + " items: " + totalReadTime + " ms");
// Additional metrics
double avgInsertTime = Math.Round((totalInsertTime / (double)MaxItems) * 1000, 1);
Console.WriteLine($"Average insert time per item: {avgInsertTime} us");
double avgReadTime = Math.Round((totalReadTime / (double)MaxItems) * 1000, 1);
Console.WriteLine($"Average read time per item: {avgReadTime} us");
Console.WriteLine($"Test Finished");
}
}
public static void TestRocksDBPerformanceMultiThread()
{
Console.WriteLine("Initiating Trial for Multi task read performance of RocksDB");
bool InsertAllowed = true;
bool DeleteEarlierRecords = true;
string dbPath = "rocksdb";
if (Directory.Exists(dbPath) && DeleteEarlierRecords)
{
Console.WriteLine("Deleting Earlier database directory");
Directory.Delete(dbPath, true);
}
Stopwatch stopwatch = new Stopwatch();
Console.WriteLine("Initiating Rocks Db Trial, Max Concurrent Tasks: 1, MaxItems: " + MaxItems);
// Directory to store the RocksDB database
stopwatch.Start();
var options = new DbOptions().SetCreateIfMissing(true);
// Initialize RocksDB
using (var db = RocksDb.Open(options, dbPath))
{
stopwatch.Stop();
Console.WriteLine($"DB Initialization Time: " + stopwatch.ElapsedMilliseconds + " ms");
long totalInsertTime = 0;
if (InsertAllowed)
{
var random = new Random();
stopwatch.Restart();
for (int i = 0; i < MaxItems; i++)
{
var student = new Student
{
Id = i,
Name = $"Student{i}",
Age = random.Next(18, 25),
GPA = Math.Round(random.NextDouble() * 4.0, 2),
Address = $"Address {i}",
PhoneNumber = $"555-0100{i:D4}",
Email = $"student{i}@example.com",
EnrollmentDate = DateTime.Now.AddDays(-random.Next(1000)).ToString("yyyy-MM-dd")
};
string serializedStudent = JsonSerializer.Serialize(student);
db.Put(i.ToString(), serializedStudent);
}
stopwatch.Stop();
totalInsertTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to insert " + MaxItems + " items: " + stopwatch.ElapsedMilliseconds + " ms");
}
else Console.WriteLine("Insertion would be skipped.");
// Measure the time taken to read items
List<double> times = new List<double>();
stopwatch.Restart();
CancellationToken ct = new CancellationToken();
var pfoptions = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount, // Adjust as needed
CancellationToken = ct, // You can use cancellation token if needed
TaskScheduler = TaskScheduler.Default,
};
var pfResulst = Parallel.For(0,MaxItems, pfoptions, i =>
{
string value = db.Get(i.ToString());
var student = JsonSerializer.Deserialize<Student>(value);
var studentID = student.Id;
var gpa = student.GPA;
});
while (!pfResulst.IsCompleted)
{
Thread.Sleep(30);
}
stopwatch.Stop();
long totalReadTime = stopwatch.ElapsedMilliseconds;
Console.WriteLine($"Time taken to read " + MaxItems + " items: " + stopwatch.ElapsedMilliseconds + " ms");
// Additional metrics
double avgInsertTime = Math.Round((totalInsertTime / (double)MaxItems) * 1000, 1);
Console.WriteLine($"Average insert time per item: {avgInsertTime} us");
double avgReadTime = Math.Round((totalReadTime / (double)MaxItems) * 1000, 1);
Console.WriteLine($"Average read time per item: {avgReadTime} us");
Console.WriteLine($"Execution Finished");
}
}
public class Student
{
public int Id { get; set; }
public string Name { get; set; }
public int Age { get; set; }
public double GPA { get; set; }
public string Address { get; set; }
public string PhoneNumber { get; set; }
public string Email { get; set; }
public string EnrollmentDate { get; set; }
}
}
}
Performance Results:
Performance Results for test Run
- Single thread performance of Zone tree
DB Initialization Time: 136 ms Time taken to insert 10000000 items: 67030 ms Time taken to read 10000000 items: 41052 ms Average insert time per item: 6.7 us Average read time per item: 4.1 us
- Single thread performance of Concurrent Dictionary
Time taken to insert 10000000 items: 21166 ms Time taken to read 10000000 items: 1658 ms Average insert time per item: 2.1 us Average read time per item: 0.2 us
- Single thread performance of RocksDB
Initiating Rocks Db Trial, Max Concurrent Tasks: 1, MaxItems: 10000000 DB Initialization Time: 748 ms Time taken to insert 10000000 items:65248 ms Time taken to read 10000000 items: 45764 ms Average inserttime per item: 6.5 us Average read time per item: 4.6 us
- Multi Task Read performance of Concurrent Dictionary
Time taken to insert 10000000 items: 20733 ms Time taken to read 10000000 items: 1322 ms Average insert time per item: 2.1 us Average read time per item: 0.1 us
- Multi task read performance of RocksDB
Initiating Rocks Db Trial, Max Concurrent Tasks: 1, MaxItems: 10000000 DB Initialization Time: 32 ms Time taken to insert 10000000 items: 74728 ms Time taken to read 10000000 items: 38001 ms Average insert time per item: 7.5 us Average read time per item: 3.8 us
Here is my conclusion:
Concurrent Dictionary offered a very fast read performance, while comparable Write performance, The reason to add ConcurrentDictionary here was just as a reference, I do realize comparing a disk based persistent store with in-memory collection isn’t and entirely fair deal.
While Zonetree offered faster performance, I noticed that the max memory during trial by Zone Tree and Concurrent Dictionary both equated 5.3GB which meant at a certain point of time all values were actually sitting in memory while Zone tree’s ‘maintainer’ did it job. This is a clear disadvantage for Zone tree, Optimal memory utilization is the reason to turn towards persistent Key Value stores in the first place.
As for RocksDB, the max memory utilization stayed below 130 MB, Thee database did not utilize more than 550 MB of diskspace and reading/writing was achieveable in <10 us on average, Additionally it seems RocksDB handles concurrency well, While a multi thread application would’nt be able to extract higher performance, built in protection against concurrency protection (within Same process though) is welcome.