Collection Lookups

FindInCollection

Yesterday, I was discussing a method with a co-worker where I suggested we loop through a collection of records and, for each record, do another retrieval-by-ID via LINQ. He brought up that this would probably be done more efficiently by creating a dictionary before the loop and retrieving from the dictionary instead of repeatedly executing the LINQ query. So I decided to do some research.

Firstly, I learned about two new LINQ methods: ToDictionary and ToLookup. Lookups and dictionaries serve a similar purpose, but the primary distinction is that a lookup will allow duplicate keys. Check out this article for a quick comparison of the two structures.

With my new tools in hand, I wanted to compare the performance. I first came up with a test. I created a collection of simple objects that had an ID and then looped through and retrieved each item by ID. Here’s what the test looks like:

void Main()
{
	var iterations = 10000;
	var list = new List<Human>();
	for (int i = 0; i < iterations; i++)
	{
		list.Add(new Human(i));
	}
	
	var timesToAvg = 100;
	
	Console.WriteLine("Avg of .Where search: {0} ms", 
		AverageIt((l, i) => TestWhere(l, i), list, iterations, timesToAvg));
	
	Console.WriteLine("Avg of for-built Dictionary search: {0} ms", 
		AverageIt((l, i) => TestDictionary(l, i), list, iterations, timesToAvg));
		
	Console.WriteLine("Avg of LINQ-built Dictionary search: {0} ms", 
		AverageIt((l, i) => TestToDictionary(l, i), list, iterations, timesToAvg));
		
	Console.WriteLine("Avg of Lookup search: {0} ms", 
		AverageIt((l, i) => TestLookup(l, i), list, iterations, timesToAvg));
}

decimal AverageIt(Action<List<Human>, int> action, List<Human> list, int iterations, int timesToAvg)
{
	var sw = new Stopwatch();
	
	decimal sum = 0;
	for (int i = 0; i < timesToAvg; i++)
	{
		sw.Reset();
		sw.Start();
		action(list, iterations);
		sw.Stop();
		sum += sw.ElapsedMilliseconds;
	}
	return sum / timesToAvg;
}

class Human
{
	public int id;
	
	public Human(int id)
	{
		this.id = id;
	}
}

Then, I wrote a method for each algorithm I wanted to test: using .Where, using a manually-built dictionary, using a ToDictionary-built dictionary, and using a lookup. Here are the methods I wrote for each of the algorithms:

void TestWhere(List<Human> list, int iterations)
{	
	for (int i = 0; i < iterations; i++)
	{
		var h = list.Where(x => x.id == i).FirstOrDefault();
	}
}

void TestDictionary(List<Human> list, int iterations)
{
	var dict = new Dictionary<int, Human>();
	foreach (var h in list)
	{
		dict.Add(h.id, h);
	}
	for (int i = 0; i < iterations; i++)
	{
		var h = dict[i];
	}
}

void TestToDictionary(List<Human> list, int iterations)
{
	var dict = list.ToDictionary(x => x.id);
	for (int i = 0; i < iterations; i++)
	{
		var h = dict[i];
	}
}

void TestLookup(List<Human> list, int iterations)
{
	var lookup = list.ToLookup(
		x => x.id,
		x => x);
	for (int i = 0; i < iterations; i++)
	{
		var h = lookup[i];
	}
}

Here are the results:

Avg of .Where search: 987.89 ms
Avg of for-built Dictionary search: 1.85 ms
Avg of LINQ-built Dictionary search: 1.67 ms
Avg of Lookup search: 2.14 ms

I would say that the results are what I expected in terms of what performed best. I was surprised by just how poorly the .Where queries performed, though–it was awful! One note about the manually-built dictionary versus the one produced by LINQ’s ToDictionary method: in repeated tests, the better performing method was inconsistent, leading me to believe that there is no significant benefit or disadvantage to using one or the other. I’ll likely stick with ToDictionary in the future due to its brevity, though.

These results seem to prove that a dictionary is optimal for lookups when key uniqueness is guaranteed. If the key is not unique or its uniqueness is questionable, a lookup should be used instead. Never do what I wanted to do, though, and use a .Where as an inner-loop lookup retrieval mechanism.

12/10/2012 Update:
A co-worker pointed out that I don’t need to chain Where and FirstOrDefault. Instead, I can just use FirstOrDefault with a lambda. So I added this to the test app to see how it compared. Surprisingly, this seems to consistently run slower than using Where in conjunction with FirstOrDefault!

void TestFirstOrDefault(List<Human> list, int iterations)
{	
	for (int i = 0; i < iterations; i++)
	{
		var h = list.FirstOrDefault(x => x.id == i);
	}
}

We also agreed that there should be a for-each loop as a base comparison, so I added that as well.

void TestForEach(List<Human> list, int iterations)
{
	for (int i = 0; i < iterations; i++)
	{
		foreach (var x in list)
		{
			if (i == x.id)
			{
				break;
			}
		}
	}
}

Here are the full results with the two new algorithms:

Avg of ForEach search: 741.05 ms
Avg of .Where search: 980.13 ms
Avg of .FirstOrDefault search: 1189.01 ms
Avg of for-built Dictionary search: 1.57 ms
Avg of LINQ-built Dictionary search: 1.57 ms
Avg of Lookup search: 1.74 ms

**********
Complete code:

void Main()
{
	var iterations = 10000;
	var list = new List<Human>();
	for (int i = 0; i < iterations; i++)
	{
		list.Add(new Human(i));
	}
	
	var timesToAvg = 100;
	
	Console.WriteLine("Avg of ForEach search: {0} ms", 
		AverageIt((l, i) => TestForEach(l, i), list, iterations, timesToAvg));
	
	Console.WriteLine("Avg of .Where search: {0} ms", 
		AverageIt((l, i) => TestWhere(l, i), list, iterations, timesToAvg));
		
	Console.WriteLine("Avg of .FirstOrDefault search: {0} ms", 
		AverageIt((l, i) => TestFirstOrDefault(l, i), list, iterations, timesToAvg));
	
	Console.WriteLine("Avg of for-built Dictionary search: {0} ms", 
		AverageIt((l, i) => TestDictionary(l, i), list, iterations, timesToAvg));
		
	Console.WriteLine("Avg of LINQ-built Dictionary search: {0} ms", 
		AverageIt((l, i) => TestToDictionary(l, i), list, iterations, timesToAvg));
		
	Console.WriteLine("Avg of Lookup search: {0} ms", 
		AverageIt((l, i) => TestLookup(l, i), list, iterations, timesToAvg));
}

decimal AverageIt(Action<List<Human>, int> action, List<Human> list, int iterations, int timesToAvg)
{
	var sw = new Stopwatch();
	
	decimal sum = 0;
	for (int i = 0; i < timesToAvg; i++)
	{
		sw.Reset();
		sw.Start();
		action(list, iterations);
		sw.Stop();
		sum += sw.ElapsedMilliseconds;
	}
	return sum / timesToAvg;
}

class Human
{
	public int id;
	
	public Human(int id)
	{
		this.id = id;
	}
}

void TestForEach(List<Human> list, int iterations)
{
	for (int i = 0; i < iterations; i++)
	{
		foreach (var x in list)
		{
			if (i == x.id)
			{
				break;
			}
		}
	}
}

void TestWhere(List<Human> list, int iterations)
{	
	for (int i = 0; i < iterations; i++)
	{
		var h = list.Where(x => x.id == i).FirstOrDefault();
	}
}

void TestFirstOrDefault(List<Human> list, int iterations)
{	
	for (int i = 0; i < iterations; i++)
	{
		var h = list.FirstOrDefault(x => x.id == i);
	}
}

void TestDictionary(List<Human> list, int iterations)
{
	var dict = new Dictionary<int, Human>();
	foreach (var h in list)
	{
		dict.Add(h.id, h);
	}
	for (int i = 0; i < iterations; i++)
	{
		var h = dict[i];
	}
}

void TestToDictionary(List<Human> list, int iterations)
{
	var dict = list.ToDictionary(x => x.id);
	for (int i = 0; i < iterations; i++)
	{
		var h = dict[i];
	}
}

void TestLookup(List<Human> list, int iterations)
{
	var lookup = list.ToLookup(
		x => x.id,
		x => x);
	for (int i = 0; i < iterations; i++)
	{
		var h = lookup[i];
	}
}
Advertisements

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s