Optimizing Jint part 3: Banishing LINQ

So you want your code go fast? Just remove all occurrences of
using System.Linq;
and make the code compile again without it and be done with!

In all seriousness, LINQ is a powerful tool, but it also has it's pitfalls. It really makes a difference if you use LINQ in the wrong places, one being the hot paths of a Javascript interpreter.

Next I'm going to show examples of better performing code - it's neither prettier nor more idiomatic than the LINQ counterpart. So be warned. You should again weigh your use case and whether anything here is really needed.

Jint optimization: usage of LINQ in Jint

Jint used LINQ in some places and made the code quite easy to read. Jint being an interpreter, it made things unnecessarily slow however. When scripts are run there can be millions of iterations in loops, choosing the correct way of looping and collecting data can make a big difference.

Removing LINQ is quite the easy optimization for libraries that have hot paths and intensive calls.

What's wrong with LINQ

LINQ is a powerful abstraction that allows you to create idiomatic and logical constructs, usually easy to read code. It however, does this with the price of performance. Every abstraction you add, adds performance penalty. 

Here's a classic example, that brings out most of the bad things you can think of. Let's filter and transform an array using LINQ and then compare that with boring old procedural c# code that's somewhat optimized for the case and is not using anything fancy.

So the LINQ version is quite idiomatic and gets the job done in much less lines of code, what about the numbers?

I'm deliberately using .NET Framework here, the LINQ version will run a lot faster under .NET Core, but the full framework is still the case we all need to consider as a runtime environment.

Can you spot the difference? There's a closure allocation because outer context is being referenced (the forbidden variable), the ToArray has trouble determining the wanted size and grows with the flow. Then there a are those little things like .Where taking a IEnumerable which requires a wrapper creation as array does not implement IEnumerable. Fun times.

OK it only takes 2.5x the time with LINQ and the overall time is a drop in the big ocean of run times of your software. But the thing is, these things add up. When we are talking about interpreter, there's a lot of loops and they all add up. If you need to call the logic 5000 times it's either 3 seconds or 5 seconds of wall clock run time.

Context-capturing closures

When LINQ captures context, like variables outside of it's lambda functions, LINQ needs to build a instance of a class to hold to the context state. Above example showed this via the forbidden lookup. Converting to regular loop gives you better performance when a lot of data is processed.

LINQ over arrays

It might not be directly evident, but LINQ needs to create a wrapper to allow you use the LINQ methods when you access an array. Array does not directly implement IEnumerable so tricks are needed. Let's examine how foreach and LINQ-based functions to return whether array of strings contains a non-null element differs.

LINQ against an array

The LINQ version produces famous <>c classes that you might have seen in task traces. So looking at the following example you can see how the machinery to hold state is produced by the compiler.

Regular loop against an array

Compare the earlier to a version that doesn't use LINQ, see how much less code and state to handle.

Don't optimize too early (or too late)

I don't believe too much in "don't optimize early" mentality. I believe in:

Understand the use case, common usage patterns, input variance and size - and then optimize or skip optimization accordingly.
If the code is in hot path, called a lot and affects the overall system performance - just make it fast to begin with.

There are however those many places where writing LINQ makes code a lot cleaner and idiomatic. When you know performance won't be an issue, just endorse it. It's a good tool as long as you know its limits.

Things to remember and consider

  • Non-LINQ version is usually/always faster
  • LINQ can capture context and throw memory allocations through the roof
  • Prefer concrete (collection) types when using LINQ extensions
  • Array might be a bad source in hot paths as it requires a wrapper

Comments

Popular posts from this blog

Optimizing Jint part 6: DictionarySlim is coming to town

Running docker-compose against WSL2