How can List be faster than native arrays?
List is an extremely fast immutable list that can be used as an alternative to native JavaScript arrays. Since List is immutable it is made for functional programming. For this use case it is often much faster than native JavaScript arrays and libraries that operate on arrays like Lodash and Ramda.
Jeremy Likness recently tweeted about List (thanks a lot). One of the things he mentioned was the claim that List is faster than arrays. Several people responded with a healthy skepticism to this statement. How can something be faster than arrays which are native to JavaScript and highly optimized by the JavaScript engines? The short answer is that by exploiting immutability List can avoid a lot of unnececarry copying through structural sharing. This blog post will go into more detail and show how structural sharing leads to huge improvements in performance.
The problem with arrays
In functional programming, we never mutate or change our
data-structures. Instead, all operations on data-structures that needs
to change something must return a new version without modifying the
old one. An example of this is the concat
method on arrays. The code
arrayA.concat(arrayB)
changes neither arrayA
nor arrayB
. It
returns a brand new array.
Arrays are stored in memory as a consequetive sequence of bits.
Therefore, when a pure function operates on arrays the only thing it
can do in order to return a new array is to copy the entire array.
For instance, to append a single element to a list by using concat
or Ramda’s append
the entire array that is being appended to must be
copied. As another example, to remove the first element of an array
one can write array.slice(1)
, use the function _.initial
from
Lodash, or R.init
from Ramda. All of these functions make a copy of
the entire slice. If the array is 1000 elements long the code will
construct a brand new array of length 999.
When doing functional programming with arrays a lot of time is wasted copying arrays all the time. This not only makes our code slower, it also gives the garbage collector extra work because it has to free all the memory again. We pay twice.
How List solves the problem
List is an implementation of an immutable data-structure called relaxed radix balanced trees. Like most immutable data-structures it uses a technique called structural sharing to avoid the unnecessary copying we saw with arrays. Internally a sequence of elements is stored by List in a format that looks a bit like this.
One way of looking at the above is that List stores elements in several small chunks. Whenever we need to create a new list we can reuse all the chunks that didn’t change. This type of reusing is structural sharing. Arrays, on the other hand, store everything in one big chunk so it’s not possible to use share anything when changing an array.
If we appended an element to the above list we would get a new list that, internally, looked something like this.
The dashed boxes represent the new list. Note how everything in the old list is reused in the new list. This is one part of why List can perform so well. Appending an element to a list takes essentially the same time no matter how large a list we’re appending to. This can be seen in benchmarks.
Appending an element to an array takes more and more time as the array gets larger. For List the time is constant. Appending an element to a list takes the same time on a list of length 10 as it does on a list of length 10.000.
This performance improving technique applies to a wide range of
operations. Sharing is useful for things such as changing a single
element in a list, inserting elements into a list, concatenating
lists, and slicing a list. The graph below compares the performance of
slice
between List and arrays.
Again, every time we want to change something in an array we must make a copy of the entire array. But, to change something in a list we can reuse large parts that can be shared between the new and the old list.
Another aspect that makes List faster than native arrays is that many
methods on native arrays are actually quite slow. This applies to
map
, filter
, reduce
, and more. They have a lot of complexity
that slows them down. For instance
filter
must handle sparse arrays and must accommodate for the filtering
predicate mutating the array during the filter. These things are
required per the JavaScript specification and they slow down native
arrays. The graph below shows the performance of filtering a List
versus filtering an array.
Caveats
Even though List is very fast across the board no data-structure can be the fastest at everything. There are a few operations where List is slower than native arrays. One of these is random accessing. I.e. accessing arbitrary elements out of order. But, even in the few cases where List is behind native arrays, it is not behind by much.
Another caveat is that some functions in List are slower than arrays
for very small lists—even though they scale better to larger lists.
This can be seen in the plot of slice
above. List’s slice
is a lot
faster on large lists but for small lists, it is slightly slower. This
is something that I’m planning to improve in the future.
Conclusion
By using structural sharing List can implement a lot of operations in a way that is much faster than what is possible with arrays. This means that for most functional use cases using List will be faster than arrays. However, that is only part of the reason why someone may use List. Other benefits, that in many cases are more important, are enforced immutability and the extensive API that List offers.
For more information about List check it out on GitHub. For more extensive benchmarks see the benchmark report.