CodeShop

Javascript uniq?

May 14th, 2007

protoype

Protoype is a cool javascript framework very usefull for dynamic webapp creation; however, the uniq algorithm in prototype.js is rather slow for large sets, infact it is O(n2) (quadratic).

This happens because uniq does not expects a sorted array and thus all elements are compared to all elements in the array (the array.include does a foreach for each value). And secondly, the array.concat function ‘will yield a new, intermediary array every time it encounters a new value (a value that wasn’t already in the result array)’.

1
2
3
4
5
  uniq: function() {
    return this.inject([], function(array, value) {
      return array.include(value) ? array : array.concat([value]);
    });
  },

stl

If you look at the c++ stl (standard template libary), this problem is solved in O(n) (linear) time, so a solution is near: convert a little bit of c++ to javascript!

(Note: there is a jsstl available on the web, but it does not implement the algorithm part of the stl, only the containers & iterators …)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
    // TEMPLATE FUNCTION unique
  template<class _FwdIt> inline
  _FwdIt unique(_FwdIt _First, _FwdIt _Last)
  {  // remove each matching previous
  for (_FwdIt _Firstb; (_Firstb = _First) != _Last && ++_First != _Last; )
    if (*_Firstb == *_First)
      {  // copy down
      for (; ++_First != _Last; )
        if (!(*_Firstb == *_First))
          *++_Firstb = *_First;
      return (++_Firstb);
      }
  return (_Last);
  }

(The above is Copyright© 1992-2002 by P.J. Plauger.)

solution

This translates fairly forward to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
  // Removes redundant elements from the array
  function unique_inplace(an_array)
  {
    var first = 0;
    var last = an_array.length;
    for(var firstb; (firstb = first) != last && ++first != last; )
    {
      if(an_array[firstb] == an_array[first])
      {
        for(; ++first != last; )
        {
          if (!(an_array[firstb] == an_array[first]))
          {
            an_array[++firstb] = an_array[first];
          }
        }
        ++firstb;
        an_array.length = firstb;
        return;
      }
      an_array.length = firstb;
    }

And this does not take forever on an array of more than 10 elements. It works on the concept of ranges: since the input array is sorted, all similar items will be grouped together; so if we look for the first of a group of similar, we can then skip the others. In the end will we will have an array without the doubles.

One last caveat: you will need to sort your array before you use this function!

Sorry, comments are closed for this article.