Algorithm

Optimazation in Sorted Container

0 Background

Get the index from input value of a container, like array/vetcor/etc…,  is the basic feature every programmer needs to implement in daily work. When the volume is really low, no one cares about the performance, a correct output is all the requirement. If the volume is really high, facing million/billions of data stored in the container, the performance would become more and more important. We have to use all we have to speed up the scenario to get find the input value. In fact, the key is as simple as everyone knows, what is just reduce the loop count and use as much informations we have as we can. Let’s start with the very beginning.

1 Find the Index in an Array.

How to find the index from a input value in an array, or so other container? Yes, just loop the iterator though the container, until what we get is equal to the input, like this

<pre>
for (int i = 0; i < size, i++)
{
    if (arr[i] == input)
    {
        index = i;
        break;
    }
}
</pre>

Here we have T(n) = O(n), if the size is large enough, performance will be really poor.

2.Find the Index in an sorted Array

In our daily usage, lots of arrays are sorted. For example, a list of payload which constructs a file, every start offset are increase sorted of course. How to improve the performance? Yes, we have to use the information we got -‘sorted’, and dichotomy search.I think every programmer could figure it out and implement no matter how junior he is.We use Cpp array here as one example,

<pre>
int DichotomySearch(int input[], int n, int inputValue, int& loop)
{
    int retIndex = 0;
    int mid = -1;
    int low = 0;
    int high = n - 1;
    int lowValue = input[low];
    int highValue = input[high];
    int midValue = 0;

    while (mid != (low + high) / 2 ) {
        loop++;
        mid = (low + high) / 2;
        midValue = input[mid];
        if (inputValue == midValue) {
            return mid;
        }
        if (inputValue > midValue) {
            low = mid;
        }
        if (inputValue < midValue) {
            high = mid;
        }
    }
    if (inputValue == input[mid + 1])
    {
        return mid + 1;
    }
    return -1;
}
</pre>

Here we have the terminate condition highlighted above, which means the ‘low’ iterator is just near the ‘high’ iterator, and the rest could not be ‘dichotomy’ any more. Since in Cpp, integer cast is the biggest integer small than the calculation result(divided by 2), there is chance that the input is just what ‘high’ iterator points.

In theory, T(n) = O(log n), worst case would be better than O(n), even the average performance n/2 above. Here we have ‘loop’ to indicate how many loops we take to get the output, so let’s just try an array with 16 elements to verify.

And use the ‘main’ like this,

<pre>
int main()
{
    int loop = 0;
    for (int i = 0; i < 180; i+= 10) {
        int ret = DichotomySearch(a, 16, i, loop);
        printf("Value: [%d],\tindex: [%d],\tloop: [%d]\n", i, ret, loop);
        loop = 0;
    }
    return 0;
}
</pre>
Value: [0], index: [0], loop: [4]
Value: [10],index: [1], loop: [3]
Value: [20],index: [2], loop: [4]
Value: [30],index: [3], loop: [2]
Value: [40],index: [4], loop: [4]
Value: [50],index: [5], loop: [3]
Value: [60],index: [6], loop: [4]
Value: [70],index: [7], loop: [1]
Value: [80],index: [8], loop: [4]
Value: [90],index: [9], loop: [3]
Value: [100],index: [10],loop: [4]
Value: [110],index: [11],loop: [2]
Value: [120],index: [12],loop: [4]
Value: [130],index: [13],loop: [3]
Value: [140],index: [14],loop: [4]
Value: [150],index: [15],loop: [4]
Value: [160],index: [-1],loop: [4]
Value: [170],index: [-1],loop: [4]

Finally, we could found the maximum loop count is only log(16) = 4 as above, which is just as our expect. Perfect, simple algorithm, but really useful~

3 Extension – Head, tail, as much hint as we could use

Now we have gone through this really simple and useful algorithm, Dichotomy Search, which could improve our search performance to such an extend, but sometimes the problem in life is much more interesting. Let’s just start from the requirement, such as the payloads within one file which was mentioned above.

Usually, we have one container to store the payload offset according to payload index, like offset[0] = 0, offset[1] = a, offset[2] = b, offset[3] = c, …, offset [n] = x, where x >c>b>a, but sometimes, since the free space between payload is 0, which means we could calculate the offset of one payload if we have the last payload offset and the last payload size, like

offset[n] = 0, n = 0

offset[n] = offset[n-1] + size[n-1] , n > 0

and some constructive payload has the same size, only the size and the payload count would be stored in the container to save the memory consumption, like

count[n] = payload count,

size[n] = payload size.

then we can get the payload offset like this,

<pre>
int CalculatePayloadOffset(int countArray[], int sizeArray[], int payloadIndex, int entryCount)
{
    int firstPayloadIndex = 0;
    int firstOffset = 0;
    for (int i = 0; i < entryCount; i++)
    {
        if ((payloadIndex >= firstPayloadIndex && payloadIndex < firstPayloadIndex + CountArray[i])
        {
            return firstPayloadOffset + (payloadIndex - firstPayloadIndex) * sizeArray[i];
        }
        firstPayloadIndex += CountArray[i];
        firstPayloadOffset += (CountArray[i] * SizeArray[i]);
    }
    return -1;
}
</pre>

Yes, just loop the entry container, add the offset delta(payload size) of all the former payload unit, and then we could get the expected offset. Of course, T(n) = O(n), in which n is the entry counts, and if the input payload index is almost the last ones, we have to loop all the entries to calculate the output.

Could we use the dichotomy as bove? Absolutely not because offset[n] = offset[n-1] + size[n]. We have go through all the recursive ones to obtain the current value unless there is any general way related to ‘n’ when calculating the output.

Let’s adding one more condition to make this case more easier. This hole file is consisted of payloads and there are only payloads in the file, and the file size is easy to be obtained, which means,

offset[n] = 0, n = 1;

offset[n] = offset[n-1] + size[n-1], n > 1;

offset[n] = fileSize – size[n], n = payloadCount

Yes, here is one more information besides the ‘head’ one, which is ‘tail’ one. How could we use this to optimize our code? As the implementation above, the worst part is the later ones, because the input index is closer to the tail, more entries need to be loop when to help calculate the offset. If we could use ‘tail’ offset if the input index is ‘later’ one, the speed up could be reached, as below,

offset[n] = offset[n-1] + size[n-1], former ns

offset[n] = offset[n+1] – size[n], later ns

Since the entry’s value is really random, which means, ‘later half’ of the payloads has the expectations to exist in the ‘later half’ entries, closer to the ‘tail’ than the ‘head’, then it could be described as,

offset[n] = offset[n-1] + size[n-1], n <= payloadCount / 2

offset[n] = offset[n+1] – size[n],n > payloadCount / 2

With the update of the code above, here comes the implementation

int CalculatePayloadOffset(int CountArray[], int SizeArray[], int payloadIndex, int payloadCount, int fileSize)
{
        int entryCount = 10;
        int indicator = (payloadIndex <= payloadCount / 2) ? 1 : -1;

        //Pretend there is another last payload after the current last payload
        int previousPayloadIndex = (indicator == 1) ? 0 : payloadCount;
        int previousPayloadOffset = (indicator == 1) ? 0 : fileSize;
        int entryIter = (indicator == 1) ? 0 : entryCount - 1;

        while (entryIter >= 0 && entryIter <= entryCount - 1)
        {
                loopCount++;
                if ((payloadIndex >= previousPayloadIndex && payloadIndex < previousPayloadIndex + CountArray[entryIter] && indicator == 1) ||
                    (payloadIndex >= previousPayloadIndex - CountArray[entryIter] && payloadIndex < previousPayloadIndex && indicator == -1))
                {
                        return previousPayloadOffset + (payloadIndex - previousPayloadIndex) * SizeArray[entryIter];
                }
                previousPayloadIndex += (CountArray[entryIter] * indicator);
                previousPayloadOffset += (CountArray[entryIter] * SizeArray[entryIter] * indicator);
                entryIter += indicator;
        }
        return -1;
}

Here we have a indicator to help separate increase and decrease search case as highlighted.

4 Extension – Previous Results as hints

The ‘tail’ – file size – must be thanked to when the above implementation has been optimized finally. One more information got, then used, then succeed. Is there any opportunities to make it more faster? Of course, as long as we could have more useful information.

In our daily work, the algorithm/interface will not be used only once during one process life-cycle at most times, which means, when we use this function, probably it has already been called and the calculation procedure has already been executed. Yes, that’s true, we could just store the previous output for the trade-off for the performance by memory consumption. Here only the simplest case would be considered, which is, only use one previous output to help speed up our algorithm.

But how? Just like how we use ‘tail’ information! The closer iterator starter, the faster. It could be explained as below,

offset[i] = offset[i-1] + size[i-1], 0 < i < previousIndex / 2

offset[i] = offset[previousIndex] +/- sum(size(previousIndex ~ i)), previousIndex / 2 < i < (payloadCount + previousIndex) / 2

offset[i] = offset[i+1] – offset[i], i > (payloadCount + previousIndex) / 2

————————————————————————————
| 1 | 2 |…………..| j – 1 | j | j + 1|………………..| n – 1 | n | n + 1 |
————————————————————————————
^                    ^                  ^                          ^                         ^
|                     |                  |                          |                         |
head          midLow      previous           midHigh                  tail

Take the code above as reference, I think this is not very hard to implement. This only take one previous output as hint. If the performance is highly required, we could consider take every previous output to speed up, and it is just the same. There is one more thing needed to be paid attention on, which is if there is more previous outputs than entry count, a lot of time would be needed to calculate every ‘mid’ and compare to decided which one should be the starter when loop all the previous outputs. This is something like ‘sampling’, and is not in this article’s scope.

The performance improvement is decreasing when the previous output hint count is increase, just be careful~

Standard

Leave a comment