Algorithm Learning Notes-02

Algorithm Learning Notes-02
Date: 2020-03-18 14:53:38.0
Updated: 2023-02-22 15:30:06.706
URL: /archives/algorithm-notes-02
Categories:

Study Notes
Tags:
Algorithm

Analysis of Best, Worst, Average, and Amortized Time Complexity

Best and Worst Case Time Complexity#

Consider the following code:

// n represents the length of array
int find(int[] array, int n, int x) {
    int i = 0;
    int pos = -1;
    for (; i < n; ++i) {
        if (array[i] == x) {
            pos = i;
            break;
        }
    }

    return pos;
}

Is the time complexity of this code O(n)? Probably not.

In this case, the time complexity of the code needs to be discussed based on different scenarios. Therefore, there are several time complexities:

Best case time complexity: The time complexity of executing this code in the best scenario. In this example, if we are lucky and find the number in array[0], the code will stop here. In this case, the time complexity of the code is O(1).

Worst case time complexity: The time complexity of executing this code in the worst scenario. In this example, if we are unlucky and find the number in array[n-1], or if the number is not found at all, the code needs to traverse the entire array. In this case, the time complexity of the code is O(n).

Average Time Complexity#

However, the best case time complexity and worst case time complexity may be too extreme and may not occur frequently. In order to better represent the complexity in the average case, we need to introduce the concept of average time complexity.

Returning to the previous example, there may be n+1 different scenarios for the position of variable x in the array: in the positions [0]~[n-1] (n) or not in the array (1). We can calculate the sum of the number of elements that need to be traversed to find this number in each scenario, divide it by n+1, and obtain the average number of elements that need to be traversed to find this number. This can be represented as:

Average number of elements that need to be traversed

According to the big O notation, coefficients, constants, and lower-order terms can be ignored. Therefore, the simplified average time complexity is O(n).

March 18, uncertain, how is it simplified? I don't understand. Is it just based on the highest order when it is in fraction form?

March 19, answer: how is it simplified? According to the explanation of a master on Leetcode, we can generalize it. The expression n(n+3) / 2(n+1) can be expanded to (n^2+3n) / (2n+2). Since constants can be ignored, it can be simplified to (n^2+3n) / (2n) = (n+3) / 2. Therefore, the simplified average time complexity is O(n)!!!

However, there are still some small issues with this conclusion. The probabilities of the n+1 scenarios for the position of variable x in the array are not the same.

First, is the variable x in the array or not? To simplify, let's assume that the probabilities of being in the array and not being in the array are both 1/2. In addition, the probabilities of appearing in positions [0]~[n-1] in the array are the same, which is 1/n. Therefore, according to the multiplication rule of probability, the probability of x appearing in any position in [0]~[n-1] is 1/(2n).

Therefore, when we consider the probabilities of these scenarios, the calculation process of the average complexity becomes:

Calculation process of average time complexity considering probabilities

This value is the weighted average in probability theory, also known as the expected value. Therefore, the full name of the average time complexity should be weighted average time complexity or expected time complexity.

Finally, the weighted average time complexity of this code is still O(n).

In fact, in most cases, we do not need to distinguish between the best case, worst case, and average case time complexities. We only use these three complexity notations to distinguish when the time complexities of the same code have significant differences in magnitude under different scenarios.

Amortized Time Complexity#

Let's take another example:

// array represents an array of length n
// The array.length in the code is equal to n
int[] array = new int[n];
int count = 0;

void insert(int val) {
    if (count == array.length) {
        int sum = 0;
        for (int i = 0; i < array.length; ++i) {
            sum = sum + array[i];
        }
        count = 1; // Clear the array
        array[0] = sum;
     }

    array[count] = val;
    ++count;
 }

The logic of this code is as follows: First, the basic function of this function is to insert a value into an integer array. When the array is full, that is, count == array.length, we use a for loop to traverse the array and calculate the sum, then clear the array, and put the sum value into position [0] of the array. When the array is not full, that is, count != array.length, we directly insert the data into the array.

Note: In this case, clearing the array means setting count to 1. For writable storage space, it is considered empty if the user thinks it is empty. If you define empty as all elements being overwritten with 0 or some other value, that is also acceptable! The user should only care about the new value to be stored.

In this code, the time complexity analysis using the previous three methods is as follows:

In the best case scenario, when the array is not full, we only need to insert the data into position [count]. The best case time complexity is O(1).
In the worst case scenario, when the array is full, we need to traverse the array and calculate the sum, and then insert the value. The worst case time complexity is O(n).
The average time complexity can be analyzed using probability theory. Assuming the length of the array is n, we can divide it into n scenarios, each with a time complexity of O(1). In addition, there is an "extra" scenario, which is when we insert a data into the array when it is full, which has a time complexity of O(n). Moreover, these n+1 scenarios have the same probability, which is 1/(n+1). Therefore, according to the calculation method of weighted average, the average time complexity is:

March 18, uncertain. Why is it 2n / n+1... But it can also be considered as O(n)? I really don't understand...

March 19, answer: I finally asked a master on Leetcode. The master said, "2n/(n+1), the quantities above and below are of the order of n, and can be regarded as 2n/n = 2, which is a constant time complexity, so it is O(1)." After reading this, I suddenly realized it. The answer is so enlightening!

In fact, we don't need to calculate the average complexity of this example in such a complex way. We can compare the find() and insert() examples to see the difference.

First, the find() function has a time complexity of O(1) only in extreme cases. However, the insert() function has a time complexity of O(1) in most cases, and only has a higher complexity of O(n) in a few cases. This is the first difference between insert() and find().

Second, for the insert() function, the O(1) insertion and the O(n) insertion with a higher time complexity occur in a predictable sequence with a certain time order. Generally, after n-1 O(1) insertions (inserting data when the array is not full), there will be an O(n) insertion (traversing the array, calculating the sum, and clearing the array), and this pattern repeats.

Therefore, for the analysis of this special scenario, we introduce a simpler analysis method: amortized analysis, which calculates the amortized time complexity.

Using the example, every O(n) insertion is followed by n-1 O(1) insertions. Therefore, the time-consuming operation of the higher complexity is amortized over the n-1 time-consuming operations of the lower complexity, and the amortized time complexity of this group of consecutive operations is O(1).

Applications of Average Time Complexity and Amortized Time Complexity#

When performing a series of consecutive operations on a data structure, the time complexity is generally low in most cases, and only a few cases have higher time complexity. Moreover, these operations have a coherent time order. In this case, we can analyze this group of operations together to see if the time-consuming operation with higher complexity can be amortized over the operations with lower complexity. Moreover, in situations where amortized time complexity analysis can be applied, the amortized time complexity is generally equal to the best case time complexity.

Amortized time complexity is a special case of average time complexity.

Summary#

Several concepts related to complexity analysis include: best case time complexity, worst case time complexity, average time complexity, and amortized time complexity. These concepts are introduced because the time complexity of the same code may vary depending on different inputs. After introducing these concepts, we can more comprehensively represent the execution efficiency of a piece of code.

After-class Thinking Questions#

// Global variables: an array array of size 10, length len, index i.
int array[] = new int[10];
int len = 10;
int i = 0;

// Add an element to the array
void add(int element) {
    if (i >= len) { // The array is full
        // Reallocate an array space with twice the size
        int new_array[] = new int[len*2];
        // Copy the data from the original array to the new_array one by one
        for (int j = 0; j < len; ++j) {
            new_array[j] = array[j];
        }
        // Assign new_array to array, and the size of array is now twice the original len
        array = new_array;
        len = 2 * len;
    }
    // Put the element at index i, and increment i by one
    array[i] = element;
    ++i;
}

The logic of this code is as follows: Add an element to the array. When the array is not full, the data is inserted directly. When the array is full, a new array space with twice the size is allocated, and the data from the original array is copied to the new array one by one. Finally, the new array is assigned to the original array, replacing it.

In this code, the best case time complexity is when the array is not full and the data is inserted directly. It is O(1).

The worst case time complexity is when the array is full and the data needs to be copied to a new array. It is O(n).

The average time complexity can be analyzed based on the insertion position. There are n scenarios, and each scenario has a time complexity of O(1). In addition, there is one "extra" scenario, which is when we insert a data into the array when it is full, which has a time complexity of O(n). Moreover, each of these n+1 scenarios has the same probability, which is 1/(n+1). Therefore, according to the calculation method of weighted average, the average time complexity is O(1).

The amortized time complexity is O(1) because every n O(1) operation is followed by one O(n) operation, and the time-consuming operation with higher complexity is amortized over the time-consuming operations with lower complexity.