Algorithm Learning Notes-01

Analysis of Complexity

1. Why do we need complexity analysis?#

Usually, we obtain performance data through statistics and monitoring to evaluate performance. Some people call this commonly used performance analysis method "post-analysis".

However, although post-analysis is widely used, it has some shortcomings:

The test results are highly dependent on the test environment.
- In a sense, this is also convenient, but facing different machines and different compilation environments, the same code segment may have different results.
The test results are greatly affected by the test scale.
- The same code segment will have different results under large and small scales.

So, is there a method that allows us to ignore the test environment and accurately analyze the performance of the code at any scale?

Of course, there is. That is to use the "Big O notation" to apply "time complexity" and "space complexity" to analyze the code.

2. Big O Notation#

Big O notation: The time complexity of an algorithm is usually expressed using the Big O notation, defined as $T(n) = O\big(f(n)\big)$. It is said that the function $T(n)$ is bounded by or limited to $f(n)$. If the scale of a problem is n, the time required for an algorithm to solve this problem is $T(n)$. $T(n)$ is called the "time complexity" of this algorithm. When the input size n gradually increases, the extreme case of time complexity is called the "asymptotic time complexity" of the algorithm. - Baidu Baike

Time Complexity#

Time complexity: The full name is "asymptotic time complexity", which represents the growth relationship between the execution time of an algorithm and the data scale.

We use a formula to represent it:

T(n) = O\big(f(n)\big)

$T(n)$	$n$	$f(n)$	$O$
Execution time of the code	Scale of the code	Number of executions of each line of code	$T(n)$ is directly proportional to $f(n)$

To better understand this formula, let's take an example:

#include <stdio.h>
int main(){
    int c = 0;
    for(int i=1; i<=n; ++i)
        c += i;

    return 0;
}

From the perspective of the CPU, each line of code in this code segment is executing similar operations: read data - perform calculations - write data. Although the number of executions and the execution time of each line of code are different, we are only making a rough estimate here, so we can assume that the execution time of each line of code is the same, which is unit_time. Based on this assumption, what is the total execution time of this code segment?

The third line of code requires 1 unit_time of execution time, and the fourth and fifth lines are executed n times, so they require 2n * unit_time of execution time. Therefore, the total execution time of this code segment is (2n+1) * unit_time. As can be seen, the total execution time T(n) of all the code is directly proportional to the number of executions of each line of code.

Methods of Time Complexity Analysis#

1. Only focus on the code segment with the most loops#

Here is an important inference that I think is important: Since the low-order terms, coefficients, and constants do not affect the scale, we can take the highest-order term as the representative.

2. Addition Rule#

The total complexity is equal to the complexity of the code segment with the highest order of magnitude

Let's take an example:

#include <stdio.h>
int main(){
    int x=0;
    for(int i=1; i<=100; ++i)
        x += i;

    int y=0;
    for(int i=1; i<=n; ++i)
        x += i;

    int z=0;
    for(int i=1; i<=n; ++i){
        for(int j=1; j<=n; ++j)
            z += j;
    }

    printf("%d %d %d\n", x, y, z);
    return 0;
}

In this example, we need to calculate x, y, and z. We calculate the time complexity of each part separately, and then take the highest order as the complexity of the entire code segment.

For the part that calculates x, it loops 100 times. Since it is a constant, it can be ignored.

For the part that calculates y, it loops n times. It is denoted as $O(n)$.

For the part that calculates z, it loops $n^2$ times. It is denoted as $O(n^2)$.

We take the highest order $O(n^2)$ as the complexity of the entire code segment.

Therefore, the addition formula can be expressed as:

\because T_1(n)=O\big(f(n)\big), T_2(n)=O\big(g(n)\big)\\\\ \therefore T_{total}(n)=T_1(n)+T_2(n)=\max\Big(O\big(f(n)\big)\Big)\\\\ \\\\ O\big(g(n)\big)=O\Big(\max\big(f(n),g(n)\big)\Big)

3. Multiplication Rule#

The complexity of nested code is equal to the product of the complexity of the nested inner and outer code

Let's take an example:

#include <stdio.h>
int f(int n);
int main(){
    int x = 0;
    for(int i=0;i<=n;++i)
        x += f(i);

    return 0;
}

int f(int n){
    int y = 0;
    for(int i=0;i<=n;++i)
        y += i;

    return y;
}

In this example, if we treat f() as a normal operation, then the time complexity of the 6th line is $T_1(n)=O(n)$. However, f() is a function, and its time complexity is $T_2(n)=O(n)$. Therefore, the time complexity of this code segment should be $T_{total}=T_1(n) * T_2(n)=O(n*n)=O(n^2)$.

Analysis of Several Common Time Complexities#

Complexity Levels

Among them, we can divide them into two categories: polynomial complexity and non-polynomial complexity. There are only two non-polynomial complexities: $O(2^n)$ and $O(n!)$.

We call the algorithm problems with time complexity of non-polynomial complexity NP problems. As the data scale n increases, the execution time of non-polynomial complexity algorithms will increase sharply, and the execution time of solving problems will increase infinitely. Therefore, algorithms with non-polynomial time complexity are actually very inefficient.

1. $O(1)$ - Constant Time Complexity#

This '1' is just a representation.

As long as the execution time of the code does not increase with the increase of n, or in general, as long as there are no loop statements or recursive statements in the algorithm, even if there are thousands of lines of code, the time complexity is $O(1)$.

2. $O(\log n)$, $O(n\log n)$ - Logarithmic Time Complexity#

All logarithmic time complexities can be represented as $O(\log n)$, and $O(n\log n)$ is the same.

According to the change of base formula, logarithms can be converted to each other, so any logarithm can be converted to $Cf(n)$, where $f(n)$ is the logarithm. According to the conclusion obtained earlier, constants do not affect the scale of the code and can be ignored. Therefore, in the representation method of logarithmic time complexity, we ignore the "base" of the logarithm and represent it as $O(\log n)$.

3. $O(m+n)$, $O(m*n)$ - Time Complexity of Two Data Scales#

In general, the complexity of the code determined by the scale of two data cannot be simplified by using the addition rule to omit one of them. Therefore, we use $O(m+n)$ and $O(m*n)$ to represent it.

In this case, the addition rule should be:

\because T_1(m)=O\big(f(m)\big), T_2(n)=O(g(n))\\\\ \therefore T_{total}(n)=T_1(m)+T_2(n)=O\big(f(n)+g(n)\big)

And the multiplication rule remains the same.

Space Complexity Analysis#

Space complexity, also known as asymptotic space complexity, represents the growth relationship between the storage space of an algorithm and the data scale.

Let's take an example to see.

#include <stdio.h>
int main(){
    int a[n];
    for(int i=0; i<=n; ++i)
        a[i] = i * i;
    for(int i=n-1; i>=0; --i)
        pritntf("%d", a[i]);

    return 0;
}

So, analyzing this code, the third line allocates an int array of size n, and other than that, the remaining code does not occupy more space, so the space complexity of the entire code is $O(n)$.

3. Summary#

Complexity, also known as asymptotic complexity, includes time complexity and space complexity, which are used to analyze the growth relationship between the execution efficiency of an algorithm and the data scale. It can roughly represent that the higher the complexity, the lower the execution efficiency of the algorithm. There are not many common complexities, from low to high, there are $O(1)$, $O(\log n)$, $O(n)$, $O(n\log n)$, $O(n^2)$.

The Proportional Relationship between T(n) and n in Various Common Complexities

Reference Articles#

Basic Usage of MathJax

Data Structures and Algorithms - Wang Zheng