Software Development : Multithreaded Programming

From bitrary
Jump to: navigation, search

Software Development : ABC


What is a Thread

Probably the Simplest Approximation

A thread is a single control flow.

A Simple Approximation

For the sake of simplicity You might imagine a thread to be a living creature that freezes from time to time and interacts with other living creatures. The creatures might keep some items private, they might share some items, they might race with each other by having running competitions and they might not fit to the entrance of a tunnel all at the same time. The creatures might even kill each other, die of "old age" or be killed by a god called "computer user".

Unfortunately that approximation does not cover many aspects of threads, which is why there exists the next section titled...


A more Exact Explanation of a Thread

If assembly language code is a set of roads at a game level map then a thread is a carriage that has at most one horse (single CPU core) moving it. In the world of computers the carriage would be an execution flow pointer in the assembly code. The carriage can stand still in 2 occasions: standing still with the horse (horse/CPU is idle, sleeping) and standing still without having a horse attached to it. A horse can be disconnected from one carriage and re-connected to some other carriage that does not yet have a horse attached to it. There can be more carriages than there are horses. Horses can warp from one game level map to another. Carriages can move around and warp only within a single game level map and they can do that only, when a horse is attached to them. There are bridges (critical sections in computer terms) that can be crossed only by one carriage at a time. A critical section is considered to be locked, when a carriage is on the bridge.


The activity of determining, how long and when each horse (a CPU core) spends servicing each carriage (thread) is called scheduling. The moving of a horse from one carriage to another is called context switching. If the number of carriages is greater than the number of horses, then the impression that all carriages are moving at the same time, generally called as concurrency, is achieved by moving the horses (CPU cores) between different carriages (threads) after very small time intervals. The work of the horses (CPU cores) is parallel, because the horses do work at the same time, but if there are more carriages (threads) than there are horses, then the moving of the carriages (execution of threads) is parallel at random and concurrent at all time. That explains the expression concurrency.


The analogy of the RAM that is managed by a thread is a set of goods on a carriage and the analogy of CPU-registers is a carriage specific saddle on the horse where the different pockets of the saddle are the CPU-registers. When the horse is moved from one carriage to another (when a context switch happens), the horse leaves its saddle behind (the values of the CPU-registers of that thread are saved) and takes on the saddle of the new carriage (the values of the CPU registers of the other thread are loaded to the registers). The saving of CPU register states during the context switch explains some of the extra complexity of the logical model of single CPU cores that support the execution of multiple, concurrent, threads. The rest of the complexity of the logical model of single CPU cores comes from speed optimizations, energy consumption optimizations and memory management. The anology of the memory management is the management of the "goods" in carriages. Some "good items" are shared between carriages, even between carriages that reside at different game level maps. The proper computer term for the sharing of goods is shared memory. The carriage goods might also be shared with some shelve at the game map, for example a bridge that can be crossed by only one carriage at a time (a critical section).


The activity of throwing a carriage to a black hole is called "thread killing". Operating system scheduled threads, designated in this informal explanation as carriages, can be created by asking the operating system to create them. The asking can be done by using some special library like the C++ OpenMP, but some of the operating system scheduled threads might be registered at creation time as treads, "carriages", with special records attached to them and those threads, the ones with the special records, are called "operating system processes" or by their shorter name, "processes". Killing of the thread with the records is also called process killing. Linux and BSD have an application for killing operating system processes and it is called: "kill". The Linux and BSD have also an application called "killall".


Threads and Design Patterns

Many, if not most, programming languages support thread dependent design patterns by having their own thread types. The programming language specific threads might be, but do not have to be, mapped one-to-one to operating system processes. If they are not mapped one-to-one to operating system processes, then the programming language implementation has its own scheduler for the threads. (The scheduler of the programming language implementation may depend on the operating system scheduler.)


Threadsafe code

To make sure that a calculation result does not depend on the order of the threads that access calculation input data, the code that is executed by more than one thread should be seen as a function that gets all its input data in a form of constants during its function call. The formal expression is functional programming without any side effects. Concurrent threads must not overwrite each others' calculation results, function outputs.


A thread must never read values that some other thread has not finished writing. For example, a data record that consists of 2 fields, "forename" and "surname", might have a form of some container, for example, an array with 2 elements or a hashtable with 2 key-value-pairs or some instance (sometimes also called as "an object"). Most likely the "forename" and "surname" fields have some default values, which might be even some "nil" or "null" or "0" or an empty string. If one thread, in this example, $thread_1$, is reading the whole record, the whole pair, in before another thread, in this example $thread_2$, has finished writing it, then it is possible, that the $thread_1$ reads in $("Joe","nil")$ while the $thread_2$ has not completed writing the surname, "TheGreat", and the $thread_1$ performs all its calculations with the flawed value of $("Joe","nil")$ in stead of using the correct value of $("Joe","TheGreat")$. The solution is to lock the record by making the code that reads the record and the code that writes the record, a critical section (a bridge that can be crossed only by a single carriage at a time), a code region that can be entered by only a single thread. That way the thread (carriage) that writes the record, completes its task (moves off the bridge) before any other thread is allowed to read or write the record (move on the bridge). The writing of the pair of names, forename and surname, can be seen as a singe compound operation. Compound operations that are carried out at a critical section are called atomic operations.


Locks

There are multiple possible mechanisms that can be used for making sure that the critical section is run by only one thread at a time (at most one carriage moves on the bridge that can be crossed by only one carriage at a time). The simplest mechanism is a lock and the rest, more complex and more flexible, mechanisms can be developed by advancing the idea of the lock. Critical section (the bridge) is considered to be locked, if there is at least one thread (carriage) executing it (residing on the bridge).


Locks can be implemented by using a thread that is dedicated for reading and writing its private variable, b_critical_section_locked (the b_critical_section_locked is a good item in the carriage that is an analogue of that thread). To simplify the explanation in this section, that dedicated thread is named in this text as lock_administrator_thread, but it is not an official term, nor is it a widely used name like the Bob, Alice and Eve are at cryptography texts. A procedure oriented, extremely simplistic, implementation of a lock can look like this:


# All variables are writable by all threads, like in the 
# language C, but there are rules that determine, which 
# variables are written/read by which thread.

# Read and written by the lock_administrator_thread and 
# the thread in the critical section:
    boolean b_declare_critical_section_exit=false; 
    string_or_integer x_critical_section_entry_granted_to_ID=null;
    # The ID is never the ID of the lock_administrator_thread, but
    # it designates an ID of a thread that actually wants
    # to do some application specific work at the critical section.

# Written and read only by the lock_administrator_thread
    boolean b_critical_section_locked=false;

# Read by the lock_administrator_thread and 
# written by all the rest of the threads:
    string_or_integer x_thread_ID_that_requests_entry=null;

function run_thread_1(){
    if(x_critical_section_entry_granted_to_ID==my_ID){
        #--------
        # It is OK to overwrite some other thread's    '
        # critical section entry request. It is 
        # before the call to the run_critical_section_code()
        # to save a little bit of time.
        x_thread_ID_that_requests_entry=null; 
        #--------
        run_critical_section_code();
        x_critical_section_entry_granted_to_ID=null;
        b_declare_critical_section_exit=true;
    } else {
        if(x_critical_section_entry_granted_to_ID==null){
            # some other thread might perform the same test at this line
            x_thread_ID_that_requests_entry=my_ID;
            # some other thread might overwrite the value of 
            # the x_thread_ID_that_requests_entry at this line
        } else {
             # Use some CPU interrupt/trap mechanisms or 
             # run assembler NOP commands.
             sleep_or_waste_a_little_bit_of_time();
        }; # else
    }; # else
};

# The run_thread_2() and the run_thread_3() are 
# similar to the run_thread_1()

run_lock_administrator_thread(){
    if(b_critical_section_locked==true){
        if(b_declare_critical_section_exit==true){
            b_declare_critical_section_exit==false;
            if(x_critical_section_entry_granted_to_ID!=null){
                throw_exception("This can not be happening.");
            }; # if
            if(x_thread_ID_that_requests_entry!=null){
                x_critical_section_entry_granted_to_ID=
x_thread_ID_that_requests_entry;
                # The critical section remains locked,
                # just that now it is locked for the 
                # new entry.
            } else {
                b_critical_section_locked=false;
            }; # else             
        } else {
             # The thread in the critical section is
             # still busy in the critical section.
             sleep_or_waste_a_little_bit_of_time();
        }; # else
    } else { # critical section is unlocked
        if(b_declare_critical_section_exit==true){
            throw_exception("This can not be happening.");
        };
        if(x_thread_ID_that_requests_entry==null){
            # It might also be that the critical section
            # is free to enter, unlocked, but 
            # no thread wants to enter it.
            sleep_or_waste_a_little_bit_of_time();
        } else { # some thread wants to enter the critical section
            b_critical_section_locked=true;
            x_critical_section_entry_granted_to_ID=x_thread_ID_that_requests_entry;
        };
    }; # else
};

while(true){
    if(random_integer(0,1)==1) run_thread_1();
    if(random_integer(0,1)==1) run_thread_2();
    if(random_integer(0,1)==1) run_thread_3();
    if(random_integer(0,1)==1) run_lock_administrator_thread();
};

A general rule with locks is that they should be used as sparingly as possible, because the situation, where multiple threads, may be even tens or hundreds of threads, are waiting for some other thread to exit the critical section, slows down the application a lot. For example, in case of web applications the central database can be a serious performance limiter, specially for sites that get hundreds of requests per second. One way to mitigate the bottleneck that simple locks impose, is to use 2 different types of locks: write-lock and read-lock. The idea behind the write-lock is that it is forbidden to overwrite data, while at least one thread is reading the data. The read-lock is the same as the ordinary lock and its idea is that data that is in the process of being updated, should not be read by any threads. Write-locked data can be simultaneously read by more than one thread and the accountancy for keeping track of the number of threads that reside in the semi-critical section that is used for only reading the data, is the following:

critical_section_with_an_ordinary_lock{
    function increment_threadcount(){ i_threadcount++; }
    function decrement_threadcount(){ i_threadcount--; }
};

critical_section_with_the_writelock{
    read_the_data(){ blablabla; }
};


my_single_thread_with_an_ID_of_42{
    # no locks acquired by this thread
    increment_threadcount()
    # ordinary lock released, writelock on
    read_the_data()
    # writelock still on
    decrement_threadcount()
    # ordinary lock released, writelock off if no other thread has writelock
};

In the classical operating system terminology the locks are called semaphores. A deadlock is a situation, where at least 2 threads, both in different critical sections, are waiting for the other thread to exit the critical section that the other thread has locked. For example, if $thread_1$ is in $criticalsection_1$ and $thread_2$ is in $criticalsection_2$ and the $criticalsection_1$ has a function call that takes the control flow to the $criticalsection_2$ and the $criticalsection_2$ has a function call that takes the control flow to the $criticalsection_1$, then the $thread_1$ waits for the $thread_2$ to exit the $criticalsection_2$, but the $thread_2$ can not exit the $criticalsection_2$, because before exiting the $criticalsection_2$ it needs to visit the $criticalsection_1$, which is occupied by the $thread_1$. A race condition is a situation, where the calculation result depends on the order that the threads that have equal roles read or write some data.


Lock Implementation Examples from Practice

The locking system of the SQLite database engine can be seen as one example(archival copy, archive_org_copy).

Thread Dependent Design Pattern: GUI

Some tasks, for example, feedback to users through graphical user interface (GUI), have an inherent requirement that they must be executed within some time frame. The solution is to put long calculations to a separate thread that has lower priority than the thread that handles GUI events. Often times, for example, in the case of JavaScript in 2015 era web browsers, individual GUI events do not reside at their own threads and there exists a single thread for executing the JavaScript events sequentially.


Thread Dependent Design Pattern: Robot Control

Tasks that are timing-wise not rigidly dependent on each other, can be placed to different threads. For example, it does not matter, when a garbage truck takes away the trash or when goods arrive to a grocery store, as long as it is done often enough. For example, in robotics the conveyor belts can run in their own threads, exchange the conveyed goods with neighboring robots and neighboring conveyor belts at their own pace, leaving the checks, whether there exists free space for new goods, to the robotic arms and other systems.


Thread Dependent Design Pattern: Map-Reduce

The details of it are described at the loops section, but the general idea is that different threads can be run on different CPU cores or even on different computers, allowing more hardware to work in parallel and the scheme is:

some_part_of_a_singlethreaded_application();
dismantle_the_problem_to_independent_tasks_so_that_they_be_run_in_parallel();
run_the_independent_tasks_in_different_threads();
assemble_the_computation_results_of_all_of_the_different_threads_together();
do_something_in_a_single_thread_by_using_the_assembled_computation_result();


Data parallelism is based on the idea that different pieces of data, for example, different photos, are processed in different threads. Task parallelism is based on the idea that the same task, for example, building a house, can be divided to multiple sub-tasks that can be carried out in parallel, independently of each other, in different threads. For example, a single task of building all walls of a single floor of a building can be carried out in parallel by multiple construction workers, who work on different regions of the walls. The transformation of the initial task to independent, concurrently executable, sub-tasks is called parallelisation.


Thread Dependent Design Pattern: Application Level Threads

In addition to operating system threads and programming language implementation specific threads, there exist application level threads that are not specific to any programming language or operating system. For example, an application level atomic operation that must be conducted at an application level critical section is writing of data to a database of some web application by using multiple calls from application code, may be PHP or Ruby, to the database server, may be some SQL-server.


The existence of an application level threads can be difficult to spot, but if they are not designed in from the very start, as part of the software architecture, a lot of development time can get wasted.


Asynchronous Communication

Threads of a single application, for example, a web application, can run on different computers that may shut down at any moment. Communication between the computers is never 100% reliable, even within a single rack. If thread A sends a function call, command, message, to thread B and waits for the B to answer, then it might happen that the B dies and the A waits forever, unless the A has some spare plan for doing something else.

Another scenario to consider is that the A receives hopelessly out of date answers. For example, may be the B did not die, but was too slow to answer and the A has sent to the B a new function call request with new data. The newer function call results might arrive sooner to the A than the results of the older function call request. May be the older function call request was more expensive to execute.

Ideas and Observations

  • When designing software architecture, always assume that messages can get lost, communication channels can brake, other threads can die at any moment, remote computers can go down at any moment.
  • Each single message must have a unique ID, each session, i.e. set of messages, must have a unique ID.
  • Messages arrive at their destination and are answered in pseudo-random order.
  • In case of crappy threading models, like the one of JavaScript, where threads do not have an explicit sleeping state, the sleeping during asynchronous communication can be simulated by packaging all of the code of a "simulated thread" to a single instance, along with a state machine of the "simulated thread". The state machine can have a state "sleep". The simulated thread will be awakened with the arrival of a proper message or occurrence of some specific event.
  • If 2 Threads exchange data with each other by using commonly accessible memory, then one idea for the implementation is that the common memory is divided to 2 regions and each of the regions is cooperatively writable only by one of the threads, so that there is one-to-one relation between the memory region and the writing thread till both of the memory regions get deallocated, returned to the operating system. Both threads an read both memory regions whenever they want.

Buzz-expressions

The buzz-expressions change over time. A partial list of them is:


References

Cluster Computing Related References


CPU Architecture Related References


General References