Introduction To Parallel Computing



welcome to our course on parallel programming in the following weeks we will see some of the cutting edge technologies for parallel programming see how to write efficient parallel programs and learn the basic principles and algorithms of this fast-moving and exciting field of computing in this first lecture we give a general introduction to parallel computing and study various forms of parallelism we will also give a summary of what you can expect in the rest of this course the first big question that we need to answer is what is parallel computing so let's define it we will say that parallel computing is a type of computation in which many calculations execute at the same time parallel computing is based on the following principle a computational problem can be divided into smaller subproblems which can then be solved simultaneously parallel computing assumes the existence of some sort of parallel hardware which is capable of undertaking this computation simultaneously that is in parallel having said that we will take a short glance into the history of parallel computing to better understand its origin despite its recent rise in popularity parallel execution was present ever since the early days of computing when computers were still mechanical devices it is difficult to identify the exact conception of the first parallel computing machine but we can agree on the following in the 19th century Charles Babbage invented the concept of a programmable computer which he called analytical engine the analytical engine was able to execute various user of written programs for this Babbage is credited for being a pioneer in the field of programmable computers after Babbage presented his ideas about the analytical engine and Italian general and mathematician luigi menabrea down a description of Babbage's computer menabrea envisioned an extension in which independent computations occur simultaneously at the same time his idea was to speed up Babbage's analytical engine modern computers based on transistor technology appeared much later in the 20th century researchers from IBM Daniel slotnick John Koch and Jean ándale to mention a few were one of the early evangelists of parallel processing Andals famous quote comes from this era he said for over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers regardless of Amdahl's visionary views at the time parallel computing did not become mainstream but it attained a niche with the high-performance computing community in high-performance computing in the main area of focus our parallel algorithms for physics and biology simulations better forecasting cryptography and applications that require a lot of processing power performance requirements of regular users were satisfied by increasing the clock frequency of the CPU which meant that the CPU could execute a higher number of instructions per second at the beginning of the 21st century it became obvious that scaling the processor frequency is no longer feasible after a certain point the reason is that the power required for the processor starts to grow non linearly with frequency this is known as the power wall so instead of increasing CPU clock frequency processor vendors turn to providing multiple CPU cores on the same processor chip and name these new devices multi-core processors it is interesting to note that these three stories share a common theme whether the need for faster computation arose from the sluggishness of a mechanical computer limitations of the available technology or the physical boundaries imposed by the powerwall additional computing power was provided through parallelization this brings us to the next big question why do we need parallel computing as you will learn in this course parallel programming is much harder than sequential programming there are several reasons for this first separating one sequential computation into parallel sub computation is a challenging mental task for the programmer and it's sometimes not even possible usually this means identifying independent parts of the program and executing them at the same time however some problems cannot be divided as shown in this figure the second source of difficulty is ensuring correctness we will see that many new types of errors can arise in parallel programming making the programmers life harder with these increased complexities by them would be butter writing parallel programs at all as shown in earlier anecdotes the main benefit of parallelization is increased program performance when we write a parallel program we expect it to be faster than its sequential counterpart thus speed-up is the main reason why we want to write parallel programs in fact achieving speed-up is the third source of difficulty in parallel computing the speed-up discussion brings us to the next big topic the relationship of parallel programming to a closely related discipline called computing programming in spite of their connection parallel and concurrent programming are not the same things there are many contrasting definitions of this programming disciplines and people sometimes disagree on what these disciplines are we will decide on one particular definition and stick to a parallel program is a program that uses the provided parallel hardware to execute a computation more quickly as such parallel programming is concerned mainly with efficiency parallel programming answers questions such as how to divide a computational problem into subproblems that can be executed in parallel or how to make best use of underlying parallel hardware to obtain the optimal speed up a concurrent program is a program in which multiple executions may or may not execute at the same time concurrent programming serves to structure programs better in order to improve modularity responsiveness to input and output events or understandability of the program as such concurrent programming is concerned with questions like when can a specific execution start when and how can two concurrent executions exchange information and how do computations manage access to shared resources such as files or database handles so parallel programming is mainly concerned with algorithmic problems numerical computations or big data applications examples of such applications include matrix multiplication data processing computer graphics rendering or simulation of fluid movement concurrent programming is targeted at writing a synchronous applications such as web servers user interfaces or databases while parallel programming is concerned mainly with speed-up concurrent programming is concerned with convenience better responsiveness and maintainability the two disciplines often overlap parallel programming may rely on insights from concurrent programming and vice versa computing programming may be used to parallel programming problems however neither discipline is the superset of the other having more clearly established what parallel programming is well let's take a look at various forms of parallelism it turns out the parallel computing manifests itself on various granularity levels for example some microprocessors have historically had a 4-bit word size number ranges that did not fit into the four bits had to be represented with multiple wards this meant that variables in the 8-bit range such as x and y had to be represented with 8 bits there is two words adding these two variables together required to separate add instructions each operating on 4-bit words word size subsequently increased to 8 bits than 16 bits and eventually 32 bits each time this happened several instructions could be replaced by just a single one this meant processing more data in the same time interval in effect performance was improved by parallelizing operations on the bit level so we call this form of parallelization bit level a decade ago we witnessed the switch from 32-bit to 64-bit architecture in commercial processors more recently a similar effect was achieved through the introduction of vector instructions in Intel and ARM processors on a more coarse-grained scale parallelism can be achieved by executing multiple instructions at the same time consider the following example here the processor has no reason not to compute B and C in parallel this is because neither B or C depends on the other to be computed this form of parallelization is thus called instruction level parallelism note that the computation of D however can only be started after both B and C have been computed so in summary instruction-level parallelism execute different instructions from the same instruction stream in parallel whenever this is possible finally task level parallelism deals with parallel execution of entirely separate sequences of instructions these instructions can execute on the same or entirely different data bit level and instruction level parallelism are exploited by the underlying parallel hardware in other words they are in most cases implemented inside the processor itself this course will mainly focus on task level parallelism which is usually achieved through software support this level of granularity will allow us to express general parallel algorithms the different forms of parallelism that we just saw manifest themselves on some concrete parallel hardware it turns out that there are many classes of parallel computers out there and we will take a look at some of them a multi-core processor is a processor that contains multiple processing units called cores on the same chip processor vendors such as the Intel AMD arm Oracle IBM and Qualcomm produced different models of multi-core processors a symmetric multiprocessor or SMP is a computer system with multiple identical processors that share memory and connect to it via bus here multiple execution units are not on the same chip an SMP itself can contain multi-core processors as is the case in this illustration a general purpose graphics processing unit GPGPU is a form of a coprocessor originally intended for graphics processing as a core processor it does not execute all user programs by default but can execute a program when this is explicitly requested by the host processor field programmable gate arrays fpga is another form of a core processor which can rewire itself for a given task their advantages improved performance when optimized for a particular application usually these are programmed in hardware description languages such as very log and HDL final class of parallel computers that we will mention our computer clusters groups of computers connected by a network that do not have a common shared memory sometimes computers in the cluster also have Co processors such as GPUs and then we call them heterogeneous clusters in this course we focused mainly on multi-core processors and symmetric multiprocessors which are closely related all our examples and code will be optimized for multi cores and SMPS however the algorithms that we will learn about generalize and can be implemented on most forms of parallel hardware in this lecture we'll learn about some basics of parallel computing the rest of this week will focus on simple parallel programming examples and on the performance analysis of parallel programs the second week deals with task parallelism and some basic parallel algorithms the third week focuses on data parallelism and data parallel computations with a preview of Scala parallel collections finally the fourth week will focus on data structures which are tailored for parallel computations in the next lecture we will learn how parallel computations are implemented on the JVM we will see the basic primitives used for parallelism and learn about some concrete foundations for parallel programming

One Comment

  1. Bijay Sahu said:

    Hello sir

    June 27, 2019
    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *