This article reviews parallel and distributed computing. Further specialized articles that focus on particular systems follow later.
The most important question regarding parallel and distributed computing is: Why do it?
The most straightforward answer is: do it whenever there is no alternative!
A conventional, linear program running in one address space with one thread is easier to develop, simpler to debug and usually painless to operate. A parallel program is more complicated to develop, substantially more difficult to debug and more burdensome to operate. Whenever possible, parallelism should be avoided.
This may sound like strange advice from someone who makes a living working with parallel software. Still, there are situations where the use of distributed resources cannot be avoided. And it is in those cases that I step in and look for the best ways to do it. So let's look at some places where a linear program is insufficient:
The first category of high-performance computing is important whenever the computational demand exceeds the resources of a single machine. This may be the case for computations with huge data models that cannot fit into the memory of a single machine. Or it may apply to computations that are so costly that any single CPU would take many years to complete the calculation. Climate models, computer simulations of crash tests and financial market simulations are examples of such situations, but also projects like SETI@home and similar come to mind.
Projects like SETI@home are examples of computation-bound systems, i.e. the data size is not a problem at all. Climate models demand both huge memory and computational power to be sufficiently accurate. It is not possible to just run them on a couple of loosely coupled smaller machines. For such applications dedicated supercomputers are needed.
The second category of distributed service may regard physical as well as virtual resources. Every office has mail servers, print servers and file servers. And every business also has at least one web servers. They may be on physically distinct machines or simply reside on virtual machines. Some of these services, like the print server, needs specialized hardware.
The last category was actually the first model that was available in the age of mainframes. Users worked at a single mainframe by remote terminals. As compute power became cheaper, they were replaced by isolated machines that had everything a user needed locally. Today, we are moving back to the more centralized model by using virtual servers and systems.
High-availability is a special requirement that can be met by duplicating resources at several installations. Although not strictly speaking a parallel computing challenge, it belongs to the general class of distributed computing systems and thus is listed here.
All these different applications of distributed systems require varying models and have dissimilar needs and requirements. To find the “best solution” for a given job one needs to exactly find and specify the requirements. In the following I give examples and name specific solutions for the cases in questions.
This is probably the most important step of all. It is not sufficient to understand the goal the final system is supposed to achieve, but also to understand the limitations of each technology. There is no “one size fits all” solution to problems of distributed computing. In subsequent articles I even give examples where designing for performance may trump holding on to basic principles of OO design. That' right – we often need to make compromises and a good developer of distributed software does not only need to know the rules of system design, but more importantly to know which rule to break.
This means identifying what needs to be distributed:
It is important that remote code execution introduces far more complexity than if only data is exchanged. In general, debugging is much more complicated and understanding the information flow in the system is not supported well by standard development tools — one needs to figure out where a piece of data came from and why it looks like it does.
There are other pitfalls specific to such a situation: a race condition can occur where the result of the calculation depends on the timing of threads. Locks may be needed to implement mutual exclusion, but may need to deadlock situations.
These problems all need to be anticipated and worked around during the design stage and this is what makes distributed systems so much more complex.
In the following we discuss a couple of techniques that can be used to implement distributed systems.
Below I list a couple of techniques which are available on almost any POSIX-compliant system. In some cases the simpler techniques are enough to solve a given problem without more complex middleware.
While it is possible to communicate via control files, you have to keep in mind that the files need to be communicated via a network file system among the machines involved. So the network file system serves as an abstraction of the actual network infrastructure, but does not offer the full power and flexibility the network normally has.
Implementation: Practically, you will need to have at least two files for each pair of client/servers: a file the server uses to send a request to the client(s) and a file for the responses. If each process can communicate both ways, you need to duplicate this. Furthermore, both the client(s) and the server(s) work on a “pull” basis, i.e., they need to revisit the control files frequently and see if something new has been delivered.
The advantage of this solution is that it minimizes the need for learning new techniques. The big disadvantage is that it has huge demands on the program logic; a lot of things need to be taken care of by the developer – will the files be written in one piece or can it happen that any party picks up inconsistent files? How frequently should checks be implemented? Do I need to worry about the file system, caching, etc? Can I add encryption later without toying around with things outside of my program code? …
If portability was an issue then this solution would be easy to port to different systems and even different programming languages. In general, though, I do not recommend this approach unless the other solutions are notably inferior (see e.g. the Qt/OpenGL client on my scientific software page).
This approach is commonly used for many client/server use cases. Mail servers, web servers etc. all are based on TCP/IP sockets, making it a viable and popular option. Furthermore, traffic can easily be rerouted to different ports using SSH/SSL-style VPN connections. Plus, they are available on practically any system that supports TCP/IP.
Implementation: Practically, you will need a similar logic as before, i.e., client(s) and server(s) communicating via the network. A definite plus of this approach is that the processes can work on a “push” basis, i.e., they can listen on a socket until a message arrives which is superior to needing to check control files regularly. Network corruption and inconsistencies are also not your concern.
The advantage is that a lot of things are taken off your shoulders. The disadvantage is that you still need to change your program logic substantially in order to make sure that you send and receive the correct information (file types, data formats etc.).
One more note: As soon as numbers are involved their endianness becomes an issue and the developer has to take care of this problem manually – this is a common (!) special case of the “correct information” issue I mentioned above.
This is an advanced technique that is discussed in a separate article in more detail. Common candidates in this category are MPI, OpenMP and PVM. The most common use is in massively parallel applications and supercomputing when data needs to be distributed and communicated efficiently. Usually the hardware is uniform, so conversion between different endianness etc. is not an issue (and even if it was, the middleware would automatically take care of this problem).
Implementation: Practically, the program logic needs to be adapted. However, in many cases the changes are not as drastic as in the previous cases since in many numerical applications the work distribution is symmetric, i.e., every node does the same thing with different data packets. Thus, a program for 64 nodes may be very similar to one for 2 nodes and the difference to the linear program just sits in additional code to treat boundaries and a initial and final part that distributes and collects results.
This is probably the most powerful and versatile approach of all. Unfortunately, it is also the most difficult to learn and use. The idea is that the high-level program logic does not need to be changed. The objects communicate via messages and receive results and the messages together with the return types are identical to what you are used to from your local implementation.
Implementation: The state-of-the-art middleware in this category is CORBA. In a later article I discuss this architecture in more detail and give working example code. The strengths of CORBA are platform independence and interoperability. CORBA programs can work together regardless of the machines and the programming languages used. Still, the architecture is among the most efficient since the transmission format is optimized.
If one does not need platform and language independence, there are a couple of proprietary technologies which offer similar benefits. These architectures differ in license and focus.
Java RMI (Remote Method Invocation) allows remote and distributed objects, but is not independent of the programming language. IBM Websphere MQ is quite versatile, but it is also a proprietary technology with no quality Open-Source implementations. Apple's
NSProxy class cluster, Microsoft's DCOM and .Net are similar proprietary technologies. Whereas for Apple's distributed object technology an Open-Source implementation named Gnustep is available, the implementations of the mechanism are not interoperable, only source-code compatible with MacOS X.
Such a native technology will in general be a better option if one is restricted to a particular platform. If interoperability of different platforms is required, CORBA is an ideal candidate.
My business has recently worked on a couple of projects that involve iOS (as used on the iPhone and iPad) as a use case for distributed data. This is a very natural application, so I would like to summarize my findings (status: July 2010):
NSProxy. This mechanism is also available for the Open-Source Gnustep implementation. There is an introduction with two explicit examples at the Gnustep project server; it illustrates how the technology works and is a good starting point for experimenting.
NSProxyis available on the iPhone, the other parts of the distributed objects mechanism are not. This excludes this solution for networking MacOS X with iPhone/iPad at this moment.
NSConnectionbased solution is not available, there are others like a CORBA implementation named AdORB. Thus, CORBA is a viable option for this operating system if the license conditions for the CORBA solution can be satisfied. Regarding the use of CORBA in general for such projects, this thread on StackOverflow gives a couple of different perspectives. Clearly, a careful drafting of requirements is crucial before settling on this option.
In conclusion I can say that for mobile systems the range of options is more limited, but so are the typical use cases. There is simply no need for juggling huge amounts of data and running time-critical simulation code, thus the available option reflect most user's needs on such a platform quite well.
In distributed computing identifying the exact requirements is the key step. Is the intention to exchange data, is it necessary to distribute computation or is it desired to have a user model that needs remote programs? The answer to this question determines the solution that best fits.
In general, distribution adds a tremendous amount of complexity to each program. Debugging remote calls is more complicated and difficult than local calls, and concurrency issues (like interdependency of threads) are abundant. Furthermore, remote calls are substantially slower than local ones, typically orders of magnitude. Thus, optimization issues are much more pressing than in purely local programs.
Imprint / Impressum
© 1997-2013 Dr. Wolfram Schroers. This site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.
Additional permissions available at http://www.field-theory.org/editorial/index.html.
Wolfram is a leading software engineer focused on Enterprise and B2B apps on iOS. His clients rank from small independent studios to companies in the German DAX index.
He has worked at top Universities on three continents in the past decade and is a popular speaker at conferences. He is currently working in Berlin, Germany, and can be reached at his company website.