field-theory.org
the blog of wolfram schroers

Parallel and distributed computing — Overview

This article reviews parallel and distributed computing. Further specialized articles that focus on particular systems follow later.

Table of contents:

Introduction
Know thy requirements!
Overview of technologies
Practical: working with MacOS, the iPhone and the iPad
Summary and conclusions

Introduction

The most important question regarding parallel and distributed computing is: Why do it?

The most straightforward answer is: do it whenever there is no alternative!

A conventional, linear program running in one address space with one thread is easier to develop, simpler to debug and usually painless to operate. A parallel program is more complicated to develop, substantially more difficult to debug and more burdensome to operate. Whenever possible, parallelism should be avoided.

This may sound like strange advice from someone who makes a living working with parallel software. Still, there are situations where the use of distributed resources cannot be avoided. And it is in those cases that I step in and look for the best ways to do it. So let's look at some places where a linear program is insufficient:

  • High-performance computing/supercomputing
    • Computation needs to be distributed
    • Data needs to be distributed
  • Specialized systems – service is distributed
  • Users are distributed
  • High availability

The first category of high-performance computing is important whenever the computational demand exceeds the resources of a single machine. This may be the case for computations with huge data models that cannot fit into the memory of a single machine. Or it may apply to computations that are so costly that any single CPU would take many years to complete the calculation. Climate models, computer simulations of crash tests and financial market simulations are examples of such situations, but also projects like SETI@home and similar come to mind.

Projects like SETI@home are examples of computation-bound systems, i.e. the data size is not a problem at all. Climate models demand both huge memory and computational power to be sufficiently accurate. It is not possible to just run them on a couple of loosely coupled smaller machines. For such applications dedicated supercomputers are needed.

The second category of distributed service may regard physical as well as virtual resources. Every office has mail servers, print servers and file servers. And every business also has at least one web servers. They may be on physically distinct machines or simply reside on virtual machines. Some of these services, like the print server, needs specialized hardware.

The last category was actually the first model that was available in the age of mainframes. Users worked at a single mainframe by remote terminals. As compute power became cheaper, they were replaced by isolated machines that had everything a user needed locally. Today, we are moving back to the more centralized model by using virtual servers and systems.

High-availability is a special requirement that can be met by duplicating resources at several installations. Although not strictly speaking a parallel computing challenge, it belongs to the general class of distributed computing systems and thus is listed here.

All these different applications of distributed systems require varying models and have dissimilar needs and requirements. To find the “best solution” for a given job one needs to exactly find and specify the requirements. In the following I give examples and name specific solutions for the cases in questions.

Know thy requirements!

This is probably the most important step of all. It is not sufficient to understand the goal the final system is supposed to achieve, but also to understand the limitations of each technology. There is no “one size fits all” solution to problems of distributed computing. In subsequent articles I even give examples where designing for performance may trump holding on to basic principles of OO design. That' right – we often need to make compromises and a good developer of distributed software does not only need to know the rules of system design, but more importantly to know which rule to break.

This means identifying what needs to be distributed:

Data
This usually has to happen. Different parts of the system may or may not have the same work assignment. But even if they do the same kind of work, they will typically need to do so with different data. An exception are fault-tolerant systems that replicate one computation on several machines and verify that the results agree.
Code
In many numerical applications it is not necessary to execute different parts of the calculation on different machines. In enterprise-level distributed applications, on the other hand, it is common that different systems specialize in one task. In the former case, remote code execution is not necessary, in the latter it is.
User
When the user work flow gets distributed we have a special case of the distributed code one. We further need to distinguish between the case where user interactions are not interdependent (easy) and where they may depend on each other (hard). The latter case is hard to handle in practice since the user needs to be notified about conflicts, but should not need to know about the details of the design. A real-world example might be several people sharing a single bank account, with equal rights for some transactions.

It is important that remote code execution introduces far more complexity than if only data is exchanged. In general, debugging is much more complicated and understanding the information flow in the system is not supported well by standard development tools — one needs to figure out where a piece of data came from and why it looks like it does.

There are other pitfalls specific to such a situation: a race condition can occur where the result of the calculation depends on the timing of threads. Locks may be needed to implement mutual exclusion, but may need to deadlock situations.

These problems all need to be anticipated and worked around during the design stage and this is what makes distributed systems so much more complex.

In the following we discuss a couple of techniques that can be used to implement distributed systems.

Overview of technologies

Below I list a couple of techniques which are available on almost any POSIX-compliant system. In some cases the simpler techniques are enough to solve a given problem without more complex middleware.

Control files

While it is possible to communicate via control files, you have to keep in mind that the files need to be communicated via a network file system among the machines involved. So the network file system serves as an abstraction of the actual network infrastructure, but does not offer the full power and flexibility the network normally has.

Implementation: Practically, you will need to have at least two files for each pair of client/servers: a file the server uses to send a request to the client(s) and a file for the responses. If each process can communicate both ways, you need to duplicate this. Furthermore, both the client(s) and the server(s) work on a “pull” basis, i.e., they need to revisit the control files frequently and see if something new has been delivered.

The advantage of this solution is that it minimizes the need for learning new techniques. The big disadvantage is that it has huge demands on the program logic; a lot of things need to be taken care of by the developer – will the files be written in one piece or can it happen that any party picks up inconsistent files? How frequently should checks be implemented? Do I need to worry about the file system, caching, etc? Can I add encryption later without toying around with things outside of my program code? …

If portability was an issue then this solution would be easy to port to different systems and even different programming languages. In general, though, I do not recommend this approach unless the other solutions are notably inferior (see e.g. the Qt/OpenGL client on my scientific software page).

Sockets

This approach is commonly used for many client/server use cases. Mail servers, web servers etc. all are based on TCP/IP sockets, making it a viable and popular option. Furthermore, traffic can easily be rerouted to different ports using SSH/SSL-style VPN connections. Plus, they are available on practically any system that supports TCP/IP.

Implementation: Practically, you will need a similar logic as before, i.e., client(s) and server(s) communicating via the network. A definite plus of this approach is that the processes can work on a “push” basis, i.e., they can listen on a socket until a message arrives which is superior to needing to check control files regularly. Network corruption and inconsistencies are also not your concern.

The advantage is that a lot of things are taken off your shoulders. The disadvantage is that you still need to change your program logic substantially in order to make sure that you send and receive the correct information (file types, data formats etc.).

One more note: As soon as numbers are involved their endianness becomes an issue and the developer has to take care of this problem manually – this is a common (!) special case of the “correct information” issue I mentioned above.

Message-passing middleware

This is an advanced technique that is discussed in a separate article in more detail. Common candidates in this category are MPI, OpenMP and PVM. The most common use is in massively parallel applications and supercomputing when data needs to be distributed and communicated efficiently. Usually the hardware is uniform, so conversion between different endianness etc. is not an issue (and even if it was, the middleware would automatically take care of this problem).

Implementation: Practically, the program logic needs to be adapted. However, in many cases the changes are not as drastic as in the previous cases since in many numerical applications the work distribution is symmetric, i.e., every node does the same thing with different data packets. Thus, a program for 64 nodes may be very similar to one for 2 nodes and the difference to the linear program just sits in additional code to treat boundaries and a initial and final part that distributes and collects results.

Web services

The term “web services” applies to a couple of mechanisms that operate via the HTTP protocol. Thus, it is easy to use existing infrastructure and also to add encryption of the transport layer. The downside is that a developer needs to take care of packing and unpacking the data themselves. The SOAP protocol – a popular way to serialize data before transmission – is XML-based; this means that the data is not packed efficiently and needs to be parsed on the client side, a comparatively time-consuming process. The JSON protocol is more efficient and based on the Javascript notation of objects. Despite its origin, it is language independent and parsers exist for (virtually) all programming languages. JSON is the preferred method for many Apple services, e.g. Push Notifications.

Distributed objects

This is probably the most powerful and versatile approach of all. Unfortunately, it is also the most difficult to learn and use. The idea is that the high-level program logic does not need to be changed. The objects communicate via messages and receive results and the messages together with the return types are identical to what you are used to from your local implementation.

Implementation: The state-of-the-art middleware in this category is CORBA. In a later article I discuss this architecture in more detail and give working example code. The strengths of CORBA are platform independence and interoperability. CORBA programs can work together regardless of the machines and the programming languages used. Still, the architecture is among the most efficient since the transmission format is optimized.

If one does not need platform and language independence, there are a couple of proprietary technologies which offer similar benefits. These architectures differ in license and focus.

Java RMI (Remote Method Invocation) allows remote and distributed objects, but is not independent of the programming language. IBM Websphere MQ is quite versatile, but it is also a proprietary technology with no quality Open-Source implementations. Apple's NSProxy class cluster, Microsoft's DCOM and .Net are similar proprietary technologies. Whereas for Apple's distributed object technology an Open-Source implementation named Gnustep is available, the implementations of the mechanism are not interoperable, only source-code compatible with MacOS X.

Such a native technology will in general be a better option if one is restricted to a particular platform. If interoperability of different platforms is required, CORBA is an ideal candidate.

Practical: working with MacOS, the iPhone and the iPad

My business has recently worked on a couple of projects that involve iOS (as used on the iPhone and iPad) as a use case for distributed data. This is a very natural application, so I would like to summarize my findings (status: July 2010):

  • MacOS/Cocoa (NOT iOS): As mentioned above, MacOS X offers a proprietary distributed object system based on the class cluster around NSProxy. This mechanism is also available for the Open-Source Gnustep implementation. There is an introduction with two explicit examples at the Gnustep project server; it illustrates how the technology works and is a good starting point for experimenting.
    Unfortunately, the disadvantages are a loss of compatibility with other systems and loss of portability to other languages. Gnustep with Objective-C is at best code-compatible, but there is no way to communicate between Gnustep and Cocoa.
    While NSProxy is available on the iPhone, the other parts of the distributed objects mechanism are not. This excludes this solution for networking MacOS X with iPhone/iPad at this moment.
  • Distributed file system: I am not aware of any way of implementing a network file system on iOS. Thus, control files are not an option. Nonetheless, I would not recommend this approach even if they were!
  • Sockets: Socket programming via TCP/IP is a very natural approach on iOS. However, endianness is a big issue – even the PowerPC-based Macs and the Intel-based Macs have different formats here. This has to be kept in mind when one considers exchanging binary data!
  • Message-passing middleware: This type of middleware typically targets numerical calculations which are of no interest on a mobile device. Thus, there is no point in trying it. (That was obvious, wasn't it?)
  • Web services: This is a natural candidate for all mobile communication needs. Since the web infrastructure allows for easy encryption and is in place already, web services are suited for all non-performance-critical communications. There is a good thread on StackOverflow discussing available options. As mentioned above, many of Apple's own services use the JSON format for remote services, although they locally use XML to store application data.
  • Distributed objects: As pointed out above, Apple's native NSConnection based solution is not available, there are others like a CORBA implementation named AdORB. Thus, CORBA is a viable option for this operating system if the license conditions for the CORBA solution can be satisfied. Regarding the use of CORBA in general for such projects, this thread on StackOverflow gives a couple of different perspectives. Clearly, a careful drafting of requirements is crucial before settling on this option.

In conclusion I can say that for mobile systems the range of options is more limited, but so are the typical use cases. There is simply no need for juggling huge amounts of data and running time-critical simulation code, thus the available option reflect most user's needs on such a platform quite well.

Summary and conclusions

In distributed computing identifying the exact requirements is the key step. Is the intention to exchange data, is it necessary to distribute computation or is it desired to have a user model that needs remote programs? The answer to this question determines the solution that best fits.

In general, distribution adds a tremendous amount of complexity to each program. Debugging remote calls is more complicated and difficult than local calls, and concurrency issues (like interdependency of threads) are abundant. Furthermore, remote calls are substantially slower than local ones, typically orders of magnitude. Thus, optimization issues are much more pressing than in purely local programs.

For further reading on parallel and distributed systems, please proceed to the article on supercomputing and to the article on CORBA.