Provided for non-commercial research and education use. Not for reproduction, distribution or commercial use.



This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited.

In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit:

http://www.elsevier.com/copyright

J. Parallel Distrib. Comput. 69 (2009) 295-306

# ELSEVIER

## Contents lists available at ScienceDirect I. Parallel Distrib. Comput.

journal homepage: www.elsevier.com/locate/jpdc



## Interconnection of distributed memory models\*

## Vicent Cholvi<sup>a,\*</sup>, Ernesto Jiménez<sup>b</sup>, Antonio Fernández Anta<sup>c</sup>

<sup>a</sup> Universitat Jaume I, 12071 Castellón, Spain

<sup>b</sup> Universidad Politécnica de Madrid, 28031 Madrid, Spain

<sup>c</sup> LADyR, GSyC, Universidad Rey Juan Carlos, 28933 Móstoles, Spain

#### ARTICLE INFO

Article history: Received 21 December 2007 Received in revised form 26 September 2008 Accepted 25 November 2008 Available online 10 December 2008

Keywords: Distributed shared memory Memory models Interconnection systems Distributed algorithms Impossibility result Correctness proofs

#### 1. Introduction

Distributed shared memory (DSM) is a well-known mechanism for interprocess communication in distributed environments [21]. Roughly speaking, it consists in using read and write operations for interprocess communication, thus hiding the particular communication technique employed by the programmers to avoid the need to be involved in the management of messages. However, this can cause problems in systems where several processes independently and simultaneously submit reads and writes, since they can see each other's operations out of order. This problem led to the concept of *consistency models*. A consistency model is a specification of the allowable behavior of memory, and it can be seen as a contract between memory implementation and the program utilizing memory: the memory implementation guarantees that for any input it will produce some output from the set of allowable outputs specified by the consistency model, and the program must

E-mail addresses: vcholvi@lsi.uji.es, vcholvi@uji.es (V. Cholvi).

#### ABSTRACT

In this paper, we present a framework to formally describe and study the interconnection of distributed shared memory systems. Using it allows us to classify the consistency models in two groups, depending on whether they are *fast* or not. In the case of non-fast consistency models, we show that they cannot be interconnected in any way. In contrast, in the case of fast consistency models we provide protocols to interconnect some of them.

© 2008 Elsevier Inc. All rights reserved.

be written to work correctly for any output allowed by the consistency model. Depending on the semantics of the memory operations, a number of consistency models has been proposed in the literature (see for instance [21,7,13,11,16]).

In this paper, we study the interconnection of distributed shared memory systems. By this we mean the addition of an *interconnection system* to several existing distributed shared memory systems that implement a given consistency model in order to obtain a single distributed shared memory system that implements the same consistency model. There are two main reasons for interconnecting DSM systems with new protocols instead of using a single protocol for the whole system:

- First, in this way we can interconnect systems that are already running without changing them. They can go on using their protocols at their local level.
- Second, depending on the network topology, it could be more efficient to implement several systems and interconnect them than to have one single large system. An example of this would be a DSM system that has to be implemented on two local area networks connected with a low-speed point-to-point link. If the protocol that is used broadcasts updates, in a single system with many popular protocols there would be a large number of messages crossing the point-to-point link for the same variable update. In this case, it would seem appropriate to implement one system in each of the local area networks, and use an interconnecting protocol via the link to connect the whole system. With the appropriate interconnecting protocol, many fewer messages cross the link for each variable update.

<sup>&</sup>lt;sup>☆</sup> A preliminary version of this paper appeared in the Proceedings of Opodis'03 [E. Jiménez, A. Fernández, V. Cholvi, Decoupled interconnection of distributed memory models, in: OPODIS, 2003, pp. 235–246]. This work was partially supported by the Spanish Ministry of Science and Technology under Grants No. TSI2006-07799, No. TSI2004-02940 and No. TIN2005-09198-C02-01, and by the Comunidad de Madrid under Grant No. S-0505/TIC/0285. Grant: TIN2007-67353-C02-01

<sup>\*</sup> Corresponding address: Departamento de Lenguajes y Sistemas Informáticos, Universitat Jaume I, Campus de Riu Sec, 12071 Castellón, Spain.

<sup>0743-7315/\$ -</sup> see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2008.11.008

It is interesting to compare our approach with the concept of locality, defined by Herlihy and Wing [12]. Both approaches have to do with the ability to compose DSM systems. However, locality addresses composability of DSM systems with the same set of processes but disjoint sets of memory objects, while our approach studies the composability of DSM systems with the same set of memory objects but disjoint sets of processes.

A first contribution of this work is the introduction of a framework for the interconnection of memory systems and the formalization of the interactions between the existing memory systems and the interconnection system. Furthermore, we identify the *fastness* of a memory model (a concept that will be defined later in the paper) as the key property that will qualify it to be an interconnectable memory model or not.

In the case of non-fast consistency models, we show that they cannot be interconnected in any way, thus deriving that a number of popular memory models can not be interconnected (e.g., the atomic, safe, regular and sequential models [18], the PCG and PCD consistency models [10,1], the eager release model [9], the lazy release model [17], the entry model [6], the scope model [14], etc.).

In contrast, we show that several fast consistency models can, indeed, be interconnected (namely, the pRAM [20], causal [2], and cache models [10]). However, whereas the cache model can be interconnected without any restriction, we found that the other two memory models can only be interconnected when the subsystems fulfill certain restrictions. In this last situation, we give sufficient conditions and the corresponding interconnecting protocols to do so.

Regarding previous work that has been carried out on the interconnection of distributed shared systems, as far as we know, it has only been studied in [8].<sup>1</sup> Here, we extend the results of that paper in a number of ways. First, we consider consistency models other than the causal one (which was the only one considered in [8]). Second, we provide some impossibility results related with interconnection of consistency models. Third, we use much weaker assumptions on the systems to be interconnected.

The rest of the paper is organized as follows. In Section 2, we introduce the framework for the interconnection of systems. In Section 3, we show the impossibility of interconnection for non-fast consistency models. In Section 4, we study the interconnection of pRAM systems, in Section 5 the interconnection of causal systems, and in Section 6 we show how to interconnect cache systems. In Section 7, we briefly study the performance of the proposed interconnecting protocols. Finally, in Section 8, we present some concluding remarks.

#### 2. System model

We consider *distributed shared memory systems* (or *systems* for short) formed by a collection of *application processes* that interact via a *shared memory* consisting of a set of *variables*. All the interactions between the application processes and the memory are performed through read and write operations (*memory operations*) on variables of the memory.

Each memory operation is applied on a named variable and has an associated value. A write operation of the value v in the variable x, denoted w(x)v, stores v in the variable x. A read operation of the value v from the variable x, denoted r(x)v, reports to the issuing application process that the variable x holds the value v. To simplify the analysis, we assume that a given value is written at most once in any given variable and that the initial values of the variables are set by using fictitious write operations. Furthermore, we also consider explicit *synchronization operations*. Synchronizations can be used just to import information, as with the acquiring of a lock, or just to export information, as with the release of a lock.

In order to characterize the system model, we specify the components that form it, the *consistency model*, the *system architecture* and the *interconnecting system*.

#### 2.1. The consistency model

Roughly speaking, a *consistency model* (also called *memory model*) is a specification of the allowable behavior of the system's operations. To formally define a consistency model, first we introduce what a system's execution is. An *execution*  $\alpha$  of a system *S* consists of a set of read and write operations, as well as synchronization operations (if any), issued by the application processes that form system *S*. Such operations must preserve the so called *execution order*. To define this, first we introduce the *process order*.

**Definition 1** (*Process Order*). Let *p* be a process of *S* and *op*,  $op' \in \alpha$ . Then *op* precedes *op'* in *p's process order*, denoted *op*  $\prec_p op'$ , if *op* and *op'* are operations issued by *p*, and *op* is issued before *op'*.

**Definition 2** (*Execution Order*). Let op,  $op' \in \alpha$ . Then op precedes op' in the *execution order*, denoted  $op \prec op'$ , if any of the following hold:

- (1) op and op' are operations from the same process p and  $op \prec_p op'$ .
- (2) op = w(x)v and op' = r(x)v.

(3) There is an operation  $op'' \in \alpha$  such that  $op \prec op'' \prec op'$ .

Now, we formally define a *consistency model* as follows:

**Definition 3** (*Consistency Model*). A *consistency model M* is a set formed by all executions of type *M*.

Obviously, for this definition to make sense, it is necessary to define what an execution of type M is in each case. The specification of particular types of executions will be dealt with later in the paper. For such a task, we need to define several related concepts.

**Definition 4** (*View*). Let  $\prec^{\circ}$  be an order defined on the operations of execution  $\alpha$ , and let  $\alpha' \subseteq \alpha$ . A view  $\beta$  of  $\alpha'$  preserving  $\prec^{\circ}$  is a sequence formed by all operations of  $\alpha'$  such that this sequence preserves the order  $\prec^{\circ}$ .

Note that if  $\prec^{o}$  does not define a total order on  $\alpha'$ , then there can be several views of  $\alpha'$ . We use  $op \xrightarrow{\beta} op'$  to denote that op precedes op' in a view  $\beta$ . We will omit the view when it is clear from the context. We will also use  $\alpha \rightarrow \alpha'$ , where  $\alpha$  and  $\alpha'$  are sets of operations, to denote that all the operations in  $\alpha$  precede all the operations in  $\alpha'$ .

**Definition 5** (*Legal View*). Let  $\prec^{o}$  be an order defined on the operations of execution  $\alpha$ , and let  $\alpha' \subseteq \alpha$ . A view  $\beta$  of  $\alpha'$  preserving  $\prec^{o}$  is *legal* if for each read operation  $r(x)v \in \alpha'$ ,

(a) there is a write operation  $w(x)v \in \alpha'$  such that  $w(x)v \stackrel{\beta}{\to} r(x)v$ , and

(b) there is no write operation  $w(x)u \in \alpha'$  such that  $w(x)v \xrightarrow{\rho} w(x)u \xrightarrow{\beta} r(x)v$ .

#### 2.2. The system architecture

From a physical point of view, we consider distributed systems as consisting of a set of *nodes* and *a network* that provides communication among them. The essence of this model has

296

<sup>&</sup>lt;sup>1</sup> In addition, of course, to the preliminary version of this paper, appeared in OPODIS'03 [15].

### Author's personal copy

V. Cholvi et al. / J. Parallel Distrib. Comput. 69 (2009) 295-306



Fig. 1. System architecture.

been taken from [4]. The application processes of the system are actually executed in the nodes of the distributed system. We assume that the shared memory abstraction is implemented by a memory consistency system (MCS). The MCS is composed of MCS-processes that use local memory at the various nodes and cooperate following a distributed algorithm, or MCS-protocol, to provide the application processes with the impression of having a shared memory. The MCS-processes are executed at the nodes of the distributed system and exchange information as specified by the MCS-protocol. They use the communication network to interact if they are in different nodes. Each MCS-process can serve several application processes, but an application process is assigned to only one local MCS-process. For each application process p we use mcs(p) to denote its MCS-process. An application process and its MCS-process have to be in the same node, as stated by the following assumption.

**Assumption 1.** Let p be an application process. Process p and mcs(p) are in the same node.

An application process sequentially issues read/write/ synchronization operations on the shared variables by sending (read/write/synchronization) calls to its *MCS-process*. After sending a call, the application process blocks until it receives the corresponding response from its *MCS-process*, which ends the operation. We assume an asynchronous model. This means that there is no bound on the amount of time instructions and message transmissions take. We do not assume synchronized clocks among processes. We also assume that no system component (processes, nodes, and networks) fails. Fig. 1 shows an example of the system architecture described above.

Regarding the consistency model implemented by a system (i.e., by its *MCS*), we follow the same approach taken when defining a consistency model:

**Definition 6** (*System*). A *system* is of type *M* if all its executions are of type *M*.

Furthermore, we consider systems in which at least the last write operation on every variable must be eventually visible in every process of the system. This is a very natural property which is preserved by every system that we have found in the literature. In our terminology, it means that their *MCSs* must satisfy the following property:

*Liveness property.* Consider any execution  $\alpha$  of system *S*. If there is only one process writing on variable *x* and its last operation on *x* was w(x)u, then eventually the response to any read call on *x* issued by any application process will contain the value *u*.

#### 2.3. The Interconnection system

Interconnecting several systems involves making them to behave as though they were one single system. Using the terminology defined above, this actually means interconnecting several *MCSs*. In our model, the load of such an interconnection will fall on an *interconnection system* (*IS*). An *IS* is a set of processes (*IS-processes*) that execute some distributed algorithm or protocol (*IS-protocol*). For simplicity in the *IS* design, we consider the existence of one *IS*-process for each *MCS* to be interconnected.

The *IS-process* of each system is an application process and, hence, it has an *MCS-process* that by Assumption 1 is in its same node. The *IS-process* uses the *MCS-process* to read and write on the shared memory of the local system. In particular, the only way a value written by an application process in some system can be read by an application process in another system is if the *IS-process* of the latter system writes it. *IS-processes* exchange information with each other (as specified by the *IS-protocol*) by using a reliable FIFO communication network. Note that, after the interconnection, the overall system has a *global MCS* formed by the *MCSs* of the original systems plus the *IS* that interconnects them. Fig. 2 presents an example of an *IS* interconnecting two systems.

**Definition 7.** We will say that a consistency model can be *interconnected* if for any collection of systems implementing this consistency model there is an IS-protocol that interconnects them.

In the rest of the paper we will use *N* to denote the number of systems to be interconnected. The systems to be interconnected will be denoted by  $S^0, \ldots, S^{N-1}$ , and the resulting interconnected system by  $S^T$ . The *IS*-process for each system  $S^k$  (where  $k \in \{0, \ldots, N - 1\}$ ) is denoted by  $isp^k$ . It is worth remarking that  $isp^k$  is part of the system  $S^k$ . We consider that the set of processes of  $S^T$  includes all the processes in  $S^0, \ldots, S^{N-1}$  except  $isp^0, \ldots, isp^{N-1}$  (since they are only used to interconnect the systems  $S^0, \ldots, S^{N-1}$ ).

Regarding how the *ISs* operate, we note that it is necessary to guarantee that any given *IS-process* be eventually aware of the writes taking place at the *MCS-processes* that it manages, so that it could exchange such information with other *IS-processes*. This functionality can be implemented in a number of ways.

- (1) Within the *IS-process*: for instance, it can be implemented by making the *IS-process* check for any updated variable, by periodically reading the whole memory. In this case, the *ISprocess* will behave as a regular application process and no additional assumption is made on the *MCS-processes*.
- (2) Within the *MCS-processes*: in this case the *MCS-processes* have to communicate explicitly any update to the *IS-process.* Whereas such an approach could be more efficient than the previous one, it requires that the *MCS-processes* be able to perform such a task.
- (3) By using a combination of both.

In order to maintain it as general as possible, in this paper we only assume that there is an *interface* (between the *MCS* and the *IS*) that provides the above mentioned functionality,



Fig. 2. Interconnection system.

without considering how it is implemented.<sup>2</sup> However, in order to "decouple" as much as possible the original systems and the interconnecting protocol, this interface does not allow the *IS* to contact the *MCS* (except to read or write variables). In particular, the *IS* cannot block the *MCS* (as was done in [8]). More formally, the interface guarantees the following assumption:

**Assumption 2.** When any *MCS-process* updates its local memory (as a result of a write operation issued by an application process), the *IS-process* will be asynchronously notified about these events (i.e., about the updated variable, the written value and the application process). Other than this, there is no *MCS*-initiated interaction between *MCS* and *IS* processes.

#### 3. Fast vs non-fast consistency models

In this section, we show that only systems implementing *fast* consistency models can be interconnected. Formally, we define a fast consistency model as follows:

**Definition 8.** We say that a consistency model is *fast* if there is an *MCS-protocol* that implements it, such that memory operations only require local computations before returning control, even in systems with several nodes.

Since there are several examples of popular fast and non-fast models, this implies that the property of being fast classifies the set of memory models in a non-trivial way. The following observation will be useful to prove some subsequent results.

**Observation 1.** Every IS-protocol that interconnects N > 2 systems can be used to interconnect 2 systems. Furthermore, every IS-protocol that interconnects 2 systems can be used to interconnect N > 2 systems.

**Proof.** For the first part, let us consider that there is an *IS-protocol* that interconnects N > 2 systems through a set of *N IS-processes*. If we only have two systems, one of the two *IS-processes* can simulate N - 2 empty systems and their *IS-processes*. Then, we have an interconnected system of two systems.

For the second part, we use induction on *i* to show that *i* systems can be interconnected for any  $i \ge 2$ . For i = 2 the claim is trivially true. Now, assume that we can obtain a system S' by interconnecting the systems  $S^0, S^1, \ldots, S^{i-2}$ . The result of the interconnection is a single system. Then, the *IS*-protocol can be used to properly interconnect S' and  $S^{i-1}$ .  $\Box$ 

In what follows, we consider the interconnection of only two systems, and use this observation to generalize our results to several systems. Now, we prove that non-fast memory models cannot, in general, be interconnected.

**Theorem 1.** There is no IS that guarantees the interconnection of systems implementing non-fast memory models.

**Proof.** We show the result by contradiction. Assume that there is a non-fast memory model *M* that can be interconnected. From Observation 1, we can consider the interconnection of two systems. Therefore, let us assume there is an IS I that interconnects two systems implementing M. Let us first take a distributed system with two nodes. In each node we implement a system with one MCS-process, at least one application process, and the corresponding IS-process. By Assumption 1, the MCS-process and the application processes (the IS-process included) are in the same node. Then, in each of these two single-node systems each memory operation only requires local computations. Now, we use *I* to interconnect these two systems into a unique system implementing M. By Assumption 2, I cannot block the MCSprocesses. Then, every memory operation in the resulting system still requires only local computations, which contradicts the fact that *M* is not fast. 

As a consequence of this theorem, we derive that a number of popular memory models cannot be interconnected. In [4] it is shown that the sequential consistency model is not fast. Hence, it cannot be interconnected, and the same happens with the atomic consistency model and its derivations, i.e., the safe and regular memory models [19]. Similarly, Attiya and Friedman [5] have shown that the processor consistency models PCG and PCD [10, 1] are not fast, and consequently cannot be interconnected. Finally et al. [5] also proved that any algorithm for the mutual exclusion problem using fast operations must be cooperative. This implies that any synchronization operation that guarantees mutual exclusion must be non-fast. Therefore, any synchronized memory model that provides exclusive access cannot be interconnected. As a result, we have that memory models such as the eager release [9], the lazy release [17], the entry [6] or the scope [14] cannot be interconnected.

On the other hand, there is a number of consistency models that are fast and for which Theorem 1 does not apply. In the following sections we show that some of the most popular fast memory models (namely, the pRAM [20], the causal [2] and the cache [10]) can indeed be interconnected, although, in some cases, in a constrained fashion.

We will assume that these fast systems control the replicas by *propagating* the new values to update the replicas. This assumption does not significantly restrict the domain of application of our results, since all current implementations of fast models we are aware of have been obtained by using propagation.

298

<sup>&</sup>lt;sup>2</sup> In addition to the above mentioned approaches, a local copy of the shared memory could be stored in a *protected* zone of the physical memory, so that any modification generates an interruption that informs the *IS-process* without using the *MCS-processes* However, here we do not consider this case, since it requires some "help" from the operating system.

#### 4. Interconnection of pRAM Systems

In this section, we study the interconnection of pRAM systems [20]. In this model, every process performs all its operations locally and transmits updated values to the other processes along FIFO channels. These updates are later performed asynchronously at the remote processes. Formally, we define a pRAM system as follows:

**Definition 9** (*pRAM System*). A system *S* is pRAM if for every execution  $\alpha$  and every process *p* there is a legal view  $\beta_p$  of  $\alpha_p$  preserving  $\prec_q$  for all *q* (where  $\alpha_p$  denotes the subset of operations obtained by removing from execution  $\alpha$  all read operations issued by processes other than *p*).

Following, we show that, in general, the interconnection of pRAM systems is not possible. That is, there is no IS that interconnects every pair of pRAM systems. The proof is based on the fact that some pRAM systems may not be FIFO ordered (the formal definition of FIFO ordered system is provided below). Indeed, as we pointed at the end of Section 2, it is necessary to have a functionality that guarantees that any given IS-process be eventually aware of the writes taking place at the MCS-processes that it manages. If such a functionality is implemented within the IS-process (i.e., by making the IS-process periodically read the whole memory), it is easy to prove that pRAM systems, in general, cannot be interconnected. Basically, the idea is that when the IS-process finds two variables that have been modified since the last time it read them, it cannot know in which order they were written, and may propagate them to the other systems in an incorrect order. Hence, such a functionality must be implemented within the MCS-processes (i.e., by making them to communicate explicitly any update to the IS-process, as described in Assumption 2). Depending on how this update is done, it could happen that local replicas are updated in a different order than the write operations were issued.<sup>3</sup> Then, the communication of the updates between the MCS-processes and the IS-process (which is performed without using read and write operations but using explicit messages) could be out of order.

In what follows in the rest of the paper, we extend our notation and use both subscripts and superscripts to respectively denote the process that performs the operations and the system where such a process is located.

**Theorem 2.** There is no IS that guarantees pRAM interconnection for every pair of pRAM systems.

**Proof.** The proof is based on the fact that when some process p in  $S^k$  issues several write operations, it may update the corresponding variables in its local memory in a different order from p's process order.

Let us assume, by way of contradiction, that there is a system  $S^T$  which is the result of interconnecting two pRAM systems  $S^0$  and  $S^1$  through some interconnection system *I*. From Definition 9, we know that for every execution  $\alpha^T$  there is a legal view  $\beta_p^T$  of  $\alpha_p^T$ , for all *p*, preserving  $\prec_q$ , for all *q*.

Assume that we have an execution  $\alpha^0$  with the following sequence of write operations issued by process p of  $S^0$ :  $w_p^0(x)s \prec_p w_p^0(y)l$ . We know, from the Liveness Property (see Section 2.2), that there is a time t after which any read operation on x and y issued by any process in  $S^1$  returns s and l, respectively. We now assume that after this time *t* the process *p* issues the write operations  $w_p^0(x)u$  and  $w_p^0(y)v$ . We know, following the definition of interface's behavior, that when any *MCS-process* updates its local memory, the interface informs *isp*<sup>0</sup> about these events. Then, *I* can take one of the following actions:

- *Case* 1:  $isp^1$  issues  $w_{isp^1}^1(x)u$  and  $w_{isp^1}^1(y)v$ , in this order, in  $S^1$ . In this case, if  $w_p^0(x)u$  and  $w_p^0(y)v$  were issued by process p in the order  $w_p^0(y)v \prec_p w_p^0(x)u$  and some process q of  $S^1$  issues the read operations  $r_q^1(x)u \prec_q r_q^1(y)l$  (which is possible if the pRAM systems are not FIFO ordered), then it is impossible to form a legal view  $\beta_q^T$  preserving  $\prec_p$ . Hence, we reach a contradiction.
- *Case2*: *isp*<sup>1</sup> issues  $w_{isp1}^1(y)v$  and  $w_{isp1}^1(x)u$ , in this order, in  $S^1$ . In this case, if  $w_p^0(x)u$  and  $w_p^0(y)v$  were issued by process p in the order  $w_p^0(x)u \prec_p w_p^0(y)v$  and some process q of  $S^1$  issues the read operations  $r_q^1(y)v \prec_q r_q^1(x)s$  (which is possible if the pRAM systems are not FIFO ordered), then it is impossible to form a legal view  $\beta_q^T$  preserving  $\prec_p$ . Hence, we reach a contradiction.
- legal view  $\beta_q^T$  preserving  $\prec_p$ . Hence, we reach a contradiction. • *Case3*: *isp*<sup>1</sup> does not issue  $w_{isp1}^1(y)v$  or  $w_{isp1}^1(x)u$  in S<sup>1</sup>. From the Liveness Property, this case is not possible.  $\Box$

Despite the previous result, we have found that for certain types of pRAM systems, which we call *FIFO ordered*, it is still possible to do this.

**FIFO ordered systems.** We say a system is *FIFO ordered* if for each process p in  $S^k$ , if p issues two write operations  $w(x)v \prec_p w(y)u$ , then the  $mcs(isp^k)$  process updates its local replica of x with the value v before updating its local replica of y with the value u.

In Fig. 3, we present an IS-protocol that can be used to interconnect pRAM systems that are FIFO ordered. It consists of two concurrent tasks,  $Propagate_{out}^k$  and  $Propagate_{in}^k$ . The first task,  $Propagate_{out}^{k}$ , deals with transferring write operations issued in  $S^{k}$ to  $S^l$ ,  $l \neq k$ . It is activated upon notification from the interface to *isp<sup>k</sup>* that the variable *x* has been updated to value *v* due to a write operation issued by the application process *p*. Then,  $Propagate_{out}^{\kappa}$ sends the pair  $\langle x, v \rangle$  to  $isp^l$ ,  $l \neq k$ . We avoid re-propagating write operations received from other systems by checking that the write operation was not issued in  $S^k$  by  $isp^k$ . The second task,  $Propagate_{in}^{k}$ , deals with applying within  $S^{k}$  the write operations transferred from the systems  $S^l$ ,  $l \neq k$ . It is activated whenever a pair  $\langle x, v \rangle$  is received from process *isp*<sup>l</sup>,  $l \neq k$ . As a result, the  $isp^k$  process performs a write operation w(x)v, thus propagating the value v to all the replicas of variable x within  $S^k$ . We note that the FIFO ordered property do not undergo substantial constraints with respect to systems that are not FIFO ordered. Theorem 3 states formally the guarantees provided by the above mentioned interconnection protocol.

**Theorem 3.** The system  $S^T$  obtained by connecting N FIFO ordered pRAM systems  $S^0, \ldots, S^{N-1}$  using the pRAM IS-protocol in Fig. 3 is pRAM.

**Proof.** See Appendix A.1.

#### 5. Interconnection of causal systems

In this section, we study the interconnection of causal systems [2]. In the causal model, in addition to the conditions of the pRAM executions, read operations are forced to return the value written by the latest causally ordered operation (i.e., read operations preserve the execution order in Definition 2). Formally, we define a causal system as follows:

<sup>&</sup>lt;sup>3</sup> An example of this behavior is the causal (and hence pRAM) algorithm of [16], in which the update of several variables are batched into a single message and applied in arbitrary order, in mutual exclusion with all read operations.

| 1        | Task $Propagate_{out}^k$ :: executed upon notifi- | 1        | Task $Propagate_{in}^k$ :: executed      |
|----------|---------------------------------------------------|----------|------------------------------------------|
|          | cation from the interface to $isp^k$ that the     |          | upon reception of $\langle x, v \rangle$ |
|          | variable x has been updated in $mcs(isp^k)$ to    |          | from $isp^l, l \neq k$                   |
|          | value $v$ due to a write operation issued by      | <b>2</b> | begin                                    |
|          | p                                                 | 3        | w(x)v                                    |
| 2        | begin                                             | 4        | end                                      |
| 3        | if $p \neq i s p^k$ then                          |          |                                          |
| 4        | send $\langle x, v \rangle$ to $isp^l, l \neq k$  |          |                                          |
| <b>5</b> | end                                               |          |                                          |

**Fig. 3.** The pRAM *IS*-protocol for each  $isp^k$ ,  $k \in \{0, 1\}$ .

**Definition 10** (*Causal System*). A system *S* is causal if for every execution  $\alpha$  and every process *p* there is a legal view  $\beta_p$  of  $\alpha_p$  preserving  $\prec$ .

As in the case of pRAM systems, here we consider systems that are implemented by using *propagation*. First of all, we have that since the pRAM model is strictly weaker than the causal model [7, 2], the result of impossibility in Section 4 is also applicable to causal systems.

**Corollary 1** (From Theorem 2). There is no IS that guarantees causal interconnection for every pair of causal systems.

In spite of this result, in the previous section we presented an *IS-protocol* for interconnecting pRAM systems that are FIFO ordered. Thus, a question that naturally arises is whether it is possible to interconnect causal systems that are also FIFO ordered. However, here we show that this result does not apply to causal systems. That is, there is no *IS* that interconnects every pair of causal systems, even if they are FIFO ordered.

## **Theorem 4.** There is no IS that guarantees causal interconnection for every pair of causal systems, even if they are FIFO ordered.

**Proof.** Let us assume, by way of contradiction, that there is a system  $S^T$  which is the result of interconnecting two FIFO-ordered causal systems  $S^0$  and  $S^1$  with the *IS I*. From Definition 10, we know that for every execution  $\alpha^T$  there is a legal view  $\beta_p^T$  of  $\alpha_p^T$ , for all p, preserving  $\prec$ .

Assume that we have an execution  $\alpha^0$  with the following write operations issued by process r of  $S^0$ :  $w_r^0(x)s \prec w_r^0(y)l$ . From the Liveness Property, we know that there is a time t after which any read operation on x and y issued by any process in  $S^0$  returns s and l, respectively. We now assume that after this time t the processes p and g issue the write operations  $w_p^0(x)u$  and  $w_g^0(y)v$ , causally related to each other through read operations (detailed below in each case). Following the definition of interface's behavior, we consider that when all *MCS-processes* update their local memory, the interface communicates  $isp^0$  about these events. Then, I can take one of the following actions:

- *Case* 1: *isp*<sup>1</sup> issues  $w_{isp^1}^1(x)u$  and  $w_{isp^1}^1(y)v$ , in this order, in  $S^1$ . Now, some process q of  $S^1$  issues the following read operations  $r_q^1(x)u \prec_q r_q^1(y)l$ . In this case, if  $r_g^0(y)l \prec w_g^0(y)v \prec r_p^0(y)v \prec w_p^0(x)u$ , then it is impossible to form a legal view  $\beta_q^T$  preserving  $\prec$ . Hence, we reach a contradiction.
- <. Hence, we reach a contradiction. • *Case 2*: *isp*<sup>1</sup> issues  $w_{isp1}^1(y)v$  and  $w_{isp1}^1(x)u$ , in this order, in S<sup>1</sup>. Now, some process *q* of S<sup>1</sup> issues the following read operations  $r_q^1(y)v \prec_q r_q^1(x)s$ . In this case, if  $r_p^0(x)s \prec w_p^0(x)u \prec r_g^0(x)u \prec w_g^0(y)v$ , then it is impossible to form a legal view  $β_q^T$  preserving  $\prec$ . Hence, we reach a contradiction.
- *Case* 3: *isp*<sup>1</sup> does not issue  $w_{isp^1}^1(y)v$  or  $w_{isp^1}^1(x)u$  in  $S^1$ . From the Liveness Property, this case is not possible.  $\Box$

Nevertheless and although in general the interconnection of causal systems is not possible even if they are FIFO ordered, we found that it is still possible to interconnect causal systems that are globally ordered.

**Globally Ordered Systems** We say that a system is globally ordered if for each two write operations  $w(x)v \prec w(y)u$  issued by (maybe different) processes in  $S^k$ , each mcs(p) with p in  $S^k$  updates its local replica of x with the value v before updating its local replica of y with the value u.

In Fig. 4, we present an *IS-protocol* that can be used to connect causal systems that are globally ordered. It consists of two concurrent tasks,  $Propagate_{out}^k$  and  $Propagate_{in}^k$ , such as in the *IS-protocol* in Fig. 3. In fact, the  $Propagate_{out}^k$  task is the same. The key difference is found in task  $Propagate_{out}^k$ , where a pair  $\langle x, v \rangle$  is not sent to the other systems until all the *MCS* replicas of *x* have been updated.

Clearly, globally ordered systems provide stronger guarantees than FIFO ordered systems. However and similar to this latter type, they do not undergo substantial constraints with respect to systems that are not globally ordered. Theorem 5 states formally the guarantees provided by the above mentioned protocol.

**Theorem 5.** The system  $S^T$  obtained by connecting N globally ordered causal systems  $S^0, \ldots, S^{N-1}$  using the causal IS-protocol in Fig. 4 is causal.

**Proof.** See Appendix A.2.  $\Box$ 

#### 6. Interconnection of cache systems

In this section, we study the interconnection of cache systems [10]. Roughly speaking, this memory model forces independent variables considered in isolation to be sequential. That is, data operations on any individual variable must "appear" to have been executed atomically in an order that is consistent with the order seen in individual processes. Formally, a cache system is defined as follows:

**Definition 11** (*Cache System*). A system *S* is cache if for every execution  $\alpha$  and every variable *x* there is a legal view  $\beta_x$  of  $\alpha_x$  preserving  $\prec$  (where  $\alpha_x$  denotes the subset of operations obtained by removing from execution  $\alpha$  all the operations on variables other than *x*).

We show that, unlike the previous models, the interconnection of cache systems is always possible, independently of how they are implemented. In Fig. 5, we present an *IS-protocol* that can be used to connect cache systems of any type. It consists of only one task *Propagate<sup>k</sup>*. Note that each *IS-process* maintains a copy of the latest value propagated from the other system in *last*(*x*) for each variable *x*. That copy must be initialized with a special value (namely, *NoData*). Note also that initially one of the *IS-processes* (for instance *isp*<sup>0</sup>) must send a message with  $\langle x, NoData \rangle$  to the other for each variable *x* to start the interconnection. Theorem 6 states formally the guarantees provided by the above mentioned protocol.

| 1 | Task $Propagate_{out}^k$ :: executed upon        | 1        | Task $Propagate_{in}^k$ :: executed      |
|---|--------------------------------------------------|----------|------------------------------------------|
|   | notification from the interface to               |          | upon reception of $\langle x, v \rangle$ |
|   | $isp^k$ that the variable x has been             |          | from $isp^l, l \neq k$                   |
|   | updated in $mcs(q)$ for all $q$ in $S^k$         | <b>2</b> | begin                                    |
|   | to value $v$ due to a write operation            | 3        | w(x)v                                    |
|   | issued by $p$                                    | 4        | end                                      |
| 2 | begin                                            |          |                                          |
| 3 | if $p \neq i s p^k$ then                         |          |                                          |
| 4 | send $\langle x, v \rangle$ to $isp^l, l \neq k$ |          |                                          |
| 5 | end                                              |          |                                          |

**Fig. 4.** The causal *IS*-protocol for each  $isp^k$ ,  $k \in \{0, 1\}$ .

| 1        | Task $Propagate^{k}(x)$ :: executed upon reception of $\langle x, v \rangle$ |  |
|----------|------------------------------------------------------------------------------|--|
|          | from $isp^l$ , $l \neq k$                                                    |  |
| <b>2</b> | begin                                                                        |  |
| 3        | if $v \neq NoData$ then                                                      |  |
| 4        | w(x)v                                                                        |  |
| 5        | last(x) := v                                                                 |  |
| 6        | r(x)u                                                                        |  |
| 7        | if $u = last(x)$ then                                                        |  |
| 8        | u := NoData                                                                  |  |
| 9        | send $\langle x, u \rangle$ to $isp^l, l \neq k$                             |  |
| 10       | end                                                                          |  |

**Fig. 5.** The cache *IS*-protocol for each  $isp^k$ ,  $k \in \{0, 1\}$ .

**Theorem 6.** The system  $S^T$  obtained by connecting N cache systems (regardless of whether they are FIFO/globally ordered or not)  $S^0, \ldots, S^{N-1}$ , using the cache IS-protocol in Fig. 5 is cache.

#### **Proof.** See Appendix A.3.

Observe that the proposed interconnection algorithm is usually highly inefficient in terms of the network traffic it causes, given that, for each memory object, there is a continuous exchange of messages between the IS-processes. Since the main target of this work is to identify whether consistency models can be interconnected, we are not very concerned about efficiency. However, we note that the algorithm can be easily optimized to decrease the network traffic. For instance, instead of handling each variable independently, the algorithm could work at a larger scale, dealing with several (or even all the) variables simultaneously. This optimization can be done by modifying task  $Propagate^{k}()$  so that it is executed only upon reception of a given set of  $\langle x, v \rangle$  pairs (for different variables), and executing the same code for each one of them. Additionally, instead of continuously transferring messages between the IS-processes, such a transfer could be performed only when a given time interval has passed or when the variable xis updated at the sending system, whichever happens first. This optimizations can be done by adding some simple pieces of code that implement this wait condition. Note, however, that in some cases, these changes could increase the latency. The correctness proofs of these alternative interconnecting protocols are in essence the same as the one for the algorithm shown in Fig. 5.

#### 7. Performance

In this paper, we have approached the interconnection problem from a theoretical point of view, trying to decide whether it is possible to interconnect given consistency models. We have not been concerned above with the efficiency of the interconnection algorithms we have proposed and the performance of the resulting interconnected system. However, we do here a brief and simple performance evaluation of algorithms and systems. We compare the performance of a system obtained using our *IS-protocols* with the performance of a system that uses a *MCS-protocol* connecting all the processes directly. We assume that the same *MCS-protocol* is used in the global DSM system of reference and in each of the systems interconnected with our *IS-protocols*.

First, observe that our *IS-protocols* should not affect the *response time* a process observes when issuing a memory operation, since its *MCS-process* is not affected (in particular, cannot be blocked) by the interconnection. Since the three models that we study are fast, the response time of the algorithms that implement them (e.g., [20] for pRAM, [2,3] for causal, or [16] for cache) only depends on local computation at a node. This does not change with the interconnection.

Second, let us look at the *latency* of a DSM system, which is the largest time until a value that is written becomes visible in any other process. In a single DSM system this time depends on the *MCS-protocol* used. For instance, if we discard the time for local computations, for the pRAM and causal algorithms [20,2,3] the latency depends on the time to complete a broadcast in the system. Let us denote this time by  $T_B(n)$  in a system with *n MCSprocesses*, which we safely assume is at most linear on *n*. The only algorithm for cache consistency of which we are aware [16] has instead latency  $L_{\text{cache}}(n) = \Theta(nT_B(n))$ .<sup>4</sup>

The three proposed IS-protocols propagate as soon as they can any new value they are aware of by sending a message. In the case of pRAM and causal consistencies this is done immediately, while in the case of cache consistency the IS-process may need to wait for a message from the other IS-process. Then, using the algorithms referred above, the interconnection of two systems S<sup>0</sup> and S<sup>1</sup> with  $n_0$  and  $n_1$  MCS-processes respectively ( $n = n_0 + n_1$ ), has latency  $T_B(n_0) + T_B(n_1) + d$  (IS-processes are assumed to use existing MCSprocesses) in the case of pRAM and causal, where d is the delay of a point-to-point communication. In the case of cache this latency is  $L_{\text{cache}}(n_0) + L_{\text{cache}}(n_1) + d$ . In the three cases, since  $T_B(n)$  is at most linear, the latency of the interconnected system is larger than the latency in the original system. However, since the broadcast delay cannot be smaller than d, the latency increases at most by a constant factor of 3. If this is generalized to N systems the increase factor depends on the topology. In the worst case, systems are connected as a line, and the worst factor becomes 2N - 1. In the best case systems are connected as a star, and the increase factor is again bounded by a small constant 5.

Regarding the *network traffic*, the pRAM and causal protocols [20,2,3] broadcast a message for each write operation. Let

<sup>&</sup>lt;sup>4</sup> The algorithm of [16] uses a token-passing scheme in which the *nMCS-processes* broadcast messages in a cyclic fashion.

| Memory Model    | Globally Ordered | FIFO Ordered | General |
|-----------------|------------------|--------------|---------|
|                 | Systems          | Systems      | Systems |
| Non-fast Models | No               | No           | No      |
| Causal          | Yes              | No           | No      |
| pRAM            | Yes              | Yes          | No      |
| Cache           | Yes              | Yes          | Yes     |

Fig. 6. Possibilities of interconnection under the different types of systems in this work.

 $M_B(n)$  be the number of messages that a broadcast requires, which again is at most linear on n. Since the interconnection of  $S^0$  and  $S^1$  implies sending one point-to-point message for each variable update, the traffic in the interconnected system is  $M_B(n_0)+M_B(n_1)+1$  messages per write operation. This again implies a small increase of networks traffic, bounded by a factor of 3 if  $M_B(n)$  is constant, and bounded by an additive constant term if  $M_B(n)$  is linear. Generalizing for N systems the worst case is a factor of 2N - 1, when  $M_B(n)$  is constant and systems are connected as a line, and the best is an additive term of O(N), when  $M_B(n)$  is linear and systems are connected in a star.

The evaluation of network traffic for the cache consistency model cannot be based on messages per write operation, since both the basic *MCS-protocol* [16] and the *IS-protocol* of Fig. 5 send messages continuously, even if variables are not written. The difference is that the *IS-protocol* sends one message per variable in the memory. If both protocols transmit at similar intervals, the single system sends  $M_B(n)$  messages every interval, while the system after the interconnection of  $S^0$  and  $S^1$  sends  $M_B(n_0) +$  $M_B(n_1) + V$  messages, where V is the number of variables. Clearly, this latter value can be very large if V is large. However, as mentioned in Section 6, some optimizations can reduce the traffic. For instance, if all the changes in the memory are sent by the *IS-process* in one single message, the traffic becomes  $M_B(n_0) +$  $M_B(n_1) + 1$  messages. Then, the increase of traffic is similar to the one observed for the pRAM and causal consistencies.

#### 8. Conclusions

In this paper, the interconnection of distributed shared memory systems has been studied. We have classified the consistency models in two groups, depending on whether they are *fast* or not. In the case of non-fast consistency models, we have shown that they cannot be interconnected in any way. In contrast, in the case of fast consistency models we have provided protocols with which to interconnect some of them. Whereas in some cases it is possible to interconnect fast consistency models without any restriction, other fast consistency models need some additional constraints. In this last situation, we gave sufficient conditions and the corresponding protocols to do so. Fig. 6 summarizes these results. At this point, we note that whereas we have shown that cache systems can be interconnected in a more general fashion than pRAM and causal systems, the protocol that we have used for such a task is, in general, less efficient than the protocols used for interconnecting pRAM and causal systems.

#### Acknowledgments

The authors wish to thank the anonymous referees for several constructive suggestions and corrections.

#### Appendix

#### A.1. Correctness proof of Theorem 3

Let *p* be some process in  $S^k$ ,  $k \in \{0, 1\}$  and  $\beta_p^k$  be a legal view of execution  $\alpha_p^k$  preserving  $\prec_q$  for all *q* in  $S^k$ , as in Definition 5. From Definition 9, such a legal view must exist by the fact that  $S^k$  is a pRAM system. We denote by *orig*(*op*) the original write operation propagated as operation *op* in  $\alpha_p^k$  by process  $isp^k$ . Similarly, given a write operation *op* issued in  $S^l$ ,  $l \neq k$ , we denote by prop(op) the write operation issued by  $isp^k$  as a result of propagating *op* to  $S^k$  as defined by the IS-protocol. We define  $\beta_p^T$  as the sequence obtained by replacing in  $\beta_p^k$  every write operation *op* from  $isp^k$  by the write operation *orig*(*op*).

#### **Lemma 1.** $\beta_p^T$ is formed by all operations of $\alpha_p^T$ .

**Proof.** First of all, note that the difference between  $\alpha_p^k$  and  $\alpha_p^T$  is that, for each operation *op* issued by  $isp^k$  in  $\alpha_p^k$ ,  $\alpha_p^T$  contains the original operation orig(op). Since  $\beta_p^k$  is a sequence formed by all operations of  $\alpha_p^k$ , and  $\beta_p^T$  is obtained by replacing in  $\beta_p^k$  every write operation *op* from  $isp^k$  by the write operation orig(op), then the set of operations in  $\beta_p^T$  is the same as that of  $\alpha_p^T$ .  $\Box$ 

The following Lemmas show that  $\beta_p^T$  preserves the order in which the operations are issued in any process of  $S^T$ .

**Lemma 2.** Let  $op = w_q^k(x)v$  and  $op' = w_q^k(y)u$  be two operations of  $\alpha^T$  issued by the same process q of  $S^k$ . If  $op \prec_q op'$  on  $\alpha^k$ , then  $Propagate_{out}^k$  will send to  $S^l$ ,  $l \neq k$ ,  $\langle x, v \rangle$  before  $\langle y, u \rangle$ .

**Proof.** Directly since, as the system  $S^k$  is FIFO ordered,  $isp^k$  receives the message in  $S^k$  with the value v of variable x from process q before the message with the value u of variable y also from process q, and then  $Propagate_{out}^k$  sends the pair  $\langle x, v \rangle$  to  $isp^l$  before it sends  $\langle y, u \rangle$ .  $\Box$ 

**Lemma 3.** Let op and op' be two write operations of  $\alpha^T$  issued by the same process q of  $S^l$ , where  $l \neq k$ . If  $op \prec_q op'$  on  $\alpha^l$ , then  $prop(op) \rightarrow prop(op')$  in  $\beta_p^k$ , for all p.

**Proof.** We know that  $\beta_p^k$  is a legal view that preserves the *q*'s process order  $\prec_q$  on  $\alpha^k$ , for all *q*. Then, the result follows from Lemma 2, from the fact that the channel connecting *isp*<sup>*l*</sup> to *isp*<sup>*k*</sup> is reliable and FIFO, and from the implementation of task *Propagate*\_{in}^k (see Fig. 3).  $\Box$ 

**Lemma 4.**  $\beta_p^T$  preserves  $\prec_q$  for all q.

**Proof.** By way of contradiction, let us assume that  $\beta_p^T$  does not preserve the order among operations issued by a process q of  $S^T$ . Hence, there must be at least two operations op and op' of  $\alpha_p^T$  issued by q such that  $op \prec_q op'$  but op' precedes op in  $\beta_p^T$ . Let us consider two possible cases.

- *Case* 1: *q* is in *S<sup>k</sup>*. Since *op*' precedes *op* in  $\beta_p^T$ , *op*' also precedes *op* in  $\beta_p^k$ , by definition of  $\beta_p^T$ . Then,  $\beta_p^k$  does not preserve *q*'s process order  $\prec_q$ . However, this is not possible since, by definition,  $\beta_p^k$  is a legal view preserving  $\prec_q$ , for all *q*. Hence, we reach a contradiction.
- *Case* 2: *q* is in  $S^l$ ,  $l \neq k$ . Since both operations are in  $\beta_p^T$ , which only contains read operations from process *p* of system  $S^k$ , both must be write operations. Let *op* and *op'* be propagated as operations *prop(op)* and *prop(op')*, respectively, issued by process *isp<sup>k</sup>*. From Lemma 3, we have that *prop(op)*  $\rightarrow$  *prop(op')* in  $\beta_p^k$ . Observe now that, by definition, operation

*prop*(*op*) in  $\beta_p^k$  is replaced by *op* and operation *prop*(*op*') is replaced by *op*' to obtain  $\beta_p^T$ . Then *op* precedes *op*' in  $\beta_p^T$  and we reach a contradiction.  $\Box$ 

#### **Lemma 5.** $\beta_p^T$ is legal.

**Proof.** By definition,  $\beta_p^k$  is legal. Also by definition,  $\beta_p^T$  is obtained by replacing in  $\beta_p^k$  every write operation *op* from *isp*<sup>k</sup> by the write operation *orig*(*op*). Therefore,  $\beta_p^T$  is legal.  $\Box$ 

**Theorem 7** (Corresponds to Theorem 3). The system  $S^T$  obtained by connecting N FIFO ordered pRAM systems  $S^0, \ldots, S^{N-1}$  using the pRAM IS-protocol in Fig. 3 is pRAM.

**Proof.** Let N = 2. From Lemma 1,  $\beta_p^T$  is formed by all operations of  $\alpha_p^T$ . Also, from Lemma 4,  $\beta_p^T$  preserves  $\prec_q$  for all q. Finally, from Lemma 5,  $\beta_p^T$  is legal. Then,  $\beta_p^T$  is a legal view of  $\alpha_p^T$  preserving  $\prec_q$  for all q. Hence,  $S^T$  is a pRAM system. The extension to more than 2 systems follows from Observation 1.  $\Box$ 

#### A.2. Correctness proof of Theorem 5

Let *p* be some process in system  $S^k$ ,  $k \in \{0, 1\}$ , and let mcs(p) be its *MCS-process*. Recall that  $\alpha_p^k$  (resp.  $\alpha_p^T$ ) is the set obtained by removing from  $\alpha^k$  (resp.  $\alpha^T$ ) all read operations except those from process *p*. We define  $\beta_p^k$  as a sequence with the same operations as  $\alpha_p^k$  that preserves the order in which all operations of  $\alpha_p^k$  are issued by process *p*, and the order in which every write operation is applied in mcs(p). Formally,

**Definition 12.** Let  $\beta_p^k$  be a sequence of the operations in  $\alpha_p^k$ . Let *op* and *op'* be in  $\alpha_p^k$ . Then *op*  $\rightarrow$  *op'* in  $\beta_p^k$ , if any of the following happen:

- (1) op and op' are operations from the same process p of  $S^k$  and  $op \prec_p op'$ .
- (2)  $op = w_q^k(x)u$ ,  $op' = w_s^k(y)v$ , and in mcs(p) the local copy of x is updated with u before updating y with v.
- (3)  $op = w_q^k(x)u$ ,  $op' = r_p^k(y)v$ , and in mcs(p) the local copy of x is updated with u before p issues op'.

Note that, as in  $\alpha_p^k$ , every write operation of process  $isp^k$  in  $\beta_p^k$  is the propagation of a write operation issued by a process of  $S^l$ ,  $l \neq k$ . We define  $\beta_p^T$  as the sequence obtained by replacing in  $\beta_p^k$  every write operation *op* from *isp*<sup>k</sup> by the write operation *orig*(*op*).

**Definition 13** (*Non-Transitive Execution Order*). Let *op* and *op'* be two operations in an execution  $\alpha$ . Then *op* precedes *op'* in the non-transitive execution order (*op*  $\prec_{nt} op'$ ) on  $\alpha$  if any of the following holds:

- (1) *op* and *op'* are operations from the same process *p* and *op*  $\prec_p op'$  on  $\alpha$ .
- (2) op = w(x)v and op' = r(x)v.

**Definition 14** ( $\prec$ -*Related Sequence*). Let *op* and *op'* be two operations in an execution  $\alpha$  such that *op*  $\prec$  *op'* on  $\alpha$ . A  $\prec$ -*related sequence* between *op* and *op'* is a sequence of operations  $op^1, op^2, \ldots, op^m$  belonging to  $\alpha$  such that  $op^1 = op, op^m = op'$ , and  $op^i \prec_{nt} op^{i+1}$  on  $\alpha$ , for  $1 \le i < m$ .

Note that at least one  $\prec$ -related sequence always exists between *op* and *op'* if *op*  $\prec$  *op'* on  $\alpha$ .

When considering the composed system  $S^T$ , a  $\prec$ -related sequence *Seq* between operations *op* and *op'* of execution  $\alpha^T$  can be divided into *n* subsequences *subSeq*<sub>1</sub>, *subSeq*<sub>2</sub>, ..., *subSeq*<sub>n</sub>, such that all the operations in subsequence *subSeq*<sub>i</sub> belong to the same system  $S^k$  and the operations in consecutive subsequences

belong to different systems. We use  $subSeq_i^k$  to express that all the operations of the *i*th subsequence belong to system  $S^k$ .

We use *first*(*subSeq*<sup>k</sup><sub>i</sub>) and *last*(*subSeq*<sup>k</sup><sub>i</sub>) to denote the first and last operation of the subsequence *subSeq*<sup>k</sup><sub>i</sub>, respectively. Note that, in two consecutive subsequences *subSeq*<sup>k</sup><sub>i</sub> and *subSeq*<sup>l</sup><sub>i+1</sub> of a given sequence, *last*(*subSeq*<sup>k</sup><sub>i</sub>) =  $w_j^k(x)v$  and *first*(*subSeq*<sup>l</sup><sub>i+1</sub>) =  $r_l^l(x)v$ , i.e. the first operation of the later subsequence reads the value written by the last operation of the former subsequence.

**Lemma 6.** Let op and op' be two operations in  $\alpha_p^T$  issued in system  $S^k$  such that op  $\prec$  op' in  $\alpha^T$ . If there is a  $\prec$ -related sequence between op and op' with one single subsequence subSeq<sup>k</sup>, then op  $\rightarrow$  op' in  $\beta_p^k$ .

**Proof.** Let us assume, by way of contradiction, that the claim does not hold. Then,  $op \prec op'$  on  $\alpha^k$ , and  $op' \rightarrow op$  in  $\beta_p^k$ . This is only possible if there are at least two "consecutive" operations  $op^i$  and  $op^{i+n}$  in  $subSeq_1^k$  and belonging to  $\alpha_t^k$  such that  $op^{i+n} \rightarrow op^i$  in  $\beta_p^k$ . We say  $op^{i+n}$  and  $op^i$  are two consecutive operations in  $subSeq_1^k$  if they are in  $\alpha_t^k$ ,  $t \neq p$ , and between them there is no other operation belonging to  $\alpha_p^k$  (i.e., every operation  $op^{i+l}$ ,  $1 \leq l < n$ , is a read operation issued by a process other than p). Note that if n > 1 then these two consecutive operations  $op^i$  and  $op^{i+n}$  can only be write operations. We have three cases:

- *Case* 1:  $op^i = w^k(x)v$  and  $op^{i+n} = w^k(y)u$ . From the definition of the  $\prec$ -related sequence,  $op^i \prec op^{i+n}$  on  $\alpha^k$ . As the system  $S^k$  is Globally Ordered, if  $op^i \prec op^{i+n}$  on  $\alpha^k$ , then  $op^i$  must be applied in all processes of  $S^k$  (and, of course, in p) before  $op^{i+n}$ . Therefore, from the second condition of Definition 12,  $op^i \rightarrow op^{i+n}$  in  $\beta_n^k$ , and we reach a contradiction.
- op<sup>i</sup> → op<sup>i+n</sup> in β<sup>k</sup><sub>p</sub>, and we reach a contradiction.
  Case 2: op<sup>i</sup> = w<sup>k</sup>(x)v and op<sup>i+1</sup> = r<sup>k</sup><sub>p</sub>(x)v. From the definition of the ≺-related sequence, op<sup>i</sup> ≺<sub>nt</sub> op<sup>i+1</sup> on α<sup>k</sup>. Obviously, the write operation w<sup>k</sup>(x)v must be applied before issuing r<sup>k</sup><sub>p</sub>(x)v, since, otherwise, op<sup>i+1</sup> could not obtain the value v in x. Therefore, from the third condition of Definition 12, op<sup>i</sup> → op<sup>i+1</sup> in β<sup>k</sup><sub>p</sub>, and we reach a contradiction.
- *Case* 3:  $op^i$  and  $op^{i+1}$  are issued by the same process *p*. From the definition of the  $\prec$ -related sequence,  $op^i \prec_{nt} op^{i+1}$  on  $\alpha^k$  and, from case 1 of  $\prec_{nt}$ ,  $op^i \prec_p op^{i+1}$ . Then, from the first condition of Definition 12,  $op^i \rightarrow op^{i+1}$  in  $\beta_p^k$ , and we reach a contradiction.

**Lemma 7.** Let op and op' be two operations in  $\alpha_n^T$ .

- (1) If they are issued by system  $S^k$  and  $op \prec op'$  on  $\alpha^T$ , then  $op \rightarrow op'$ in  $\beta_p^k$ .
- (2) If they are issued by system  $S^k$  and  $op = w^k(x)v$  and  $op' = w^k(y)u$  and  $op \prec op'$  on  $\alpha^T$ , then Propagate<sup>k</sup><sub>out</sub> will send the pairs  $\langle x, v \rangle$  and  $\langle y, u \rangle$  to  $S^l$  in this order.
- (3) If they are issued by system  $S^l$ ,  $l \neq k$ , and are write operations and  $op \prec op'$  on  $\alpha^T$ , then prop(op)  $\rightarrow prop(op')$  in  $\beta_n^k$ .
- (4) If they are issued by systems  $S^l$  and  $S^k$  respectively and  $op = w^l(x)v \prec op'$  on  $\alpha^T$ , then prop $(op) \rightarrow op'$  in  $\beta_n^k$ .
- (5) If they are issued by systems  $S^k$  and  $S^l$  respectively and op  $\prec$ op' =  $w^l(x)v$  on  $\alpha^T$ , then op  $\rightarrow$  prop(op') in  $\beta_p^k$ .
- **Proof.** Proof of Part 1: Let Seq be a  $\prec$ -related sequence between op and op'. We use induction on the number of subsequences of Seq to show the result. Note that this number has to be odd. In the base case, the sequence Seq has only one subsequence  $subSeq_1^k$ . Hence, from Lemma 6,  $op = first(subSeq_1^k) \rightarrow op' = last(subSeq_1^k)$ in  $\beta_p^k$ . Assume the claim is true for sequences with *i* subsequences. We show it also holds if Seq has i + 2subsequences. By the induction hypothesis, we have that

 $op = first(subSeq_1^k) \rightarrow last(subSeq_i^k)$  in  $\beta_p^k$ . Note that  $last(subSeq_i^k) = w_t^k(x)v$  is propagated to system  $S^l, l \neq k$ , by process  $isp^k$  after, in all processes of  $S^k$ , the local copy of x is updated with the value v. Later on,  $isp^l$  propagates the pair (y, u) from  $last(subSeq_{i+1}^l) = w_q^l(y)u$  as  $w_{isp^k}^k(y)u$ (see Fig. 4). Then,  $w_t^k(x)v$  is applied by all processes in  $S^k$ (and, of course, by p) before  $w_{isp^k}^k(y)u$  and therefore, from the second condition of  $\beta_p^k$  in Definition 12,  $w_t^k(x)v \rightarrow w_{isp^k}^k(y)u$  in  $\beta_p^k$ . From the second condition of  $\prec$ -related order,  $w_{isp}^k(y)u \prec_n t$  first(subSeq\_{i+2}^k) =  $r_s^k(y)u$  on  $\alpha^k$ , and then,  $w_{isp}^k(y)u \prec_o p' = last(subSeq_{i+2}^k)$  on  $\alpha^k$ . Then, from Lemma 6,  $w_{isp^k}^k(y)u \rightarrow op' = last(subSeq_{i+2}^k)$  in  $\beta_p^k$ . Hence, by transitivity,  $op = first(subSeq_1^k) \rightarrow op' = last(subSeq_{i+2}^k)$  in  $\beta_p^k$ .

- *Proof of Part 2*: If there is a  $\checkmark$ -related sequence between *op* and *op'* with a single subsequence, then *op*  $\prec$  *op'* on  $\alpha^k$ , and it follows (since the system is globally ordered) that *op* is applied in all processes of  $S^k$  before *op'*. Otherwise, the proof of Part 1 shows the same fact when the  $\prec$ -related sequences between *op* and *op'* have more than one subsequence. Thus, since the task *Propagate*<sup>k</sup><sub>out</sub> of our IS–protocol (see Fig. 4) propagates operations in the order they are locally applied, it will send the pair  $\langle x, v \rangle$  of *op* to  $S^l$  before the pair  $\langle y, u \rangle$  of *op'*.
- Proof of Part 3: From Part 1,  $op \rightarrow op'$  in  $\beta_q^l$ ,  $l \neq k$ . Then, the result follows from Part 2, from the fact that the channel connecting  $isp^l$  to  $isp^k$  is reliable and FIFO, and from the implementation of task  $Propagate_{in}^k$  (see Fig. 4). The process  $isp^k$  issues prop(op) and prop(op') in  $S^k$  and, thus, from the first condition of execution order,  $prop(op) \prec prop(op')$  on  $\alpha^k$ . Hence, from Lemma 6,  $prop(op) \rightarrow prop(op')$  in  $\beta_n^k$ .
- Proof of Part 4: Let  $Seq^{k}$  be a  $\prec$ -related sequence with n subsequences between op and op'. Let us assume  $last(subSeq_{n-1}^{l}) = w_{q}^{l}(y)u$  and  $first(subSeq_{n}^{k}) = r_{s}^{k}(y)u$ . From Part 3,  $prop(op) \rightarrow prop(last(subSeq_{n-1}^{l})) = prop(w_{q}^{l}(y)u) = w_{isp^{k}}^{k}(y)u$  in  $\beta_{p}^{k}$ . From the second condition of  $\prec$ -related order,  $w_{isp}^{k}(y)u \prec_{nt} first(subSeq_{n}^{k}) = r_{s}^{k}(y)u$  on  $\alpha^{k}$ , and then,  $w_{isp}^{k}(y)u \prec_{op'} = last(subSeq_{n}^{k})$  on  $\alpha^{k}$ . As we know, because the system  $S^{k}$  is Globally Ordered and from Definition 12 of  $\beta_{p}^{k}, w_{isp^{k}}^{k}(y)u \rightarrow op' = last(subSeq_{n}^{k})$  in  $\beta_{p}^{k}$ . Hence, by transitivity,  $op = prop(op) \rightarrow op' = last(subSeq_{n}^{k})$  in  $\beta_{p}^{k}$ .

*Proof of Part 5*: Similar to the proof of Part 4.

#### **Lemma 8.** $\beta_p^T$ preserves $\prec$ .

**Proof.** In this proof we show that if there are two operations *op* and *op'* in  $\alpha_p^T$  such that  $op \prec op'$  on  $\alpha^T$ , then  $op \rightarrow op'$  in  $\beta_p^T$ . Let us make a case analysis:

- Case op and op' are issued by processes in  $S^k$ : From Part 1 of Lemma 7, if  $op \prec op'$  on  $\alpha^T$ , then  $op \rightarrow op'$  in  $\beta_p^k$ . Then, by definition of  $\beta_p^T$ , we have that  $op \rightarrow op'$  in  $\beta_p^T$ .
- Case op and op' are issued by processes in  $S^l$ , where  $l \neq k$ : Since both operations are in  $\alpha_p^T$ , which only contains read operations from process p of system  $S^k$ , both operations must be write operations. Then, let op and op' be propagated as prop(op) and prop(op') operations.

From Part 3 of Lemma 7, we have that if  $op \prec op'$  on  $\alpha^T$ , then  $prop(op) \rightarrow prop(op')$  in  $\beta_p^k$ . Hence, replacing prop(op) and

prop(op') by op and op', respectively, we have that, by definition of  $\beta_p^T$ ,  $op \to op'$  in  $\beta_p^T$ .

• Case op is issued by some process in  $S^l$  and op' is issued by some process in  $S^k$ , where  $l \neq k$ : op must be a write operation, since  $\alpha_p^T$  only contains read operations from process p of system  $S^k$ . Such an operation will be propagated from  $S^l$  to  $S^k$  as described by the IS-protocol and it will appear in  $S^k$  as a (write) operation prop(op) issued by process  $isp^k$ .

From Part 4 of Lemma 7, if  $op \prec op'$  on  $\alpha^T$ , then  $prop(op) \rightarrow op'$  in  $\beta_p^k$ . Hence, replacing prop(op) by op, we have that, by definition of  $\beta_p^T$ ,  $op \rightarrow op'$  in  $\beta_p^T$ .

• Case op is issued by some process in  $S^k$  and op' is issued by some process in  $S^l$ , where  $l \neq k$ : op' must be a write operation, since  $\alpha_p^T$  only contains read operations from process p of system  $S^k$ . Such an operation will be propagated from  $S^l$  to  $S^k$  as described by the IS-protocol and it will appear in  $S^k$  as a (write) operation prop(op') issued by process  $isp^k$ .

From Part 5 of Lemma 7, if  $op \prec op'$  on  $\alpha^T$ , then  $op \rightarrow prop(op')$  in  $\beta_p^k$ . Hence, replacing prop(op') by op', we have that, by definition of  $\beta_p^T$ ,  $op \rightarrow op'$  in  $\beta_p^T$ .  $\Box$ 

#### **Lemma 9.** $\beta_p^T$ is legal.

**Proof.** If process *p* issues some read operation  $op = r_p^k(x)u$  is because, when this operation is invoked, it has the value *u* in its local copy of *x*. Then, the latest write operation applied on *x* in *p* is  $op' = w^k(x)u$ . Hence, from the third condition of Definition 12, op' must be the previous nearest write operation on *x* in  $\beta_p^k$ . Therefore, from Definition 5,  $\beta_p^k$  is legal. Note that, by definition of  $\beta_p^T$ , if we replace in  $\beta_p^k$  every write operation *op* from *isp<sup>k</sup>* by the write operation *orig(op)*, we obtain  $\beta_p^T$ . Then,  $\beta_p^T$  is legal.

**Theorem 8** (Corresponds to Theorem 5). The system  $S^T$  obtained by connecting N globally ordered causal systems  $S^0, \ldots, S^{N-1}$  using the causal IS-protocol in Fig. 4 is causal.

**Proof.** Let N = 2. From Lemma 1,  $\beta_p^T$  is formed by all operations of  $\alpha_p^T$ . Also, from Lemma 8,  $\beta_p^T$  preserves  $\prec$ . Finally, from Lemma 9,  $\beta_p^T$  is legal. Then,  $\beta_p^T$  is a legal view of  $\alpha_p^T$  preserving  $\prec$ . Hence,  $S^T$  is a causal system. The extension to more than 2 systems follows from Observation 1.  $\Box$ 

#### A.3. Correctness proof of Theorem 6

Let  $\beta_x^k$  be a legal view of  $\alpha(x)^k$  preserving  $\prec$  on  $\alpha(x)^k$ , as described in Definition 5. Such a legal view must exist by the fact that  $S^k$  is a cache system. We define  $op_i$  as the *i*th write operation propagated by process *isp* from one system to the other (regardless of the system in which it is issued). We use  $op(x)_i^k$  to indicate that  $op_i$  is issued by some process in  $S^k$  on variable *x*. We use  $prop^l(op(x)_i^k)$  to denote the write operation issued by the task  $Propagate_i^l$  as a result of the propagation of  $op(x)_i^k$ .

We define  $\beta_{x,i}^k$  as the subsequence of operations of  $\beta_x^k$  issued by processes of  $S^k$  from  $op(x)_i^k$  (or  $prop^k(op(x)_i^l)$ ) until  $op(x)_{i+1}^k$  (or  $prop^k(op(x)_{i+1}^l)$ ) without including them.

We define  $\beta_{x,i}^T$  as the sequence formed by all operations issued by processes of  $S^T$  between the *i*th and *i*th + 1 propagation of write operations on variable *x* so that operations belonging to  $\alpha(x)^k$ follow the order they have in  $\beta_x^k$ , and operations belonging to  $\alpha(x)^l$ follow the order they have in  $\beta_x^l$ . Formally,  $\beta_{x,i}^T$  can be obtained as follows:  $op(x)_i \cdot head(x)_i^k \cdot head(x)_i^l \cdot tail(x)_i^k \cdot tail(x)_i^l$ , where  $head(x)_i^k$ denotes the subsequence of  $\beta_{x,i}^k$  that includes all read operations

from the beginning of  $\beta_{x,i}^k$  until the first write operation in  $\beta_{x,i}^k$  (not included), and  $tail(x)_i^k$  is the subsequence of  $\beta_{x,i}^k$  that includes all the operations in  $\beta_{x,i}^k$  that are not in  $head(x)_i^k$ .

Finally, we define  $\beta_x^T$  as the sequence obtained by concatenating the sequences of  $\beta_{x,i}^T$  such that  $\beta_{x,i}^T$  goes before  $\beta_{x,i+1}^T$ ,  $\forall i$ . In what follows, we will prove that it is a legal view of  $\alpha(x)^T$ .

#### **Lemma 10.** $\beta_x^T$ is a sequence formed by all operations of $\alpha(x)^T$ .

**Proof.**  $\alpha(x)^T$  is, by definition, the set of all operations in  $\alpha(x)^k$  and  $\alpha(x)^l$  issued by all processes of  $S^k$  and  $S^l$  other than  $isp^k$  and  $isp^l$  (i.e., by all processes of  $S^T$ ).

We know that  $\beta_x^k$  and  $\beta_x^l$  are sequences of all operations of  $\alpha(x)^k$  and  $\alpha(x)^l$ , respectively, because they are legal views. Then, since  $\beta_x^T$  is formed as the sequence of operations of  $S^T$  obtained by concatenating the sequences of legal views  $\beta_x^k$  and  $\beta_x^l$ , it is a sequence of all operations of  $\alpha(x)^T$ .  $\Box$ 

#### **Lemma 11.** $\beta_x^T$ preserves $\prec$ .

**Proof.** We show that if there are two operations *op* and *op'* in  $\alpha(x)^T$  such that  $op \prec op'$  on  $\alpha(x)^T$ , then  $op \rightarrow op'$  in  $\beta_x^T$ . We have two possible cases.

- *Case*1: *op* and *op'* have been issued by processes of the same system. Let us suppose that *op* and *op'* are issued by processes of *S<sup>k</sup>*. Note that, by definition, *S<sup>k</sup>* is a cache system. Then, from Definition 11, there must be a legal view  $\beta_x^k$  preserving the execution order  $\prec$  on  $\alpha(x)^k$  and hence, from Definition 4, if  $op \prec op'$  on  $\alpha(x)^k$  then  $op \rightarrow op'$  in  $\beta_x^k$ . It is easy to check from the definition of  $\beta_x^T$  that operations of  $\alpha^T$  and issued by processes of *S<sup>k</sup>* appear in  $\beta_x^T$  and in  $\beta_x^k$  in the same order. Hence,  $op \rightarrow op'$  in  $\beta_x^T$ .
- *Case* 2: *op* and *op'* have been issued by processes of different systems. Let us suppose that *op* is issued by some process of  $S^k$ , and *op'* is issued by some process of  $S^l$ . We know, from Case 1, that  $\beta_x^T$  preserves  $\prec$  on  $\alpha(x)^k$ , and also preserves  $\prec$  on  $\alpha(x)^l$ . Then,  $\beta_x^T$  will preserve  $\prec$  on  $\alpha(x)^T$  if it also preserves  $\prec$  on  $\alpha(x)^l$ . Then,  $\beta_x^T$ , it is enough to show that the second condition of the Execution Order is preserved between two operations *op* and *op'* from different systems such that *op* =  $w^k(x)u$  and *op* =  $r^l(x)u$  in  $\beta_{x,i}^T$ . We can see, by definition, that *op* must be the *i*<sup>th</sup> write operation propagated from  $S^k$  to  $S^l$  (that is, *op*( $x_i^k$ ), and *op'* is a read operation in *head*( $x_i^l$ . Then, by definition, *op*  $\rightarrow$  *op'* in  $\beta_{x,i}^T$ .  $\Box$

#### **Lemma 12.** $\beta_x^T$ is legal.

**Proof.** Let op = r(x)u be a read operation of  $\beta_x^T$ . From Definition 5,  $\beta_x^T$  is legal if op' = w(x)u is the nearest previous write operation to op in  $\beta_x^T$ . We know, by definition, that  $\beta_x^k$  is the same sequence as  $\beta_x^T$  but replacing each write operation op from  $isp^k$  by prop(op). We have two possible cases.

- *Case* 1:  $op = r^k(x)u$  and  $op' = w^k(x)u$  are operations in  $\alpha(x)^T$  issued by processes of  $S^k$ . By definition, as  $\beta_x^k$  is a legal view of execution  $\alpha(x)^k$  preserving  $\prec$  on  $\alpha(x)^k$ ,  $op' = w^k(x)u$  is the nearest previous write operation to  $op = r^k(x)u$  in  $\beta_x^k$ . Then, by definition of  $\beta_x^T$ , op' is also the nearest previous write operation to op in  $\beta_x^T$ . Therefore,  $\beta_x^T$  is legal.
- *Case2*:  $op = r^k(x)u$  and  $op' = w^l(x)u$  are operations in  $\alpha(x)^T$ issued by systems  $S^k$  and  $S^l$  respectively. Let  $op' = w^l(x)u$  be the write operation  $op(x)_i^l$ . Then, its corresponding write operation in  $S^k$  is  $prop^k(op(x)_i^l) = w_{isp^k}^k(x)u$ . By definition, as  $\beta_x^k$  is a legal view of  $\alpha(x)^k$  preserving  $\prec$  on  $\alpha(x)^k$ ,  $prop^k(op(x)_i^l)$  is the nearest previous write operation to op in  $\beta_x^k$ . Then, by definition of  $\beta_x^T$ ,  $prop^k(op(x)_i^l)$  is replaced by  $op' = op(x)_i^l$  to obtain  $\beta_x^T$ , and

 $op' = op(x)_i^l$  is also the nearest previous write operation to op in  $\beta_x^T$ . Therefore,  $\beta_x^T$  is legal.  $\Box$ 

**Theorem 9** (Corresponds to Theorem 6). The system  $S^T$  obtained by connecting N cache systems (regardless of whether they are FIFO/globally ordered or not)  $S^0, \ldots, S^{N-1}$ , using the cache ISprotocol in Fig. 5 is cache.

**Proof.** Let N = 2. From Lemma 10,  $\beta_x^T$  is formed by all operations of  $\alpha(x)^T$ . From Lemma 11,  $\beta_x^T$  preserves  $\prec$ . Finally, from Lemma 12,  $\beta_x^T$  is legal. Then,  $\beta_x^T$  is a legal view of  $\alpha(x)^T$  preserving  $\prec$ . Hence,  $S^T$  is a cache system. The extension to more than 2 systems follows from Observation 1.  $\Box$ 

#### References

- M. Ahamad, R. Bazzi, R. John, P. Kohli, G. Neiger, The power of processor consistency, in: Proceedings of the 5th ACM Symposium on Parallel Algorithms and Architectures, 1993, pp. 251–260.
- [2] M. Ahamad, G. Neiger, J. Burns, P. Kohli, P. Hutto, Causal memory: Definitions, implementation and programming, Distrib. Comput. 9 (1) (1995) 37–49.
- [3] M. Ahamad, M. Raynal, G. Thia-Kime, An adaptive protocol for implementing causally consistent distributed services, in: ICDCS, 1998, pp. 86–93.
- [4] H. Attiya, J. Welch, Sequential consistency versus linearizability, ACM Trans. Comput. Syst. 12 (2) (1994) 91–122.
- [5] H. Attiya, R. Friedman, Limitations of fast consistency conditions for distributed shared memories, Inform. Process. Lett. 57 (1996) 243–248.
- [6] B. Bershad, M. Zekauskas, W. Sawdon, The Midway distributed shared memory system, in: COMPCON, 1993.
- [7] V. Cholvi, J. Bernabéu, Relationships between memory models, Inform. Process. Lett. 90 (2) (2004) 53–58.
- [8] A. Fernández, E. Jiménez, V. Cholvi, On the interconnection of causal memory systems, J. Parallel Distrib. Comput. 64 (4) (2004) 498–506.
- [9] K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, J. Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, in: Proceedings of the 7th Annual International Symposium on Computer Architecture, ACM, 1990, pp. 15–26.
- [10] J. Goodman, Cache consistency and sequential consistency, Tech. Rep. 61, IEEE Scalable Coherence Interface Working Group, March 1989.
- [11] S. Haldar, K. Vidyasankar, On specification of read/write shared variables, J. ACM 54 (6) (2007) 31.
- [12] M. Herlihy, J. Wing, Linearizability: A correctness condition for concurrent objects, ACM Trans. Program. Lang. Syst. 12 (3) (1990) 463–492.
- [13] L. Higham, L. Jackson, J. Kawash, Specifying memory consistency of write buffer multiprocessors, ACM Trans. Comput. Syst. 25 (1) (2007) 1.
- [14] L. Iftode, J.P. Singh, K. Li, Scope consistency: a bridge between release consistency and entry consistency, in: SPAA'96: Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures, ACM Press, New York, NY, USA, 1996, pp. 277–287.
- [15] E. Jiménez, A. Fernández, V. Cholvi, Decoupled interconnection of distributed memory models, in: OPODIS, 2003, pp. 235–246.
- [16] E. Jiménez, A. Fernández, V. Cholvi, A parametrized algorithm that implements sequential, causal, and cache memory consistencies, J. Syst. Softw. 81 (1) (2008) 120–131.
- [17] P. Keleher, Distributed shared memory using lazy consistency, Ph.D. Thesis, Rice University, 1994.
- [18] L. Lamport, How to make a multiprocessor computer that correctly executes multiprocess programs, IEEE Trans. Comput. 28 (9) (1979) 690–691.
- [19] L. Lamport, On interprocess communication: Parts I and II, Distrib. Comput. 1 (2) (1986) 77–101.
- [20] R. Lipton, J. Sandberg, PRAM: A scalable shared memory, Tech. Rep. CS-TR-180-88, Princeton University, Department of Computer Science, September 1988.
- [21] R.C. Steinke, G.J. Nutt, A unified theory of shared memory consistency, J. ACM 51 (5) (2004) 800–849.



Vicent Cholvi graduated in Physics from the Universitat de València (Spain) and received his doctorate in Computer Science in 1994 from the Polytechnic University of Valencia (Spain). In 1995, he joined the Jaume I University in Castelló (Spain) where he is currently an associate professor. His interests are in distributed and communication systems.



**Ernesto Jiménez** graduated in Computer Science from the Universidad Politécnica de Madrid (Spain) and got a Ph.D. in Computer Science from the University Rey Juan Carlos (Spain) in 2004. He is currently an associate professor at the Universidad Politécnica de Madrid.



Antonio Fernández Anta is a professor at the Universidad Rey Juan Carlos in Madrid, Spain, since 1998. Previously, he was on the faculty of the Universidad Politécnica de Madrid. He graduated in Computer Science from the Universidad Politécnica de Madrid in 1991, and got a PhD in Computer Science from the University of Southwestern Louisiana in 1994. His research interests include data communications, computer networks, parallel and distributed processing, algorithms, and discrete and applied mathematics.

306