[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Real-time multiprocessor locks with nesting: optimizing the common case

  • Published:
Real-Time Systems Aims and scope Submit manuscript

Abstract

In prior work on multiprocessor real-time locking protocols, only protocols within the RNLP family support unrestricted lock nesting while guaranteeing asymptotically optimal priority-inversion blocking bounds. However, these protocols support nesting at the expense of increasing the cost of processing non-nested lock requests, which tend to be the common case in practice. To remedy this situation, a new fast-path mechanism is presented herein that extends prior RNLP variants by ensuring that non-nested requests are processed efficiently. This mechanism yields overhead and blocking costs for such requests that are nearly identical to those seen in the most efficient single-resource locking protocols. In experiments, the proposed fast-path mechanism enabled observed blocking times for non-nested requests that were up to 18 times lower than under an existing RNLP variant and improved schedulability over that variant and a simple group lock.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. The terminology “fast-in-the-common-case RW-RNLP,” which is obviously too verbose, would be more technically precise.

  2. A job can also be pi-blocked at release by lower-priority jobs executing non-preemptively. By our analysis assumptions, this release blocking is asymptotically upper bounded by the maximum spin blocking for any such jobs, so we focus on spin blocking.

  3. Line 5 can be modified to handle overflow in and . In a spin-based implementation, at most m requests can be active at once, so and can be at most m apart. Therefore, the condition in Line 5 can be modified to to mitigate overflow.

  4. The lock and unlock routines for the \(\text {R}^3\text {LP}\) or RW-RNLP* routines have been denoted in a slightly abbreviated way. For example, W_Lock\(^{{nn}}\) denotes the lock routine invoked by non-nested write requests under the chosen protocol.

  5. More precisely, the bounds presented are \(L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\) and \((m-1)(L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}})\) for read and write requests, respectively.

  6. Entitlement is a property of a request, and Definition 1 and Definition 2 give conditions upon which a request becomes entitled in terms of the entitlement of other requests. Therefore, while Definitions 1 and  2 reference each other parenthetically to aid the reader, they are not in fact circularly defined.

  7. This extra phase erroneously combined Lines 3–6 into a single loop in the conference paper (Nemitz et al. 2017), given there as Listing 3, but the source code was correct. It has been corrected here.

  8. See online appendix: http://www.cs.unc.edu/~anderson/papers.html.

References

  • Afshar S, Behnam M, Bril R, Nolte T (2014) Flexible spin-lock model for resource sharing in multiprocessor real-time systems. In: Proceedings of the 9th IEEE international symposium on industrial embedded systems, pp 41–51. IEEE

  • Afshar S, Behnam M, Bril R, Nolte T (2017) An optimal spin-lock priority assignment algorithm for real-time multi-core systems. In: Proceedings of the 23rd IEEE international conference on embedded and real-time computing systems and applications, pp 1–11. IEEE

  • Afshar S, Behnam M, Bril R, Nolte T (2018) Per processor spin-based protocols for multiprocessor real-time systems. Leibniz Trans Embed Syst 4(2):1–30

    Google Scholar 

  • Afshar S, Behnam M, Nolte T (2013) Integrating independently developed real-time applications on a shared multi-core architecture. ACM SIGBED Rev 10(3):49–56

    Article  Google Scholar 

  • Afshar S, Khalilzad N, Nemati F, Nolte T (2015) Resource sharing among prioritized real-time applications on multiprocessors. ACM SIGBED Rev 12(1):46–55

    Article  Google Scholar 

  • Bacon D, Konuru R, Murthy C, Serrano M (1998) Thin locks: featherweight synchronization for Java. In: ACM SIGPLAN Notices, vol 33, pp 258–268. ACM

  • Baker TP (1991) Stack-based scheduling of realtime processes. Real-Time Syst 3(1):67–99

    Article  MathSciNet  Google Scholar 

  • Baruah S (2007) Techniques for multiprocessor global schedulability analysis. In: Proceedings of the 28th IEEE real-time systems symposium, pp 119–128. IEEE

  • Biondi A, Brandenburg B, Wieder A (2016) A blocking bound for nested FIFO spin locks. In: Proceedings of the 37th IEEE real-time systems symposium, pp 291–302. IEEE

  • Block A, Leontyev H, Brandenburg B, Anderson J (2007) A flexible real-time locking protocol for multiprocessors. In: Proceedings of the 13th IEEE international conference on embedded and real-time computing systems and applications, pp 71–80. IEEE

  • Brandenburg B (2011) Scheduling and locking in multiprocessor real-time operating systems. Ph.D. thesis, University of North Carolina, Chapel Hill, NC

  • Brandenburg B (2014) The FMLP+: an asymptotically optimal real-time locking protocol for suspension-aware analysis. In: Proceedings of the 26th euromicro conference on real-time systems, pp 61–71. IEEE

  • Brandenburg B, Anderson J (2007) Feather-trace: a lightweight event tracing toolkit. In: Proceedings of the 3rd international workshop on operating systems platforms for embedded real-time applications, pp 19–28

  • Brandenburg B, Anderson J (2008) A comparison of the M-PCP, D-PCP, and FMLP on LITMUS\(^{{\text{RT}}}\). In: Proceedings of the 12th international conference on principles of distributed systems, pp 105–124. Springer

  • Brandenburg B, Anderson J (2008) An implementation of the PCP, SRP, D-PCP, M-PCP, and FMLP real-time synchronization protocols in LITMUS\(^{{\text{ RT }}}\). In: Proceedings of the 14th IEEE international conference on embedded and real-time computing systems and applications, pp 185–194. IEEE

  • Brandenburg B, Anderson J (2010) Spin-based reader-writer synchronization for multiprocessor real-time systems. Real-Time Syst 46(1):25–87

    Article  MATH  Google Scholar 

  • Brandenburg B, Anderson J (2011) Real-time resource-sharing under clustered scheduling: mutex, reader-writer, and \(k\)-exclusion locks. In: Proceedings of the 9th ACM international conference on embedded software, pp 69–78. ACM

  • Brandenburg B, Anderson J (2013) The OMLP family of optimal multiprocessor real-time locking protocols. Des Autom Embed Syst 17(2):277–342

    Article  Google Scholar 

  • Burns A, Wellings A (2013) A schedulability compatible multiprocessor resource sharing protocol—MrsP. In: Proceedings of the 25th euromicro conference on real-time systems, pp 282–291. IEEE

  • Chang Y, Davis R, Wellings A (2010) Reducing queue lock pessimism in multiprocessor schedulability analysis. In: Proceedings of the 18th international conference on real-time and network systems, pp 99–108

  • Chen C, Tripathi S (1994) Multiprocessor priority ceiling based protocols. Tech. Rep. CS-TR-3252, University of Maryland

  • Courtois P, Heymans F, Parnas D (1971) Concurrent control with readers and writers. Commun ACM 14(10):667–668

    Article  Google Scholar 

  • Dijkstra E (1978) Two starvation free solutions to a general exclusion problem. EWD 625, Plataanstraat 5, 5671 Al Nuenen, The Netherlands

  • Elliott G, Anderson J (2013) An optimal k-exclusion real-time locking protocol motivated by multi-GPU systems. Real-Time Syst 49(2):140–170

    Article  MATH  Google Scholar 

  • Faggioli D, Lipari G, Cucinotta T (2010) The multiprocessor bandwidth inheritance protocol. In: Proceedings of the 22nd euromicro conference on real-time systems, pp 90–99. IEEE

  • Faggioli D, Lipari G, Cucinotta T (2012) Analysis and implementation of the multiprocessor bandwidth inheritance protocol. Real-Time Syst 48(6):789–825

    Article  MATH  Google Scholar 

  • Gai P, Di Natale M, Lipari G, Ferrari A, Gabellini C, Marceca P (2003) A comparison of MPCP and MSRP when sharing resources in the Janus multiple-processor on a chip platform. In: Proceedings of the 9th IEEE real-time and embedded technology and applications symposium, pp 189–198. IEEE

  • Gai P, Lipari G, Di Natale M (2001) Minimizing memory utilization of real-time task sets in single and multi-processor systems-on-a-chip. In: Proceedings of the 22nd IEEE real-time systems symposium, pp 73–83. IEEE

  • Garrido J, Zhao S, Burns A, Wellings A (2017) Supporting nested resources in MrsP. In: Proceedings of the Ada-Europe international conference on reliable software technologies, pp 73–86. Springer

  • Havender J (1968) Avoiding deadlock in multitasking systems. IBM Syst J 7(2):74–84

    Article  Google Scholar 

  • Herlihy M, Wing J (1990) Linearizability: a correctness condition for concurrent objects. ACM Trans Progr Lang Syst 12(3):463–492

    Article  Google Scholar 

  • Huang H, Pillai P, Shin K (2002) Improving wait-free algorithms for interprocess communication in embedded real-time systems. In: Proceedings of the general track of the annual conference on USENIX annual technical conference, pp 303–316. USENIX Association

  • Jarrett C, Ward B, Anderson J (2015) A contention-sensitive fine-grained locking protocol for multiprocessor real-time systems. In: Proceedings of the 23rd international conference on real-time networks and systems, pp 3–12. ACM

  • Joung Y (2000) Asynchronous group mutual exclusion. Distrib Comput 13(4):189–206

    Article  Google Scholar 

  • Keane P, Moir M (1999) A simple local-spin group mutual exclusion algorithm. In: Proceedings of the 18th annual ACM symposium on principles of distributed computing, pp 23–32. ACM

  • Lakshmanan K, Niz D, Rajkumar R (2009) Coordinated task scheduling, allocation and synchronization on multiprocessors. In: Proceedings of the 30th IEEE real-time systems symposium, pp 469–478. IEEE

  • Mellor-Crummey J, Scott M (1991) Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans Comput Syst 9(1):21–65

    Article  Google Scholar 

  • Nemati F, Behnam M, Nolte T (2011) Independently-developed real-time systems on multi-cores with shared resources. In: Proceedings of the 23rd euromicro conference on real-time systems, pp 251–261. IEEE

  • Nemitz C, Amert T, Anderson J (2017) Real-time multiprocessor locks with nesting: optimizing the common case. In: Proceedings of the 25th international conference on real-time networks and systems, pp 38–47. ACM

  • Nemitz C, Amert T, Anderson J (2018) Using lock servers to scale real-time locking protocols: chasing ever-increasing core counts. In: Proceedings of the 30th euromicro conference on real-time systems, pp 25:1–25:24. Schloss Dagstuhl–Leibniz-Zentrum für Informatik

  • Rajkumar R (1990) Real-time synchronization protocols for shared memory multiprocessors. In: Proceedings of the 10th international conference on distributed computing systems, pp 116–123. IEEE

  • Rajkumar R (1991) Synchronization in real-time systems: a priority inheritance approach. Kluwer Academic Publishers, Boston

    Book  MATH  Google Scholar 

  • Rajkumar R, Sha L, Lehoczky J (1988) Real-time synchronization protocols for multiprocessors. In: Proceedings of the 9th IEEE real-time systems symposium, pp 259–269. IEEE

  • Sha L, Rajkumar R, Lehoczky JP (1990) Priority inheritance protocols: an approach to real-time synchronization. IEEE Trans Comput 39(9):1175–1185

    Article  MathSciNet  MATH  Google Scholar 

  • SchedCAT: schedulability test collection and toolkit. http://www.mpi-sws.org/bbb/projects/schedcat. Accessed 31 May 2018

  • Takada H, Sakamura K (1995) Real-time scalability of nested spin locks. In: Proceedings of the 2nd IEEE international workshop on real-time computing systems and applications, pp 160–167. IEEE

  • Wang C, Takada H, Sakamura K (1996) Priority inheritance spin locks for multiprocessor real-time systems. In: Proceedings of the 2nd IEEE international symposium on parallel architectures, algorithms, and networks, pp 70–76. IEEE

  • Ward B (2015) Relaxing resource-sharing constraints for improved hardware management and schedulability. In: Proceedings of the 36th IEEE real-time systems symposium, pp 153–164. IEEE

  • Ward B (2016) Sharing non-processor resources in multiprocessor real-time systems. Ph.D. thesis, University of North Carolina, Chapel Hill, NC

  • Ward B, Anderson J (2012) Supporting nested locking in multiprocessor real-time systems. In: Proceedings of the 23rd euromicro conference on real-time systems, pp 223–232. IEEE

  • Ward B, Anderson J (2013) Fine-grained multiprocessor real-time locking with improved blocking. In: Proceedings of the 21st international conference on real-time networks and systems, pp 67–76. ACM

  • Ward B, Anderson J (2014) Multi-resource real-time reader/writer locks for multiprocessors. In: Proceedings of the 28th IEEE international parallel and distributed processing symposium, pp 177–186. IEEE

  • Ward B, Elliott G, Anderson J (2012) Replica-request priority donation: a real-time progress mechanism for global locking protocols. In: Proceedings of the 18th IEEE international conference on embedded and real-time computing systems and applications, pp 280–289. IEEE

  • Wieder A, Brandenburg B (2013) On spin locks in AUTOSAR: blocking analysis of FIFO, unordered, and priority-ordered spin locks. In: Proceedings of the 34th IEEE real-time systems symposium, pp 45–56. IEEE

  • Wieder A, Brandenburg B (2014) On the complexity of worst-case blocking analysis of nested critical sections. In: Proceedings of the 35th IEEE real-time systems symposium, pp 106–117. IEEE

  • Yang M, Wieder A, Brandenburg B (2015) Global real-time semaphore protocols: a survey, unified analysis, and comparison. In: Proceedings of the 36th IEEE real-time systems symposium, pp 1–12. IEEE

  • Zhao S, Garrido J, Burns A, Wellings A (2017) New schedulability analysis for MrsP. In: Proceedings of the 23rd IEEE international conference on embedded and real-time computing systems and applications, pp 1–10. IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catherine E. Nemitz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Work supported by NSF Grants CNS 1409175, CPS 1446631, CNS 1563845, and CNS 1717589, AFOSR Grant FA9550-14-1-0161, ARO Grant W911NF-14-1-0499, ARO Grant W911NF-17-1-0294, and funding from General Motors. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGS 1650116. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Additional details

Additional details

This appendix provides additional details on several claims made in the body of the paper.

1.1 Tight blocking bounds for the RW-RNLP*

To show that each blocking bound proven for the RW-RNLP* is tight, we show that each worst-case bound can actually occur by means of examples. An example corresponding to each lemma and theorem about the RW-RNLP* is presented below in the order in which the lemmas and theorems appear in Sect. 4.4. In each example, requests are numbered in the order in which they were issued.

Lemma 5 bounds the acquisition delay that a write request can experience after becoming entitled to \(L ^r_{{\textit{max}}}\).

Fig. 16
figure 16

A simple example that shows worst-case acquisition delay for a read request and the acquisition delay a write may experience after becoming entitled

Example 6

As shown in Fig. 16, write request \({\mathcal {R}}^w_2\), issued just after \({\mathcal {R}}^r_1\), is immediately entitled and can experience \(L ^r_{{\textit{max}}}\) acquisition delay. This is exactly the upper bound presented in Lemma 5.

Theorem 3 bounds the acquisition delay a read request can experience to \(L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\).

Example 7

In Fig. 16, read request \({\mathcal {R}}^r_3\) experiences an acquisition delay of up to \(L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\) time units. It was issued after the issuance of requests \({\mathcal {R}}^r_1\) and \({\mathcal {R}}^w_2\), all for the same resource. \({\mathcal {R}}^r_3\) cannot be satisfied initially, as \({\mathcal {R}}^w_2\) is entitled. Therefore it waits for up to \(L ^r_{{\textit{max}}}\) time units for \({\mathcal {R}}^r_1\) to complete. Once \({\mathcal {R}}^w_2\) is satisfied, \({\mathcal {R}}^r_3\) waits for up to \(L ^w_{{\textit{max}}}\) time units for \({\mathcal {R}}^w_2\) to complete before acquiring the resource.

According to Lemma 6, a write request \({\mathcal {R}}^w_i\) may experience up to \(L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\) blocking after becoming the earliest-timestamped active write request for each resource in \(D_i\).

Fig. 17
figure 17

An issuance order which may cause the maximum blocking after a write request \({\mathcal {R}}^w_3\) becomes the earliest-timestamped active write request for each of its resources, here just \({\ell } _2\)

Example 8

Similarly to the previous examples, in Fig. 17, write request \({\mathcal {R}}^w_3\) can experience the worst-case delay stated in Lemma 6. Because requests were issued in increasing index order, \({\mathcal {R}}^w_3\) can potentially block for the entire critical sections of \({\mathcal {R}}^w_1\) and \({\mathcal {R}}^r_2\), which can be as high as \(L ^w_{{\textit{max}}}\) and \(L ^r_{{\textit{max}}}\), respectively.

The earliest-timestamped non-nested write request with no nested requests present can experience blocking of \(L ^r_{{\textit{max}}}\). This upper-bound proven in Lemma 7 is shown to be tight in Example 6 with \({\mathcal {R}}^w_2\) as depicted in Fig. 16.

Lemma 8 bounds the time a nested write request must wait before becoming the earliest-timestamped write request for all of its resources to \(2L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\). The following example shows this bound is tight.

Fig. 18
figure 18

A series of read and write requests that illustrate the worst-case acquisition delay for nested and non-nested write requests

Example 9

As shown in Fig. 18a, \({\mathcal {R}}^{w,n}_4\) is not the earliest-timestamped active write request for each of \(D_4=\{{\ell } _2,{\ell } _3\}\) when it is issued. In fact, it must wait until \({\mathcal {R}}^w_3\) has completed. Given that each of these requests could have been issued immediately after each other and that \({\mathcal {R}}^w_4\) will need to wait until \({\mathcal {R}}^w_1\), \({\mathcal {R}}^r_2\), and \({\mathcal {R}}^w_3\) complete, \({\mathcal {R}}^w_4\) may wait up to \(2L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\) time units to become the earliest-timestamped active write request.

Lemma 9 has two cases for how soon a non-nested write request \({\mathcal {R}}^{w,nn}_i\) will become the earliest-timestamped request for each of its resources. Case (i) does not need an example: the worst-case delay for \({\mathcal {R}}^{w,nn}_i\) becoming the earliest-timestamped active write request in the RW-RNLP* for \(D_i\) is zero when no nested requests are active. Case (ii) bounds this time to \(4L ^w_{{\textit{max}}}+ 2L ^r_{{\textit{max}}}\) in the presence of nested requests.

Example 10

Consider \({\mathcal {R}}^w_5\) in Fig. 18. This non-nested write request may wait for up to \(4L ^w_{{\textit{max}}}+ 2L ^r_{{\textit{max}}}\) time units to become the earliest-timestamped request for \(D_5 = \{{\ell } _3\}\). To become the earliest-timestamped request for \(D_5\), \({\mathcal {R}}^w_5\) must wait for \({\mathcal {R}}^w_4\) to complete, which in turn must wait for \({\mathcal {R}}^w_3\) to complete. As shown in Fig. 18a, \({\mathcal {R}}^w_3\) may wait for up to \(L ^w_{{\textit{max}}}\) time units to become entitled (the time for \({\mathcal {R}}^w_1\) to complete, after which \({\mathcal {R}}^r_2\) is no longer entitled). After \({\mathcal {R}}^w_3\) becomes entitled, \({\mathcal {R}}^r_6\) is issued, as shown in (b). After up to \(L ^r_{{\textit{max}}}\) time units after \({\mathcal {R}}^w_3\) becomes entitled, \({\mathcal {R}}^r_2\) completes and \({\mathcal {R}}^w_3\) is satisfied. \({\mathcal {R}}^w_3\) may execute for just under \(L ^w_{{\textit{max}}}\) time units before \({\mathcal {R}}^w_7\) and \({\mathcal {R}}^r_8\) are issued, as shown in (c). Once \({\mathcal {R}}^w_3\) completes, \({\mathcal {R}}^w_4\) may still wait for up to \(L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\) time units before becoming satisfied (for \({\mathcal {R}}^w_7\) and \({\mathcal {R}}^r_8\) to complete). After \({\mathcal {R}}^w_4\) is satisfied, it may execute for \(L ^w_{{\textit{max}}}\) time units, after which \({\mathcal {R}}^w_5\) is finally the earliest-timestamped request for \(D_5\) after waiting for up to \(4L ^w_{{\textit{max}}}+ 2L ^r_{{\textit{max}}}\) time units.

Theorem 4 presents four bounds for write requests. Non-nested write requests may experience up to \(L ^r_{{\textit{max}}}\) time units of acquisition delay if no nested requests are active (illustrated in Fig. 16 and described in Example 6). If nested read requests may be present but no nested write requests are active, non-nested write requests may experience up to \(L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\) time units of acquisition delay, as illustrated in Fig. 17 and described in Example 8. The third bound presented in Theorem 4 is that non-nested write requests in the presence of nested requests may experience up \(5L ^w_{{\textit{max}}}+ 3L ^r_{{\textit{max}}}\) time units of acquisition delay (illustrated below). Finally, nested write requests may experience acquisition delay of up to \(3L ^w_{{\textit{max}}}+ 2L ^r_{{\textit{max}}}\) (also illustrated below).

Example 11

As illustrated by Fig. 18 and Example 10, \({\mathcal {R}}^w_5\) may wait for \(4L ^w_{{\textit{max}}}+ 2L ^r_{{\textit{max}}}\) time units to become the earliest-timestamped request for its resources. Suppose just before \({\mathcal {R}}^w_4\) completes, \({\mathcal {R}}^w_9\) and \({\mathcal {R}}^r_{10}\) are issued, as illustrated in Fig. 18d. (This is similar to the situation in Example 10 when \({\mathcal {R}}^w_7\) and \({\mathcal {R}}^r_8\) were issued just before the completion of \({\mathcal {R}}^w_3\).) \({\mathcal {R}}^w_5\) may indeed need to wait an additional \(L ^w_{{\textit{max}}}+ L ^r_{{\textit{max}}}\) time units before being satisfied, making its total acquisition delay \(5L ^w_{{\textit{max}}}+ 3L ^r_{{\textit{max}}}\) time units.

Figure 18 also illustrates that a nested write request, namely \({\mathcal {R}}^w_4\), may experience acquisition delay of \(3L ^w_{{\textit{max}}}+ 2L ^r_{{\textit{max}}}\). Indeed, \({\mathcal {R}}^w_4\) waits for the completion of three write requests (\({\mathcal {R}}^w_1\), \({\mathcal {R}}^w_3\), and \({\mathcal {R}}^w_7\)), which may only barely overlap, and two read phases (those of \({\mathcal {R}}^r_2\) and \({\mathcal {R}}^r_8\)) that do not overlap with any of the write requests.

1.2 Corner case for nested read requests

Fig. 19
figure 19

Illustrates the edge case in which a write request (\({\mathcal {R}}^w_4\)) would need to wait unnecessarily behind a nested read request (\({\mathcal {R}}^r_3\)) if the extra code step had not been added in Listing 5

If the extra phase in Lines 3-6 of Listing 5 is not included for the \(\textsc {R*\_Lock}^\textit{n}\) routine, a potential edge case exists, as demonstrated in Fig. 19. In this edge case, write requests suffer unnecessary transitive blocking caused by read requests incorrectly marking themselves entitled.

In this scenario, read request \({\mathcal {R}}^{r,nn}_1\) is satisfied and write request \({\mathcal {R}}^{w,nn}_2\) is entitled when read request \({\mathcal {R}}^{r,n}_3\) is issued. At this point, \({\mathcal {R}}^{r,nn}_1\) has completed the \(\textsc {R*\_Lock}^\textit{nn}\) routine and \({\mathcal {R}}^{w,nn}_2\) is waiting at Line 13 for its requested resource to become available.

Without Lines 3-6, \({\mathcal {R}}^{r,n}_3\) immediately modifies \({\ell } _1\)’s rin variable, effectively marking itself entitled, and waits at Line 12 for \({\mathcal {R}}^{w,nn}_2\) to complete. When \({\mathcal {R}}^{w,nn}_4\) is released, it must wait at Line 13 (Listing 4) for \({\mathcal {R}}^{r,n}_3\) to complete, even though it should have been immediately satisfied following the rules of the fast RW-RNLP.

Using the implementation given in Listing 5, however, \({\mathcal {R}}^{r,n}_3\) must wait at Line 6 due to \({\mathcal {R}}^{w,nn}_2\) having marked itself present in the bottom byte of \({\ell } _1\)’s rin variable. This prevents \({\mathcal {R}}^{r,n}_3\) from modifying \({\ell } _1\)’s rin variable before it becomes entitled. Therefore, when \({\mathcal {R}}^{w,nn}_4\) is released, the condition at Line 13 in Listing 4 is true for its single resource \({\ell } _2\), so it is immediately satisfied.

1.3 Linearizability

Herlihy and Wing presented linearizability as a correctness condition for concurrent objects that “provides the illusion that each operation applied by concurrent processes takes effect instantaneously at some point between its invocation and its response.” Linearizability is a local property; if the operations on each object can be linearized, the system as a whole is considered to be linearizable (Herlihy and Wing 1990). In the following discussion, we focus on the fast RW-RNLP with the RW-RNLP* as our example, but the fast RW-RNLP with the \(\text {R}^3\text {LP}\) is linearizable as well.

In the body of the paper, we claim that each routine we presented has a linearization point; this is the point at which the routine can be considered to take effect (atomically). For the non-nested routines, these points are clear. A read request enqueues atomically at Line 3 (Listing 4) and can be viewed as executing the lock function as a whole atomically at the end of the procedure. Similarly for \(\textsc {R*\_Unlock}^\textit{nn}\), the routine’s linearization point can be considered to be at its invocation. The non-nested write routines function similarly, with linearization points at the end of the lock routine’s execution and the beginning of the execution of the unlock routine.

The nested routines grant access to groups of resources at a time (Listing 5). Considering the routines themselves, each call of the lock routine can be said to linearize to the last point in its execution. That is, no access to any of the requested resources occurs before that point in time, and the order of request accesses to those resources is exactly the order of termination of the lock routines. (Recall that linearization is defined relative to a specific resource; there may be requests for other resources occurring concurrently. These requests are not granted access clearly before or after the non-conflicting request. Again, linearization is a local property and there may be multiple legal sequential histories (Herlihy and Wing 1990).) Just like non-nested requests, the invocation of each unlock routine can be considered to be the linearization point of the entire routine.

Fig. 20
figure 20

Illustration of a series of lock and unlock calls by requests \({\mathcal {R}}_1\) through \({\mathcal {R}}_5\) with the linearization point of each operation shown with a circle

An example of the linearization of several objects is shown in Fig. 20. An operation invocation op on a set of shared resources \(D_i\) by request \({\mathcal {R}}_i\) is indicated by \(D_i~ \textit{op}~ {\mathcal {R}}_i\) above a line whose length corresponds to the duration of time each invocation takes. The linearization point of each operation’s execution is indicated with a circle at some point during its execution. As discussed above, this point can always be selected at the end of the execution of a lock operation and at the beginning of the execution of an unlock operation. In Fig. 20, time moves forward to the right.

Example 12

In Fig. 20, \({\mathcal {R}}^{w,n}_1\) is the first to begin executing the lock logic to gain mutually exclusive access to \(D_1 = \{{\ell } _1, {\ell } _2\}\). Then \({\mathcal {R}}^{w,nn}_2\) is issued for \(D_2 = \{{\ell } _2\}\). \({\mathcal {R}}^{w,nn}_2\) calls \(\textsc {W*\_Lock}^\textit{nn}\) for \({\ell } _2\). It is granted access to \({\ell } _2\) first (at the end of the lock routine), and then enters its critical section before calling \(\textsc {W*\_Unlock}^\textit{nn}\).

During \({\mathcal {R}}^{w,nn}_2\)’s execution of the lock operation, \({\mathcal {R}}^{r,n}_3\) invoked the lock call for \(D_3 = \{{\ell } _1, {\ell } _2\}\).

At some point \({\mathcal {R}}^{w,nn}_2\) completes its critical section and invokes the unlock routine. The unlock routine can be linearized to the point indicated in the Fig. 20, which clearly comes before the point at which \({\mathcal {R}}^{w,n}_1\) or \({\mathcal {R}}^{r,n}_3\) has linearized its respective lock call. Note that this properly reflects the mutually exclusive access for \({\mathcal {R}}^{w,nn}_2\) for \({\ell } _2\); a request is considered to access the resource between the linearization point for its call to the lock routine and the linearization point for its call to the unlock routine.

At a later point in time, \({\mathcal {R}}^{w,n}_1\) finishes execution of the lock routine, enters its critical section, and then calls the unlock routine.

While \({\mathcal {R}}^{w,n}_1\) is executing the unlock routine for \(D_1=\{{\ell } _1,{\ell } _2\}\), \({\mathcal {R}}^{r,nn}_4\) and \({\mathcal {R}}^{w,nn}_5\) are issued for \(D_4=\{{\ell } _1\}\) and \(D_5=\{{\ell } _1\}\), respectively.

At some point in time after \({\mathcal {R}}^{w,n}_1\) has updated the writer bits of \({\ell } _1\)’s rin variable, \({\mathcal {R}}^{r,nn}_4\) becomes satisfied and completes its call to the lock routine. Similarly, after \({\mathcal {R}}^{w,n}_1\) has updated \({\ell } _2\)’s rin variable, \({\mathcal {R}}^{r,n}_3\) becomes satisfied and completes its call to the lock routine. Note that overlapping critical sections for \({\ell } _1\) is expected behavior for these requests; read requests may overlap.

Once the read requests finish accessing their respective resources, they both call the unlock routine. At a future point in time, \({\mathcal {R}}^{w,nn}_5\) completes its call to the lock routine and can begin its critical section. Note that the linearization points correctly reflect mutually exclusive access for this request for \({\ell } _1\).

1.4 Constraints used in schedulability study

As mentioned in the body of the paper, we tightened the calculated blocking from the the worst-case acquisition-delay bounds presented in Table 1 by applying several constraints. We illustrate these with an example using a ticket lock, and then discuss how these can be applied to more complex protocols.

Example 13

Suppose that we are working with an application with four tasks (\(n=4\)) and six processors (\(m=6\)) in which all tasks have the same period and access the same resource. The access of this resource is protected by a ticket lock. The requests issued by these tasks are \({\mathcal {R}}^w_1,...,{\mathcal {R}}^w_4\), with critical-section lengths \(L_1=50\mu \)s, \(L_2=60\mu \)s, \(L_3=30\mu \)s, and \(L_4=20\mu \)s. When calculating the worst-case blocking that the task issuing request \({\mathcal {R}}^w_1\) may experience, we can use the analytical bound for a ticket lock of \((m-1)L_{max} ^w\), where \(L_{max} ^w=60\mu \)s. Then the worst-case blocking for \({\mathcal {R}}^w_1\) is bounded by \(5 \cdot 60 = 300\mu \)s. However this blocking can never occur in practice.

1.4.1 Period-based constraints

Based on the period of each task in the system, we can compute how many jobs (and thus requests) of that task may be active while a given request is active. Then we can tighten our analysis to select only the highest instances of critical sections that may delay a given request.

Example 13 (cont’d)

As all tasks have the same period, at most two jobs of a given task can overlap with the job of interest. Thus, the job that issues \({\mathcal {R}}^w_2\) can only issue it twice while \({\mathcal {R}}^w_1\) is waiting or executing. We can tighten our analysis to select the top \(m-1\) instances of critical sections that may delay \({\mathcal {R}}^w_1\). Our new bound on the worst-case blocking after applying period-based constraints is \(60+60+30+30+20=200\mu \)s. Note that we do not count \({\mathcal {R}}^w_1\) as potentially increasing its own blocking.

1.4.2 FIFO constraints

The write requests handled by a ticket lock are enqueued in a FIFO queue. Thus, each request can only delay a given request once.

Example 13 (cont’d)

By imposing FIFO constraints, we can bound worst-case blocking to \(60+30+20=110\mu \)s.

Example 14

We now modify the task system presented in Example A.4 by adding a read request \({\mathcal {R}}^r_5\) with \(L_5=3\mu \)s. The task that issues \({\mathcal {R}}^r_5\) has a shorter period such that it could be issued six times while a given write request is active. We switch from using a ticket lock to using a PF-TL. When considering the blocking \({\mathcal {R}}^w_1\) may experience, the above analysis still holds for the write requests. Now we must account for the blocking \({\mathcal {R}}^w_1\) may experience due to read requests.

Note that when we consider locking protocols with both read and write requests, FIFO constraints may apply to the write requests but will not apply to read requests; all reader/writer protocols we have discussed handle read requests separately from write requests to yield constant-time blocking. Because of this, a given read request may execute multiple times before a write request of interest. However, we can constrain the total number of read requests that must be included by period-based constraints and a new constraint.

1.4.3 Read-write constraints

Each protocol examined in this paper functions by alternative write and read phases in some manner. Thus, the number of read phases by which a request is blocked is constrained by the number of write phases possible, which may be limited by the above constraints.

Example 14 (cont’d)

In this scenario, only three write requests could contribute to the blocking \({\mathcal {R}}^w_1\) experiences. In the worst case, before each of these write requests (including \({\mathcal {R}}^w_1\)), a read phase could occur. Thus, we are limited to the four highest read critical-section instances. Because our read request could occur six times, its critical section can be counted for all four instances. Thus the blocking \({\mathcal {R}}^w_1\) experiences due to read requests is at most \(4 \cdot 3 = 12\mu \)s and the total blocking is at most \(110 + 12=122\mu \)s.

While we have focused on the blocking a write request may experience in the above examples, the same constraints may be applied when calculating the write and read critical sections that may block a read request.

1.4.4 Contention constraints

For both fast RW-RNLP variants, the blocking bound for non-nested write requests depends on contention. Recall that in Theorem 1 the contending requests that contributed to blocking were only other non-nested write requests for the same resource. Therefore, for each task system and each request, we use the number of contending non-nested write requests for the value of contention.

1.4.5 Constraints applied to protocols

Recall that we evaluated four protocols: the PF-TL, the RW-RNLP, the fast RW-RNLP with the \(\text {R}^3\text {LP}\), and the fast RW-RNLP with the RW-RNLP*. The PF-TL was applied as group lock for a static group of all resources.

When the period-based, FIFO, and read-write constraints are applied to the group PF-TL, the blocking computed for read and write requests may be tightened. The same analysis holds for the RW-RNLP. While the RW-RNLP is a fine-grained locking protocol, its worst-case blocking for write requests is still \((m-1)(L_{max} ^w + L_{max} ^r)\) due to potential transitive blocking chains between nested write requests. Computing these transitive blocking chains is an expensive process, and in a system with randomly requested resources, write expansion and potential chains imply that it is likely that each write request could delay any other write request. However, the FIFO manner in which the RW-RNLP functions allows us to use the FIFO constraint to limit the contribution of each write request to blocking others only once. Without expensive tighter analysis that considers possible transitive blocking chains, the tightened blocking bounds of the RW-RNLP are identical to those of the group PF-TL.

FIFO constraints can also be applied to the fast RW-RNLP, but the modular structure means that FIFO constraints cannot be applied to write requests of the opposite type. For example, a given nested write request may enter the RNLP and then be allowed to execute by the global arbitration mechanism multiple times while a non-nested write request waits behind other non-nested write requests in its ticket lock. The same scenario can occur for nested write requests. Thus the FIFO constraints can only be applied to nested write requests which share at least one resource or non-nested write requests for the same resource. For any write requests to which FIFO constraints cannot be applied, the period-based constraints can still be applied. Due to the way in which requests are grouped by the \(\text {R}^3\text {LP}\), when we account for blocking in the fast RW-RNLP with the \(\text {R}^3\text {LP}\), we add the highest critical-section length instances of non-nested write requests and nested write requests separately to get a tighter bound.

Based on the number of requests being generated in each scenario, we expect that in some scenarios, the manner in which the FIFO constraints can be applied in the context of the group PF-TL may greatly reduce the number of critical sections that may be counted toward blocking. (Because it is a group lock, each write can only effect the request of interest once by the FIFO constraint.) The FIFO constraint cannot be applied as broadly to the fine-grained nesting protocols, as some requests may delay the request of interest multiple times. This argues for tighter blocking analysis, though getting exact blocking analysis for nested resource accesses is NP-hard (Wieder and Brandenburg 2014).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nemitz, C.E., Amert, T. & Anderson, J.H. Real-time multiprocessor locks with nesting: optimizing the common case. Real-Time Syst 55, 296–348 (2019). https://doi.org/10.1007/s11241-019-09328-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11241-019-09328-w

Keywords

Navigation