Abstract
Dataflow applications, such as machine learning algorithms, can run for days, making it desirable to have assurances that they will work correctly. Current tools are not good enough: too often the interactions between tasks are not type-safe, leading to undesirable runtime errors. This paper presents a new declarative Haskell Embedded DSL (eDSL) for dataflow programming: \(\textsf {CircuitFlow}\). Defined as a Symmetric Monoidal Preorder (SMP) on data that models dependencies in the workflow, it has a strong mathematical basis, refocusing on how data flows through an application, resulting in a more expressive solution that not only catches errors statically, but also achieves competitive run-time performance. In our preliminary evaluation, \(\textsf {CircuitFlow}\) outperforms the industry-leading Luigi library of Spotify by scaling better with the number of inputs. The innovative creation of \(\textsf {CircuitFlow}\) is also of note, exemplifying how to create a modular eDSL whose semantics necessitates effects, and where storing complex type information for program correctness is paramount.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283. USENIX Association, Savannah, November 2016
Apache: Airflow. http://airflow.apache.org
Bahr, P., Hvitved, T.: Compositional data types. In: Proceedings of the Seventh ACM SIGPLAN Workshop on Generic Programming, WGP 2011, pp. 83–94. Association for Computing Machinery, New York (2011)
Bernardy, J.P., Spiwack, A.: Evaluating linear functions to symmetric monoidal categories. In: Proceedings of the 14th ACM SIGPLAN International Symposium on Haskell, Haskell 2021, pp. 14–26. Association for Computing Machinery, New York (2021)
Chambers, C., et al.: Easy, efficient data-parallel pipelines. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2 Penn Plaza, Suite 701 New York, NY 10121–070, pp. 363–3751 (2010)
Coecke, B., Fritz, T., Spekkens, R.W.: A mathematical theory of resources. Inf. Comput. 250, 59–86 (2016)
Dennis, J.B., Misunas, D.P.: A preliminary architecture for a basic data-flow processor. In: Proceedings of the 2nd Annual Symposium on Computer Architecture, ISCA 1975, pp. 126–132. Association for Computing Machinery, New York (1974)
Eisenberg, R.A., Vytiniotis, D., Peyton Jones, S., Weirich, S.: Closed type families with overlapping equations. In: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2014, pp. 671–683. Association for Computing Machinery, New York (2014)
Eisenberg, R.A., Weirich, S.: Dependently typed programming with singletons. In: Proceedings of the 2012 Haskell Symposium, Haskell 2012, pp. 117–130. Association for Computing Machinery, New York (2012)
Erdmann, M., Fischer, B., Fischer, R., Rieger, M.: Design and execution of make-like, distributed analyses based on spotify’s pipelining package Luigi. J. Phys. Conf. Ser. 898, 072047 (2017)
Evans, R., Frohlich, S., Wang, M.: CircuitFlow: a domain specific language for dataflow programming (with appendices) (2021)
Fokkinga, M.: Monadic maps and folds for arbitrary datatypes. Memoranda Informatica (94–28), June 1994. Imported from EWI/DB PMS [db-utwente:tech:0000003538]
Gibbons, J., Wu, N.: Folding domain-specific languages: deep and shallow embeddings (functional pearl). In: Proceedings of the ACM SIGPLAN International Conference on Functional Programming, ICFP 49, August 2014
Gonzalez, G.: Pipes. https://hackage.haskell.org/package/pipes
Hils, D.D.: Visual languages and computing survey: data flow visual programming languages. J. Vis. Lang. Comput. 3, 69–101 (1992)
Hughes, J.: Generalising monads to arrows. Sci. Comput. Program. 37(1), 67–111 (2000)
Inc, A.: Quartz composer user guide, July 2007
Johann, P., Ghani, N.: Foundations for structured programming with GADTs. In: Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2008, pp. 297–308. Association for Computing Machinery, New York (2008)
Kiselyov, O., Lämmel, R., Schupke, K.: Strongly typed heterogeneous collections. In: Proceedings of the 2004 ACM SIGPLAN Workshop on Haskell, Haskell 2004, pp. 96–107. Association for Computing Machinery, New York (2004)
Kotliar, M., Kartashov, A.V., Barski, A.: CWL-Airflow: a lightweight pipeline manager supporting Common Workflow Language. GigaScience 8(7), giz084 (2019)
Lampa, S., Dahlö, M., Alvarsson, J., Spjuth, O.: SciPipe: a workflow library for agile development of complex and dynamic bioinformatics pipelines. GigaScience 8(5), giz044 (2019)
Lee, E.A., Parks, T.M.: Dataflow process networks. Proc. IEEE 83(5), 773–801 (1995)
Maries, I.C.: Time. https://pypi.org/project/pytest-benchmark/
Marlow, S., Brandy, L., Coens, J., Purdy, J.: There is no fork: an abstraction for efficient, concurrent, and concise data access. In: Proceedings of the 19th ACM SIGPLAN International Conference on Functional Programming, ICFP 2014, pp. 325–337. Association for Computing Machinery, New York (2014)
Marlow, S., Peyton Jones, S., Kmett, E., Mokhov, A.: Desugaring Haskell’s do-notation into applicative operations. SIGPLAN Not. 51(12), 92–104 (2016)
McBride, C.: Functional pearl: Kleisli arrows of outrageous fortune. J. Funct. Program (2011, accepted for publication)
Mcbride, C., Paterson, R.: Applicative programming with effects. J. Funct. Program. 18(1), 1–13 (2008)
Murray, D., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pp. 439–455. ACM, November 2013
O’Sullivan, B.: Criterion. http://www.serpentine.com/criterion/
Parès, Y., Bernardy, J.P., Eisenberg, R.A.: Composing effects into tasks and workflows. In: Proceedings of the 13th ACM SIGPLAN International Symposium on Haskell, Haskell 2020, pp. 80–94. Association for Computing Machinery, New York (2020)
Paterson, R.: A new notation for arrows. In: International Conference on Functional Programming, pp. 229–240. ACM Press, September 2001
Schrijvers, T., Peyton Jones, S., Chakravarty, M., Sulzmann, M.: Type checking with open type functions. In: Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming, ICFP 2008, pp. 51–62. Association for Computing Machinery, New York (2008)
Spotify: Spotify: Luigi. https://github.com/spotify/luigi
Spotify: Tasks, April 2020. https://luigi.readthedocs.io/en/stable/tasks.html
Svenningsson, J., Axelsson, E.: Combining deep and shallow embedding of domain-specific languages. Comput. Lang. Syst. Struct. 44, 143–165 (2015). sI: TFP 2011/12
Swierstra, W.: Data types á la carte. J. Funct. Program. 18(4), 423–436 (2008)
Wadler, P.: The expression problem, November 1998
Willis, J., Wu, N., Pickering, M.: Staged selective parser combinators. Proc. ACM Program. Lang. 4(ICFP), 1–30 (2020)
Yorgey, B.A., Weirich, S., Cretin, J., Peyton Jones, S., Vytiniotis, D., Magalhães, J.P.: Giving haskell a promotion. In: Proceedings of the 8th ACM SIGPLAN Workshop on Types in Language Design and Implementation, TLDI 2012, pp. 53–66. Association for Computing Machinery, New York (2012)
Yu, Y., et al.: DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI 2008, pp. 1–14. USENIX Association, USA (2008)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, USA (2010)
Acknowledgements
The authors would like to thank Jamie Willis for his insights while creating \(\textsf {CircuitFlow}\) and the anonymous reviewers for their constructive and helpful comments.
The work is partly supported by EPSRC Grant EXHIBIT: Expressive High-Level Languages for Bidirectional Transformations (EP/T008911/1) and Royal Society Grant Bidirectional Compiler for Software Evolution (IESR3170104).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Evans, R., Frohlich, S., Wang, M. (2022). \(\textsf {CircuitFlow}\): A Domain Specific Language for Dataflow Programming. In: Cheney, J., Perri, S. (eds) Practical Aspects of Declarative Languages. PADL 2022. Lecture Notes in Computer Science(), vol 13165. Springer, Cham. https://doi.org/10.1007/978-3-030-94479-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-94479-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-94478-0
Online ISBN: 978-3-030-94479-7
eBook Packages: Computer ScienceComputer Science (R0)