One key feature in the Data Processing Library are its compilation patterns. These patterns guide you to implement incremental distributed compilers. Each task executes one compiler, which can either consist of your code only or a combination of your code with one of the patterns provided.
There are two types of patterns:
- Functional patterns: provide specific interfaces that guide you to develop the compiler in a more precise way. In these patterns, Spark is hidden in the pattern implementation and the compiler focuses on the business logic. The processing library takes care of the distributed processing and incremental compilation details.
- Spark RDD-based patterns: expose Spark RDDs, allowing the compiler implementation to perform parallel operations on data and metadata using Spark, such as join, cogroup, filter, or map. In these patterns, the interfaces are less rigid and you may need to actively support incremental compilation.
Table 1: Compilation Patterns Overview
|Compiler Class ||Incremental Processing ||References to Other Tiles ||Global Algorithms ||Functional or RDD-Based ||Complexity |
|DirectCompiler ||Yes ||No ||No ||Functional ||Simple |
|MapGroupCompiler ||Yes ||No ||No ||Functional ||Simple |
|RefTreeCompiler ||Yes ||Yes ||No ||Functional ||Medium |
|NonIncrementalCompiler ||No ||Yes ||Yes ||RDD ||Simple |
|DepCompiler ||Partially ||No ||Yes ||RDD ||Medium |
|IncrementalDepCompiler ||Yes ||No ||Yes ||RDD ||Complex |
Note: Where possible, it is recommended to use functional patterns instead of Spark RDD-based patterns.