Sometimes, you may have tasks that only need a subset of the input layer partitions to produce their output layers. In such cases, it is usually desirable to exclude unnecessary partition keys as early as possible in the execution process to avoid wasting CPU, memory, and network bandwidth.
While it is possible to manually filter the input keys in the compiler's front-end implementation for most compilation patterns, for some patterns, such as the RefTreeCompiler, it can be difficult to know if a key should be filtered, or not, since we typically do not know if a subject or a referenced partition is being processed.
For these cases, the library provides a way to configure a partition filter to select partition keys and metadata from the list of partitions to process. This configuration works as follows:
- For the RefTreeCompiler, only the subject partitions are filtered. This is something to remember, particularly if an input layer is itself the product of an upstream compiler in a multi-compiler driver task with filtering applied. If a compiler down the chain references neighbor partition names from a layer produced by an upstream compiler, make sure that the filtering is permissive enough to output all partitions to be referenced by any following compilation in the driver task. You can use the
byId
executor config to configure a different filter for a compiler. - The
here.platform.data-processing.executors.partitionKeyFilters
configuration is part of fingerprints, which means that changing this configuration triggers a non-incremental run for the compilation to remain deterministic. - Partition key filters can also be used with the
DeltaSet
interface. Partition key filters defined in here.platform.data-processing.executors.partitionKeyFilters
are used in all query
and readBack
DeltaSet transformations to filter the input partitions.
Partition key filtering can be configured in the application.conf
. For example, this filter can configure a task to only process partitions contained within a given latitude/longitude bounding box:
here.platform.data-processing.executors.partitionKeyFilters = [
{
className = "BoundingBoxFilter"
param.boundingBox { north = 24.8, south = 24.68, east = 121.8, west = 121.7 }
}
]
The root of the property is a list, multiple filters specified at this level are combined as their union (OR
logic). If there is a need to combine them with AND
logic, they can be put under a single AndFilter
at the root.
This is the list of built-in filters:
BoundingBoxFilter
AllowListFilter
AndFilter
OrFilter
NotFilter
A custom filter can also be applied from applications by extending the PartitionKeyFilter
. These filters return either true or false from shouldProcess
. Boolean logic is applied to combine them up to the root filter, such as:
-
Or
of two bounding boxes is a union -
And
of two bounding boxes is an intersection -
Not
of two bounding boxes means that only partitions outside of the underlying bounding box are considered
Filters can only be applied on partition key parameters, for example, catalog and layer IDs, and partition name.
Note
For performance reasons, filters do not filter based on partition metadata or payloads.
Override Filters for Specific Compilers
If one of the compilers needs a different partition filter applied, you can use the byId
mechanism to configure a different set of filter for the executor that wraps it. Overriding these filters means replacing the whole default set of filters, as they are not combined.
Example:
here.platform.data-processing.executors.partitionKeyFilters = [
{
className = "BoundingBoxFilter"
param.boundingBox { north = 24.8, south = 24.68, east = 121.8, west = 121.7 }
}
]
here.platform.data-processing.executors.byId {
intermediate-compiler.partitionKeyFilters = [
{
className = "BoundingBoxFilter"
param.boundingBox { north = 24.9, south = 24.58, east = 121.9, west = 121.6 }
}
]
another-compiler.partitionKeyFilters = []
}
Apply a Filter to Specific Layers Only
The AllowListFilter
can be used to filter based on a fixed list of partition names. When combined with Boolean operation filters, AllowListFilter
can also be used to apply some filters to specific layers only.
For example, to apply a bounding box only to inLayer
of inCatalog
, you can configure your application as follows:
here.platform.data-processing.executors.partitionKeyFilters = [
{
className = "NotFilter"
param.operand = {
className = "AllowListFilter"
param.catalogsAndLayers = {"inCatalog": ["inLayer"]}
}
},
{
className = "AndFilter"
param.operands = [
{
className = "AllowListFilter"
param.catalogsAndLayers = {"inCatalog": ["inLayer"]}
}, {
className = "BoundingBoxFilter"
param.boundingBox { north = 2.8, south = 2.68, east = 121.8, west = 121.7 }
}
]
}
]
For additional information about configuring partition key filters see Configure The Library.