# Functional Groups

The figure below illustrates the functional architecture of the Data Validation Library:

where:

• f is a transformation function producing the data under test (Output) plus some auxiliary information useful to simplify the testing phase
• v is a validating function that produces unaggregated test results; these results are at the individual feature level rather than per partition or dataset
• m is a quality metrics calculating function running on the test results;
• q is a quality assessment function calculating a set of reports, such as per data area; the Output is available based on the quality assessment results.

The architecture decouples the test running phase from the analysis phase because:

• Testing can be seen as a data transformation (v), on the data under test (Output and Auxiliary Output). In most cases, this transformation can be run incrementally using the Data Processing Library.
• Multiple assessments based on multiple metrics can be run on a single test output, on a single validation result set produced by the function v.

The two phases are performed by three components respectively:

• Incremental Testing Component -- the function v
• Quality Metrics Calculator -- the function m
• Quality Assessment Component -- the function q

## Incremental Testing Component

Depending on the input partitioning, testing is done incrementally as much as possible by:

• Accessing the minimal amount of data
• Checking the minimal amount of information based on the changes in the input and output information.

If both, the input data and test results are partitioned, and the testing task has enough locality, it should be possible to use one of the functional patterns which the Data Processing Library offers.

Tests need to be structured under the MapReduce programming model.

## Test Result

In the context of the Data Validation Library, test results are the effective result of the testing phase; not the overall test result.

For each output key, all the information is available as an iterable object that can be analyzed together.

Each partition has its own test result, which can be one of the following:

• PASS: Indicates success, where the feature satisfies the test.
• FAIL: Indicates failure, where the feature does not satisfy the test.
• ERROR: Indicates an exception, where the test did not complete due to a software error.
• INSUFFICIENT_CONTEXT: Indicates a lack of context, where the spatial context provided was insufficient for the test to complete, for example, a feature at the edge of a quad tile. Typically, a larger spatial context would result in a PASS or FAIL outcome.
• INCOMPLETE_ATTRIBUTION: Indicates a precondition was not met for the test to complete. Consequently, the feature state fails. In this case, some other test must exist to evaluate the same precondition as its primary goal and produce a FAIL outcome when it is not met.
• SKIPPED: Indicates a test has been skipped, resulting in a sequence of features could not be checked .

## Test Cases

Based on the functional architecture illustrated in the figure above, a conceptual testing is represented as follows:

Computations are done on a partitioned basis, where for each output partition the test:

• Collects relevant information to test as well as relevant auxiliary information, stored as (k, v) pairs.
• Groups this information based on the key value and then checks to ensure all the information needed to analyze the partition under discussion is available.
• Returns the result for every checked feature.

In more detail, a test could be represented via the following Scala trait:

TestTrait
trait Test[C] {
def testId: String
def map(context: (Kio, C)): Iterable[(Ko, C)]
def reduce(data: (Ko, Iterable[C])): Result
}

The test case interacts with a hypothetical general compileInFn function that calls the testing mapping phase immediately after calculating the test's context.

## Test Case Families

To share the same context, test cases writing to the same output catalog layer, according to the tiling schema, should be grouped and run together using the same compiler instance. This prevents the need to load and create the same data multiple times.

A sequence of such test cases with its own ID forms a test case family.

## Groups of Test Case Families

Different test case families may need to publish data on different layer sets due to different layer partitioning schemas. In this case, a sequence of Data Processing Library pipelines can be used to perform the testing and then write the data in a transactional way.

## Quality Metrics

Test results are analyzed according to metrics that you define and inject. Metrics are MapReduce steps carried out on the test output catalogs. You can implement these metrics using some further interfaces which are quite close to the ones defined for the testing.

### Apply Weight to Test Results

Test results are weighted depending on their perceived impact on the quality.

The following weights are available:

• CRITICAL: The failure impacts the related artifact at critical severity. Normally this is a blocking issue that needs to be checked immediately, regardless at which phase this issue appears.
• MEDIUM: The failure impacts the related artifact at medium severity.
• LOW: The failure impacts the related artifact at low severity.
• NONE: The failure lets you inject custom logic to ignore certain errors, if required.

#### Metrics Analyzers

The metrics described above represents basic functionalities exposed by the validation library, but you can also inject additional metrics.

In most of the cases, metrics calculation may be accomplished with the following dependency patterns:

• local calculation: One output tile may depend only on one or more input tiles close to it, based on a fixed law. In this case, the direct compilation patterns are suitable.
• calculation with dependencies based on data in a particular layer: In this case, the MapGroupCompiler is suitable.
##### Map and Reduce Metric Calculator

An abstraction similar to the test trait defined above abstracting map and reduce methods is devised as well like:

Metric
trait Metric[M] {
def metricId: String
def map(context: (Kio, M)): Iterable[(Ko, M)]
def reduce(data: (Ko, Iterable[M])): Result
}

Metrics could be grouped in an analogous way as done for tests.

The trait described above can also be simplified so that it maps the input key to a single output key, while leaving the context untouched.

Below is one implementation:

MapMetric
trait MapMetric[C] extends Metric[C] {

def mapKey(inKey: InKey): OutKey

final override def map(context: (InKey, C)): Iterable[(OutKey, C)] =
Iterable((mapKey(context.key), context.value))
}

#### Metrics Family

To share a context, for example to reduce the load of creating the same data multiple times, metrics can be grouped and run together using the same compiler instance. A metrics family is a sequence of metrics classes, with it's own ID.

#### Groups of Metrics Families

Due to the different layer partitioning schemas, different metrics families may need to publish data on different layer sets. For this case, you can use a sequence of Data Processing Library compilers to perform the metrics alculation and then write the data in a transactional way.

### Quality Assessment

The quality metrics provide an estimate of quality for their related features. Since different products require different levels of quality, the validation library lets you define thresholds for your metrics. This way, you can decide whether the quality level of a feature meets your product's requirements.

#### Weighted Tests Quality Assessment

Weighted tests let you perform quality assurance based on the outcome for failing tests.

You can define the assessment criteria's logic, such as:

• a threshold on the CRITICAL tests (normally 1);
• a threshold on the MEDIUM impact tests;
• a threshold on the LOW impact tests.

#### General Quality Assessment Layer

The quality assessment output catalog contains a single layer that indicates a final PASS or FAIL status of the candidate catalog.

The assessment output layer contains a single partition named "assessment".

The content is:

TestAssessmentResultType
enum TestAssessmentResultType {
PASS = 0;
FAIL = 1;
}

#### Assessment Components

The abstractions in this context are the same as the ones introduced in the previous two cases related to testing and metrics calculation.

##### Assessment Criteria

The assessment criteria is the component that performs a quality assessment on one or more metrics layers.

Below is an example trait that represents an assessment criteria:

Criteria
trait Criteria {

/** The assessment output layer */
def assessmentResultLayer: Layer.Id

// The single partition in the output layer to contain the final result */
def assessmentResultPartition: Name
def reduce(data: (OutKey, Iterable[AssessmentContext]))
(implicit logContext: LogContext): Option[TestAssessment]
}
##### Assessment Family

An assessment family aggregates multiple assessment criteria.

An abstraction for such family is:

Assessment Family
trait AssessmentFamily {

/** The quality criteria*/
def criteria: Seq[Criteria]

/** Aggregates the results */
def aggregate(qualityAssesments: Seq[(Boolean, String)]): (Boolean, String)
}
}