Batch Processing Module

At the moment the only available algorithm is:

libraryDependencies ++= Seq(
  "" %% "location-spark" % "0.21.139"
dependencies {
    compile group: '', name: 'location-spark_2.11', version:'0.21.139'


The clustering algorithm implements a distributed version of DBScan. It clusters geographically spread, geolocated items. To exemplify what the algorithm does, see the following image:

Clustering example
Figure 1. Clustering example

The input data contains different trips and events reported along them. These events are marked in green. The image shows these clusters. For each of them, an instance of Cluster is returned. The blue markers represent the cluster centers.

You can use DistributedClustering as follows:

import{Cluster, DistributedClustering}
import org.apache.spark.rdd.RDD

val events: RDD[Event] = mapPointsToEvents(sensorData)

val dc = new DistributedClustering[Event](neighborhoodRadiusInMeters = 20.0,
                                          minNeighbors = 3,
                                          partitionBufferZoneInMeters = 125.0)

val clusters: RDD[Cluster[EventWithPosition]] = dc(events)

val result = clusters.collect()

// Print some statistics
val clusterCount = result.length
val clusteredEvents =
println(s"Found $clusterCount clusters")
println(s"Found $clusteredEvents events in total.")
println(s"An average of ${clusteredEvents / clusterCount} event per cluster")

You need an implicit instance of GeoCoordinateOperations to extract a GeoLocation that corresponds to your own Event type.

results matching ""

    No results matching ""