Data API

The Data API is a REST interface that provides access to data and data management functions. To simultaneously provide robustness, scalability, and efficient storage capabilities and flexible query possibilities, the Data API design is based on a simple principle - separating data (blobs) from metadata (partitions).

The following diagram shows the standard flow of how data is typically stored in the Data API.

Data consumer flow:

  • Fetch blobs (data) for each partition
  • Fetch blobs (data) for each partition
  • Discover metadata (partitions)
  • Discover metadata (partitions)

Data producer flow:

  • Publish metadata
  • Publish metadata
  • Publish data, collect metadata
  • Publish data, collect metadata

As you can see, uploading or retrieving data is usually a two-step process. To retrieve data, applications first need to discover blob IDs (dataHandle) by querying Metadata API, Query API or Index API and later fetch data referenced by these dataHandles via Blob API or Volatile Blob API.

For uploading data to the Data API, the process is reversed application first upload data to Blob API or Volatile Blob API collecting metadata and as a 2nd step upload metadata (in batches) via Publish API.

To achieve better performance in some use cases, it is possible to combine metadata and data together and pass it as single message, like in stream processing, or entirely skip discover metadata step by preloading / caching required “working set”.

The Data API consists of the following APIs:

  • API Lookup
  • Config API
  • APIs to work with metadata
  • Metadata API
  • Index API
  • Query API
  • Publish API
  • APIs to work with data
  • Blob API
  • Volatile Blob API
  • APIs to work with streams
  • Ingest API
  • Notification APIs
  • Stream API

Let’s have a detailed look into each set of an APIs.

results matching ""

    No results matching ""