The Data API is a REST interface that provides access to data and data management functions. To simultaneously provide robustness, scalability, and efficient storage capabilities and flexible query possibilities, the Data API design is based on a simple principle - separating data (blobs) from metadata (partitions).
The following diagram shows the standard flow of how data is typically stored in the Data API.
Data consumer flow:
Data producer flow:
As you can see, uploading or retrieving data is usually a two-step process. To retrieve data, applications first need to discover blob IDs (dataHandle) by querying Metadata API, Query API or Index API and later fetch data referenced by these dataHandles via Blob API or Volatile Blob API.
For uploading data to the Data API, the process is reversed application first upload data to Blob API or Volatile Blob API collecting metadata and as a 2nd step upload metadata (in batches) via Publish API.
To achieve better performance in some use cases, it is possible to combine metadata and data together and pass it as single message, like in stream processing, or entirely skip discover metadata step by preloading / caching required “working set”.
The Data API consists of the following APIs:
Let’s have a detailed look into each set of an APIs.