Get data from an index layer

Note

Currently, the blob service supports REST API versions v1 and v2. Version v1 should be used to access versioned, index and stream (if the stream payload is larger than 1MB) layers. Version v2 should be used to access the object store layer. Always pick the proper API version from API Lookup to ensure you get back the correct API version response. For instructions, see the API Lookup Developer's Guide.

An index layer is an index of the catalog’s data. You can query the index layer to get the data handles of data that meets your query criteria, and you can then use those data handles to retrieve the corresponding data. For example, one use of an index layer is to archive data from a stream layer so you can query it. For more information, see Archive stream data.

The typical flow for getting indexed data consists of the following steps.

  1. Obtain an authorization token.
  2. Use the api-lookup v1 API to get API base URLs.
  3. Use the index v1 API to query the index to get the data handles for the data that matches your query.
  4. Use the blob v1 API to get the corresponding data.

Obtain an authorization token

Obtain an authorization token for your HTTP requests. For instructions, see the Identity & Access Management Guide.

Get API base URLs

Use the API Lookup service to get:

  • The API endpoints for the index v1 API of the catalog containing the index layer you want to query.
  • The API endpoints for the blob v1 API for the catalog containing the data you want to get.

For instructions, see the API Lookup Developer's Guide.

Get the data handle

In order to get data, you must obtain the data handle of the data that meets your query criteria. To get the data handle, use the index API ("query" request parameter).

GET /<Base path for the index API from the API Lookup Service>/layers/<Layer ID>?query=<RSQL> HTTP/1.1
Host: <Hostname for the index API from the API Lookup Service>
Authorization: Bearer <Authorization Token>
Accept: application/json
Cache-Control: no-cache

Note the following:

  • The index API supports the following RSQL operators: "==", "!=", ">", ">=", "<", "<=", "=in=", "=out=", "=inboundingbox=", "=incircle=", "=inpolygon=" and "=incountry=". For information on RSQL, see: https://github.com/jirutka/rsql-parser. Uppercase operators are deprecated.
  • A single query cannot contain more than one occurrence of the "=inpolygon=" and "=incircle=" operators.
  • The "=inpolygon=" and "=incircle=" operators cannot be used in a query where the "OR" operator is present.
  • The "=incountry=" operator enables you to quickly and efficiently query all tiles within a country's borders as defined by the ISO country code. This operator supports querying in a maximum of one country. You can combine multiple "=incountry=" operators to query in multiple countries using the "and" or "or" operators.1
  • The list of elements following the "=in=" operator should be small. Large lists may result in unexpected behavior. For example, URL length limits might be hit, causing the request to fail.
  • The timestamp and timewindow indexing attributes cannot be used in the same query.
  • There is a buffer time of 2 seconds between when data is written to the index layer and when it is available to query.

Query Performance Tips

  • Use the timewindow indexing attribute in the query.
  • To get results back within a short period of time, query based on timewindow can be split so that the data in the timewindow range are manageable (100,000 or less).
  • Limit the usage of additional constraints.

When you run a query for a long time without getting any response, you can use the Part query support option.

Example RSQL queries:

ingestionTime==1552381200000;tile=inboundingbox=(52.52,52.51,13.31,13.30)
ingestionTime==1552381200000;tile=incircle=(52.52,13.30,500)
ingestionTime==1552381200000;tile=inpolygon=(13.31,52.52,13.30,52.51,13.29,52.53)
ingestionTime==1552381200000;tile=incountry=DEU

For spatial queries which can only be computed on heretile indexing attributes, arguments need to be specified as follows:

inboundingbox=(northLatitude, southLatitude, eastLongitude, westLongitude)
incircle=(focusLatitude, focusLongitude, radiusInMeters)
inpolygon=(longitude1, latitude1, longitude2, latitude2, ...)
incountry=isoCountryCode

Note

  • The maximum length of the request line is 8k bytes.
  • Bounding box function parameters should use the following sequence (north, south, east, west), north >= south.
  • Only tiles with zoom levels between 6 and 14 are returned by the =incountry= operator.
  • The country code associated with a tile is only guaranteed to be accurate at the time of insertion. In other words, if borders that involve the tile are changed after data is inserted, there is no guarantee that the associated country code will be updated retroactively.
  • The granularity of country codes is zoom level 12. This means that while country code association is supported up to zoom levels 13 and 14, the country codes associated with tiles in zoom level 13 and 14 reflect the borders from zoom level 12.
  • 1. When querying using the =incountry= operator, only valid ISO 3166-1 alpha-3 country codes are supported. For example "USA" for the United States of America, "GBR" for the United Kingdom of Great Britain and Northern Ireland and "DEU" for Germany.

This query produces the following response which contains an array of index metadata:

{
  "data": [
    {
      "id": "e9e05a2b-25d1-415d-bc6a-14a1be626c9a",
      "size": 155,
      "checksum": "28271214-1532-4cb3-9cd7-35bef1735055",
      "metadata": "{}",
      "timestamp": 1552383033000,
      "ingestionTime": 1552381200000,
      "tile": 23618359
    }
  ]
}

Note

In the example above, hour and tile are user-defined indexing attributes. The id field is the data handle.

The metadata field is stored as a string and therefore the returned value is a string and is not unwrapped.

Example: Query on a timewindow attribute

The timewindow is a time slice, not just a point in time. This time slice is defined by the attribute's duration field at the time of index layer creation. Value ofduration can range from 10 minutes to 24 hours (1440 minutes).

Let's assume your index layer has a timewindow attribute with the name ingestionTime and a duration of 60 minutes.

You want to upload following indexes:

  • Index-1 with an ingestionTime value of 1552383031000 (In GMT this is March 12, 2019 09:30:31 AM)
  • Index-2 with an ingestionTime value of 1552386633000 (In GMT this is March 12, 2019 10:30:33 AM)
  • Index-3 with an ingestionTime value of 1552388398000 (In GMT this is March 12, 2019 10:59:58 AM)

Because index layer stores the timewindow attribute value at the finest granularity decided by the duration field, your indexes will be stored as follows:

  • Index-1 with an ingestionTime value of 1552381200000 (In GMT this is March 12, 2019 09:00:00 AM)
  • Index-2 with an ingestionTime value of 1552384800000 (In GMT this is March 12, 2019 10:00:00 AM)
  • Index-3 with an ingestionTime value of 1552384800000 (In GMT this is March 12, 2019 10:00:00 AM)

You can query the ingestionTime attribute in multiple ways. For example, to query for a specific time slice (based on your duration field):

ingestionTime==1552381200000 #(Equal to March 12, 2019 09:00:00 AM)

Above query will produce following response (Index-1 is returned):

{
  "data": [
    {
      "id": "e9e05a2b-25d1-415d-bc6a-14a1be626c9a",
      "size": 155,
      "checksum": "28271214-1532-4cb3-9cd7-35bef1735055",
      "metadata": "{}",
      "timestamp": 1552383033000,
      "ingestionTime": 1552381200000,
      "tile": 23618359
    }
  ]
}

To query for a specific time slice (based on your duration field) using time range:

ingestionTime>1552382100000;ingestionTime<1552385700000 #(Greater than March 12, 2019 9:15:00 AM AND Lesser than March 12, 2019 10:15:00 AM)

Above query will produce the following response:

  • Index-1 with value 1552383031000 (March 12, 2019 09:30:31 AM) which is greater than 1552382100000 (March 12, 2019 9:15:00 AM) will not be returned as Index-1 is stored with truncated value 1552381200000 (March 12, 2019 09:00:00 AM) in the index layer.
  • Index-2 with value 1552386633000 (March 12, 2019 10:30:33 AM) which is greater than 1552385700000 (March 12, 2019 10:15:00 AM) will be returned as Index-2 is stored with truncated value 1552384800000 (March 12, 2019 10:00:00 AM) in the index layer.
  • Index-3 with value 1552388398000 (March 12, 2019 10:59:58 AM) which is greater than 1552385700000 (March 12, 2019 10:15:00 AM) will be returned as Index-3 is stored with truncated value 1552384800000 (March 12, 2019 10:00:00 AM) in the index layer.
{
  "data": [
    {
      "id": "c291c4c3-8603-472b-a828-63ab594146c4",
      "size": 132,
      "checksum": "a6feb574-50b6-4162-906d-ecbfedf8a248",
      "metadata": "{}",
      "timestamp": 1552386672000,
      "ingestionTime": 1552384800000,
      "tile": 23618359

    },
    {
      "id": "22bc518c-5797-4c77-a487-ce346dfd7ac5",
      "size": 289,
      "checksum": "e162582f-d21a-4742-a076-1beeae0d8b7b",
      "metadata": "{}",
      "timestamp": 1552388403000,
      "ingestionTime": 1552384800000,
      "tile": 23618359
    }
  ]
}

Note

  • In the examples above, ingestionTime is the only user-defined indexing attribute. The value of ingestionTime is truncated according to the time slice selected by duration field of the index layer. The id field is the data handle. The timestamp field is the time when data was inserted in index layer.
  • It is not recommended to query the data based on timestamp field as it could vary from the value of ingestionTime before truncation. Eg:- For Index-3 (id: 22bc518c-5797-4c77-a487-ce346dfd7ac5), ingestionTime value before truncation was 1552388398000 (March 12, 2019 10:59:58 AM) and timestamp value was 1552388403000 (March 12, 2019 11:00:03 AM).
  • Queries on checksum and metadata fields are prohibited.

Get data

Now that you have the index metadata, use the data handle to retrieve data using the blob API:

GET /<Base path for the blob API from the API Lookup Service>/layers/<Layer ID>/data/<Data Handle> HTTP/1.1
Host: <Hostname for the blob API from the API Lookup Service>
Authorization: Bearer <Authorization Token>
Cache-Control: no-cache

The response consists of the data that was uploaded most recently to the given data handle.

Note

We recommend that your application includes retry logic for handling HTTP 5xx errors. Use exponential backoff in the retry logic.

results matching ""

    No results matching ""