Get data from an index layer
Note
Currently, the blob
service supports REST API versions v1
and v2
. Version v1
should be used to access versioned, index and stream (if the stream payload is larger than 1MB) layers. Version v2
should be used to access the object store layer. Always pick the proper API version from API Lookup to ensure you get back the correct API version response. For instructions, see the API Lookup Developer's Guide.
An index layer is an index of the catalog’s data. You can query the index layer to get the data handles of data that meets your query criteria, and you can then use those data handles to retrieve the corresponding data. For example, one use of an index layer is to archive data from a stream layer so you can query it. For more information, see Archive stream data.
The typical flow for getting indexed data consists of the following steps.
- Obtain an authorization token.
- Use the
api-lookup
v1
API to get API base URLs. - Use the
index
v1
API to query the index to get the data handles for the data that matches your query. - Use the
blob
v1
API to get the corresponding data.
Obtain an authorization token
Obtain an authorization token for your HTTP requests. For instructions, see the Identity & Access Management Guide.
Get API base URLs
Use the API Lookup service to get:
- The API endpoints for the
index
v1
API of the catalog containing the index layer you want to query. - The API endpoints for the
blob
v1
API for the catalog containing the data you want to get.
For instructions, see the API Lookup Developer's Guide.
Get the data handle
In order to get data, you must obtain the data handle of the data that meets your query criteria. To get the data handle, use the index
API ("query" request parameter).
GET /<Base path for the index API from the API Lookup Service>/layers/<Layer ID>?query=<RSQL> HTTP/1.1
Host: <Hostname for the index API from the API Lookup Service>
Authorization: Bearer <Authorization Token>
Accept: application/json
Cache-Control: no-cache
Note the following:
- The
index
API supports the following RSQL operators: "==", "!=", ">", ">=", "<", "<=", "=in=", "=out=", "=inboundingbox=", "=incircle=", "=inpolygon=" and "=incountry=". For information on RSQL, see: https://github.com/jirutka/rsql-parser. Uppercase operators are deprecated. - A single query cannot contain more than one occurrence of the "=inpolygon=" and "=incircle=" operators.
- The "=inpolygon=" and "=incircle=" operators cannot be used in a query where the "OR" operator is present.
- The "=incountry=" operator enables you to quickly and efficiently query all tiles within a country's borders as defined by the ISO country code. This operator supports querying in a maximum of one country. You can combine multiple "=incountry=" operators to query in multiple countries using the "and" or "or" operators.1
- The list of elements following the "=in=" operator should be small. Large lists may result in unexpected behavior. For example, URL length limits might be hit, causing the request to fail.
- The
timestamp
and timewindow
indexing attributes cannot be used in the same query. - There is a buffer time of 2 seconds between when data is written to the index layer and when it is available to query.
- Use the
timewindow
indexing attribute in the query. - To get results back within a short period of time, query based on
timewindow
can be split so that the data in the timewindow range are manageable (100,000 or less). - Limit the usage of additional constraints.
When you run a query for a long time without getting any response, you can use the Part query support option.
Example RSQL queries:
ingestionTime==1552381200000;tile=inboundingbox=(52.52,52.51,13.31,13.30)
ingestionTime==1552381200000;tile=incircle=(52.52,13.30,500)
ingestionTime==1552381200000;tile=inpolygon=(13.31,52.52,13.30,52.51,13.29,52.53)
ingestionTime==1552381200000;tile=incountry=DEU
For spatial queries which can only be computed on heretile
indexing attributes, arguments need to be specified as follows:
inboundingbox=(northLatitude, southLatitude, eastLongitude, westLongitude)
incircle=(focusLatitude, focusLongitude, radiusInMeters)
inpolygon=(longitude1, latitude1, longitude2, latitude2, ...)
incountry=isoCountryCode
Note
- The maximum length of the request line is 8k bytes.
- Bounding box function parameters should use the following sequence (north, south, east, west), north >= south.
- Only tiles with zoom levels between 6 and 14 are returned by the =incountry= operator.
- The country code associated with a tile is only guaranteed to be accurate at the time of insertion. In other words, if borders that involve the tile are changed after data is inserted, there is no guarantee that the associated country code will be updated retroactively.
- The granularity of country codes is zoom level 12. This means that while country code association is supported up to zoom levels 13 and 14, the country codes associated with tiles in zoom level 13 and 14 reflect the borders from zoom level 12.
-
1. When querying using the =incountry= operator, only valid ISO 3166-1 alpha-3 country codes are supported. For example "USA" for the United States of America, "GBR" for the United Kingdom of Great Britain and Northern Ireland and "DEU" for Germany. ↩
This query produces the following response which contains an array of index metadata:
{
"data": [
{
"id": "e9e05a2b-25d1-415d-bc6a-14a1be626c9a",
"size": 155,
"checksum": "28271214-1532-4cb3-9cd7-35bef1735055",
"metadata": "{}",
"timestamp": 1552383033000,
"ingestionTime": 1552381200000,
"tile": 23618359
}
]
}
Note
In the example above, hour
and tile
are user-defined indexing attributes. The id
field is the data handle.
The metadata field is stored as a string and therefore the returned value is a string and is not unwrapped.
Example: Query on a timewindow
attribute
The timewindow
is a time slice, not just a point in time. This time slice is defined by the attribute's duration
field at the time of index layer creation. Value ofduration
can range from 10 minutes to 24 hours (1440 minutes).
Let's assume your index layer has a timewindow
attribute with the name ingestionTime
and a duration
of 60 minutes.
You want to upload following indexes:
- Index-1 with an
ingestionTime
value of 1552383031000 (In GMT this is March 12, 2019 09:30:31 AM) - Index-2 with an
ingestionTime
value of 1552386633000 (In GMT this is March 12, 2019 10:30:33 AM) - Index-3 with an
ingestionTime
value of 1552388398000 (In GMT this is March 12, 2019 10:59:58 AM)
Because index layer stores the timewindow
attribute value at the finest granularity decided by the duration
field, your indexes will be stored as follows:
- Index-1 with an
ingestionTime
value of 1552381200000 (In GMT this is March 12, 2019 09:00:00 AM) - Index-2 with an
ingestionTime
value of 1552384800000 (In GMT this is March 12, 2019 10:00:00 AM) - Index-3 with an
ingestionTime
value of 1552384800000 (In GMT this is March 12, 2019 10:00:00 AM)
You can query the ingestionTime
attribute in multiple ways. For example, to query for a specific time slice (based on your duration
field):
ingestionTime==1552381200000
Above query will produce following response (Index-1 is returned):
{
"data": [
{
"id": "e9e05a2b-25d1-415d-bc6a-14a1be626c9a",
"size": 155,
"checksum": "28271214-1532-4cb3-9cd7-35bef1735055",
"metadata": "{}",
"timestamp": 1552383033000,
"ingestionTime": 1552381200000,
"tile": 23618359
}
]
}
To query for a specific time slice (based on your duration
field) using time range:
ingestionTime>1552382100000;ingestionTime<1552385700000
Above query will produce the following response:
- Index-1 with value 1552383031000 (March 12, 2019 09:30:31 AM) which is greater than 1552382100000 (March 12, 2019 9:15:00 AM) will not be returned as Index-1 is stored with truncated value 1552381200000 (March 12, 2019 09:00:00 AM) in the index layer.
- Index-2 with value 1552386633000 (March 12, 2019 10:30:33 AM) which is greater than 1552385700000 (March 12, 2019 10:15:00 AM) will be returned as Index-2 is stored with truncated value 1552384800000 (March 12, 2019 10:00:00 AM) in the index layer.
- Index-3 with value 1552388398000 (March 12, 2019 10:59:58 AM) which is greater than 1552385700000 (March 12, 2019 10:15:00 AM) will be returned as Index-3 is stored with truncated value 1552384800000 (March 12, 2019 10:00:00 AM) in the index layer.
{
"data": [
{
"id": "c291c4c3-8603-472b-a828-63ab594146c4",
"size": 132,
"checksum": "a6feb574-50b6-4162-906d-ecbfedf8a248",
"metadata": "{}",
"timestamp": 1552386672000,
"ingestionTime": 1552384800000,
"tile": 23618359
},
{
"id": "22bc518c-5797-4c77-a487-ce346dfd7ac5",
"size": 289,
"checksum": "e162582f-d21a-4742-a076-1beeae0d8b7b",
"metadata": "{}",
"timestamp": 1552388403000,
"ingestionTime": 1552384800000,
"tile": 23618359
}
]
}
Note
- In the examples above,
ingestionTime
is the only user-defined indexing attribute. The value of ingestionTime
is truncated according to the time slice selected by duration
field of the index layer. The id
field is the data handle. The timestamp
field is the time when data was inserted in index layer. - It is not recommended to query the data based on
timestamp
field as it could vary from the value of ingestionTime
before truncation. Eg:- For Index-3 (id: 22bc518c-5797-4c77-a487-ce346dfd7ac5), ingestionTime
value before truncation was 1552388398000 (March 12, 2019 10:59:58 AM) and timestamp
value was 1552388403000 (March 12, 2019 11:00:03 AM). - Queries on
checksum
and metadata
fields are prohibited.
Get data
Now that you have the index metadata, use the data handle to retrieve data using the blob
API:
GET /<Base path for the blob API from the API Lookup Service>/layers/<Layer ID>/data/<Data Handle> HTTP/1.1
Host: <Hostname for the blob API from the API Lookup Service>
Authorization: Bearer <Authorization Token>
Cache-Control: no-cache
The response consists of the data that was uploaded most recently to the given data handle.
Note
We recommend that your application includes retry logic for handling HTTP 5xx errors. Use exponential backoff in the retry logic.