Huge Query Support
Index queries support huge
queries, such as a query that returns 5 million records. You should set huge=true
in the request when either:
- You expect results of more than 100,000 records.
- There is a timeout error or no response for a few minutes.
To understand the effect of setting huge=true
, consider how a query works if you set huge=false
(or if you omit the huge
parameter):
- The query is executed with a limit of 100,000, so only the first 100,000 results that match the query are returned. This is called a probe query.
- If the probe query returns fewer than 100,000 results, no other queries are performed.
- If the probe query returns 100,000 results, then the query is split into multiple smaller "split" queries. The results of each split query are streamed to the user. This step is a full scan of all the records in the index layer.
If you set huge=true
, the service performs split queries without performing the probe query. This prevents the query from running a long time without returning any response, so you begin to receive streaming responses sooner.
The format of a huge
query is:
GET /<Base path for the index API from the API Lookup Service>/layers/<Layer ID>?query=<RSQL>&huge=true HTTP/1.1
Note
- Setting
huge=true
forces a full scan of all records in the index layer. - Using the huge parameter does not guarantee that the query will finish faster. It only prevents the query from running for a long time without returning a response. With
huge=true
, no matter how long the query ultimately takes, a response is continuously streamed to the client until all records are scanned. - When the probe query detects that there are more than 100,000 results, it automatically switches to Step 3 (full scan). However, when a probe query hangs or times out, it won't automatically switch to full scan. Instead, you have to kill the query and set
huge=true
to force a full scan.
Example
Your index layer contains 1 billion records. Each record contains these fields: size
, timestamp
, tileId
, eventTime
and eventType
. In the records, there are 20,000 different eventTime
values, 100 different size
values and 1,000 different tileId
values distributed evenly. Internally, a secondary index is created on the eventTime
field which is defined as the type timewindow
.
Scenario #1: Find records in a specific hour
eventTime=1566961200000
- By default (huge=false), the query will return about 50,000 records in seconds
- With huge=true, the query would take 5 minutes
Scenario #2: Find records in a specific hour, and with a specific data size
eventTime=1566961200000;size=100
- By default (huge=false), the query will return about 500 records in seconds
- With huge=true, the query would take 5 minutes
Scenario #3: Find records within a time range
eventTime>=1566961200000;eventTime<1567029600000
This time range covers 20 distinct eventTime
windows, and there are about 1 million records matching the criteria.
- By default (huge=false), the query could timeout, or take 10 minutes plus the probe query time
- With huge=true, the query would take 10 minutes
Scenario #4: Find records within a time range and, with a specific tile ID
eventTime>=1566961200000;eventTime<1567029600000;tileId=321535565
This time range covers 20 distinct eventTime
windows, and there are about 1,000 records matching the criteria.
- By default (huge=false), the query could timeout, or take the probe query time, e.g. 30 seconds
- With huge=true, the query would take 10 minutes
Scenario #5: Find records with a specific tile ID
tileId=321535565
There are about 1 million records matching the criteria.
- By default (huge=false), the query would take 10 minutes, plus a short probe query time
- With huge=true, the query would take 10 minutes
Scenario #6: Find records with a specific tile ID and data size
tileId=321535565;size=100
There are about 10,000 records matching the criteria.
- By default (huge=false), the query will almost certainly time out, or if not it will take a long probe query time.
- With huge=true, the query would take 10 minutes