Huge Query Support

Index queries support huge queries, such as a query that returns 5 million records. You should set huge=true in the request when either:

  • You expect results of more than 100,000 records.
  • There is a timeout error or no response for a few minutes.

To understand the effect of setting huge=true, consider how a query works if you set huge=false (or if you omit the huge parameter):

  1. The query is executed with a limit of 100,000, so only the first 100,000 results that match the query are returned. This is called a probe query.
  2. If the probe query returns fewer than 100,000 results, no other queries are performed.
  3. If the probe query returns 100,000 results, then the query is split into multiple smaller "split" queries. The results of each split query are streamed to the user. This step is a full scan of all the records in the index layer.

If you set huge=true, the service performs split queries without performing the probe query. This prevents the query from running a long time without returning any response, so you begin to receive streaming responses sooner.

The format of a huge query is:

GET /<Base path for the index API from the API Lookup Service>/layers/<Layer ID>?query=<RSQL>&huge=true HTTP/1.1

Note

  • Setting huge=true forces a full scan of all records in the index layer.
  • Using the huge parameter does not guarantee that the query will finish faster. It only prevents the query from running for a long time without returning a response. With huge=true, no matter how long the query ultimately takes, a response is continuously streamed to the client until all records are scanned.
  • When the probe query detects that there are more than 100,000 results, it automatically switches to Step 3 (full scan). However, when a probe query hangs or times out, it won't automatically switch to full scan. Instead, you have to kill the query and set huge=true to force a full scan.

Example

Your index layer contains 1 billion records. Each record contains these fields: size, timestamp, tileId, eventTime and eventType. In the records, there are 20,000 different eventTime values, 100 different size values and 1,000 different tileId values distributed evenly. Internally, a secondary index is created on the eventTime field which is defined as the type timewindow.

Scenario #1: Find records in a specific hour

eventTime=1566961200000

  • By default (huge=false), the query will return about 50,000 records in seconds
  • With huge=true, the query would take 5 minutes

Scenario #2: Find records in a specific hour, and with a specific data size

eventTime=1566961200000;size=100

  • By default (huge=false), the query will return about 500 records in seconds
  • With huge=true, the query would take 5 minutes

Scenario #3: Find records within a time range

eventTime>=1566961200000;eventTime<1567029600000

This time range covers 20 distinct eventTime windows, and there are about 1 million records matching the criteria.

  • By default (huge=false), the query could timeout, or take 10 minutes plus the probe query time
  • With huge=true, the query would take 10 minutes

Scenario #4: Find records within a time range and, with a specific tile ID

eventTime>=1566961200000;eventTime<1567029600000;tileId=321535565

This time range covers 20 distinct eventTime windows, and there are about 1,000 records matching the criteria.

  • By default (huge=false), the query could timeout, or take the probe query time, e.g. 30 seconds
  • With huge=true, the query would take 10 minutes

Scenario #5: Find records with a specific tile ID

tileId=321535565

There are about 1 million records matching the criteria.

  • By default (huge=false), the query would take 10 minutes, plus a short probe query time
  • With huge=true, the query would take 10 minutes

Scenario #6: Find records with a specific tile ID and data size

tileId=321535565;size=100

There are about 10,000 records matching the criteria.

  • By default (huge=false), the query will almost certainly time out, or if not it will take a long probe query time.
  • With huge=true, the query would take 10 minutes

results matching ""

    No results matching ""