{ Limezest 🍋 }

Drive API's Batch requests

Aug 25, 2022
7 minutes
dev python Google Drive
Photo by [Peggy_Marco via Pixabay](https://pixabay.com/fr/photos/matruschka-matriochka-babushka-1029685/)

Each HTTP connection that your application makes results in a certain amount of overhead.

The google-api-python-client library supports batching, allowing your application to send several API calls within a single HTTP request, reducing overhead and code complexity.

Examples of situations when you might want to use batching:

  • You have many small requests to make and would like to minimize HTTP request overhead.
  • A user made changes to data while your application was offline, so your application needs to synchronize its local data with the server by sending a lot of updates and deletes.

Note: You’re limited to 1000 calls in a single batch request by the google-api-python-client.
Some APIs limit to lower values: Drive is 100 max.

If you need to make more calls than that, use multiple batch requests


Sending out one single batch request of 50 API calls will still count as 50 units towards the API’s Quota.

However since the batch request is a blocking operation, it may be a way to circumvent some rate limitations. (number of calls in a sliding window of time)
For instance, sending a batch of 100 requests that hangs for 30 seconds will result in an average of 330reqs / 100seconds which is close to the rate limit of the Sheets API for Write operations.


Solution

You create batch requests by calling new_batch_http_request() on your service object, and then calling add() for each request you want to execute.

The add() method also allows you to supply a request_id parameter for each request.
These IDs are provided to the callbacks. If you don’t supply one, the library creates one for you. The IDs must be unique for each API request, otherwise add() raises an exception.


You may pass in a callback with each request that is called with the response to that request.
The callback function arguments are: a unique request identifier for each API call, a response object which contains the API call response, and an exception object which may be set to an exception raised by the API call.


You can also supply a single callback that gets called for each response.
If you supply a callback to both new_batch_http_request() and add(), they both get called.

Inside the callback you can check for each individual call’s API response.


After you’ve added the requests, you call execute() to send all requests.
The execute() function hangs until all callbacks have been called.


Note: If you plan to iterate a long list and make multiple smaller batch requests, running batch.execute() after each chunk will not empty the list of requests in the BatchHttpRequest object.

Make sure you recreate the BatchHttpRequest object with a call to new_batch_http_request().



Code sample

Following is a Python code snippet to delete a huge amount of Drive files using Batch HTTP Requests.

Drive API’s Files.delete(fileId) method allows an application to delete the file passed as the fileId parameter and skip the trash.

Deleting several files would imply calling this same method several times in a row (or in parallel if you’re brave enough to try it), which would result in more complexity to handle this use-case.
(more http calls → more request overhead → potentially more network errors, we may encounter a rate limit on concurrent calls, etc)


The delete_shortcuts(shortcuts) function first splits the big list of files into smaller chunks, then iterates on it to make batch requests and store the API bulk responses to return in the callback.

# In this sample we want to delete a huge amount of Drive Files (shortcuts).
# The `delete_shortcuts(shortcuts)` function expects a big list of file ID to delete.

from app.config import settings
from app.services.drive import DriveService


def delete_shortcuts(shortcuts: list) -> list:
    """
    Calls the Drive File.delete() API method using batch requests of size 100.

    Args:
        shortcuts (list): List of shortcut IDs to delete.

    Returns:
        list: List of errors that happened as a dict containing shortcut id, detailled api response and error.
    """
    err: list = []

    def callback(request_id, response={}, exception=None) -> None:
        """
        Callback for batch request to Drive API.

        Args:
            request_id (str): id of the request. In this case the shortcut file id.
            response: response to the request by the Drive API.
            exception (Exception): exception raised by the request, None if no exception
        """
        if exception:
            # Just strip the file id from the error message
            # so that the same error type doesn't give several unique error messages
            err.append(
                {
                    "shortcut_id": request_id,
                    "response": response,
                    "error": str(exception).replace(request_id, ""),
                }
            )

    def make_batch(iterable: list, size=1000):
        """
        Split a list into several smaller chunks of size N.
        Yield each chunk so we can iterate on it.

        Args:
            iterable (list): The big list to be split into smaller chunks.
            size (int, optional): Size of each chunk, adapt this parameter to what's allowed by the Drive API.
            google-api-python-client allows up to 1000, but Drive API seems to support 100 better.

        Yields:
            Generator[list]: Generator of each chunk of the big list.
        """
        length = len(iterable)
        for ndx in range(0, length, size):
            yield iterable[ndx : min(ndx + size, length)]

    i: int = 1
    # settings.BATCH_SIZE = 100, substract 1 to be extra safe
    for shortcuts_chunk in make_batch(iterable=shortcuts, size=(settings.BATCH_SIZE - 1)):
        drive_batch_requests = drive_service.service.new_batch_http_request(callback)

        for shortcut_id in shortcuts_chunk:
            drive_batch_requests.add(
                request_id=shortcut_id,
                request=drive_service.service.files().delete(
                    fileId=shortcut_id,
                    supportsAllDrives=True,
                ),
            )

        print(f"Sending batch n°{i} with {len(drive_batch_requests._requests)} requests")
        # `drive_service.make_call()` is a wrapper around request.execute() with a exception handling and an exponential backoff decorator.
        # You can replace next line with the library-native retry mecanism:
        # `drive_batch_requests.execute(num_retries=3)`
        drive_service.make_call(drive_batch_requests)
        i += 1

    return err

if __name__ == "__main__":
    # This drive service uses Domain-wide Delegation
    # (scopes=drive as defined in settings)
    drive_service = DriveService(subject=settings.ADMIN_DRIVE_EMAIL)

    shortcuts: list = [
        "a1b2c3d4e5f6",
        "…",  # (long)
        "a1b2c3d4e5f6",
    ]

    err: list = delete_shortcuts(shortcuts=shortcuts)
    if (err):
        # If errors, do stuff like display unique error messages
        print({e.get("error") for e in err})