fsspec+s3fs integration for Google Cloud Storage

Home

Basic setup instructions

Install the following packages:

pip install fsspec s3fs

Do the following steps to get the HMAC key and secret from GCP:

  1. Open web browser and go to https://console.cloud.google.com
  2. Go to google cloud storage
  3. Open settings
  4. Go to Interoperability
  5. Create HMAC key for a sevice account

Code without the fix

import fsspec

f = fsspec.filesystem(
    "s3",
    key="hmac access key",
    secret="hmac secret",
    endpoint_url="https://storage.googleapis.com",
)

with fs.open("s3://some-bucket/test.txt", "w") as fp:
    fp.write("Hello world!")

It shows the following error:

PermissionError: Invalid argument.

which actually masks the following error from aiobotocore:

botocore.exceptions.ClientError: An error occurred (SignatureDoesNotMatch) when calling the PutObject operation: Invalid argument.

Why does the error happen

Quoting from Gemini:

The error botocore.exceptions.ClientError: An error occurred (SignatureDoesNotMatch) when calling the PutObject operation: Invalid argument. when you are using Google Cloud Storage (GCS) with Boto3/Botocore is almost certainly due to an incompatibility in checksum calculation in newer versions of the AWS SDK for Python.

You are using the AWS S3-compatible XML API to interact with GCS, and a recent update in the AWS SDK (which Boto3/Botocore uses) changed the default data integrity settings, which GCS does not support by default.

Fixed code

import fsspec

fs = fsspec.filesystem(
    "s3",
    key="hmac access key",
    secret="hmac secret",
    endpoint_url="https://storage.googleapis.com",
    config_kwargs={
          "request_checksum_calculation": "when_required",
          "response_checksum_validation": "when_required",
    },
)

with fs.open("s3://some-bucket/test.txt", "w") as fp:
    fp.write("Hello world!")

Quoting from boto3 docs:

request_checksum_calculation

    Determines when a checksum will be calculated for request payloads. Valid
    values are:

        when_supported – When set, a checksum will be calculated for all request
        payloads of operations modeled with the httpChecksum trait where
        requestChecksumRequired is true or a requestAlgorithmMember is modeled.

        when_required – When set, a checksum will only be calculated for request
        payloads of operations modeled with the httpChecksum trait where
        requestChecksumRequired is true or where a requestAlgorithmMember is
        modeled and supplied.

response_checksum_validation

    Determines when checksum validation will be performed on response payloads.
    Valid values are:

        when_supported – When set, checksum validation is performed on all
        response payloads of operations modeled with the httpChecksum trait
        where responseAlgorithms is modeled, except when no modeled checksum
        algorithms are supported.

        when_required – When set, checksum validation is not performed on
        response payloads of operations unless the checksum algorithm is
        supported and the requestValidationModeMember member is set to ENABLED.