fsspec+s3fs integration for Google Cloud Storage
Basic setup instructions
Install the following packages:
pip install fsspec s3fs
Do the following steps to get the HMAC key and secret from GCP:
- Open web browser and go to https://console.cloud.google.com
- Go to google cloud storage
- Open settings
- Go to Interoperability
- Create HMAC key for a sevice account
Code without the fix
import fsspec f = fsspec.filesystem( "s3", key="hmac access key", secret="hmac secret", endpoint_url="https://storage.googleapis.com", ) with fs.open("s3://some-bucket/test.txt", "w") as fp: fp.write("Hello world!")
It shows the following error:
PermissionError: Invalid argument.
which actually masks the following error from aiobotocore:
botocore.exceptions.ClientError: An error occurred (SignatureDoesNotMatch) when calling the PutObject operation: Invalid argument.
Why does the error happen
Quoting from Gemini:
The error botocore.exceptions.ClientError: An error occurred (SignatureDoesNotMatch) when calling the PutObject operation: Invalid argument. when you are using Google Cloud Storage (GCS) with Boto3/Botocore is almost certainly due to an incompatibility in checksum calculation in newer versions of the AWS SDK for Python.
You are using the AWS S3-compatible XML API to interact with GCS, and a recent update in the AWS SDK (which Boto3/Botocore uses) changed the default data integrity settings, which GCS does not support by default.
Fixed code
import fsspec fs = fsspec.filesystem( "s3", key="hmac access key", secret="hmac secret", endpoint_url="https://storage.googleapis.com", config_kwargs={ "request_checksum_calculation": "when_required", "response_checksum_validation": "when_required", }, ) with fs.open("s3://some-bucket/test.txt", "w") as fp: fp.write("Hello world!")
Quoting from boto3 docs:
request_checksum_calculation
Determines when a checksum will be calculated for request payloads. Valid
values are:
when_supported – When set, a checksum will be calculated for all request
payloads of operations modeled with the httpChecksum trait where
requestChecksumRequired is true or a requestAlgorithmMember is modeled.
when_required – When set, a checksum will only be calculated for request
payloads of operations modeled with the httpChecksum trait where
requestChecksumRequired is true or where a requestAlgorithmMember is
modeled and supplied.
response_checksum_validation
Determines when checksum validation will be performed on response payloads.
Valid values are:
when_supported – When set, checksum validation is performed on all
response payloads of operations modeled with the httpChecksum trait
where responseAlgorithms is modeled, except when no modeled checksum
algorithms are supported.
when_required – When set, checksum validation is not performed on
response payloads of operations unless the checksum algorithm is
supported and the requestValidationModeMember member is set to ENABLED.