I am currently trying to read a parquet file from Google Cloud Storage using the R programming language and the arrow
package. I am using this documentation here:
https://arrow.apache.org/docs/r/articles/fs.html
I have no issues replicating the example for the public buckets. But I am confused by the documentation around Google Storage for private files. The below screenshot seems to imply I need to get a Google Cloud access and secret key, but I think access and secret keys are related to AWS and not Google Cloud.
I am able to autenicathenicate and interact with the Google Storage bucket with other packages
gar_auth_service(json_file="~/.ssh/myjson.json")
But the below code to read in a private parquet file from Google Cloud always times out.
library(arrow)
dat <- read_parquet("gs://mybucket/mypath/myfile.parquet?retry_limit_seconds=10t")
I am able to connect to the bucket via the file system commands:
fs <- GcsFileSystem$create(json_credentials="~/.ssh/myjson.json")
So I don’t think it is a credential/access issues. I just think i need to modify this piece of code but I am unsure what modification I should use.
library(arrow)
dat <- read_parquet("gs://mybucket/mypath/myfile.parquet?retry_limit_seconds=10t")