Skip to content

Support connecting to local s3 object stores in datafusion-cli #10072

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

I am trying to use Sprox locally to query parquet files

Sprox currently proxies requests to an actual S3 instance or local file cache.

I would like to be able to create an EXTERNAL table to read from this instance. Here is how it works in DuckDB:

CREATE SECRET (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN,
    ENDPOINT 'localhost:8080',
    USE_SSL false,
    URL_STYLE path
);

select * from read_parquet('s3://sprox/sample.parquet');

Describe the solution you'd like

I would like to do something like this in datafusion-cli:

-- Create external table
CREATE EXTERNAL TABLE sample
STORED AS PARQUET
OPTIONS(
    'aws.access_key_id' 'A',
    'aws.secret_access_key' 'B',
    'aws.endpoint' 'http://localhost:8080',
)
LOCATION 's3://sprox/sample.parquet';

When I run that today here is the error I get

datafusion-cli -f sprox.sql
DataFusion CLI v37.0.0
Internal error: Config value "" not found on AwsOptions.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Error during planning: table 'datafusion.public.sample' not found

I think this particular error is related to the fact that the config provider doesn't check for aws.endpoint. However, even once I fixed that locally I still couldn't make the external table -- I get an error about scheme not allowed.

Describe alternatives you've considered

Note you can do this workflow using environment variables

$ (venv) andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion2/datafusion-cli$ AWS_ALLOW_HTTP=true AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B AWS_ENDPOINT=http://localhost:8080  datafusion-cli
DataFusion CLI v37.0.0
> CREATE EXTERNAL TABLE sample
STORED AS PARQUET
LOCATION 's3://sprox/sample.parquet';
0 row(s) fetched.
Elapsed 2.266 seconds.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions