Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am trying to load data using spark into the minio storage -
Below is the spark program -
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from datetime import datetime
from pyspark.sql import Window, functions as F
spark = SparkSession.builder.appName("MinioTest").getOrCreate()
sc = spark.sparkContext
spark.conf.set("spark.hadoop.fs.s3a.endpoint", "https://minioendpoint.com/")
spark.conf.set("spark.hadoop.fs.s3a.access.key", "username")
spark.conf..set("spark.hadoop.fs.s3a.secret.key", "password" )
spark.conf..set("spark.hadoop.fs.s3a.path.style.access", True)
spark.conf..set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
df = spark.read.csv('s3a://bucketname/spark-operator-on-k8s/data/input/input.txt',header=True)
df.write.format('csv').options(delimiter='|').mode('overwrite').save('s3a://bucketname/spark-operator-on-k8s/data/output/')
Spark Submit Command -
/usr/middleware/spark-3.2.0-bin-hadoop3.2/bin/spark-submit --jars /usr/middleware/maven/hadoop-aws-3.2.0.jar,/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar --driver-class-path /usr/middleware/maven/hadoop-aws-3.2.0.jar,/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar --conf spark.executor.extraClassPath="/usr/middleware/maven/hadoop-aws-3.2.0.jar:/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar" /usr/middleware/miniocerts/minio.py
Error 1 -
Although password was provided in the scripts not sure why this error is thrown -
java.nio.file.AccessDeniedException: s3a://bucketname/spark-operator-on-k8s/data/output: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
Then i
export AWS_ACCESS_KEY_ID=usenrame
export AWS_SECRET_KEY=password
/usr/middleware/spark-3.2.0-bin-hadoop3.2/bin/spark-submit --jars /usr/middleware/maven/hadoop-aws-3.2.0.jar,/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar --driver-class-path /usr/middleware/maven/hadoop-aws-3.2.0.jar,/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar --conf spark.executor.extraClassPath="/usr/middleware/maven/hadoop-aws-3.2.0.jar:/usr/middleware/maven/aws-java-sdk-bundle-1.11.375.jar" /usr/middleware/miniocerts/minio.py
Now the error is -
java.nio.file.AccessDeniedException: s3a://bucketname/spark-operator-on-k8s/data/input/input.txt: getFileStatus on s3a://bucketname/spark-operator-on-k8s/data/input/input.txt: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: XX9TMS6ANGYZXXKN; S3 Extended Request ID: t4kasdasda=; Proxy: null), S3 Extended Request ID: t4kYUfgfSAnw7ymP:403 Forbidden
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:249)
mc S3 -policy -
The access fothe useris is {
"Version": "2012-10-17",
"Statement": [
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject",
"s3:DeleteObject"
"Resource": [
"arn:aws:s3:::bucketname",
"arn:aws:s3:::bucketname/*"
Any pointers on what could be the error ?
–
–
I modified your code. It should work now.
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from datetime import datetime
from pyspark.sql import Window, functions as F
spark = SparkSession.builder.appName("MinioTest").getOrCreate()
sc = spark.sparkContext
spark.conf.set("spark.hadoop.fs.s3a.endpoint", "https://minioendpoint.com/")
spark.conf.set("spark.hadoop.fs.s3a.access.key", "username")
spark.conf.set("spark.hadoop.fs.s3a.secret.key", "password" )
spark.conf.set("spark.hadoop.fs.s3a.path.style.access", "true")
spark.conf.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
df = spark.read.csv('s3a://bucketname/spark-operator-on-k8s/data/input/input.txt',header=True)
df.write.format('csv').options(delimiter='|').mode('overwrite').save('s3a://bucketname/spark-operator-on-k8s/data/output/')
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.