Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
columns = [('Sn','Products')]
df1 = spark.createDataFrame(([x[0],*x[1]] for x in sdata), schema=columns)
Getting error:
AttributeError: 'tuple' object has no attribute 'encode'
How to load this variable length data ?
You can represent tuples as StructType; but it has fixed fields. I am not sure about "variable length" tuples; but if your requirement is to support variable number of elements in a collection type, then you can either define an explicit schema:
sdata = [(1,(10,20,30)),
(2,(100,20)),
(3,(100,200,300))]
schema = StructType([
StructField('Sn', LongType()),
StructField('Products', ArrayType(LongType())),
df1 = spark.createDataFrame(sdata, schema=schema)
[Out]:
+---+---------------+
| Sn| Products|
+---+---------------+
| 1| [10, 20, 30]|
| 2| [100, 20]|
| 3|[100, 200, 300]|
+---+---------------+
or use field directly as an array:
sdata = [(1,[10,20,30]),
(2,[100,20]),
(3,[100,200,300])]
columns = ['Sn','Products']
df1 = spark.createDataFrame(sdata, schema=columns)
[Out]:
+---+---------------+
| Sn| Products|
+---+---------------+
| 1| [10, 20, 30]|
| 2| [100, 20]|
| 3|[100, 200, 300]|
+---+---------------+
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.