I have a CSV file with below format :
c1,c2,c3
1,ABC,"{'InvestigationName': None, 'Status': None, 'ProcessedBySentinel': 2}"
2,XYZ,"{'InvestigationName': None, 'Status': None, 'ProcessedBySentinel': 1}"
3,MNC,"{'InvestigationName': 1234, 'Status': None, 'ProcessedBySentinel': 1}"
where column c3 is of JSON format. I need to parse that Json value to different columns and generate a CSV file. I am using parse functionlaity :
But I am getting the below error:
StructType(StructField(InvestigationName,StringType,true), StructField(Status,StringType,true)) (of class org.apache.spark.sql.types.StructType)
for more details on the issue:
https://stackoverflow.com/questions/68362992/parse-json-column-value-in-csv-file
Can someone point out what I am doing wrong?
Welcome back
@Nandan Hegde
. Thank you for bringing this to our attention.
I don't see anything you are doing wrong. I am getting the same error message. I'll get another person to help me look, as I don't see the cause. Also thank you for linking the questions.
So far I tried swapping ' and " , but that made not difference. Next I'll try replacing the None with different value.
@Nandan Hegde
we found the(?) cause. There was a setting we missed.
In the Parse Settings, below format, there is Json settings. It started out minimized and set to "Document per line". After I changed it to "Single document", things started to work.
Update:
There was another thing I did. I made derived column using
replace(replace(c3,"'",'"'),'None','"non"')
I am testing whether one or both of these replaces are necessary.
Yes it seems both changes are necessary. I tried putting each individually in the Parse expression column. The output was nulls.
replace(c3,"'",'"')
replace(c3,"None",'"non"')
Actually, these are linked. The None
is without "double quotes"
, so it isn't a string. None
is also not a number, so it breaks the integers.