ADF Parsing JSOn column values in csv file

相关文章推荐

稳重的啤酒 · C语言读取csv文件并保存到二维数组 - C羽言 · 4 天前 ·

慷慨大方的毛巾 · java解析CSV文件三种方法(openCS ...· 3 天前 ·

活泼的弓箭 · 使用opencsv实现Csv与Java对象转 ...· 3 天前 ·

卖萌的眼镜 · 描述块 | Microsoft Learn· 2 年前 ·

儒雅的薯片 · 将 expression 转换为数据类型 ...· 2 年前 ·

大鼻子的水龙头 · Python 绘图，我只用 ...· 2 年前 ·

I have a CSV file with below format :

c1,c2,c3
1,ABC,"{'InvestigationName': None, 'Status': None, 'ProcessedBySentinel': 2}"
2,XYZ,"{'InvestigationName': None, 'Status': None, 'ProcessedBySentinel': 1}"
3,MNC,"{'InvestigationName': 1234, 'Status': None, 'ProcessedBySentinel': 1}"

where column c3 is of JSON format. I need to parse that Json value to different columns and generate a CSV file. I am using parse functionlaity :

But I am getting the below error:

StructType(StructField(InvestigationName,StringType,true), StructField(Status,StringType,true)) (of class org.apache.spark.sql.types.StructType)

for more details on the issue:
https://stackoverflow.com/questions/68362992/parse-json-column-value-in-csv-file

Can someone point out what I am doing wrong?

Welcome back @Nandan Hegde . Thank you for bringing this to our attention.

I don't see anything you are doing wrong. I am getting the same error message. I'll get another person to help me look, as I don't see the cause. Also thank you for linking the questions.

So far I tried swapping ' and " , but that made not difference. Next I'll try replacing the None with different value.

@Nandan Hegde we found the(?) cause. There was a setting we missed.

In the Parse Settings, below format, there is Json settings. It started out minimized and set to "Document per line". After I changed it to "Single document", things started to work.

Update:
There was another thing I did. I made derived column using

replace(replace(c3,"'",'"'),'None','"non"')  
I am testing whether one or both of these replaces are necessary.    
											Yes it seems both changes are necessary.  I tried putting each individually in the Parse expression column.  The output was nulls.  
replace(c3,"'",'"')  
replace(c3,"None",'"non"')  
Actually, these are linked.  The None is without "double quotes" , so it isn't a string.  None is also not a number, so it breaks the integers.