Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I have converted data frame to JSON by using
toJSON
in
pyspark
that gives me each row as JSON string. but I want to reformat a bit
My code is given below:
spark=SparkSession.builder.config("spark.sql.warehouse.dir", "C:\spark\spark-warehouse").appName("TestApp").enableHiveSupport().getOrCreate()
sqlstring="SELECT lflow1.LeaseType as LeaseType, lflow1.Status as Status, lflow1.Property as property, lflow1.City as City, lesflow2.DealType as DealType, lesflow2.Area as Area, lflow1.Did as DID, lesflow2.MID as MID from lflow1, lesflow2 WHERE lflow1.Did = lesflow2.MID"
def queryBuilder(sqlval):
df=spark.sql(sqlval)
df.show()
return df
result=queryBuilder(sqlstring)
resultlist=result.toJSON().collect()
print(resultlist)
print("Type of",type(resultlist))
After this, the output is:
'{"LeaseType":"Offer to Lease","Status":"Fully Executed","property":"10230104","City":"Edmonton","DealType":"Renewal","Area":"2312","DID":"79cc3959ffc8403f943ff0e7e93584f8","MID":"79cc3959ffc8403f943ff0e7e93584f8"}',
'{"LeaseType":"Offer to Renew","Status":"Fully Executed","property":"1040HAMI","City":"Vancouver","DealType":"Renewal","Area":"784","DID":"ecf922d0583247c0a4cb591bd4b3843e","MID":"ecf922d0583247c0a4cb591bd4b3843e"}',
'{"LeaseType":"Offer to Renew","Status":"Fully Executed","property":"1040HAMI","City":"Vancouver","DealType":"Renewal","Area":"2223","DID":"ecf922d0583247c0a4cb591bd4b3843e","MID":"ecf922d0583247c0a4cb591bd4b3843e"}',
'{"LeaseType":"Offer to Lease","Status":"Conditional","property":"106PORTW","City":"Toronto","DealType":"Renewal","Area":"2212","DID":"69c3af0527014fd99d1ccf156c0bce93","MID":"69c3af0527014fd99d1ccf156c0bce93"}',
'{"LeaseType":"Offer to Lease","Status":"Fully Executed","property":"106PORTW","City":"Toronto","DealType":"0","Area":"","DID":"04aedb01da5d44fead7e1315115c2530","MID":"04aedb01da5d44fead7e1315115c2530"}'
But I want to format this JSON Array like for example: the following two rows:
"LeaseType": "Offer to Lease",
"Status": "Fully Executed",
"property": "10230104",
"City": "Edmonton",
"DealType": "Renewal",
"Area": "2312",
"DID": "79cc3959ffc8403f943ff0e7e93584f8",
"MID": "79cc3959ffc8403f943ff0e7e93584f8"
"LeaseType": "Offer to Renew",
"Status": "Fully Executed",
"property": "1040HAMI",
"City": "Vancouver",
"DealType": "Renewal",
"Area": "784",
"DID": "ecf922d0583247c0a4cb591bd4b3843e",
"MID": "ecf922d0583247c0a4cb591bd4b3843e"
I want to omit the ' here.
resultlist = [
'{"LeaseType":"Offer to Lease","Status":"Fully Executed","property":"10230104","City":"Edmonton","DealType":"Renewal","Area":"2312","DID":"79cc3959ffc8403f943ff0e7e93584f8","MID":"79cc3959ffc8403f943ff0e7e93584f8"}',
'{"LeaseType":"Offer to Renew","Status":"Fully Executed","property":"1040HAMI","City":"Vancouver","DealType":"Renewal","Area":"784","DID":"ecf922d0583247c0a4cb591bd4b3843e","MID":"ecf922d0583247c0a4cb591bd4b3843e"}',
'{"LeaseType":"Offer to Renew","Status":"Fully Executed","property":"1040HAMI","City":"Vancouver","DealType":"Renewal","Area":"2223","DID":"ecf922d0583247c0a4cb591bd4b3843e","MID":"ecf922d0583247c0a4cb591bd4b3843e"}',
'{"LeaseType":"Offer to Lease","Status":"Conditional","property":"106PORTW","City":"Toronto","DealType":"Renewal","Area":"2212","DID":"69c3af0527014fd99d1ccf156c0bce93","MID":"69c3af0527014fd99d1ccf156c0bce93"}',
'{"LeaseType":"Offer to Lease","Status":"Fully Executed","property":"106PORTW","City":"Toronto","DealType":"0","Area":"","DID":"04aedb01da5d44fead7e1315115c2530","MID":"04aedb01da5d44fead7e1315115c2530"}'
data_to_dump = re.sub(r"\'", "", str(resultlist))
json_data= json.dumps(data_to_dump)
print json_data
You have a list of JSON strings, so if you want to get that entire list as a JSON block, you can load the JSON back to python dictionaries, then serialize the whole list
import json
resultlist_json = [json.loads(x) for x in resultlist]
print(json.dumps(resultlist_json, sort_keys=True, indent=4))
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.