This is incredibly frustrating, especially given that it takes about 10 minutes for the spark pool to respond each time you try!
Hello @Matty-2070 ,
Thanks for the question and using MS Q&A platform.
Could you please share the content of the requirements.txt?
As per the error message: "ERROR: numpy-1.23.3-cp38-cp38-win32.whl is not a supported wheel on this platform. - which clearly says .whl is not supported.
As per the repro - I'm able to successfully install numpy pacakge using requirements.txt as shown below:
Above requirements.txt successfully installed on the Apache Spark pool:
Checkout the numpy package update from the previous version as shown below:
Hope this will help. Please let us know if any further queries.
------------------------------
Please don't forget to click on
or upvote
button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators
Thank you for your detailed response.
I've tried to install numpy==1.23.3 exactly as suggested, but I get an error after around 10 minutes of it running.
The method I am using:
Requirements.txt file contains numpy==1.23.3
Then upload the file via the following method:
'Manage' > 'Apache Spark pools' > choose 'Packages' on the specific spark pool > upload Requirements.txt file via the 'Requirements files' section > 'Upload' > 'Apply'
Then wait patiently for 10 mins for it to work (or fail in this case).
Do you know what the issue might be? Are there any other configs that I might need to change to get things to work?
Cheers,
Matty
Interestingly, when I try the above using a different Azure account, it works fine. So the issue is exclusively related to the specific account/Synapse instance.
If there's anything you can recommend I check, that would be appreciated.
Thanks,
Matty
I have tried loading to a completely new spark pool this morning, but it failed again. Here's the error:
Error details
Notifications
ProxyLivyApiAsyncError
LibraryManagement - Spark Job for sparkpooltest in workspace **** in subscription **** failed with status:
{"id":18,"appId":"application_****","appInfo":{"driverLogUrl":"http://vm-****/node/containerlogs/container_****/trusted-service-user","sparkUiUrl":"http://vm-****/proxy/application_****/","isSessionTimedOut":null,"isStreamingQueryExists":"false","impulseErrorCode":null,"impulseTsg":null,"impulseClassification":null},"state":"dead","log":["Elapsed: -","","An HTTP error occurred when trying to retrieve this URL.","HTTP errors are often intermittent, and a simple retry will get you on your way.","'https://conda.anaconda.org/conda-forge/linux-64'","","","22/10/11 08:16:58 ERROR b\"Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you.\\nCollecting package metadata (repodata.json): ...working... failed\\n\"","22/10/11 08:16:58 INFO Cleanup following folders and files from staging directory:","22/10/11 08:17:01 INFO Staging directory cleaned up successfully"],"registeredSources":null}
I am now wondering whether the issue is linked to how Azure has been configured within our corporate environment given that things are working fine when I use my personal Azure account, but I wouldn't know what to check. Any ideas?
Cheers,
Matty
I've noticed in the logs that exfiltration protection is set to true:
INFO Data exfiltration protection set to: true
I wonder if this is the issue? Link to the Microsoft article on this below:
Cheers,
Matty
Hello @Matty-2070 ,
Users can provide an environment configuration file to install Python packages from public repositories like PyPI. In data exfiltration protected workspaces, connections to outbound repositories are blocked. As a result, Python libraries installed from public repositories like PyPI are not supported.
As an alternative, users can upload workspace packages or create a private channel within their primary Azure Data Lake Storage account. For more information, visit Package management in Azure Synapse Analytics.
Thanks for responding - your explanation makes sense.
I will now have a look at uploading via the method described in order to overcome the data exfiltration protection that exists.
Cheers,
Matty
Hello @Matty-2070 ,
Following up to see if the above suggestion was helpful. And, if you have any further query do let us know.
------------------------------
Please don't forget to click on
or upvote
button whenever the information provided helps you.