I have created a custom pipeline in Azure Machine Learning that involves executing a script called
data_prep.py
. This script requires the
en_core_web_sm
model from the Spacy library for data cleaning purposes.
I have defined an environment YAML file for my pipeline, where I included the
spacy
package as a dependency. However, when I try to include the
en_core_web_sm
model in the YAML file, it fails to create the environment.
I have also attempted to download the
en_core_web_sm
model manually and store it in a folder, and then referenced the model path in my
data_prep.py
script. However, this approach is not working either.
I would greatly appreciate any guidance on how to correctly include the
en_core_web_sm
model in my Azure ML pipeline environment. Is there a recommended method to include Spacy models in Azure ML environments? How can I ensure that the
en_core_web_sm
model is accessible to my pipeline and can be utilized by
data_prep.py
?
Thank you for your assistance.
Hello [@Kavinamoole, Abhishek]/users/na/?userid=310be9bd-66aa-4ab9-8091-dba14acc54e0)
Thanks for reaching out to us, to include the en_core_web_sm model in your Azure ML pipeline environment, you can try the following steps:
Install the en_core_web_sm model in your environment: You can install the en_core_web_sm model in your environment by adding it as a dependency in your environment YAML file. Here is an example of how to add the en_core_web_sm model as a dependency:
dependencies:
- python=3.8
- spacy
- pip:
- https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.1.0/en_core_web_sm-3.1.0-py3-none-any.whl
This will install the en_core_web_sm model from the official Spacy models repository.
Load the en_core_web_sm model in your script: Once the en_core_web_sm model is installed in your environment, you can load it in your script using the spacy.load() function. Here is an example of how to load the en_core_web_sm model in your script:
import spacy
nlp = spacy.load("en_core_web_sm")
This will load the en_core_web_sm model into memory and allow you to use it for data cleaning purposes.
Verify that the model is accessible: To verify that the en_core_web_sm model is accessible in your environment, you can run a simple test script that loads the model and performs some basic text processing tasks. If the script runs without errors, then the en_core_web_sm model is accessible and can be used in your pipeline.
I hope this helps! Please have a try and let me know if you have any other questions.
Regards,
Yutong