Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I'm trying to build a new conda environment in our Sagemaker ec2 environment in a terminal session. Packages in the original copy of the environment were corrupted, and the environment became unusable. The issue couldn't be fixed by removing packages and re-installing or using
conda update
.
I nuked the environment with
conda env remove -n python3-cn
and then attempted to recreate the environment with:
conda env create -p /home/ec2-user/SageMaker/anaconda3/envs/python3-cn --file=${HOME}/SageMaker/efs/.sagemaker/python3-cn_environment.yml --force
This environment has been created a number of times in several ec2 instances for individual Sagemaker users.
Conda logs the following:
Collecting package metadata (repodata.json): done
Solving environment: done
Downloading and Extracting Packages
pytest-arraydiff-0.2 | 14 KB | ##################################################################################################### | 100%
partd-0.3.8 | 32 KB | ##################################################################################################### | 100%
... several progress bar lines later...
psycopg2-2.7.5 | 507 KB | ##################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
ERROR conda.core.link:_execute(700): An error occurred while installing package 'defaults::mkl-2018.0.3-1'.
Rolling back transaction: done
[Errno 28] No space left on device
The No space left on device error is consistent. I've tried
conda clean --all, removing the environment, re-building the environment
removing the caches, removing the environment, re-building the environment
removing the environment, shutting down and restarting JuypiterLab (our Sagemaker is configured to create python3-cn if the environment doesn't exist when JupyterLab starts)
In the first two, I get Errno 28.
In the last one, the instance is not created, conda env list does not show the python3-cn, but I see there is a python3-cn directory in the anaconda/envs/ directory. If I do conda activate python3-cn, I see the prompt change, but the environment is unusuable. If I try conda update --all, I get a notification that one of the package files has been corrupted.
Not really sure what to do here. I'm looking for space hogs, but not really finding anything significant.
Try increasing the ebs volume amount of your notebook ... this blog explains it well: https://aws.amazon.com/blogs/machine-learning/customize-your-notebook-volume-size-up-to-16-tb-with-amazon-sagemaker/
Also, best practice is to use lifecycle configuration scripts to build/add new dependencies ... official docs: https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html
This github page has some great template examples ... for example setting up specific configs like conda, etc: https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/tree/master/scripts
@thePurplePython Thank you for this info-- it's most helpful. I found that there were some hidden directories in the notebook that were consuming JUST ENOUGH space to allow the build to start, but not to finish. However, I really like the lifecycle config and will look at it as our dependencies grow, which is bound to happen.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.