docker image prune/rm removes in-use image tags · Issue #36295 · moby/moby

BUG REPORT INFORMATION

Describe the results you received:
Running docker image prune --all --force removes IN-USE images causing locally created images to get lost.

Describe the results you expected:
If a service is running via locally-create-image:1 , running docker image prune --all --force SHOULD NOT remove locally-create-image:1

Steps to reproduce the issue:


    docker swarm init

# simply create a swarm


    mkdir /test-bug


    cd /test-bug


    echo "FROM nginx:1.13.8" > Dockerfile

# this is an example, it can be anything


    docker build -t myimage:1 .


    docker service create --name my-service myimage:1

wait and make sure that your container is running. Check using


    docker ps -a

, you will see image is listed as


    myimage:1


    docker image prune --all --force

# this is supposed to remove all "unused" images


    docker service update my-service --force

# this can be an update of environmental variables or anything else that causes a container restart.

You will see that the service does not get restarted as there is no


    myimage:1

image!

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version :

Client:
 Version:	17.12.0-ce
 API version:	1.35
 Go version:	go1.9.2
 Git commit:	c97c6d6
 Built:	Wed Dec 27 20:11:19 2017
 OS/Arch:	linux/amd64
Server:
 Engine:
  Version:	17.12.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	c97c6d6
  Built:	Wed Dec 27 20:09:53 2017
  OS/Arch:	linux/amd64
  Experimental:	false
Output of docker info:
Containers: 7
 Running: 4
 Paused: 0
 Stopped: 3
Images: 66
Server Version: 17.12.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 113
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: hl08z3yjk34msargicyuklvzy
 Is Manager: true
 ClusterID: k883g1werpi8dj207nr09kj3h
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 174.7.187.204
 Manager Addresses:
  174.7.187.204:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-112-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 7.682GiB
Name: REMOVED
ID: REMOVED
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: dockersaturn
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
          Yes, but that condition is a filter for when to check if a container uses it. If it evaluates to false it proceeds to untag the image.
Why would it remove a tag that is used, just because there is more than one tag? Doesn't that sound broken?
          I understand that, I don't think that changes my expectation for the behaviour. Also this issue is about services, which do use the reference when updated.
I would argue this is issue is a bug with ImageDelete, not specific to prune.
Also this issue is about services, which do use the reference when updated.
With the current code/api/ux, a container or rmi has no concept of a cluster.
I would argue this is issue is a bug with ImageDelete, not specific to prune.
I think this was the desired behavior for ImageDelete. Is it the best solution or not is of course arguable. The base issue seems to be that someone thought all cases can be covered without the need for untag and that assumption doesn't seem to hold. But the bug, in this case, seems to be that prune uses the same codepath that rmi although it expects a different behavior than what is documented as the intended behavior for rmi.
the bug, in this case, seems to be that prune uses the same codepath that rmi although it expects a different behavior
I disagree. image prune and image rm should absolutely use the same code path and have the exact same behaviour. There is no good reason for them to behave differently in some very subtle way. That is a terrible UX.
That leaves us with two options:
ImageDelete is wrong, but no one has really cared until now because it was only subtly wrong.
the current behaviour of docker prune is correct.
I personally think it is the first case.
image prune and image rm should absolutely use the same code path and have the exact same behaviour.
rmi is a combination of untag and/or delete image. This is not subtle but very intentional decision from the first 0.* releases to have an effect that references are same as images. You can't just skip the reference check in prune(that are different as no reference is explicit) and expect rmi to error when an image wasn't deleted because it will just untag then. Even if we fix this container ID vs image ID check (#36346) and this starts to work again it is almost accidental that the rmi code works here as most of it will be impossible to reach from prune. The bug in ImageDelete is that it should be called ImageDeleteOrUntag as it can return successfully without any image being deleted (as is clearly stated in the code comments and rmi docs).
rmi is a combination of untag and/or delete image. This is not subtle but very intentional decision
I agree, this is absolutely correct.
The problem is not that it untags images, the problem is what it untagged is not intuitive. When it was just the container API this was not a bit deal, but with the service API and service update it is more of an issue (which is why this github issue was opened). My reasoning for why this is not just a problem with prune is because the same problem exists with image rm:
$ docker image ls
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
myimage             1                   d3117424aaee        2 weeks ago         27.1MB
redis               alpine              d3117424aaee        2 weeks ago         27.1MB
$ docker service create myimage:1
lbeozsxxzluvrtkqvdqgumxho
$ docker container ls
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
50a94c08cf19        myimage:1           "docker-entrypoint..."   10 seconds ago      Up 9 seconds        6379/tcp            serene_volhard.1.t434m24w64enows81ed5034r3
$ docker image rm myimage:1
Untagged: myimage:1
$ docker service update --force lbeozsxxzluvrtkqvdqgumxho
overall progress: 0 out of 1 tasks
1/1: No such image: myimage:1
service update paused: update paused due to failure or early termination of task 
Exact same problem with image rm, the service is left in a broken state even though no --force was used on rm. This is exactly the same issue.
The problem is, as I mentioned above, that a reference that is "in use" is allowed to be removed. It's not really relevant the container is actually using an image ID because the image reference needs to exist for a user to re run  the docker run command, or to run  service update command. So from the perspective of a user, it is still in use. (maybe we need to only consider the tag if a container uses the same image ID as the tag points at, I'm not sure if that extra check is necessary).
ImageDelete should only allow you to untag if nothing is "using" that reference. The number of references that share the same image ID should be irrelevant.
docker image prune --all removes in-use image
docker image prune/rm removes in-use image tags
    Feb 23, 2018
Exact same problem with image rm, the service is left in a broken state even though no --force was used on rm. This is exactly the same issue.
I don't think this is the same issue. The issue you are describing is about different semantics between services and containers. A container is tied to an image, while service is tied to a reference that should point to a highly available external storage(with a fallback case in a worker if it fails). This fallback only works if a worker node has been manually set up in a specific way to answer a possible cluster request, while the node itself has no knowledge of being in a cluster. You are conveniently mixing cluster-manager and local-node commands in that example but in reality, we need to consider these commands running on different machines. To correctly solve this containers vs services problem we need a local cluster-aware storage that keeps track of all service images without the need for an external registry.
Prune is wrong in this case regardless if you want to change the deletion rules(that were designed before there were services). You don't need to use a service for prune to untag your image that is being used by a container.
Did you see my example? It is identical.
You don't need to use a service for prune to untag your image that is being used by a container.
Yes, both prune and image rm are broken for both containers and services, it's just more obviously wrong with services.
You are conveniently mixing cluster-manager and local-node commands in that example but in reality, we need to consider these commands running on different machines.
It's just as valid to run them on a single machine. It needs to work for both.
I think you're making this problem way more complicated than it needs to be. It's simple. A user issues a request to delete unused images. It removes an image that the user considers in-use, and that our CLI/API shows as in-use. That sounds like a bug to me. Either the the CLI/API shouldn't show the image tag, or the image delete is wrong.
A user issues a request to delete unused images. It removes an image that the user considers in-use, and that our CLI/API shows as in-use. That sounds like a bug to me
Exactly. Prune is broken. It is broken because of a typo that compared image IDs to container IDs. Let's fix that please.
If you want to change the semantics of rmi that have been there for a long time and are documented and do have clear use-cases, please create another issue/proposal to discuss that.
It is broken because of a typo that compared image IDs to container IDs. Let's fix that please.




    

Keeping a map of image IDs was never the right solution.  prune is fixed, now the only issue is with image remove. image delete and image prune must behave the same way. We don't need a separate issue.
do have clear use-cases,
What is the use case? Your example of keeping at least one tag is still supported by my proposed behaviour.
image delete and image prune must behave the same way
Why? They are very different functionality, one the user is requesting to remove something specific, the second is the daemon deciding what might be eligible for removal.
We shouldn't break the image delete behavior to fix a bug in prune. We have been fairly consistent in not breaking this sort of behavior because it tends to have more unexpected consequences for users than the original behavior. Honestly, it is confusing either way so we just need to make it clear in documentation. I would rather avoid changing this behavior now as it will have an impact on any future changes involving referencing/tagging.
          prune should just be an automated version of image rm. That should be the only difference. Otherwise it's unnecessary confusion for the user.
In the case of image rm the engine still needs to figure out if it's eligible for removal as well. It's even documented here:
      moby/daemon/image_delete.go
        Lines 47 to 54
      8f6a40a
We shouldn't break the image delete behavior to fix a bug in prune
We aren't. We're fixing a bug in image rm:
$ docker pull redis:alpine
$ docker tag redis:alpine myimage:1
$ docker run -d myimage:1
552ae8f1134235f9c7d41ec220c9978cb51fd387d4a2d2cd4215b108950ee7c1
$ docker container ls --format '{{.Image}}'
myimage:1
$ docker image rm myimage:1
Untagged: myimage:1
$ docker image rm redis:alpine
Error response from daemon: conflict: unable to remove repository reference "redis:alpine" (must force) - container 20c7d341f7f5 is using its referenced image d3117424aaee
$ docker container ls --format '{{.Image}}'
d3117424aaee
I don't know how you can say this isn't broken. It's fine to untag a reference that IS being used and referenced by a container, but it's not ok to remove a completely unreferenced tag?
If we have to keep a tag around, we should keep the tag that's actually being used.
for example:
tagging image with myimage-backup without removing the container and freeing up the reference to be used by the new container with an option to rollback to the old image
clearing up junk images that are duplicates of used ones
untagging myrepo:latest when you know that it is not up-to-date anymore. so that when new containers are created they would point to/pull a new image
prune should just be an automated version of image rm. That should be the only difference. Otherwise it's unnecessary confusion for the user.
I explained the differences 3 comments above. Prune is about removing unused images, rmi is not only about removing images but also about untagging a specific reference that user has picked out.
Those untagging rules are running before determining conflicts and documented just 10 lines before the conflicts documentation.
I don't know how you can say this isn't broken.
It isn't broken because this is what it was designed to do and how it is documented. That design idea/compromise was that container is not related to a reference but an actual image. Changing a reference after creating a container has no effect to the container after that, it is only a pointer for finding an image on creation time. rmi isn't only way to untag an image, you can do the same thing with tag and pull. Your example is much easier to understand than these cases. User explicitly asks to remove or untag the image. It is successfully untagged. It should not be a surprise at all that it can't be used in a completely new docker run. Container list shows a link to parent image so all the tools using that link still work(as well as restarting the container). The ref doesn't disappear from the container list because the reference was deleted but because the list code knows that references are mutable and always checks if whatever string user typed when starting a container would still be valid as a reference to that image. Inspect always shows image ID, and inspecting images shows all current refs as equal.
In case you have better ideas feel free to make proposals but finding a documented behavior that has weird edge cases does not make it a bug. rmi behaves like documented, prune does not behave like documented.
tagging image with myimage-backup without removing the container and freeing up the reference to be used by the new container with an option to rollback to the old image
This is still supported by my proposed behaviour because you can always build or image tag over an existing tag. You don't even need to delete the original tag.  Why would you need to "free it" ?
clearing up junk images that are duplicates of used ones
Should still be supported as long as no container is using them. If a container is using them they aren't junk, right? Or if you really want to you can always -f.
untagging myrepo:latest when you know that it is not up-to-date anymore. so that when new containers are created they would point to/pull a new image
This is the only use case that isn't directly supported. However a more straighforward way of doing this would be to docker pull myrepo:latest. Why just rmi the thing instead of updating it directly?
It isn't broken because this is what it was designed to do and how it is documented
That's fair. Broken is the wrong word for it. I think it's unexpected behaviour that needs to be fixed, so you're correct that I should open a new issue for this. I have opened #36435.
I still maintain that image prune and image rm should behave the same way (with the exception of one being automated). Not because image rm should act like prune, but because consistent and expected behaviour are important qualities of an API. So I consider this issue (#36295) as a wont-fix, as it will be addressed by #36435.
      Is there an option to delete old (tagged) docker image AUTOMATICALLY if they are not associated to container
      caprover/caprover#234
          We have images tagged with both :latest and :[version], and we use :[version] in our compose file. After pruning the images, restart the server/docker, the services then just refuse to work (without re-deploying).
This is certainly wrong.
          copying from docker/cli#2247 (comment)
I agree that for docker image prune -a this is definitely confusing;

If an image is in use, and tagged under multiple names, then running docker image prune will remove tags for the image that is in used but will keep 1 tag. Not sure if the tags that are removed are "random" or explicitly include the tag that's actually in use, but in the example below, the tag that's actually used is removed, and tags that are unused are removed.
If possible, I think that that behaviour (at least for docker image prune) should be changed.
example
Pull the busybox:latest image, and tag the image under some other names:
docker image pull busybox:latest
docker image tag busybox:latest  thajeztah/busybox:latest
docker image tag busybox:latest  thajeztah/busybox2:latest
there's now three tags for the same image
docker image ls
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
busybox              latest              6d5fcfe5ff17        13 days ago         1.22MB
thajeztah/busybox2   latest              6d5fcfe5ff17        13 days ago         1.22MB
thajeztah/busybox    latest              6d5fcfe5ff17        13 days ago         1.22MB
Now,  run a container, using the busybox:latest tag
docker run -dit --name mycontainer busybox:latest
Prune images with the -a/--all option set
docker image prune -a
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
Deleted Images:
untagged: busybox:latest
untagged: busybox@sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
untagged: thajeztah/busybox2:latest
Total reclaimed space: 0B
Notice that in this case, it removed the thajeztah/busybox2:latest and busybox:latest tags (even though the container was using that), but keeps one tag so that the image still is referenced.
Removing the container, and running docker image prune -a again will remove the remaining tag for the image, and will remove the image itself:
docker rm -f mycontainer
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
Deleted Images:
untagged: thajeztah/busybox:latest
deleted: sha256:6d5fcfe5ff170471fcc3c8b47631d6d71202a1fd44cf3c147e50c8de21cf0648
deleted: sha256:195be5f8be1df6709dafbba7ce48f2eee785ab7775b88e0c115d8205407265c5
Total reclaimed space: 1.22MB
To make the behavior less surprising, I think it should (for docker image prune) remove the unused tag(s), and keep the tags that are in use.
          Maybe some helpful input:

This was working as expected in Docker API version 1.26 but currently not on version 1.39.
Tried to look into the source code but I'm far too inexperienced in Golang to be able to help more unfortunately. Hope it is solved soon since it's a great feature when working :)
          Think I ran into this issue as well, I build and deploy images like so:
for dir in onfw fhwb fnwb nfwb jurwb
  cd $dir/dockerapp
  export APPNAME=$dir
  docker build --build-arg EXISTADDONSVERSION=2.5-SNAPSHOT --secret id=adminpw,src=adminpw -t ${APPNAME}:${VERSION} .
  docker stack rm $APPNAME
  sleep 15
  docker stack deploy --compose-file docker-compose.yml $APPNAME
  cd ../..
image ls will now show 5 images

Regularly I run docker system prune -a

The 4 images last created are removed, athough in use.
The isSingleReference() is iterating through the references in the order of creation without checking if any container exists, since isSingleReference() returns a false we proceed to remove the references.
The above-mentioned behavior doesn't happen when using docker image prune [IMAGE_ID] since it relies on a different set of checks before deleting and moreover relies on imageDeleteHelper().

      moby/daemon/images/image_delete.go
        Lines 157 to 164
      2f74fa5
Also while looking into this, found a small discrepancy in behavior between the 2 flow,
docker image rm [IMAGE_ID] gives the below error if a container attached with the same image is running.  (doesn't allow to be forced)
root@a46e2988ae4f:/go/src/github.com/docker/docker# docker ps -a
DEBU[2021-08-22T21:31:20.412062382Z] Calling GET /_ping                           
DEBU[2021-08-22T21:31:20.412332961Z] Calling GET /v1.30/containers/json?all=1     
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
19d35ebceb80        busybox:latest      "sh"                12 minutes ago      Up 1 second                             mycontainer
----------------------------------------------
Error response from daemon: conflict: unable to delete 42b97d3c2ae9 (cannot be forced) - image is being used by running container 19d35ebceb80
docker image rm [IMAGE_NAME] gives the below error in the same scenario, (allows to be forcibly removed successfully)
Error response from daemon: conflict: unable to remove repository reference "busybox" (must force) - container 19d35ebceb80 is using its referenced image 42b97d3c2ae9
To fix the issue at hand I'm thinking of having the check mentioned here, out into the if expression so as to give priority for running containers.

https://github.com/moby/moby/blob/master/daemon/images/image_delete.go#L89
Since the second discrepancy isn't documented anywhere (couldn't find one at least), assuming it's an issue I think we should re-use the same checks built across the two variations (by ID ref and repo ref).
Please let me know if this approach is valid, will push the bug fix after that.