Restrict a Container's Syscalls with seccomp

FEATURE STATE: Kubernetes v1.19 [stable]

Seccomp stands for secure computing mode and has been a feature of the Linux kernel since version 2.6.12. It can be used to sandbox the privileges of a process, restricting the calls it is able to make from userspace into the kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a node to your Pods and containers.

Identifying the privileges required for your workloads can be difficult. In this tutorial, you will go through how to load seccomp profiles into a local Kubernetes cluster, how to apply them to a Pod, and how you can begin to craft profiles that give only the necessary privileges to your container processes.

Objectives

  • Learn how to load seccomp profiles on a node
  • Learn how to apply a seccomp profile to a container
  • Observe auditing of syscalls made by a container process
  • Observe behavior when a missing profile is specified
  • Observe a violation of a seccomp profile
  • Learn how to create fine-grained seccomp profiles
  • Learn how to apply a container runtime default seccomp profile

Before you begin

In order to complete all steps in this tutorial, you must install kind and kubectl .

This tutorial shows some examples that are still beta (since v1.25) and others that use only generally available seccomp functionality. You should make sure that your cluster is configured correctly for the version you are using.

The tutorial also uses the curl tool for downloading examples to your computer. You can adapt the steps to use a different tool if you prefer.

Download example seccomp profiles

The contents of these profiles will be explored later on, but for now go ahead and download them into a directory named profiles/ so that they can be loaded into the cluster.

pods/security/seccomp/profiles/audit.json "defaultAction" : "SCMP_ACT_LOG" }

pods/security/seccomp/profiles/violation.json "defaultAction" : "SCMP_ACT_ERRNO" }

pods/security/seccomp/profiles/fine-grained.json "defaultAction" : "SCMP_ACT_ERRNO" , "architectures" : [ "SCMP_ARCH_X86_64" , "SCMP_ARCH_X86" , "SCMP_ARCH_X32" "syscalls" : [ "names" : [ "accept4" , "epoll_wait" , "pselect6" , "futex" , "madvise" , "epoll_ctl" , "getsockname" , "setsockopt" , "vfork" , "mmap" , "read" , "write" , "close" , "arch_prctl" , "sched_getaffinity" , "munmap" , "brk" , "rt_sigaction" , "rt_sigprocmask" , "sigaltstack" , "gettid" , "clone" , "bind" , "socket" , "openat" , "readlinkat" , "exit_group" , "epoll_create1" , "listen" , "rt_sigreturn" , "sched_yield" , "clock_gettime" , "connect" , "dup2" , "epoll_pwait" , "execve" , "exit" , "fcntl" , "getpid" , "getuid" , "ioctl" , "mprotect" , "nanosleep" , "open" , "poll" , "recvfrom" , "sendto" , "set_tid_address" , "setitimer" , "writev" "action" : "SCMP_ACT_ALLOW" }