Cgroups for the best

Using container technologies as a PaaS provider

@m4d_z
alwaysdata
We Gonna Talk about Future

You Wanna Use
Microservices

You Need to Run Containers
Operating In Prod Mode May Be Hazardous

The Microservices Architecture

Reminder: Microservices

  • Collection of Services
  • Loosely Coupling
  • Business Capabilities Oriented
  • Deployable On-the-fly Independently
microservices.io

Microservices

  • Business oriented: one service only does one thing
    (e.g. a settings manager module)
  • Exposes an API: each service allow interaction through a standardized API
    (e.g. the settings manager exposes getter/setter for settings)
  • Is independent: a service doesn’t require another one
    (e.g. the interface service get settings from the manager of fallback to defaults if unavailable)
  • Rely on bus messages: a service publish changes to a common Bus
    (e.g. the settings manager push changes to the bus and the interface service register to those messages to update)
  • Runs Stateless: data are stored on dedicated storage backends
    (e.g. the settings manager stores values to a shared DB)

Do You Need Microservices?

  • You want to split a complex monolith architecture
  • Your business is spread accross different units
  • You need real scaling capabilities
  • Your team can be split to multiple small projects
  • Your team is pretty much DevOps skilled

When not using them?

  • You’re not ready for observability
  • Your team doesn’t have DevOps roles
  • You don’t even know containers
  • You’re running the Web version of Flappy Bird

The Multi-Languages Architecture

  • Components vs Services
  • Doesn’t need high DevOps skills
  • Offers the Goods of Microservices
    (i.e. Scalability, Flexibility, Maintenance, etc.)
  • Flappy Bird Compatible!

Handling the Containers

Running In Isolation

Isolation Benefits

  • Preventing from data leaks
  • Distributing resources
  • Sandboxing environments
    (i.e. versions, libraries, etc.)
  • Improving Observability

You may think that Isolation means Containers
like k8s, LXC/Jails, Virtual Machines…
but You’re Wrong

The POSIX Basics

  • Processes: isolation of execution
  • I/O Controls: isolation of access
  • Message Passing: isolation of communication
  • Permissions: isolation of resources

You do not need containers
You need a safe isolation system

The Underlying Technology: Cgroups

  • Used by every containers technology
    (Docker, containerd, LXC, etc.)
  • Used at system level
    (systemd)
  • Kernel-level isolation

Running On Your Own

  • Pros:
    • Built-in the Kernel
    • Lot of documentation
    • Standardized
  • Cons:
    • Need High Linux Skills
    • Not “just push an image”
Containers vs Processes

Building the Platform

We built a Cloud Platform
before the Cloud Era,
15 years ago

Cloud Definition

  • High-availabily
  • Elastic Scalability
  • Embedded Services
  • Edge Infrastructure
  • Native Isolation

Cgroups Based Isolation

  • Isolation per User
  • Cgroup per Process per User
  • POSIX Permissions for Resources
# ls -l /sys/fs/cgroup/system.slice/container.service/users/intranet/proxy/apache.upstream

-r--r--r-- 1 root root 0 Apr 13 14:37 cgroup.controllers
-r--r--r-- 1 root root 0 Apr 13 14:37 cgroup.events
-rw-r--r-- 1 root root 0 Apr 13 14:37 cgroup.freeze
-rw-r--r-- 1 root root 0 Apr 13 14:37 cgroup.max.depth
-rw-r--r-- 1 root root 0 Apr 13 14:37 cgroup.max.descendants
-rw-r--r-- 1 root root 0 Apr 13 14:37 cgroup.procs
-r--r--r-- 1 root root 0 Apr 13 14:37 cgroup.stat
-rw-r--r-- 1 root root 0 Apr 13 14:37 cgroup.subtree_control
-rw-r--r-- 1 root root 0 Apr 13 14:37 cgroup.threads
-rw-r--r-- 1 root root 0 Apr 13 14:37 cgroup.type
...
# cat /sys/fs/cgroup/[...]/apache.upstream/cgroup.procs

986352
1297585
1297586
1297587
3598699

Setting the Limits

  • Cgroups Native Capability
  • Writing into files
  • Capping Hardware Resources
  • Using Kernel balances features
# cat /sys/fs/cgroup/[...]/apache.upstream/memory.max

4294967296

Orchestrating: Running the Containers

  • Need a Central Interface
    (i.e. a-la-Kubernetes)
  • Manage Cgroups CRUD
  • Interact with Users’ Profiles

Interfacing: PAM and iptables

  • Sanboxing any process
  • SSH process: PAM script
  • Network: iptables for private ips

Bonus
Patching the Kernel
prevents iptables use

diff -ru linux-5.10.1/net/ipv4/af_inet.c linux-5.10.1~/sources/net/ipv4/af_inet.c
--- linux-5.10.1/net/ipv4/af_inet.c	2020-12-14 19:33:01.000000000 +0100
+++ linux-5.10.1~/net/ipv4/af_inet.c	2020-12-16 15:16:26.195915654 +0100
@@ -464,10 +464,23 @@
 	struct sockaddr_in *addr = (struct sockaddr_in *)uaddr;
 	struct inet_sock *inet = inet_sk(sk);
 	struct net *net = sock_net(sk);
-	unsigned short snum;
+	unsigned short snum = ntohs(addr->sin_port);
 	int chk_addr_ret;
 	u32 tb_id = RT_TABLE_LOCAL;
 	int err;
+	int gid = current_gid().val;
+
+	if (gid >= 2000 && addr->sin_port) {
+		int ad_requested_ip = ntohl(addr->sin_addr.s_addr);
+		int ad_private_ip = 0x7f000000 | gid;
+
+		if (ad_requested_ip == INADDR_ANY && snum >= 8000 && snum < 8300)
+			addr->sin_addr.s_addr = htonl(ad_private_ip);
+		else if ((ad_requested_ip & 0xff000000) == 0x7f000000 &&
+			 (ad_requested_ip & 0x00ffffff) != gid &&
+			 (ad_requested_ip & 0x00ffffff) >= 2000)
+			return -EACCES;
+	}

Bonus
Forbid process read-access
to other users

Going Further: Namespaces

  • Partition Kernel Resources
  • Used in every containers technology
  • Useful but not mandatory

POSIX & Cgroups instead of Containers ?

  • as easy as mkdir & tee
  • fully-agnostic, POSIX compatible
  • no need for images
  • no orchestrator to deploy and manage
  • no extra consumption of resources

You don’t explicitely need
k8s or whatever

You need fair Isolation
on a reliable Platform

Be ready for the Future now
The Wasm-Serverless
based Architecture 🥳

m4dz's avatar
m4dz

Paranoïd Web Dino · Tech Evangelist

alwaysdata logo
https://www.alwaysdata.com

Thank You!

Available under licence CC BY-SA 4.0

Illustrations

m4dz, CC BY-SA 4.0

Interleaf images

Courtesy of Unsplash and Pexels contributors

Icons

  • Layout icons are from Entypo+
  • Content icons are from FontAwesome

Fonts

  • Cover Title: Sinzano
  • Titles: Argentoratum
  • Body: Mohave
  • Code: Fira Code

Tools

Powered by Reveal.js

Source code available at
https://git.madslab.net/talks