If your container is not starting, or not behaving as you would expect,
the first thing to do is to look at the console logs generated by the
container, using the
lxc console --show-log CONTAINERNAME command.
In this example, we will investigate a RHEL 7 system in which
can not start.
# lxc console --show-log systemd Console log: Failed to insert module 'autofs4' Failed to insert module 'unix' Failed to mount sysfs at /sys: Operation not permitted Failed to mount proc at /proc: Operation not permitted [!!!!!!] Failed to mount API filesystems, freezing.
The errors here say that /sys and /proc can not be mounted - which is correct in an unprivileged container. However, LXD does mount these filesystems automatically if it can.
The container requirements specify that
every container must come with an empty
folder, as well as
/sbin/init existing. If those folders don't
exist, LXD will be unable to mount to them, and systemd will then
try to. As this is an unprivileged container, systemd does not have
the ability to do this, and it then freezes.
So you can see the environment before anything is changed, you can
explicitly change the init in a container using the
param. This is equivalent to setting
init=/bin/bash on the linux
lxc config set systemd raw.lxc 'lxc.init.cmd = /bin/bash'
Here is what it looks like:
root@lxc-01:~# lxc config set systemd raw.lxc 'lxc.init.cmd = /bin/bash' root@lxc-01:~# lxc start systemd root@lxc-01:~# lxc console --show-log systemd Console log: [root@systemd /]# root@lxc-01:~#
Now that the container has started, you can look in it and see that things are not running as well as expected.
root@lxc-01:~# lxc exec systemd bash [root@systemd ~]# ls [root@systemd ~]# mount mount: failed to read mtab: No such file or directory [root@systemd ~]# cd / [root@systemd /]# ls /proc/ sys [root@systemd /]# exit
Because LXD tries to auto-heal, it did create some of the folders when it was starting up. Shutting down and restarting the container will fix the problem, but the original cause is still there - the template does not contain the required files.
In a larger Production Environment, it is common to have multiple vlans and have LXD clients attached directly to those vlans. Be aware that if you are using netplan and systemd-networkd, you will encounter some bugs that could cause catastrophic issues
At time of writing (2019-03-05), netplan can not assign a random MAC address to
a bridge attached to a vlan. It always picks the same MAC address, which causes
layer2 issues when you have more than one machine on the same network segment.
It also has difficultly creating multiple bridges. Make sure you use
network-manager instead. An example config is below, with a management
address of 10.61.0.25, and VLAN102 being used for client traffic.
network: version: 2 renderer: NetworkManager ethernets: eth0: dhcp4: no accept-ra: no # This is the 'Management Address' addresses: [ 10.61.0.25/24 ] gateway4: 10.61.0.1 nameservers: addresses: [ 220.127.116.11, 18.104.22.168 ] eth1: dhcp4: no accept-ra: no # A bogus IP address is required to ensure the link state is up addresses: [ 10.254.254.25/32 ] vlans: vlan102: accept-ra: no dhcp4: no id: 102 link: eth1 bridges: br102: accept-ra: no dhcp4: no interfaces: [ "vlan102" ] # A bogus IP address is required to ensure the link state is up addresses: [ 10.254.102.25/32 ] parameters: stp: false
- eth0 is the Management interface, with the default gateway.
- vlan102 uses eth1.
- br102 uses vlan102, and has a bogus /32 IP address assigned to it
The other important thing is to set
stp: false, otherwise the bridge will sit
learning state for up to 10 seconds, which is longer than most DHCP requests
last. As there is no possibility of cross-connecting and causing loops, this is
safe to do.
Many switches do not allow MAC address changes, and will either drop traffic with an incorrect MAC, or, disable the port totally. If you can ping a LXD instance from the host, but are not able to ping it from a different host, this could be the cause. The way to diagnose this is to run a tcpdump on the uplink (in this case, eth1), and you will see either 'ARP Who has xx.xx.xx.xx tell yy.yy.yy.yy', with you sending responses but them not getting acknowledged, or, ICMP packets going in and out successfully, but never being received by the other host.
A privileged container can do things that effect the entire host - for example, it can use things in /sys to reset the network card, which will reset it for the entire host, causing network blips. Almost everything can be run in an unprivileged container, or - in cases of things that require unusual privileges, like wanting to mount NFS filesystems inside the container, you may need to use bind mounts.