Docker problem with network changes¶
Back to Docker Networks
All docker containers of one common service and commissioning file will be run
inside a separate docker network. This service-specific docker network is
named simply by the service ID. This network
connects all containers of this service, and also the ingress proxy container of Connectware. The name of the ingress proxy is simply connectware
with
the docker-compose project name as prefix (e.g. connectware), and the decimal
1 as suffix, hence the full container name can be connectware_connectware_1
.
In addition to the service-specific networks, there is also one more Docker
network connecting the Connectware-internal application containers. This
internal network is named with the docker-compose project name as prefix (e.g.
connectware) and the name default, hence e.g. connectware_default
.
There is one somewhat unexpected caveat in the event of changing Docker network configurations, leading to temporary loss of certain data connections each time this occurs. This event can occur only during enabling or disabling of a Connectware service. For details, read on.
Service-specific networks¶
Each time a Connectware service with container resources is enabled, a new service-specific Docker network will be created. The service-specific network will then be connected to all service containers, and also to the common ingress proxy container. Upon disabling a Connectware service, this network will be disconnected from the ingress proxy and the other containers again.
Hence, the Docker network configuration of the ingress proxy container is being changed every time a service is enabled and disabled, if that service contains containers. This is not a problem as long as each network change does not change the default gateway setting of the ingress proxy.
Ingress proxy default gateway¶
Here’s the catch: Whether or not a connected Docker network sets the default gateway of a container depends on the lexicographical ordering of the network names. Yes, unfortunately this is not a joke.
Without any service-specific networks, the ingress proxy is connected to the
Connectware-internal network named after the docker-compose project name, for
example connectware_default
. Any data connection from outside clients to
e.g. REST endpoints of Connectware rely on the ingress proxy configuration
which should route the request from the outside to the Connectware-internal
network, so that the appropriate service container will respond to the specific
REST request. This concerns both HTTP clients (for REST) and MQTT clients (for
MQTT data connections).
The problem occurs if there are services whose service ID name is
lexicographically before the Connectware-internal’s network name. In the default
installation, the Connectware-internal network name is connectware_default
.
If another service with service ID anotherService
is enabled, the a
comes before the c
. Hence upon enabling the anotherService, the ingress
proxy’s default gateway is changed by the Docker runtime from the
connectware_default network to the anotherService network.
This will cause a connection loss for several seconds, until the reachability of the internal containers is restored by trying not only the default gateway network, but one by one the additional networks. Connection loss on the order of 10 seconds has been observed, after which everything is being restored automatically.
Observed errors¶
If the network change is introducing the described error, the observed effect is that any ongoing MQTT client or HTTP client has lost the connection to the Connectware service over a duration of approx. 10 seconds. After that time, the reachability of the internal containers is automatically restored by trying the routing not only on the default gateway network but also on the additional docker networks.
Also, if this event occurs, in the Docker logs of the container-manager
there will be a log message of log level WARNING, stating:
Ingress container default network changed to anotherService. Was connectware_default.
Scope limitation¶
This error concerns only a very specific scenario. To clarify the scope limitation: The error will not occur in any of the following cases:
Connectware services are not enabled or disabled but just keep on running
The enabled Connectware services do not contain any Cybus::Container resource
All installed services have service IDs starting with letters lexicographically after the Connectware’s internal network name
There are no ongoing MQTT, HTTP, or TCP client connections from outside clients to Connectware. Connections from Connectware to other servers or machines are not affected.
Workarounds¶
If this error is a problem for you, any of the following workarounds can be applied to avoid this error:
Renaming the Connectware-internal network to be lexicographically always before any service ID, such as
a_a_connectware...
. To achieve this renaming, the docker-compose project name must be changed. By default, the docker-compose project name is taken from the current working directory where thedocker-compose.yml
file is installed and thedocker-compose
command is called. This directory can be renamed fromconnectware
to the desired other name. Alternatively, the-p
option can be used in all program calls to docker-compose, such asdocker-compose -p a_a_connectware ...
. Watch out: When renaming a running installation, the Docker volumes must be renamed, too.Renaming all service IDs to be lexicographically always after the Connectware-internal network. There could be a convention that every service ID must start with a letter
s_
or similar.
Further reading¶
This problem in the Docker runtime has been reported and is known to exist for many years now. Further reading:
https://github.com/moby/moby/issues/21741#issuecomment-210090728
https://github.com/moby/libnetwork/issues/1141#issuecomment-215522809
https://gist.github.com/jfellus/cfee9efc1e8e1baf9d15314f16a46eca
Back to Docker Networks