The idea is to implement a locking mechanism that utilizes the backupstore,
to prevent the following dangerous cases of concurrent operations.
1. prevent backup deletion during backup restoration
2. prevent backup deletion while a backup is in progress
3. prevent backup creation during backup deletion
4. prevent backup restoration during backup deletion
The locking solution shouldn't unnecessary block operations, so the
following cases should be allowed.
1. allow backup creation during restoration
2. allow backup restoration during creation
The locking solution should have a maximum wait time for lock acquisition,
which will fail the backup operation so that the user does not have to wait
forever.
The locking solution should be self expiring, so that when a process dies
unexpectedly, future processes are able to acquire the lock.
The locking solution should guarantee that only a single type of lock is
active at a time.
The locking solution should allow a lock to be passed down into async
running go routines.
Longhorn #612
Signed-off-by: Joshua Moody <joshua.moody@rancher.com>
We changed the imagePullPolicy to IfNotPresent so that user can easily
install Longhorn in air-gap instalation.
Also add a bash script for the developer to quickly change all the
imagePullPolicies back to Always so that k8s always pull the
lastest images. This will be useful when dev use the tag such as master.
Longhorn #1491
Signed-off-by: Phan Le <phan.le@rancher.com>
The test deployment creates 4 replicas that continously, write the
current date time once a second into the file `/mnt/nfs-test/test.log`
This is a good test for an rwx volume. Since it replicates an append
only log that is used by multiple pods.
Signed-off-by: Joshua Moody <joshua.moody@rancher.com>
So that on a delete & recreate of the service the previous pv's still
point to this nfs-provisioner. We cannot use the hostname since the actual
host doesn't know how to resolve service addresses inside of the cluster.
To support this would require the installation of kube-dns and
modification to the /etc/resolve.conf file on each host.
Signed-off-by: Joshua Moody <joshua.moody@rancher.com>
This makes the nfs client use a new src port for each tcp reconnect.
This way after a crash the faulty connection isn't kept alive in the
connection cache (nat). This should allow to resolve the cluster ip to
the new destination pod ip.
Signed-off-by: Joshua Moody <joshua.moody@rancher.com>
Add tolerations so that nfs provisioner pod gets evicted from a failing
node after 60 second + 30 grace period (relevant for va recovery policy).
Add liveness + readyness probe, so that no traffic gets routed to a failed
nfs server. Disable device based fsids (major:minor) since our block device
mapping can change from node to node, which makes the id's unstable.
Signed-off-by: Joshua Moody <joshua.moody@rancher.com>
Attempting to use longhorn with NFSv3 for backuip causes errors related to rpc.statd, which is not managed by longhorn processes. Longhorn works properly with NFSv4 servers only.
Signed-off-by: Douglas Mayle <dmayle@dmayle.com>
commit 1c1bb3571c25378bc8e3680bed72ef3ce61c4360
Author: Sheng Yang <sheng.yang@rancher.com>
Date: Sat May 30 08:55:06 2020 -0700
Longhorn v1.0.0 release
Signed-off-by: Sheng Yang <sheng.yang@rancher.com>
Signed-off-by: Sheng Yang <sheng.yang@rancher.com>
commit fdd34f4f1986da6db3bac1b53c385c23203b0775
Author: Sheng Yang <sheng.yang@rancher.com>
Date: Sat May 23 13:00:11 2020 -0700
Longhorn v1.0.0-rc2 release
Signed-off-by: Sheng Yang <sheng.yang@rancher.com>
Signed-off-by: Sheng Yang <sheng.yang@rancher.com>
When pod security policies are used, the default restricted policy does
not allow root permissions. Even when a more permissive policy is
assigned to the service account, we need to inform k8s that we need root
permissions so that the correct policy can be selected.
Signed-off-by: Aaron Spettl <aaron@spettl.de>