Modify Engine Identity Validation LEP for changes to longhorn-engine PR

Signed-off-by: Eric Weber <eric.weber@suse.com>
This commit is contained in:
Eric Weber 2023-06-23 13:57:36 -05:00 committed by David Ko
parent 914fb89687
commit b76d853800

View File

@ -110,32 +110,32 @@ in this LEP are backwards compatible. All gRPC metadata validation is by demand
upgraded) client only injects some metadata (e.g. `volume-name` but not `instance-name`), the server only validates the upgraded) client only injects some metadata (e.g. `volume-name` but not `instance-name`), the server only validates the
metadata provided. metadata provided.
Add an `instance-name` flag to the `longhorn controller <volume-name>` command (e.g. `longhorn controller <volume-name> Add a global `volume-name` flag and a global `engine-instance-name` flag to the engine CLI (e.g. `longhorn -volume-name
--instance-name <instance-name>`). This command already accepts the volume name, so an additional `volume-name` flag <volume-name> -engine-instance-name <engine-instance-name> <command> <args>`). Virtually all CLI commands create a
would be redundant. The longhorn-engine controller server remembers its volume and instance name. controller client and these flags allow appropriate gRPC metadata to be injected into every client request. Requests
that reach the wrong longhorn-engine controller server are rejected.
Add a `volume-name` flag and an `instance-name` flag to the `longhorn replica <directory>` command (e.g. `longhorn Use the global `engine-instance-name` flag and the pre-existing `volume-name` positional argument to allow the
replica <directory> -volume-name <volume-name> -instance-name <instance-name>`). The longhorn-engine replica server longhorn-engine controller server to remember its volume and instance name (e.g. `longhorn -engine-instance-name
remembers its volume and instance name. <instance-name> controller <volume-name>`). Ignore the global `volume-name` flag, as it is redundant.
Add a `volume-name` flag and an `instance-name` flag to the `longhorn replica <directory>` command (e.g. `longhorn Use the global `volume-name` flag or the pre-existing local `volume-name` flag and a new `replica-instance-name` flag to
replica <directory> -volume-name <volume-name> -instance-name <instance-name>`). The longhorn-engine sync-agent server allow the longhorn-engine replica server to remember its volume and instance name (e.g. `longhorn -volume-name
remembers its volume and instance name. <volume-name> replica <directory> -replica-instance-name <replica-instance-name>`).
Add a `volume-name` and `engine-instance-name` flag to every CLI command that launches an asynchronous task (e.g. Use the global `volume-name` flag and a new `replica-instance-name` flag to allow the longhorn-engine sync-agent server
`longhorn ls-replica -volume-name <volume-name> -engine-instance-name <engine-instance-name>`). All such to remember its volume and instance name (e.g. `longhorn -volume-name <volume-name> sync-agent -replica-instance-name
commands create a controller client and these flags allow appropriate gRPC metadata to be injected into every client <replica-instance-name>`).
request. Requests that reach the wrong longhorn-engine controller server are rejected.
Add an additional `replica-instance-name` flag to CLI commands that launch asynchronous tasks that communicate directly Add an additional `replica-instance-name` flag to CLI commands that launch asynchronous tasks that communicate directly
with the longhorn-engine replica server (e.g. `longhorn add-replica <address> -size <size> -current-size <current-size> with the longhorn-engine replica server (e.g. `longhorn -volume-name <volume-name> add-replica <address> -size <size>
-volume-name <volume-name> -engine-instance-name <engine-instance-name> -replica-instance-name -current-size <current-size> -replica-instance-name <replica-instance-name>`). All such commands create a replica
<replica-instance-name>`). All such commands create a replica client and these flags allow appropriate gRPC metadata to client and these flags allow appropriate gRPC metadata to be injected into every client request. Requests that reach the
be injected into every client request. Requests that reach the wrong longhorn-engine replica server are rejected. wrong longhorn-engine replica server are rejected.
Return 5 NOT FOUND with an appropriate message when metadata validation fails. In this case, the NOT_FOUND refers to the Return 9 FAILED_PRECONDITION with an appropriate message when metadata validation fails. This code is chosen in
longhorn-engine component the caller was actually attempting to communicate with. (The particular return code is accordance with the [RPC API](https://grpc.github.io/grpc/core/md_doc_statuscodes.html), which instructs developers to
definitely open to discussion.) use FAILED_PRECONDITION if the client should not retry until the system system has been explicitly fixed.
#### Longhorn-Instance-Manager #### Longhorn-Instance-Manager
@ -318,7 +318,7 @@ validation.
#### Longhorn-Manager Integration #### Longhorn-Manager Integration
Ensure the engine and replica controllers launch engine and replica processes with `--volume-name` and `--instance-name` Ensure the engine and replica controllers launch engine and replica processes with `-volume-name` and `-instance-name`
flags so that these processes can validate identifying gRPC metadata coming from requests. flags so that these processes can validate identifying gRPC metadata coming from requests.
Ensure the engine controller supplies correct information to the ProxyEngineService client functions so that identity Ensure the engine controller supplies correct information to the ProxyEngineService client functions so that identity
@ -328,6 +328,7 @@ validation can occur in the lower layers.
This issue/LEP was inspired by [longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709). In the This issue/LEP was inspired by [longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709). In the
situation described in this issue: situation described in this issue:
1. An engine controller with out-of-date information (including a replica address the associated volume does not own) 1. An engine controller with out-of-date information (including a replica address the associated volume does not own)
[issues a ReplicaAdd [issues a ReplicaAdd
command](https://github.com/longhorn/longhorn-manager/blob/a7dd20cdbdb1a3cea4eb7490f14d94d2b0ef273a/controller/engine_controller.go#L1819) command](https://github.com/longhorn/longhorn-manager/blob/a7dd20cdbdb1a3cea4eb7490f14d94d2b0ef273a/controller/engine_controller.go#L1819)
@ -339,8 +340,9 @@ situation described in this issue:
is used to expand the replica before a followup failure to actually add the replica to the controller's backend. is used to expand the replica before a followup failure to actually add the replica to the controller's backend.
After this improvement, the above scenario will be impossible: After this improvement, the above scenario will be impossible:
1. Both the engine and replica controllers will launch engine and replica processes with the `--volume-name` and
`--instance-name` flags. 1. Both the engine and replica controllers will launch engine and replica processes with the `-volume-name` and
`-instance-name` flags.
2. When the engine controller issues a ReplicaAdd command, it will do so using the expanded embedded 2. When the engine controller issues a ReplicaAdd command, it will do so using the expanded embedded
`ProxyEngineRequest` message (with `volume_name` and `instance_name` fields) and an additional `ProxyEngineRequest` message (with `volume_name` and `instance_name` fields) and an additional
`replica_instance_name` field. `replica_instance_name` field.
@ -353,9 +355,10 @@ After this improvement, the above scenario will be impossible:
### Test plan ### Test plan
TODO: Integration test plan. #### TODO: Integration Test Plan
In my test environment, I have experimented with: In my test environment, I have experimented with:
- Running new versions of all components, making gRPC calls to the longhorn-engine controller and replica processes with - Running new versions of all components, making gRPC calls to the longhorn-engine controller and replica processes with
wrong gRPC metadata, and verifying that these calls fail. wrong gRPC metadata, and verifying that these calls fail.
- Running new versions of all components, making gRPC calls to instance-manager with an incorrect volume-name or - Running new versions of all components, making gRPC calls to instance-manager with an incorrect volume-name or
@ -367,10 +370,26 @@ This is really a better fit for a negative testing scenario (do something that w
communication, then verify that communication fails), but we have already eliminated the only known recreate for communication, then verify that communication fails), but we have already eliminated the only known recreate for
[longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709). [longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709).
TODO: Engine integration test plan. #### Engine Integration Test Plan
I'm unfamiliar with our engine integration tests, but it should be fairly easy to create negative tests that verify the Rework test fixtures so that:
expected behavior.
- All controller and replica processes are created with the information needed for identity validation.
- It is convenient to create controller and replica clients with the information needed for identity validation.
- gRPC metadata is automatically injected into controller and replica client requests when clients have the necessary
information.
Do not modify the behavior of existing tests. Since these tests were using clients with identity validation information,
no identity validation is performed.
Create new tests that:
- Ensure validation fails when a directly created client attempts to communicate with a controller or replica server
using the wrong identity validation information.
- Ensure validation fails when an indirectly created client (by the engine) tries to communicate with a replica server
using the wrong identity validation information.
- Ensure validation fails when an indirectly created client (by a CLI command) tries to communicate with a controller or
replica server using the wrong identity validation information.
### Upgrade strategy ### Upgrade strategy
@ -383,6 +402,7 @@ metadata to non-upgraded processes, it will be ignored.
We will only populate extra ProxyEngineService fields when longhorn-manager is running with an update ProxyEngineService We will only populate extra ProxyEngineService fields when longhorn-manager is running with an update ProxyEngineService
client. client.
- RPCs from an old client to a new ProxyEngineService server will succeed, but without the extra fields, - RPCs from an old client to a new ProxyEngineService server will succeed, but without the extra fields,
instance-manager will have no useful gRPC metadata to inject into its longhorn-engine requests. instance-manager will have no useful gRPC metadata to inject into its longhorn-engine requests.
- RPCs from a new client to an old ProxyEngineService will succeed, but instance-manager will ignore the new fields and - RPCs from a new client to an old ProxyEngineService will succeed, but instance-manager will ignore the new fields and