Modify Engine Identity Validation LEP for changes to longhorn-engine PR
Signed-off-by: Eric Weber <eric.weber@suse.com>
This commit is contained in:
parent
914fb89687
commit
b76d853800
@ -110,32 +110,32 @@ in this LEP are backwards compatible. All gRPC metadata validation is by demand
|
|||||||
upgraded) client only injects some metadata (e.g. `volume-name` but not `instance-name`), the server only validates the
|
upgraded) client only injects some metadata (e.g. `volume-name` but not `instance-name`), the server only validates the
|
||||||
metadata provided.
|
metadata provided.
|
||||||
|
|
||||||
Add an `instance-name` flag to the `longhorn controller <volume-name>` command (e.g. `longhorn controller <volume-name>
|
Add a global `volume-name` flag and a global `engine-instance-name` flag to the engine CLI (e.g. `longhorn -volume-name
|
||||||
--instance-name <instance-name>`). This command already accepts the volume name, so an additional `volume-name` flag
|
<volume-name> -engine-instance-name <engine-instance-name> <command> <args>`). Virtually all CLI commands create a
|
||||||
would be redundant. The longhorn-engine controller server remembers its volume and instance name.
|
controller client and these flags allow appropriate gRPC metadata to be injected into every client request. Requests
|
||||||
|
that reach the wrong longhorn-engine controller server are rejected.
|
||||||
|
|
||||||
Add a `volume-name` flag and an `instance-name` flag to the `longhorn replica <directory>` command (e.g. `longhorn
|
Use the global `engine-instance-name` flag and the pre-existing `volume-name` positional argument to allow the
|
||||||
replica <directory> -volume-name <volume-name> -instance-name <instance-name>`). The longhorn-engine replica server
|
longhorn-engine controller server to remember its volume and instance name (e.g. `longhorn -engine-instance-name
|
||||||
remembers its volume and instance name.
|
<instance-name> controller <volume-name>`). Ignore the global `volume-name` flag, as it is redundant.
|
||||||
|
|
||||||
Add a `volume-name` flag and an `instance-name` flag to the `longhorn replica <directory>` command (e.g. `longhorn
|
Use the global `volume-name` flag or the pre-existing local `volume-name` flag and a new `replica-instance-name` flag to
|
||||||
replica <directory> -volume-name <volume-name> -instance-name <instance-name>`). The longhorn-engine sync-agent server
|
allow the longhorn-engine replica server to remember its volume and instance name (e.g. `longhorn -volume-name
|
||||||
remembers its volume and instance name.
|
<volume-name> replica <directory> -replica-instance-name <replica-instance-name>`).
|
||||||
|
|
||||||
Add a `volume-name` and `engine-instance-name` flag to every CLI command that launches an asynchronous task (e.g.
|
Use the global `volume-name` flag and a new `replica-instance-name` flag to allow the longhorn-engine sync-agent server
|
||||||
`longhorn ls-replica -volume-name <volume-name> -engine-instance-name <engine-instance-name>`). All such
|
to remember its volume and instance name (e.g. `longhorn -volume-name <volume-name> sync-agent -replica-instance-name
|
||||||
commands create a controller client and these flags allow appropriate gRPC metadata to be injected into every client
|
<replica-instance-name>`).
|
||||||
request. Requests that reach the wrong longhorn-engine controller server are rejected.
|
|
||||||
|
|
||||||
Add an additional `replica-instance-name` flag to CLI commands that launch asynchronous tasks that communicate directly
|
Add an additional `replica-instance-name` flag to CLI commands that launch asynchronous tasks that communicate directly
|
||||||
with the longhorn-engine replica server (e.g. `longhorn add-replica <address> -size <size> -current-size <current-size>
|
with the longhorn-engine replica server (e.g. `longhorn -volume-name <volume-name> add-replica <address> -size <size>
|
||||||
-volume-name <volume-name> -engine-instance-name <engine-instance-name> -replica-instance-name
|
-current-size <current-size> -replica-instance-name <replica-instance-name>`). All such commands create a replica
|
||||||
<replica-instance-name>`). All such commands create a replica client and these flags allow appropriate gRPC metadata to
|
client and these flags allow appropriate gRPC metadata to be injected into every client request. Requests that reach the
|
||||||
be injected into every client request. Requests that reach the wrong longhorn-engine replica server are rejected.
|
wrong longhorn-engine replica server are rejected.
|
||||||
|
|
||||||
Return 5 NOT FOUND with an appropriate message when metadata validation fails. In this case, the NOT_FOUND refers to the
|
Return 9 FAILED_PRECONDITION with an appropriate message when metadata validation fails. This code is chosen in
|
||||||
longhorn-engine component the caller was actually attempting to communicate with. (The particular return code is
|
accordance with the [RPC API](https://grpc.github.io/grpc/core/md_doc_statuscodes.html), which instructs developers to
|
||||||
definitely open to discussion.)
|
use FAILED_PRECONDITION if the client should not retry until the system system has been explicitly fixed.
|
||||||
|
|
||||||
#### Longhorn-Instance-Manager
|
#### Longhorn-Instance-Manager
|
||||||
|
|
||||||
@ -318,7 +318,7 @@ validation.
|
|||||||
|
|
||||||
#### Longhorn-Manager Integration
|
#### Longhorn-Manager Integration
|
||||||
|
|
||||||
Ensure the engine and replica controllers launch engine and replica processes with `--volume-name` and `--instance-name`
|
Ensure the engine and replica controllers launch engine and replica processes with `-volume-name` and `-instance-name`
|
||||||
flags so that these processes can validate identifying gRPC metadata coming from requests.
|
flags so that these processes can validate identifying gRPC metadata coming from requests.
|
||||||
|
|
||||||
Ensure the engine controller supplies correct information to the ProxyEngineService client functions so that identity
|
Ensure the engine controller supplies correct information to the ProxyEngineService client functions so that identity
|
||||||
@ -328,6 +328,7 @@ validation can occur in the lower layers.
|
|||||||
|
|
||||||
This issue/LEP was inspired by [longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709). In the
|
This issue/LEP was inspired by [longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709). In the
|
||||||
situation described in this issue:
|
situation described in this issue:
|
||||||
|
|
||||||
1. An engine controller with out-of-date information (including a replica address the associated volume does not own)
|
1. An engine controller with out-of-date information (including a replica address the associated volume does not own)
|
||||||
[issues a ReplicaAdd
|
[issues a ReplicaAdd
|
||||||
command](https://github.com/longhorn/longhorn-manager/blob/a7dd20cdbdb1a3cea4eb7490f14d94d2b0ef273a/controller/engine_controller.go#L1819)
|
command](https://github.com/longhorn/longhorn-manager/blob/a7dd20cdbdb1a3cea4eb7490f14d94d2b0ef273a/controller/engine_controller.go#L1819)
|
||||||
@ -339,8 +340,9 @@ situation described in this issue:
|
|||||||
is used to expand the replica before a followup failure to actually add the replica to the controller's backend.
|
is used to expand the replica before a followup failure to actually add the replica to the controller's backend.
|
||||||
|
|
||||||
After this improvement, the above scenario will be impossible:
|
After this improvement, the above scenario will be impossible:
|
||||||
1. Both the engine and replica controllers will launch engine and replica processes with the `--volume-name` and
|
|
||||||
`--instance-name` flags.
|
1. Both the engine and replica controllers will launch engine and replica processes with the `-volume-name` and
|
||||||
|
`-instance-name` flags.
|
||||||
2. When the engine controller issues a ReplicaAdd command, it will do so using the expanded embedded
|
2. When the engine controller issues a ReplicaAdd command, it will do so using the expanded embedded
|
||||||
`ProxyEngineRequest` message (with `volume_name` and `instance_name` fields) and an additional
|
`ProxyEngineRequest` message (with `volume_name` and `instance_name` fields) and an additional
|
||||||
`replica_instance_name` field.
|
`replica_instance_name` field.
|
||||||
@ -353,9 +355,10 @@ After this improvement, the above scenario will be impossible:
|
|||||||
|
|
||||||
### Test plan
|
### Test plan
|
||||||
|
|
||||||
TODO: Integration test plan.
|
#### TODO: Integration Test Plan
|
||||||
|
|
||||||
In my test environment, I have experimented with:
|
In my test environment, I have experimented with:
|
||||||
|
|
||||||
- Running new versions of all components, making gRPC calls to the longhorn-engine controller and replica processes with
|
- Running new versions of all components, making gRPC calls to the longhorn-engine controller and replica processes with
|
||||||
wrong gRPC metadata, and verifying that these calls fail.
|
wrong gRPC metadata, and verifying that these calls fail.
|
||||||
- Running new versions of all components, making gRPC calls to instance-manager with an incorrect volume-name or
|
- Running new versions of all components, making gRPC calls to instance-manager with an incorrect volume-name or
|
||||||
@ -367,10 +370,26 @@ This is really a better fit for a negative testing scenario (do something that w
|
|||||||
communication, then verify that communication fails), but we have already eliminated the only known recreate for
|
communication, then verify that communication fails), but we have already eliminated the only known recreate for
|
||||||
[longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709).
|
[longhorn/longhorn#5709](https://github.com/longhorn/longhorn/issues/5709).
|
||||||
|
|
||||||
TODO: Engine integration test plan.
|
#### Engine Integration Test Plan
|
||||||
|
|
||||||
I'm unfamiliar with our engine integration tests, but it should be fairly easy to create negative tests that verify the
|
Rework test fixtures so that:
|
||||||
expected behavior.
|
|
||||||
|
- All controller and replica processes are created with the information needed for identity validation.
|
||||||
|
- It is convenient to create controller and replica clients with the information needed for identity validation.
|
||||||
|
- gRPC metadata is automatically injected into controller and replica client requests when clients have the necessary
|
||||||
|
information.
|
||||||
|
|
||||||
|
Do not modify the behavior of existing tests. Since these tests were using clients with identity validation information,
|
||||||
|
no identity validation is performed.
|
||||||
|
|
||||||
|
Create new tests that:
|
||||||
|
|
||||||
|
- Ensure validation fails when a directly created client attempts to communicate with a controller or replica server
|
||||||
|
using the wrong identity validation information.
|
||||||
|
- Ensure validation fails when an indirectly created client (by the engine) tries to communicate with a replica server
|
||||||
|
using the wrong identity validation information.
|
||||||
|
- Ensure validation fails when an indirectly created client (by a CLI command) tries to communicate with a controller or
|
||||||
|
replica server using the wrong identity validation information.
|
||||||
|
|
||||||
### Upgrade strategy
|
### Upgrade strategy
|
||||||
|
|
||||||
@ -383,6 +402,7 @@ metadata to non-upgraded processes, it will be ignored.
|
|||||||
|
|
||||||
We will only populate extra ProxyEngineService fields when longhorn-manager is running with an update ProxyEngineService
|
We will only populate extra ProxyEngineService fields when longhorn-manager is running with an update ProxyEngineService
|
||||||
client.
|
client.
|
||||||
|
|
||||||
- RPCs from an old client to a new ProxyEngineService server will succeed, but without the extra fields,
|
- RPCs from an old client to a new ProxyEngineService server will succeed, but without the extra fields,
|
||||||
instance-manager will have no useful gRPC metadata to inject into its longhorn-engine requests.
|
instance-manager will have no useful gRPC metadata to inject into its longhorn-engine requests.
|
||||||
- RPCs from a new client to an old ProxyEngineService will succeed, but instance-manager will ignore the new fields and
|
- RPCs from a new client to an old ProxyEngineService will succeed, but instance-manager will ignore the new fields and
|
||||||
|
Loading…
Reference in New Issue
Block a user