kanidm/book/src/developers/designs/scim_migration_planning.md
2023-03-01 13:10:52 +10:00

14 KiB

Scim and Migration Tooling

We need to be able to synchronise content from other directory or identity management systems. To do this, we need the capability to have "pluggable" synchronisation drivers. This is because not all deployments will be able to use our generic versions, or may have customisations they wish to perform that are unique to them.

To achieve this we need a layer of separation - This effectively becomes an "extract, transform, load" process. In addition this process must be stateful where it can be run multiple times or even continuously and it will bring kanidm into synchronisation.

We refer to a "synchronisation" as meaning a complete successful extract, transform and load cycle.

There are three expected methods of using the synchronisation tools for Kanidm

  • Kanidm as a "read only" portal allowing access to it's specific features and integrations. This is less of a migration, and more of a way to "feed" data into Kanidm without relying on it's internal administration features.
  • "Big Bang" migration. This is where all the data from another IDM is synchronised in a single execution and applications are swapped to Kanidm. This is rare in larger deployments, but may be used in smaller sites.
  • Gradual migration. This is where data is synchronised to Kanidm and then both the existing IDM and Kanidm co-exist. Applications gradually migrate to Kanidm. At some point a "final" synchronisation is performed where Kanidm 'gains authority' over all identity data and the existing IDM is disabled.

In these processes there may be a need to "reset" the synchronsied data. The diagram below shows the possible work flows which account for the above.

                          ┏━━━━━━━━━━━━━━━━━┓
                          ┃                 ┃
                          ┃    Detached     ┃
┌──────────────────────┬──┃ (Initial State) ┃◀─────────────────────────┐
│                      │  ┃                 ┃                          │
│                      │  ┗━━━━━━━━━━━━━━━━━┛                          │
│                      └──────────────────────────┐                    │
│                                                 │                    │
├───────────────────────┬─────────────────────┐   │                    │
│  ┌─────────────┐      │   ┌─────────────┐   │   │   ┌─────────────┐  │
│  │             │      │   │             │───┘   │   │             │  │
│  │   Initial   │      │   │   Active    │       │   │    Final    │  │
└─▶│ Synchronise │──────┴──▶│ Synchronise │───────┴──▶│ Synchronise │──┤
   │             │          │             │           │             │  │
   └─────────────┘          └─────────────┘           └─────────────┘  │
          │                        │                                   │
          │                        │                  ┌─────────────┐  │
          │                        │                  │             │  │
          │                        │                  │    Purge    │  │
          └────────────────────────┴─────────────────▶│   Content   │──┘
                                                      │             │
                                                      └─────────────┘

Kanidm starts in a "detached" state from the extern IDM source.

For Kanidm as a "read only" application source the Initial synchronisation is performed followed by periodic active (partial) synchronisations. At anytime a full initial synchronisation can re-occur to reset the data of the provider. The provider can be reset and removed by a purge which reset's Kanidm to a detached state.

For a gradual migration, this process is the same as the read only application. However when ready to perform the final cut over a final synchronisation is performed, which retains the data of the external system and grants Kanidm the authority over it. This then moves Kanidm back to the detached state, but with a full cope of the provider data.

A "big bang" migration is this same process, but the "final" synchronisation is the first and only step required, where all data is loaded and then immediately granted authority to Kanidm.

ETL process

Extract

First a user must be able to retrieve their data from their supplying IDM source. Initially we will target LDAP and systems with LDAP interfaces, but in the future there is no barrier to supporting other transports.

To achieve this, we initially provide synchronisation primitives in the ldap3 crate.

Transform

This process will be custom developed by the user, or may have a generic driver that we provide. Our generic tools may provide attribute mapping abilitys so that we can allow some limited customisation.

Load

Finally to load the data into Kanidm, we will make a SCIM interface available. SCIM is a "spiritual successor" to LDAP, and aligns with Kani's design. SCIM allows structured data to be uploaded (unlike LDAP which is simply strings). Because of this SCIM will allow us to expose more complex types that previously we have not been able to provide.

The largest benefit to SCIM's model is it's ability to perform "batched" operations, which work with Kanidm's transactional model to ensure that during load events, that content is always valid and correct.

Configuring a Synchronisation Provider in Kanidm

Kanidm has a strict transactional model with full ACID compliance. Attempting to create an external model that needs to interoperate with Kanidm's model and ensure both are compliant is fraught with danger. As a result, Kanidm sync providers should be stateless, acting only as an ETL bridge.

Additionally syncproviders need permissions to access and write to content in Kanidm, so it also necessitates Kanidm being aware of the sync relationship.

For this reason a syncprovider is a derivative of a service account, which also allows storage of the state of the synchronisation operation. An example of this is that LDAP syncrepl provides a cookie defining the "state" of what has been "consumed up to" by the ETL bridge. During the load phase the modified entries and the cookie are persisted. This means that if the operation fails the cookie also rolls back allowing a retry of the sync. If it succeeds the next sync knows that kanidm is in the correct state. Graphically:

┌────────────┐                    ┌────────────┐                   ┌────────────┐
│            │                    │            │     Retrieve      │            │
│            │                    │            │──────Cookie──────▶│            │
│            │                    │            │                   │            │
│            │                    │            │    Provide        │            │
│            │                    │            │◀────Cookie────────│            │
│            │   Sync Request     │            │                   │            │
│  External  │◀───With Cookie─────│    ETL     │                   │            │
│    IDM     │                    │   Bridge   │                   │   Kanidm   │
│            │   Sync Response    │            │                   │            │
│            │────New Cookie─────▶│            │                   │            │
│            │                    │            │                   │            │
│            │                    │            │  Upload Entries   │            │
│            │                    │            │──Persist Cookie──▶│            │
│            │                    │            │                   │            │
│            │                    │            │◀─────Result───────│            │
└────────────┘                    └────────────┘                   └────────────┘

At any point the operation may fail, so by locking the state with the upload of entries this guarantees correct upload has succeeded and persisted. A success really means it!

SCIM

Authentication to the endpoint

This will be based on Kanidm's existing authentication infrastructure, allowing service accounts to use bearer tokens. These tokens will internally bind that changes from the account MUST contain the associated state identifier (cookie).

Batch Operations

Per rfc7644 section 3.7

A requirement of the sync account will be a PATCH request to update the state identifier as the first operation of the batch request. Failure to do so will result in an error.

Schema and Attributes

SCIM defines a number of "generic" schemas for User's and Group's. Kanidm will provide it's own schema definitions that extend or replace these. TBD.

Post Migration Concerns

Reattaching a Provider Post Final Sync

In the case that a provider is re-attached after it has been through a final synchronisation, entries that Kanidm now has authority over will NOT be synced and will be highlighted as conflicts. The administrator then needs to decide how to proceed with these conflicts determining which data source is the authority on the information.

Internal Batch Update Operation Phases

We have to consider in our batch updates that there are multiple stages of the update. This is because we need to consider that at any point the lifecycle of a presented entry may change within a single batch. Because of this, we have to treat the operation differently within kanidm to ensure a consistent outcome.

Additionally we have to "fail fast". This means that on any conflict the sync will abort and the administrator must intervene.

To understand why we chose this, we have to look at what happens in a "soft fail" condition.

In this example we have an account named X and a group named Y. The group contains X as a member.

When we submit this for an initial sync, or after the account X is created, if we had a "soft" fail during the import of the account, we would reject it from being added to Kanidm but would then continue with the synchronisation. Then the group Y would be imported. Since the member pointing to X would not be valid, it would be silently removed.

At this point we would have group Y imported, but it has no members and the account X would not have been imported. The administrator may intervene and fix the account X to allow sync to proceed. However this would not repair the missing group membership. To repair the group membership a change to group Y would need to be triggered to also sync the group status.

Since the admin may not be aware of this, it would silently mean the membership is missing.

To avoid this, by "failing fast" if account X couldn't be imported for any reason, than we would stop the whole sync process until it could be repaired. Then when repaired both the account X and group Y would sync and the membership would be intact.

Phase 1 - Validation of Update State

In this phase we need to assert that the batch operation can proceed and is consistent with the expectations we have of the server's state.

Assert the token provided is valid, and contains the correct access requirements.

From this token, retrieve the related synchronisation entry.

Assert that the batch updates from and to state identifiers are consistent with the synchronisation entry.

Retrieve the sync_parent_uuid from the sync entry.

Retrieve the sync_authority value from the sync entry.

Phase 2 - Entry Location, Creation and Authority

In this phase we are ensuring that all the entries within the operation are within the control of this sync domain. We also ensure that entries we intend to act upon exist with our authority markers such that the subsequent operations are all "modifications" rather than mixed create/modify

For each entry in the sync request, if an entry with that uuid exists retrieve it.

  • If an entry exists in the database, assert that it's sync_parent_uuid is the same as our agreements.

    • If there is no sync_parent_uuid or the sync_parent_uuid does not match, reject the operation.
  • If no entry exists in the database, create a "stub" entry with our sync_parent_uuid

    • Create the entry immediately, and then retrieve it.

Phase 3 - Entry Assertion

Remove all attributes in the sync that are overlapped with our sync_authority value.

For all uuids in the entry present set Assert their attributes match what was synced in. Resolve types that need resolving (name2uuid, externalid2uuid)

Write all

Phase 4 - Entry Removal

For all uuids in the delete_uuids set: if their sync_parent_uuid matches ours, assert they are deleted (recycled).

Phase 5 - Commit

Write the updated "state" from the request to_state to our current state of the sync

Write an updated "authority" value to the agreement of what attributes we can change.

Commit the txn.