From e1c41d549af954935c7aeb4b56533ffb0a2bef40 Mon Sep 17 00:00:00 2001 From: William Brown Date: Wed, 1 May 2019 14:07:13 +1000 Subject: [PATCH] Docs update --- designs/memberof.rst | 159 ++++++++++ designs/repl_future_considerations.rst | 409 ++++++++++++++++++++++++- src/lib/server.rs | 7 + 3 files changed, 573 insertions(+), 2 deletions(-) create mode 100644 designs/memberof.rst diff --git a/designs/memberof.rst b/designs/memberof.rst new file mode 100644 index 000000000..3cbb9292b --- /dev/null +++ b/designs/memberof.rst @@ -0,0 +1,159 @@ + +MemberOf +-------- + +Member Of is a plugin that serves a fundamental tradeoff: precomputation of +the relationships between a group and a user is more effective than looking +up those relationships repeatedly. + +There are a few reasons for this to exist. + +The major one is that the question is generally framed "what groups is this person +a member of". This is true in terms of application access checks (is user in group Y?), nss' calls +ie 'id name'. As a result, we want to have our data for our user and groups in a close locality. +Given the design of the KaniDM system, where we generally frame and present user id tokens, it +is upon the user that we want to keep the reference to it's groups. + +Now at this point one could consider "Why not just store the groups on the user in the first place?". +There is a security benefit to the relationship of "groups have members" rather than "users are +members of groups". That benefit is delegated administration. It is much easier to define access +controls over "who" can alter the content of a group, including the addition of new members, where +the ability to control writing to all users memberOf attribute would mean that anyone with that right +could add anyone, to any group. + +IE, if Claire has the write access to "library users" she can only add members to that group. + +However, if users took memberships, for claire to add "library users", we would need to either allow +claire to arbitrarily write any group name to users, OR we would need to increase the complexity +of the ACI system to support validation of the content of changes. + + +So as a result - from a user interaction viewpoint, management of groups that have members is the +simpler, and more powerful solution, however from a query and access viewpoint, the relation ship +of what group is a user member of is the more useful structure. + +To this end, we have the member of plugin. Given a set of groups and there members, update the reverse +reference on the users to contain the member of relationship to the group. + + +There is one final benefit to memberOf - it allows us to have *fast* group nesting capability +where the inverse look up becomes N operations to resolve the full structure. + +Design +------ + +Due to the nature of this plugin, there is a single attribute - 'member' - whos content is examined +to build the relationship to others - 'memberOf'. We will examine a single group and user situation +without nesting. We assume the user already exists, as the situation where the group exists and we add +the user can't occur due to refint. + +* Base Case + +The basecase is the state where memberOf:G-uuid is present in U:memberOf. When this case is met, no +action is taken. To determine this, we assert that entry pre:memberOf == entry post:memberOf in +the modification - IE no action was taken. + +* Modify Case. + +as memberOf:G-uuid is not present in U:memberOf, we do a "modify" to add it. The modify will recurse +to the basecase, that asserts, it is present then will return. + + +Now let's consider the nested case. G1 -> G2 -> U. We'll assume that G2 -> U already exists +but that now we need to add G1 -> G2. This is now trivial to apply given that we use recursion +to apply these changes. + +An important aspect of this is that groups *also* contain memberOf attributes: This benefits us because +we can then apply the memberOf from our group to the members of the group! + +:: + + G1 G2 U + member: G2 member: U + memberOf: G1 memberOf: G1, G2 + +So at each step, if we are a group, we take our uuid, and add it to the set, and then make a present +modification of our memberOf + our uuid. So translated: + +:: + + + G1 G2 U + member: G2 member: U + memberOf: - memberOf: - memberOf: G2 + + -> [ G1, ] + + G1 G2 U + member: G2 member: U + memberOf: - memberOf: G1 memberOf: G2 + + -> [ G2, G1 ] + + G1 G2 U + member: G2 member: U + memberOf: - memberOf: G1 memberOf: G2, G1 + +It's important to note, we only recures on Groups - nothing else. This is what breaks the +cycle on U, as memberOf is now fully applied. + + +As a result of our base-case, we can now handle the most evil of cases: circular nested groups +and cycle breaking. + +:: + + G1 G2 G3 + member: G2 member: G3 member: G1 + memberOf: -- memberOf: -- memberOf: -- + + -> [ G1, ] + + G1 G2 G3 + member: G2 member: G3 member: G1 + memberOf: -- memberOf: G1 memberOf: -- + + -> [ G2, G1 ] + + G1 G2 G3 + member: G2 member: G3 member: G1 + memberOf: -- memberOf: G1 memberOf: G1-2 + + -> [ G3, G2, G1 ] + + G1 G2 G3 + member: G2 member: G3 member: G1 + memberOf: G1-3 memberOf: G1 memberOf: G1-2 + + -> [ G3, G2, G1 ] + + G1 G2 G3 + member: G2 member: G3 member: G1 + memberOf: G1-3 memberOf: G1-3 memberOf: G1-2 + + -> [ G3, G2, G1 ] + + G1 G2 G3 + member: G2 member: G3 member: G1 + memberOf: G1-3 memberOf: G1-2 memberOf: G1-3 + + -> [ G3, G2, G1 ] + + G1 G2 G3 + member: G2 member: G3 member: G1 + memberOf: G1-3 memberOf: G1-2 memberOf: G1-3 + + BASE CASE -> Application of G1-3 on G1 has no change. END. + +To supplement this, *removal* of a member from a group is the same process - but instead we +use the "removed" modify keyword instead of present. The base case remains the same: if no +changes occur, we have completed the operation. + + +Considerations +-------------- + +* Preventing recursion: As of course, we are + +* Replication + diff --git a/designs/repl_future_considerations.rst b/designs/repl_future_considerations.rst index 67dd6b846..3c3e17776 100644 --- a/designs/repl_future_considerations.rst +++ b/designs/repl_future_considerations.rst @@ -8,7 +8,7 @@ At first glance it may seem correct to no-op a change where the state is: name: william } -with a "delete name; add name william". +with a "purge name; add name william". However, this doesn't express the full possibities of the replication topology in the system. The follow events could occur: @@ -17,7 +17,6 @@ in the system. The follow events could occur: DB 1 DB 2 ---- ---- - n: w del: name n: l del: name @@ -27,3 +26,409 @@ The events of DB 1 seem correct in isolation, to no-op the del and re-add, howev when the changelogs will be replayed, they will then cause the events of DB2 to be the final state - whet the timing of events on DB 1 should actually be the final state. + +To contrast if you no-oped the purge name: + +:: + + DB 1 DB 2 + ---- ---- + n: l + n: w + +Your final state is now n: [l, w] - note that we have an extra name field we didn't want! + + + +CSN +--- + +The CSN is a concept from 389 Directory Server. It is the Change Serial Number of a a modification +or event in the database. The CSN is a lamport clock, where it is the current time in UTC, but +it can never move *backwards*. + +RID +--- + +The RID is a concept from 389 Directory Server. It is the Replica ID of a server. The RID must +be a unique value, that identifies exactly this server as unique. + +CID +--- + +The CID is a (rename?) of a concept from 389 Directory Server. It is the pair of CSN and RID, allowing +for changes to now be qualified to a specific server origin and ordering between multiple servers. + +As a result, this value is likely to be: + +:: + + (CSN, RID) + +RUV +--- + +The RUV is a concept from 389 Directory Server. It is the replication up-to-dateness vector. + +This is an array of RIDs, and their min-max CSN locations in the changelog for those RIDs. Min being the +oldest change in the log related to that RID, and max being the latest change in the log related +to that RID. + +:: + + Server A: + |----------------------| + | ID | MIN | MAX | + |----------------------| + | 01 | 000 | 010 | + | 02 | 002 | 005 | + | 03 | 004 | 008 | + |----------------------| + +To translate, this says that for RID 01, we have CSN 000 through 010. We can use these two values to +recreate the CID of the change itself. + +Now, critically, it is important to be able to compare RUV's to determine what changes are required +to be sent, and in which order. Let's assume we have a second server with a RUV of: + +:: + + Server B: + |----------------------| + | ID | MIN | MAX | + |----------------------| + | 01 | 005 | 008 | + | 02 | 000 | 002 | + | 03 | 004 | 012 | + |----------------------| + +So if we are to compare these, we can see that for ID 1, Server A has 000 -> 010, and B has 005 -> 008. +You can make similar determinations for the other values. + +Importantly, in this case we need to ensure the max of Server B is at least equal to or greater than our MIN for each RID. + +Once we have asserting this, we can generate a list of CIDs to supply. + +:: + + (003,02) + (004,02) + (005,02) + (009,01) + (010,01) + +It's important to note, these have been ordered by their CID, primarily by CSN! After the replication completes Server B's +RUV would now be: + +:: + + Server B: + |----------------------| + | ID | MIN | MAX | + |----------------------| + | 01 | 005 | 010 | + | 02 | 000 | 005 | + | 03 | 004 | 012 | + |----------------------| + +There are some other notes here: Server B is *ahead* of us for RID 3, so we actually send nothing related to +this: it's likely that Server B will connect to us later and will supply the changes 11, 12 to us. + +Consider also two servers make a change at the same time. Both could generate an identical CSN +value, but due to the nature of a CID to be (CSN, RID), this means that ordering can still take +place between the events, where the server RID is now used to determine the order. + + +Repl Proto Ideas +---------------- + +We should have push based replication. There should be two versions of the system: + +* Entry Level Replication +* Attribute Level Replication. + +Both should be able to share the same RUV details. + +Entry Based +=========== + +This is the simpler version of the replication system. This is likely ONLY appropriate on a read-only +consumer of data. + +The read-only stores *no* server RID, and contains an initially empty RUV. The provider would then supply it's +RUV to the consumer (so that it now has a state of where it is), but with all CSN MIN/MAX set to 0. + +The list of CIDs is derived by RUV comparison, but instead of supplying the change log, the entries +are sent whole, and the read-only blindly replaces them. We rely on the provider to have completed +a correct entry update resolution process for this to make sense. + +To achieve this, we store a list of CID's and what entries were affected within the CID. + +One can imagine a situation where two servers change the entry, but between +those changes the read-only is supplied the CID. We don't care in what order they did change, +only that a change *must* have occured. + +So example: let's take entry A with server A and B, and read-only R. + +:: + + A { + data: ... + uuid: x, + } + + CID-list: + [ + (001, A): [x, ...] + ] + +So the entry was created with CID (001, A). We connect to R and it has an empty RUV. + +:: + + RUV A: RUV R: + A 0/1 A 0/0 + +We then determine the set of CID's to transmit must be: + +:: + + (001, A) + +Referencing our CID list, we know that uuid: x was modified, so we transmit that to the server. + +Now we add server B. The ruvs now are: + +:: + + RUV A: RUV B: RUV R: + A 0/1 A 0/1 A 0/1 + B 0/0 B 0/0 + + CID-list A: + [ + (001, A): [x, ...] + ] + + CID-list B: + [ + (001, A): [x, ...] + ] + +At this point a change happens on B *and* A at almost the same time: We'll say B happened first +in this case though: + +:: + + RUV A: RUV B: RUV R: + A 0/3 A 0/1 A 0/1 + B 0/0 B 0/2 + + CID-list A: + [ + (001, A): [x, ...] + (003, A): [x, ...] + ] + + CID-list B: + [ + (001, A): [x, ...] + (002, B): [x, ...] + ] + +Remember, this protocol is ASYNC however. At this point something happens - server A replicates to R first, but +without the changes from B yet. A RUV comparison yields that RUV R must be updated with the empty RUV B, but +that the CID: (3, A) must be sent. The entry x is sent to R again. + +:: + + RUV A: RUV B: RUV R: + A 0/3 A 0/1 A 0/3 + B 0/0 B 0/2 B 0/0 + + CID-list A: + [ + (001, A): [x, ...] + (003, A): [x, ...] + ] + + CID-list B: + [ + (001, A): [x, ...] + (002, B): [x, ...] + ] + +Now, Server B now connects to A and supplies it's changes. Since the changes on B happen *before* +the changes on A, the CID slots between the existing changes (and an update resolution would take +place, which is out of scope of this part of the design). + +:: + + RUV A: RUV B: RUV R: + A 0/3 A 0/1 A 0/3 + B 0/2 B 0/2 B 0/0 + + CID-list A: + [ + (001, A): [x, ...] + (002, B): [x, ...] + (003, A): [x, ...] + ] + +Next Server A again connects to Server R, and determines based on the RUV that the differences are: (2, B). + +Consulting our CID-list, we see that entry X was changed in this CID. Here's what's important: the order of the change +doesn't matter, because we take the *latest* version of UUID X, which has (1, A), (2, B) and (3, A) all +fully resolved. We send the entry X as a whole, so all state of (2, B) and LATER changes are applied. + +This now means that because the whole entry was sent, we can assert the entry had changes (2, B) and +(3, A), so we can update the RUV R to: + +:: + + RUV A: RUV B: RUV R: + A 0/3 A 0/1 A 0/3 + B 0/2 B 0/2 B 0/2 + +Now this protocol is not without flaws: read-only's should only be supplied data by a single server +as one could imagine the content of R flip-flopping while server A/B are not in sync. However +to prevent this situation such as: + +:: + + RUV A: RUV B: RUV R: + A 0/3 A 0/1 A 0/3 + B 0/1 B 0/4 B 0/1 + +In this case, one can imagine B would then supply data, and when A recieved B's changes, it would again +supply to R. However, this can be easily avoided by adhering to the following: + +* A server can only supply to a read-only if all of the suppling server's RUV CSN MAX are contained + within the destination RUV CSN MAX. + +By following this, B would determine that as it does *not* have (3, A) (which is greater than the local +RUV CSN MAX for A), it should not supply at this time. Once A and B resolve their changes: + +:: + + RUV A: RUV B: RUV R: + A 0/3 A 0/3 A 0/3 + B 0/1 B 0/4 B 0/1 + +Note that B has A's changes, but not A with B's - but now, server B does satisfy the RUV conditions +and COULD supply to R. Similar, A now does not meet the conditions to supply to R until B replicates +to A. There could be a risk of starvation to R however in high write-load conditions. It could just +be preferable to allow the flip flop, but the risk there is a lack of over-all consistency of the entire +server state. This risk is minimised by the fact that we support batching of operations, so all +changes should be complete as a whole, and that if a changes happens on A in series, they must +logically be valid. + + +Deletion of entries is a different problem: Due to the entry lifecycle, most entries actually +step to recycled, which would trigger the above process. Similar, when recycle ends, we then +move to tombstone, again which triggers the above. + +However, we must now discuss the tomstone purging process. + +A tombstone would store the CID upon which it was ... well - tombstoned. As a result, the entry +itself is aware of it's state. + +The tombstone purge process would work by detecting the MIN RUV of all replicas. If the MIN RUV +is greater than the tombstone CID, then it must be true that all replicas HAVE the tombstone as +a tombstone and all changes leading to that fact (as URP would dictate that all servers would +arrive at the same tombstone state). At this point, we can now safely remove the tombstone from our +database, and no replication needs to occur - as all other replicas would also remove it! This applies +to read-onlies as well. + +However, this poses the question - how do we move the MIN RUV of a server? To achieve this we need +to assert that *all other servers* have at least moved past a certain state, allowing us to trim out +changelog UP TO the MIN RUV. + +Let's consider the supplier to read-only situation first, as this is the simplest: + +:: + + RUV A: RUV R: + A 0/3 A 0/0 + + GRUV A: + A:R ??? + +To achieve this, we need to view the RUV of every server we connect to: even the RO's despite their +lack of RID (in fact this could be a reason to PROVIDE a RID to ROs) ... . +We create a global RUV (GRUV) state which would look like +the following: + +:: + + RUV A: RUV R: + A 0/3 A 0/0 + + GRUV A: + R (A: 0/0, ) + +So A has connected to R and polled the RUV and recieved a 0/0. We now can supply our changes to +R: + +:: + + RUV A: --> RUV R: + A 0/3 A 3/3 + + GRUV A: + R (A: 0/0, ) + +As R is a read-only it has no concept of the changelog, so it sets MIN to MAX. + +Now, we then poll the RUV again. Protocol wise RUV polling should be seperate to suppling of data! + +:: + + RUV A: RUV R: + A 0/3 A 3/3 + + GRUV A: + R (A: 3/3, ) + +Now, we can see that the server R has changes MAX up to 3 - since this is the minimum of the set +of all MAX in GRUV, we can now purge changelog of A up to MIN 3 + +:: + + RUV A: RUV R: + A 3/3 A 3/3 + + GRUV A: + R (A: 3/3, ) + +And we are fully consistent! + +Let's imagine now we have two read-onlies, R1, R2. + + + +:: + + RUV A: RUV B: RUV R: + A 0/3 A 0/1 A 0/3 + B 0/1 B 0/4 B 0/1 + + GRUV A: + A:B ??? + A:R ??? + +So, at this point, A would contact both + + + + + +Attribute Level Replication +=========================== + +TBD + + + + + diff --git a/src/lib/server.rs b/src/lib/server.rs index 3d5a1d588..058eeec1e 100644 --- a/src/lib/server.rs +++ b/src/lib/server.rs @@ -827,6 +827,13 @@ impl<'a> QueryServerWriteTransaction<'a> { return plug_pre_res; } + // TODO: There is a potential optimisation here, where if + // candidates == pre-candidates, then we don't need to store anything + // because we effectively just did an assert. However, like all + // optimisations, this could be premature - so we for now, just + // do the CORRECT thing and recommit as we may find later we always + // want to add CSN's or other. + let res: Result>, SchemaError> = candidates .into_iter() .map(|e| e.validate(&self.schema))