From e1c41d549af954935c7aeb4b56533ffb0a2bef40 Mon Sep 17 00:00:00 2001
From: William Brown <william@blackhats.net.au>
Date: Wed, 1 May 2019 14:07:13 +1000
Subject: [PATCH] Docs update

---
 designs/memberof.rst                   | 159 ++++++++++
 designs/repl_future_considerations.rst | 409 ++++++++++++++++++++++++-
 src/lib/server.rs                      |   7 +
 3 files changed, 573 insertions(+), 2 deletions(-)
 create mode 100644 designs/memberof.rst

diff --git a/designs/memberof.rst b/designs/memberof.rst
new file mode 100644
index 000000000..3cbb9292b
--- /dev/null
+++ b/designs/memberof.rst
@@ -0,0 +1,159 @@
+
+MemberOf
+--------
+
+Member Of is a plugin that serves a fundamental tradeoff: precomputation of
+the relationships between a group and a user is more effective than looking
+up those relationships repeatedly.
+
+There are a few reasons for this to exist.
+
+The major one is that the question is generally framed "what groups is this person
+a member of". This is true in terms of application access checks (is user in group Y?), nss' calls
+ie 'id name'. As a result, we want to have our data for our user and groups in a close locality.
+Given the design of the KaniDM system, where we generally frame and present user id tokens, it
+is upon the user that we want to keep the reference to it's groups.
+
+Now at this point one could consider "Why not just store the groups on the user in the first place?".
+There is a security benefit to the relationship of "groups have members" rather than "users are
+members of groups". That benefit is delegated administration. It is much easier to define access
+controls over "who" can alter the content of a group, including the addition of new members, where
+the ability to control writing to all users memberOf attribute would mean that anyone with that right
+could add anyone, to any group.
+
+IE, if Claire has the write access to "library users" she can only add members to that group.
+
+However, if users took memberships, for claire to add "library users", we would need to either allow
+claire to arbitrarily write any group name to users, OR we would need to increase the complexity
+of the ACI system to support validation of the content of changes.
+
+
+So as a result - from a user interaction viewpoint, management of groups that have members is the
+simpler, and more powerful solution, however from a query and access viewpoint, the relation ship
+of what group is a user member of is the more useful structure.
+
+To this end, we have the member of plugin. Given a set of groups and there members, update the reverse
+reference on the users to contain the member of relationship to the group.
+
+
+There is one final benefit to memberOf - it allows us to have *fast* group nesting capability
+where the inverse look up becomes N operations to resolve the full structure.
+
+Design
+------
+
+Due to the nature of this plugin, there is a single attribute - 'member' - whos content is examined
+to build the relationship to others - 'memberOf'. We will examine a single group and user situation
+without nesting. We assume the user already exists, as the situation where the group exists and we add
+the user can't occur due to refint.
+
+* Base Case
+
+The basecase is the state where memberOf:G-uuid is present in U:memberOf. When this case is met, no
+action is taken. To determine this, we assert that entry pre:memberOf == entry post:memberOf in
+the modification - IE no action was taken.
+
+* Modify Case.
+
+as memberOf:G-uuid is not present in U:memberOf, we do a "modify" to add it. The modify will recurse
+to the basecase, that asserts, it is present then will return.
+
+
+Now let's consider the nested case. G1 -> G2 -> U. We'll assume that G2 -> U already exists
+but that now we need to add G1 -> G2. This is now trivial to apply given that we use recursion
+to apply these changes.
+
+An important aspect of this is that groups *also* contain memberOf attributes: This benefits us because
+we can then apply the memberOf from our group to the members of the group!
+
+::
+
+    G1              G2              U
+    member: G2      member: U
+                    memberOf: G1    memberOf: G1, G2
+
+So at each step, if we are a group, we take our uuid, and add it to the set, and then make a present
+modification of our memberOf + our uuid. So translated:
+
+::
+
+
+    G1              G2              U
+    member: G2      member: U
+    memberOf: -     memberOf: -     memberOf: G2
+
+    -> [ G1, ]
+
+    G1              G2              U
+    member: G2      member: U
+    memberOf: -     memberOf: G1    memberOf: G2
+
+                    -> [ G2, G1 ]
+
+    G1              G2              U
+    member: G2      member: U
+    memberOf: -     memberOf: G1    memberOf: G2, G1
+
+It's important to note, we only recures on Groups - nothing else. This is what breaks the
+cycle on U, as memberOf is now fully applied.
+
+
+As a result of our base-case, we can now handle the most evil of cases: circular nested groups
+and cycle breaking.
+
+::
+
+    G1              G2              G3
+    member: G2      member: G3      member: G1
+    memberOf: --    memberOf: --    memberOf: --
+
+    -> [ G1, ]
+
+    G1              G2              G3
+    member: G2      member: G3      member: G1
+    memberOf: --    memberOf: G1    memberOf: --
+
+                    -> [ G2, G1 ]
+
+    G1              G2              G3
+    member: G2      member: G3      member: G1
+    memberOf: --    memberOf: G1    memberOf: G1-2
+
+                                    -> [ G3, G2, G1 ]
+
+    G1              G2              G3
+    member: G2      member: G3      member: G1
+    memberOf: G1-3  memberOf: G1    memberOf: G1-2
+
+    -> [ G3, G2, G1 ]
+
+    G1              G2              G3
+    member: G2      member: G3      member: G1
+    memberOf: G1-3  memberOf: G1-3  memberOf: G1-2
+
+                    -> [ G3, G2, G1 ]
+
+    G1              G2              G3
+    member: G2      member: G3      member: G1
+    memberOf: G1-3  memberOf: G1-2  memberOf: G1-3
+
+                                    -> [ G3, G2, G1 ]
+
+    G1              G2              G3
+    member: G2      member: G3      member: G1
+    memberOf: G1-3  memberOf: G1-2  memberOf: G1-3
+
+    BASE CASE -> Application of G1-3 on G1 has no change. END.
+
+To supplement this, *removal* of a member from a group is the same process - but instead we
+use the "removed" modify keyword instead of present. The base case remains the same: if no
+changes occur, we have completed the operation.
+
+
+Considerations
+--------------
+
+* Preventing recursion: As of course, we are 
+
+* Replication
+
diff --git a/designs/repl_future_considerations.rst b/designs/repl_future_considerations.rst
index 67dd6b846..3c3e17776 100644
--- a/designs/repl_future_considerations.rst
+++ b/designs/repl_future_considerations.rst
@@ -8,7 +8,7 @@ At first glance it may seem correct to no-op a change where the state is:
     name: william
 }
 
-with a "delete name; add name william".
+with a "purge name; add name william".
 
 However, this doesn't express the full possibities of the replication topology
 in the system. The follow events could occur:
@@ -17,7 +17,6 @@ in the system. The follow events could occur:
 
     DB 1        DB 2
     ----        ----
-    n: w
                 del: name
                 n: l
     del: name
@@ -27,3 +26,409 @@ The events of DB 1 seem correct in isolation, to no-op the del and re-add, howev
 when the changelogs will be replayed, they will then cause the events of DB2 to
 be the final state - whet the timing of events on DB 1 should actually be the
 final state.
+
+To contrast if you no-oped the purge name:
+
+::
+
+    DB 1        DB 2
+    ----        ----
+                n: l
+    n: w
+
+Your final state is now n: [l, w] - note that we have an extra name field we didn't want!
+
+
+
+CSN
+---
+
+The CSN is a concept from 389 Directory Server. It is the Change Serial Number of a a modification
+or event in the database. The CSN is a lamport clock, where it is the current time in UTC, but
+it can never move *backwards*.
+
+RID
+---
+
+The RID is a concept from 389 Directory Server. It is the Replica ID of a server. The RID must
+be a unique value, that identifies exactly this server as unique.
+
+CID
+---
+
+The CID is a (rename?) of a concept from 389 Directory Server. It is the pair of CSN and RID, allowing
+for changes to now be qualified to a specific server origin and ordering between multiple servers.
+
+As a result, this value is likely to be:
+
+::
+
+    (CSN, RID)
+
+RUV
+---
+
+The RUV is a concept from 389 Directory Server. It is the replication up-to-dateness vector.
+
+This is an array of RIDs, and their min-max CSN locations in the changelog for those RIDs. Min being the
+oldest change in the log related to that RID, and max being the latest change in the log related
+to that RID.
+
+::
+
+    Server A:
+    |----------------------|
+    |  ID  |  MIN  |  MAX  |
+    |----------------------|
+    |  01  |  000  |  010  |
+    |  02  |  002  |  005  |
+    |  03  |  004  |  008  |
+    |----------------------|
+
+To translate, this says that for RID 01, we have CSN 000 through 010. We can use these two values to
+recreate the CID of the change itself.
+
+Now, critically, it is important to be able to compare RUV's to determine what changes are required
+to be sent, and in which order. Let's assume we have a second server with a RUV of:
+
+::
+
+    Server B:
+    |----------------------|
+    |  ID  |  MIN  |  MAX  |
+    |----------------------|
+    |  01  |  005  |  008  |
+    |  02  |  000  |  002  |
+    |  03  |  004  |  012  |
+    |----------------------|
+
+So if we are to compare these, we can see that for ID 1, Server A has 000 -> 010, and B has 005 -> 008.
+You can make similar determinations for the other values.
+
+Importantly, in this case we need to ensure the max of Server B is at least equal to or greater than our MIN for each RID.
+
+Once we have asserting this, we can generate a list of CIDs to supply.
+
+::
+
+    (003,02)
+    (004,02)
+    (005,02)
+    (009,01)
+    (010,01)
+
+It's important to note, these have been ordered by their CID, primarily by CSN! After the replication completes Server B's
+RUV would now be:
+
+::
+
+    Server B:
+    |----------------------|
+    |  ID  |  MIN  |  MAX  |
+    |----------------------|
+    |  01  |  005  |  010  |
+    |  02  |  000  |  005  |
+    |  03  |  004  |  012  |
+    |----------------------|
+
+There are some other notes here: Server B is *ahead* of us for RID 3, so we actually send nothing related to
+this: it's likely that Server B will connect to us later and will supply the changes 11, 12 to us.
+
+Consider also two servers make a change at the same time. Both could generate an identical CSN
+value, but due to the nature of a CID to be (CSN, RID), this means that ordering can still take
+place between the events, where the server RID is now used to determine the order.
+
+
+Repl Proto Ideas
+----------------
+
+We should have push based replication. There should be two versions of the system:
+
+* Entry Level Replication
+* Attribute Level Replication.
+
+Both should be able to share the same RUV details.
+
+Entry Based
+===========
+
+This is the simpler version of the replication system. This is likely ONLY appropriate on a read-only
+consumer of data.
+
+The read-only stores *no* server RID, and contains an initially empty RUV. The provider would then supply it's
+RUV to the consumer (so that it now has a state of where it is), but with all CSN MIN/MAX set to 0.
+
+The list of CIDs is derived by RUV comparison, but instead of supplying the change log, the entries
+are sent whole, and the read-only blindly replaces them. We rely on the provider to have completed
+a correct entry update resolution process for this to make sense.
+
+To achieve this, we store a list of CID's and what entries were affected within the CID.
+
+One can imagine a situation where two servers change the entry, but between
+those changes the read-only is supplied the CID. We don't care in what order they did change,
+only that a change *must* have occured.
+
+So example: let's take entry A with server A and B, and read-only R.
+
+::
+
+    A {
+        data: ...
+        uuid: x,
+    }
+
+    CID-list:
+    [
+        (001, A): [x, ...]
+    ]
+
+So the entry was created with CID (001, A). We connect to R and it has an empty RUV.
+
+::
+
+    RUV A:    RUV R:
+    A 0/1     A 0/0
+
+We then determine the set of CID's to transmit must be:
+
+::
+
+    (001, A)
+
+Referencing our CID list, we know that uuid: x was modified, so we transmit that to the server.
+
+Now we add server B. The ruvs now are:
+
+::
+
+    RUV A:    RUV B:    RUV R:
+    A 0/1     A 0/1     A 0/1
+    B 0/0     B 0/0
+
+    CID-list A:
+    [
+        (001, A): [x, ...]
+    ]
+
+    CID-list B:
+    [
+        (001, A): [x, ...]
+    ]
+
+At this point a change happens on B *and* A at almost the same time: We'll say B happened first
+in this case though:
+
+::
+
+    RUV A:    RUV B:    RUV R:
+    A 0/3     A 0/1     A 0/1
+    B 0/0     B 0/2
+
+    CID-list A:
+    [
+        (001, A): [x, ...]
+        (003, A): [x, ...]
+    ]
+
+    CID-list B:
+    [
+        (001, A): [x, ...]
+        (002, B): [x, ...]
+    ]
+
+Remember, this protocol is ASYNC however. At this point something happens - server A replicates to R first, but
+without the changes from B yet. A RUV comparison yields that RUV R must be updated with the empty RUV B, but
+that the CID: (3, A) must be sent. The entry x is sent to R again.
+
+::
+
+    RUV A:    RUV B:    RUV R:
+    A 0/3     A 0/1     A 0/3
+    B 0/0     B 0/2     B 0/0
+
+    CID-list A:
+    [
+        (001, A): [x, ...]
+        (003, A): [x, ...]
+    ]
+
+    CID-list B:
+    [
+        (001, A): [x, ...]
+        (002, B): [x, ...]
+    ]
+
+Now, Server B now connects to A and supplies it's changes. Since the changes on B happen *before*
+the changes on A, the CID slots between the existing changes (and an update resolution would take
+place, which is out of scope of this part of the design).
+
+::
+
+    RUV A:    RUV B:    RUV R:
+    A 0/3     A 0/1     A 0/3
+    B 0/2     B 0/2     B 0/0
+
+    CID-list A:
+    [
+        (001, A): [x, ...]
+        (002, B): [x, ...]
+        (003, A): [x, ...]
+    ]
+
+Next Server A again connects to Server R, and determines based on the RUV that the differences are: (2, B).
+
+Consulting our CID-list, we see that entry X was changed in this CID. Here's what's important: the order of the change
+doesn't matter, because we take the *latest* version of UUID X, which has (1, A), (2, B) and (3, A) all
+fully resolved. We send the entry X as a whole, so all state of (2, B) and LATER changes are applied.
+
+This now means that because the whole entry was sent, we can assert the entry had changes (2, B) and
+(3, A), so we can update the RUV R to:
+
+::
+
+    RUV A:    RUV B:    RUV R:
+    A 0/3     A 0/1     A 0/3
+    B 0/2     B 0/2     B 0/2
+
+Now this protocol is not without flaws: read-only's should only be supplied data by a single server
+as one could imagine the content of R flip-flopping while server A/B are not in sync. However
+to prevent this situation such as:
+
+::
+
+    RUV A:    RUV B:    RUV R:
+    A 0/3     A 0/1     A 0/3
+    B 0/1     B 0/4     B 0/1
+
+In this case, one can imagine B would then supply data, and when A recieved B's changes, it would again
+supply to R. However, this can be easily avoided by adhering to the following:
+
+* A server can only supply to a read-only if all of the suppling server's RUV CSN MAX are contained
+  within the destination RUV CSN MAX.
+
+By following this, B would determine that as it does *not* have (3, A) (which is greater than the local
+RUV CSN MAX for A), it should not supply at this time. Once A and B resolve their changes:
+
+::
+
+    RUV A:    RUV B:    RUV R:
+    A 0/3     A 0/3     A 0/3
+    B 0/1     B 0/4     B 0/1
+
+Note that B has A's changes, but not A with B's - but now, server B does satisfy the RUV conditions
+and COULD supply to R. Similar, A now does not meet the conditions to supply to R until B replicates
+to A. There could be a risk of starvation to R however in high write-load conditions. It could just
+be preferable to allow the flip flop, but the risk there is a lack of over-all consistency of the entire
+server state. This risk is minimised by the fact that we support batching of operations, so all
+changes should be complete as a whole, and that if a changes happens on A in series, they must
+logically be valid.
+
+
+Deletion of entries is a different problem: Due to the entry lifecycle, most entries actually
+step to recycled, which would trigger the above process. Similar, when recycle ends, we then
+move to tombstone, again which triggers the above.
+
+However, we must now discuss the tomstone purging process.
+
+A tombstone would store the CID upon which it was ... well - tombstoned. As a result, the entry
+itself is aware of it's state.
+
+The tombstone purge process would work by detecting the MIN RUV of all replicas. If the MIN RUV
+is greater than the tombstone CID, then it must be true that all replicas HAVE the tombstone as
+a tombstone and all changes leading to that fact (as URP would dictate that all servers would
+arrive at the same tombstone state). At this point, we can now safely remove the tombstone from our
+database, and no replication needs to occur - as all other replicas would also remove it! This applies
+to read-onlies as well.
+
+However, this poses the question - how do we move the MIN RUV of a server? To achieve this we need
+to assert that *all other servers* have at least moved past a certain state, allowing us to trim out
+changelog UP TO the MIN RUV.
+
+Let's consider the supplier to read-only situation first, as this is the simplest:
+
+::
+
+    RUV A:      RUV R:
+    A 0/3       A 0/0
+
+    GRUV A:
+    A:R ???
+
+To achieve this, we need to view the RUV of every server we connect to: even the RO's despite their
+lack of RID (in fact this could be a reason to PROVIDE a RID to ROs) ... .
+We create a global RUV (GRUV) state which would look like
+the following:
+
+::
+
+    RUV A:      RUV R:
+    A 0/3       A 0/0
+
+    GRUV A:
+    R (A: 0/0, )
+
+So A has connected to R and polled the RUV and recieved a 0/0. We now can supply our changes to
+R:
+
+::
+
+    RUV A: -->  RUV R:
+    A 0/3       A 3/3
+
+    GRUV A:
+    R (A: 0/0, )
+
+As R is a read-only it has no concept of the changelog, so it sets MIN to MAX.
+
+Now, we then poll the RUV again. Protocol wise RUV polling should be seperate to suppling of data!
+
+::
+
+    RUV A:      RUV R:
+    A 0/3       A 3/3
+
+    GRUV A:
+    R (A: 3/3, )
+
+Now, we can see that the server R has changes MAX up to 3 - since this is the minimum of the set
+of all MAX in GRUV, we can now purge changelog of A up to MIN 3
+
+::
+
+    RUV A:      RUV R:
+    A 3/3       A 3/3
+
+    GRUV A:
+    R (A: 3/3, )
+
+And we are fully consistent!
+
+Let's imagine now we have two read-onlies, R1, R2.
+
+
+
+::
+
+    RUV A:    RUV B:    RUV R:
+    A 0/3     A 0/1     A 0/3
+    B 0/1     B 0/4     B 0/1
+
+    GRUV A:
+    A:B ???
+    A:R ???
+
+So, at this point, A would contact both
+
+
+
+
+
+Attribute Level Replication
+===========================
+
+TBD
+
+
+
+
+
diff --git a/src/lib/server.rs b/src/lib/server.rs
index 3d5a1d588..058eeec1e 100644
--- a/src/lib/server.rs
+++ b/src/lib/server.rs
@@ -827,6 +827,13 @@ impl<'a> QueryServerWriteTransaction<'a> {
             return plug_pre_res;
         }
 
+        // TODO: There is a potential optimisation here, where if
+        // candidates == pre-candidates, then we don't need to store anything
+        // because we effectively just did an assert. However, like all
+        // optimisations, this could be premature - so we for now, just
+        // do the CORRECT thing and recommit as we may find later we always
+        // want to add CSN's or other.
+
         let res: Result<Vec<Entry<EntryValid, EntryCommitted>>, SchemaError> = candidates
             .into_iter()
             .map(|e| e.validate(&self.schema))