Docs update

This commit is contained in:
William Brown 2019-05-01 14:07:13 +10:00
parent b4ea4fff89
commit e1c41d549a
3 changed files with 573 additions and 2 deletions

159
designs/memberof.rst Normal file
View file

@ -0,0 +1,159 @@
MemberOf
--------
Member Of is a plugin that serves a fundamental tradeoff: precomputation of
the relationships between a group and a user is more effective than looking
up those relationships repeatedly.
There are a few reasons for this to exist.
The major one is that the question is generally framed "what groups is this person
a member of". This is true in terms of application access checks (is user in group Y?), nss' calls
ie 'id name'. As a result, we want to have our data for our user and groups in a close locality.
Given the design of the KaniDM system, where we generally frame and present user id tokens, it
is upon the user that we want to keep the reference to it's groups.
Now at this point one could consider "Why not just store the groups on the user in the first place?".
There is a security benefit to the relationship of "groups have members" rather than "users are
members of groups". That benefit is delegated administration. It is much easier to define access
controls over "who" can alter the content of a group, including the addition of new members, where
the ability to control writing to all users memberOf attribute would mean that anyone with that right
could add anyone, to any group.
IE, if Claire has the write access to "library users" she can only add members to that group.
However, if users took memberships, for claire to add "library users", we would need to either allow
claire to arbitrarily write any group name to users, OR we would need to increase the complexity
of the ACI system to support validation of the content of changes.
So as a result - from a user interaction viewpoint, management of groups that have members is the
simpler, and more powerful solution, however from a query and access viewpoint, the relation ship
of what group is a user member of is the more useful structure.
To this end, we have the member of plugin. Given a set of groups and there members, update the reverse
reference on the users to contain the member of relationship to the group.
There is one final benefit to memberOf - it allows us to have *fast* group nesting capability
where the inverse look up becomes N operations to resolve the full structure.
Design
------
Due to the nature of this plugin, there is a single attribute - 'member' - whos content is examined
to build the relationship to others - 'memberOf'. We will examine a single group and user situation
without nesting. We assume the user already exists, as the situation where the group exists and we add
the user can't occur due to refint.
* Base Case
The basecase is the state where memberOf:G-uuid is present in U:memberOf. When this case is met, no
action is taken. To determine this, we assert that entry pre:memberOf == entry post:memberOf in
the modification - IE no action was taken.
* Modify Case.
as memberOf:G-uuid is not present in U:memberOf, we do a "modify" to add it. The modify will recurse
to the basecase, that asserts, it is present then will return.
Now let's consider the nested case. G1 -> G2 -> U. We'll assume that G2 -> U already exists
but that now we need to add G1 -> G2. This is now trivial to apply given that we use recursion
to apply these changes.
An important aspect of this is that groups *also* contain memberOf attributes: This benefits us because
we can then apply the memberOf from our group to the members of the group!
::
G1 G2 U
member: G2 member: U
memberOf: G1 memberOf: G1, G2
So at each step, if we are a group, we take our uuid, and add it to the set, and then make a present
modification of our memberOf + our uuid. So translated:
::
G1 G2 U
member: G2 member: U
memberOf: - memberOf: - memberOf: G2
-> [ G1, ]
G1 G2 U
member: G2 member: U
memberOf: - memberOf: G1 memberOf: G2
-> [ G2, G1 ]
G1 G2 U
member: G2 member: U
memberOf: - memberOf: G1 memberOf: G2, G1
It's important to note, we only recures on Groups - nothing else. This is what breaks the
cycle on U, as memberOf is now fully applied.
As a result of our base-case, we can now handle the most evil of cases: circular nested groups
and cycle breaking.
::
G1 G2 G3
member: G2 member: G3 member: G1
memberOf: -- memberOf: -- memberOf: --
-> [ G1, ]
G1 G2 G3
member: G2 member: G3 member: G1
memberOf: -- memberOf: G1 memberOf: --
-> [ G2, G1 ]
G1 G2 G3
member: G2 member: G3 member: G1
memberOf: -- memberOf: G1 memberOf: G1-2
-> [ G3, G2, G1 ]
G1 G2 G3
member: G2 member: G3 member: G1
memberOf: G1-3 memberOf: G1 memberOf: G1-2
-> [ G3, G2, G1 ]
G1 G2 G3
member: G2 member: G3 member: G1
memberOf: G1-3 memberOf: G1-3 memberOf: G1-2
-> [ G3, G2, G1 ]
G1 G2 G3
member: G2 member: G3 member: G1
memberOf: G1-3 memberOf: G1-2 memberOf: G1-3
-> [ G3, G2, G1 ]
G1 G2 G3
member: G2 member: G3 member: G1
memberOf: G1-3 memberOf: G1-2 memberOf: G1-3
BASE CASE -> Application of G1-3 on G1 has no change. END.
To supplement this, *removal* of a member from a group is the same process - but instead we
use the "removed" modify keyword instead of present. The base case remains the same: if no
changes occur, we have completed the operation.
Considerations
--------------
* Preventing recursion: As of course, we are
* Replication

View file

@ -8,7 +8,7 @@ At first glance it may seem correct to no-op a change where the state is:
name: william
}
with a "delete name; add name william".
with a "purge name; add name william".
However, this doesn't express the full possibities of the replication topology
in the system. The follow events could occur:
@ -17,7 +17,6 @@ in the system. The follow events could occur:
DB 1 DB 2
---- ----
n: w
del: name
n: l
del: name
@ -27,3 +26,409 @@ The events of DB 1 seem correct in isolation, to no-op the del and re-add, howev
when the changelogs will be replayed, they will then cause the events of DB2 to
be the final state - whet the timing of events on DB 1 should actually be the
final state.
To contrast if you no-oped the purge name:
::
DB 1 DB 2
---- ----
n: l
n: w
Your final state is now n: [l, w] - note that we have an extra name field we didn't want!
CSN
---
The CSN is a concept from 389 Directory Server. It is the Change Serial Number of a a modification
or event in the database. The CSN is a lamport clock, where it is the current time in UTC, but
it can never move *backwards*.
RID
---
The RID is a concept from 389 Directory Server. It is the Replica ID of a server. The RID must
be a unique value, that identifies exactly this server as unique.
CID
---
The CID is a (rename?) of a concept from 389 Directory Server. It is the pair of CSN and RID, allowing
for changes to now be qualified to a specific server origin and ordering between multiple servers.
As a result, this value is likely to be:
::
(CSN, RID)
RUV
---
The RUV is a concept from 389 Directory Server. It is the replication up-to-dateness vector.
This is an array of RIDs, and their min-max CSN locations in the changelog for those RIDs. Min being the
oldest change in the log related to that RID, and max being the latest change in the log related
to that RID.
::
Server A:
|----------------------|
| ID | MIN | MAX |
|----------------------|
| 01 | 000 | 010 |
| 02 | 002 | 005 |
| 03 | 004 | 008 |
|----------------------|
To translate, this says that for RID 01, we have CSN 000 through 010. We can use these two values to
recreate the CID of the change itself.
Now, critically, it is important to be able to compare RUV's to determine what changes are required
to be sent, and in which order. Let's assume we have a second server with a RUV of:
::
Server B:
|----------------------|
| ID | MIN | MAX |
|----------------------|
| 01 | 005 | 008 |
| 02 | 000 | 002 |
| 03 | 004 | 012 |
|----------------------|
So if we are to compare these, we can see that for ID 1, Server A has 000 -> 010, and B has 005 -> 008.
You can make similar determinations for the other values.
Importantly, in this case we need to ensure the max of Server B is at least equal to or greater than our MIN for each RID.
Once we have asserting this, we can generate a list of CIDs to supply.
::
(003,02)
(004,02)
(005,02)
(009,01)
(010,01)
It's important to note, these have been ordered by their CID, primarily by CSN! After the replication completes Server B's
RUV would now be:
::
Server B:
|----------------------|
| ID | MIN | MAX |
|----------------------|
| 01 | 005 | 010 |
| 02 | 000 | 005 |
| 03 | 004 | 012 |
|----------------------|
There are some other notes here: Server B is *ahead* of us for RID 3, so we actually send nothing related to
this: it's likely that Server B will connect to us later and will supply the changes 11, 12 to us.
Consider also two servers make a change at the same time. Both could generate an identical CSN
value, but due to the nature of a CID to be (CSN, RID), this means that ordering can still take
place between the events, where the server RID is now used to determine the order.
Repl Proto Ideas
----------------
We should have push based replication. There should be two versions of the system:
* Entry Level Replication
* Attribute Level Replication.
Both should be able to share the same RUV details.
Entry Based
===========
This is the simpler version of the replication system. This is likely ONLY appropriate on a read-only
consumer of data.
The read-only stores *no* server RID, and contains an initially empty RUV. The provider would then supply it's
RUV to the consumer (so that it now has a state of where it is), but with all CSN MIN/MAX set to 0.
The list of CIDs is derived by RUV comparison, but instead of supplying the change log, the entries
are sent whole, and the read-only blindly replaces them. We rely on the provider to have completed
a correct entry update resolution process for this to make sense.
To achieve this, we store a list of CID's and what entries were affected within the CID.
One can imagine a situation where two servers change the entry, but between
those changes the read-only is supplied the CID. We don't care in what order they did change,
only that a change *must* have occured.
So example: let's take entry A with server A and B, and read-only R.
::
A {
data: ...
uuid: x,
}
CID-list:
[
(001, A): [x, ...]
]
So the entry was created with CID (001, A). We connect to R and it has an empty RUV.
::
RUV A: RUV R:
A 0/1 A 0/0
We then determine the set of CID's to transmit must be:
::
(001, A)
Referencing our CID list, we know that uuid: x was modified, so we transmit that to the server.
Now we add server B. The ruvs now are:
::
RUV A: RUV B: RUV R:
A 0/1 A 0/1 A 0/1
B 0/0 B 0/0
CID-list A:
[
(001, A): [x, ...]
]
CID-list B:
[
(001, A): [x, ...]
]
At this point a change happens on B *and* A at almost the same time: We'll say B happened first
in this case though:
::
RUV A: RUV B: RUV R:
A 0/3 A 0/1 A 0/1
B 0/0 B 0/2
CID-list A:
[
(001, A): [x, ...]
(003, A): [x, ...]
]
CID-list B:
[
(001, A): [x, ...]
(002, B): [x, ...]
]
Remember, this protocol is ASYNC however. At this point something happens - server A replicates to R first, but
without the changes from B yet. A RUV comparison yields that RUV R must be updated with the empty RUV B, but
that the CID: (3, A) must be sent. The entry x is sent to R again.
::
RUV A: RUV B: RUV R:
A 0/3 A 0/1 A 0/3
B 0/0 B 0/2 B 0/0
CID-list A:
[
(001, A): [x, ...]
(003, A): [x, ...]
]
CID-list B:
[
(001, A): [x, ...]
(002, B): [x, ...]
]
Now, Server B now connects to A and supplies it's changes. Since the changes on B happen *before*
the changes on A, the CID slots between the existing changes (and an update resolution would take
place, which is out of scope of this part of the design).
::
RUV A: RUV B: RUV R:
A 0/3 A 0/1 A 0/3
B 0/2 B 0/2 B 0/0
CID-list A:
[
(001, A): [x, ...]
(002, B): [x, ...]
(003, A): [x, ...]
]
Next Server A again connects to Server R, and determines based on the RUV that the differences are: (2, B).
Consulting our CID-list, we see that entry X was changed in this CID. Here's what's important: the order of the change
doesn't matter, because we take the *latest* version of UUID X, which has (1, A), (2, B) and (3, A) all
fully resolved. We send the entry X as a whole, so all state of (2, B) and LATER changes are applied.
This now means that because the whole entry was sent, we can assert the entry had changes (2, B) and
(3, A), so we can update the RUV R to:
::
RUV A: RUV B: RUV R:
A 0/3 A 0/1 A 0/3
B 0/2 B 0/2 B 0/2
Now this protocol is not without flaws: read-only's should only be supplied data by a single server
as one could imagine the content of R flip-flopping while server A/B are not in sync. However
to prevent this situation such as:
::
RUV A: RUV B: RUV R:
A 0/3 A 0/1 A 0/3
B 0/1 B 0/4 B 0/1
In this case, one can imagine B would then supply data, and when A recieved B's changes, it would again
supply to R. However, this can be easily avoided by adhering to the following:
* A server can only supply to a read-only if all of the suppling server's RUV CSN MAX are contained
within the destination RUV CSN MAX.
By following this, B would determine that as it does *not* have (3, A) (which is greater than the local
RUV CSN MAX for A), it should not supply at this time. Once A and B resolve their changes:
::
RUV A: RUV B: RUV R:
A 0/3 A 0/3 A 0/3
B 0/1 B 0/4 B 0/1
Note that B has A's changes, but not A with B's - but now, server B does satisfy the RUV conditions
and COULD supply to R. Similar, A now does not meet the conditions to supply to R until B replicates
to A. There could be a risk of starvation to R however in high write-load conditions. It could just
be preferable to allow the flip flop, but the risk there is a lack of over-all consistency of the entire
server state. This risk is minimised by the fact that we support batching of operations, so all
changes should be complete as a whole, and that if a changes happens on A in series, they must
logically be valid.
Deletion of entries is a different problem: Due to the entry lifecycle, most entries actually
step to recycled, which would trigger the above process. Similar, when recycle ends, we then
move to tombstone, again which triggers the above.
However, we must now discuss the tomstone purging process.
A tombstone would store the CID upon which it was ... well - tombstoned. As a result, the entry
itself is aware of it's state.
The tombstone purge process would work by detecting the MIN RUV of all replicas. If the MIN RUV
is greater than the tombstone CID, then it must be true that all replicas HAVE the tombstone as
a tombstone and all changes leading to that fact (as URP would dictate that all servers would
arrive at the same tombstone state). At this point, we can now safely remove the tombstone from our
database, and no replication needs to occur - as all other replicas would also remove it! This applies
to read-onlies as well.
However, this poses the question - how do we move the MIN RUV of a server? To achieve this we need
to assert that *all other servers* have at least moved past a certain state, allowing us to trim out
changelog UP TO the MIN RUV.
Let's consider the supplier to read-only situation first, as this is the simplest:
::
RUV A: RUV R:
A 0/3 A 0/0
GRUV A:
A:R ???
To achieve this, we need to view the RUV of every server we connect to: even the RO's despite their
lack of RID (in fact this could be a reason to PROVIDE a RID to ROs) ... .
We create a global RUV (GRUV) state which would look like
the following:
::
RUV A: RUV R:
A 0/3 A 0/0
GRUV A:
R (A: 0/0, )
So A has connected to R and polled the RUV and recieved a 0/0. We now can supply our changes to
R:
::
RUV A: --> RUV R:
A 0/3 A 3/3
GRUV A:
R (A: 0/0, )
As R is a read-only it has no concept of the changelog, so it sets MIN to MAX.
Now, we then poll the RUV again. Protocol wise RUV polling should be seperate to suppling of data!
::
RUV A: RUV R:
A 0/3 A 3/3
GRUV A:
R (A: 3/3, )
Now, we can see that the server R has changes MAX up to 3 - since this is the minimum of the set
of all MAX in GRUV, we can now purge changelog of A up to MIN 3
::
RUV A: RUV R:
A 3/3 A 3/3
GRUV A:
R (A: 3/3, )
And we are fully consistent!
Let's imagine now we have two read-onlies, R1, R2.
::
RUV A: RUV B: RUV R:
A 0/3 A 0/1 A 0/3
B 0/1 B 0/4 B 0/1
GRUV A:
A:B ???
A:R ???
So, at this point, A would contact both
Attribute Level Replication
===========================
TBD

View file

@ -827,6 +827,13 @@ impl<'a> QueryServerWriteTransaction<'a> {
return plug_pre_res;
}
// TODO: There is a potential optimisation here, where if
// candidates == pre-candidates, then we don't need to store anything
// because we effectively just did an assert. However, like all
// optimisations, this could be premature - so we for now, just
// do the CORRECT thing and recommit as we may find later we always
// want to add CSN's or other.
let res: Result<Vec<Entry<EntryValid, EntryCommitted>>, SchemaError> = candidates
.into_iter()
.map(|e| e.validate(&self.schema))