From 5049eac1e4af912db77037e21e60f84f8595c12a Mon Sep 17 00:00:00 2001
From: Firstyear <william@blackhats.net.au>
Date: Thu, 7 Nov 2019 18:00:57 +1000
Subject: [PATCH] Design a downgrade process (#144)

---
 designs/downgrade.rst                  | 47 ++++++++++++++++++++++++++
 designs/repl_future_considerations.rst |  2 +-
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 designs/downgrade.rst

diff --git a/designs/downgrade.rst b/designs/downgrade.rst
new file mode 100644
index 000000000..b6dbd0844
--- /dev/null
+++ b/designs/downgrade.rst
@@ -0,0 +1,47 @@
+Downgrades
+----------
+
+It's inevitable that someone will find some issue that requires them to downgrade
+their working copy of kanidmd - that means we have to understand that process,
+and at least advertise or document how it should be done.
+
+A major barrier for us to have a downgrade process is the nature of our inplace
+migrations and upgrades - while we have a system that understands how to upgrade
+data and make changes, when downgrading we won't be able to understand the newer
+types to do a downgrade.
+
+Consider we add a new value type XDATA which was previous a UTF8STRING. We have
+version 1.0 to 1.1. Version 1.1 will change all UTF8STRING to XDATA which is
+what we want, but if we were to run version 1.0 again it would not understand
+the XDATA field - not know how to downgrade since the type flatly doesn't exist.
+
+As a result this leaves one conclusion - we can not support downgrades. Rather
+the correct behaviour for us to support is to advise admins to backup before
+an upgrade, and to restore from the backup if anything goes wrong.
+
+This will affect replication in two ways
+
+First, it means the RUV of a server node can move backwards. This requires
+us to limit changelog trimming of events to events that have expired by time
+rather than events that are fully resolved. This way within the changelog
+trim window, a server can be downgraded, and it's RUV move backwards, but the missing updates will be "replayed" backwards to it.
+
+Second, it means we have to consider making replication either version (typed)
+data agnostic *or* have CSN's reperesent a dataset version from the server which gates or blocks replication events from newer to older instances until *they* are upgraded.
+
+Having the version gate does have a good benefit. Imagine we have three servers
+A, B, C. We upgrade A and B, and they migrate UTF8STRING to XDATA. Server C has
+not been upgraded.
+
+This means that *all changes* from A and B post upgrade will NOT be sent to C. C
+may accept changes and will continue to provide them to A and B (provided all
+other update resolution steps uphold). If we now revert B, the changes from A will
+not flow to B which has been downgraded, but C's changes that were accepted WILL
+continue to be acceptted by B. Similar with A. This means in a downgrade scenario
+that any data writen on upgraded nodes that are downgraded will be lost, but
+that all replication as a whole will still be valid. This is good!
+
+It does mean we need to consider that we have to upgrade data as it comes in from
+replication from an older server too to bring fields up to date if needed. This
+may necesitate a "data version" field on each entry, which we can also associate
+to any CSN so that it can be accepted or rejected as required.
diff --git a/designs/repl_future_considerations.rst b/designs/repl_future_considerations.rst
index 3c3e17776..8f31bff6a 100644
--- a/designs/repl_future_considerations.rst
+++ b/designs/repl_future_considerations.rst
@@ -420,7 +420,7 @@ Let's imagine now we have two read-onlies, R1, R2.
 So, at this point, A would contact both
 
 
-
+SEE ALSO: Downgrades
 
 
 Attribute Level Replication