kanidm/designs/indexing.md
2022-07-07 13:03:08 +10:00

11 KiB

Indexing

Indexing is deeply tied to the concept of filtering. Indexes exist to make the application of a search term (filter) faster.

World without indexing

Almost all databases are built ontop of a key-value storage engine of some nature. In our case we are using (feb 2019) sqlite and hopefully SLED in the future.

So our entries that contain sets of avas, these are serialised into a byte format (feb 2019, json but soon cbor) and stored in a table of "id: entry". For example:

ID data
01 { 'Entry': { 'name': ['name'], 'class': ['person'], 'uuid': ['...'] } }
02 { 'Entry': { 'name': ['beth'], 'class': ['person'], 'uuid': ['...'] } }
03 { 'Entry': { 'name': ['alan'], 'class': ['person'], 'uuid': ['...'] } }
04 { 'Entry': { 'name': ['john'], 'class': ['person'], 'uuid': ['...'] } }
05 { 'Entry': { 'name': ['kris'], 'class': ['person'], 'uuid': ['...'] } }

The ID column is private to the backend implementation and is never revealed to the higher level components. However the ID is very important to indexing :)

If we wanted to find Eq(name, john) here, what do we need to do? A full table scan is where we perform:

data = sqlite.do(SELECT * from id2entry);
for row in data:
    entry = deserialise(row)
    entry.match_filter(...) // check Eq(name, john)

For a small database (maybe up to 20 objects), this is probably fine. But once you start to get much larger this is really costly. We continually load, deserialise, check and free data that is not relevant to the search. This is why full table scans of any database (sql, ldap, anything) are so costly. It's really really scanning everything!

How does indexing work?

Indexing is a pre-computed lookup table of what you might search in a specific format. Let's say in our example we have an equality index on "name" as an attribute. Now in our backend we define an extra table called "index_eq_name". Its contents would look like:

index idl (ID List)
alan [03, ]
beth [02, ]
john [04, ]
kris [05, ]
name [01, ]

So when we perform our search for Eq(name, john) again, we see name is indexed. We then perform:

SELECT * from index_eq_name where index=john;

This would give us the idl (ID list) of [04,]. This is the "ID's of every entry where name equals john".

We can now take this back to our id2entry table and perform:

data = sqlite.do(SELECT * from id2entry where ID = 04)

The key-value engine only gives us the entry for john, and we have a match! If id2entry had 1 million entries, a full table scan would be 1 million loads and compares - with the index, it was 2 loads and one compare. That's 30000x faster (potentially ;) )!

To improve on this, if we had a query like Or(Eq(name, john), Eq(name, kris)) we can use our indexes to speed this up.

We would query index_eq_name again, and we would perform the search for both john, and kris. Because this is an OR we then union the two idl's, and we would have:

[04, 05,]

Now we just have to get entries 04,05 from id2entry, and we have our matching query. This means filters are often applied as idl set operations.

Compressed ID lists

In order to make idl loading faster, and the set operations faster there is an idl library (developed by me, firstyear), which will be used for this. To read more see:

https://github.com/Firstyear/idlset

Filter Optimisation

Filter optimisation begins to play an important role when we have indexes. If we indexed something like Pres(class), then the idl for that search is the set of all database entries. Similar, if our database of 1 million entries has 250,000 class=person, then Eq(class, person), will have an idl containing 250,000 ids. Even with idl compression, this is still a lot of data!

There tend to be two types of searches against a directory like Kanidm.

  • Broad searches
  • Targetted single entry searches

For broad searches, filter optimising does little - we just have to load those large idls, and use them. (Yes, loading the large idl and using it is still better than full table scan though!)

However, for targeted searches, filter optimisation really helps.

Imagine a query like:

And(Eq(class, person), Eq(name, claire))

In this case with our database of 250,000 persons, our idl's would have:

And( idl[250,000 ids], idl(1 id))

Which means the result will always be the single id in the idl or no value because it wasn't present.

We add a single concept to the server called the "filter test threshold". This is the state in which a candidate set that is not completed operation, is shortcut, and we then apply the filter in the manner of a full table scan to the partial set because it will be faster than the index loading and testing.

When we have this test threshold, there exists two possibilities for this filter.

And( idl[250,000 ids], idl(1 id))

We load 250,000 idl and then perform the intersection with the idl of 1 value, and result in 1 or 0.

And( idl(1 id), idl[250,000 ids])

We load the single idl value for name, and then as we are below the test-threshold we shortcut out and apply the filter to entry ID 1 - yielding a match or no match.

Notice in the second, by promoting the "smaller" idl, we were able to save the work of the idl load and intersection as our first equality of "name" was more targetted?

Filter optimisation is about re-arranging these filters in the server using our insight to data to provide faster searches and avoid indexes that are costly unless they are needed.

In this case, we would demote any filter where Eq(class, ...) to the end of the And, because it is highly likely to be less targetted than the other Eq types. Another example would be promotion of Eq filters to the front of an And over a Sub term, wherh Sub indexes tend to be larger and have longer IDLs.

Implementation Details and Notes

Before we discuss the details of the states and update processes, we need to consider the index types we require.

Index types

The standard index is a key-value, where the key is the lookup, and the value is the idl set of the candidates. The examples follow the above.

For us, we will format the table names as:

  • idx_eq_
  • idx_sub_
  • idx_pres_

These will be string, blob for SQL. The string is the pkey.

We will have the Value's "to_index_str" emit the set of values. It's important to remember this is a set of possible index emissions, where we could have multiple values returned. This will be important with claims for credentials so that the claims can be indexed correctly.

We also require a special name to uuid, and uuid to name index. These are to accelerate the name2uuid and uuid2name functions which are common in resolving on search. These will be named in the tables as:

  • idx_name2uuid
  • idx_uuid2name

They will be structured as string, string for both - where the uuid and name column matches the correct direction, and is the primary key. We could use a single table, but if we change to sled we need to split this, so we pre-empt this change and duplicate the data here.

Indexing States

  • Reindex

A reindex is the only time when we create the tables needed for indexing. In all other phases if we do not have the table for the insertion, we log the error, and move on, instructing in the logs to reindex asap.

Reindexing should be performed after we join a replication group, or when we "setup" the instance for the first time. This means we need an "initial indexed" flag or similar.

For all intents, a reindex is likely the same as "create" but just without replacing the entry. We would just remove all the index tables before hand.

  • Write operation index metadata

At the start of a write transaction, the schema passes us a map of the current attribute index states so that on filter application or modification we are aware of what attrs are indexed. It is assumed that name2uuid and uuid2name are always indexed.

  • Search Index Metadata

When filters are resolved they are tagged by their indexed state to allow optimisation to occur. We then process each filter element and their tag to determine the indexes needed to built a candidate set. Once we reach threshold we return the partial candidate set, and begin the id2entry process and the entry_match_no_index routine.

And and Or terms have flags if they are partial or fully indexed, meaning we could have a shortcut where if the outermost term is a full indexed term, then we can avoid the entry_match_no_index Scall.

  • Create

This is one of the simplest steps. On create we iterate over the entries ava's and referencing the index metadata of the transaction, we create the indexes as needed from the values (before dbv conversion).

  • Delete

Given the Entry to delete, we remove the ava's and id's from each set as needed. Generally this will only be for tombstones, but we still should check the process works. Important to check will be entries with and without names, ensuring the name2uuid/uuid2name is correctly changed, and removal of all the other attributes.

  • Modify

This is the truly scary and difficult situation. The simple method would be to "delete" all indexes based on the pre-entry state, and then to create again. However the current design of Entry and modification doesn't work like this as we only get the Entry to add.

Most likely we will need to change modify to take the set of (pre, post) candidates as a pair OR we have the entry store it's own pre-post internally. Given we already need to store the pre /post entries in the txn, it's likely better to have a pairing of these, and that allows us to then index replication metadata later as the entry will contain it's own changelog internally.

Given the pair, we then assert that they are the same entry (id). We can then use the index metadata to generate an indexing diff between them, containing a set of index items to remove (due to removal of the attr or value), and what to add (due to addition).

The major transformation cases for testing are:

  • Add a multivalue (one)
  • Add a multivalue (many)
  • On a mulitvalue, add another value
  • On multivalue, remove a value, but leave others
  • Delete a multivalue
  • Add a new single value
  • Replace a single value
  • Delete a single value

We also need to check that modification of name correctly changes name2uuid and uuid2name.

  • Recycle to Tombstone (removal of name)
  • Change of UUID (may happen in repl conflict scenario)
  • Change of name
  • Change of name and uuid

Of course, these should work as above too.