Compare commits

...

7 commits

Author SHA1 Message Date
Firstyear 81c38af10b
Merge 71039cf069 into b113262357 2025-04-09 20:57:13 +10:00
Firstyear b113262357
Improve token handling ()
It was possible that a token could be updated in a way that caused
existing cached information to be lost if an event was delayed
in it's write to the user token.

To prevent this, the writes to user tokens now require the HsmLock
to be held, and refresh the token just ahead of writing to ensure
that these data can't be lost. The benefit to this approach is that
readers remain unblocked by a writer.
2025-04-09 14:49:06 +10:00
dependabot[bot] d025e8fff0
Bump tokio from 1.44.1 to 1.44.2 in the cargo group ()
Bumps the cargo group with 1 update: [tokio](https://github.com/tokio-rs/tokio).


Updates `tokio` from 1.44.1 to 1.44.2
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.44.1...tokio-1.44.2)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.44.2
  dependency-type: direct:production
  dependency-group: cargo
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-09 09:39:19 +10:00
William Brown 71039cf069 Updahte 2025-04-05 14:00:56 +10:00
William Brown d89fad3f42 Update 2025-04-05 13:55:53 +10:00
William Brown e35d4fddc1 Update 2025-04-05 13:55:53 +10:00
William Brown fa36325618 Ideas about service accounts 2025-04-05 13:55:53 +10:00
6 changed files with 281 additions and 38 deletions
Cargo.lockCargo.toml
book/src/developers/designs
unix_integration/resolver/src

4
Cargo.lock generated
View file

@ -5658,9 +5658,9 @@ dependencies = [
[[package]]
name = "tokio"
version = "1.44.1"
version = "1.44.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f382da615b842244d4b8738c82ed1275e6c5dd90c459a30941cd07080b06c91a"
checksum = "e6b88822cbe49de4185e3a4cbf8321dd487cf5fe0c5c65695fef6346371e9c48"
dependencies = [
"backtrace",
"bytes",

View file

@ -268,7 +268,7 @@ tempfile = "3.15.0"
testkit-macros = { path = "./server/testkit-macros" }
time = { version = "^0.3.36", features = ["formatting", "local-offset"] }
tokio = "^1.43.0"
tokio = "^1.44.2"
tokio-openssl = "^0.6.5"
tokio-util = "^0.7.13"

View file

@ -0,0 +1,171 @@
# Service Account Improvements - 2025
Initially when service accounts were added to Kanidm they were simply meant to be "detached"
accounts that could be used for some API access to Kani, or some other background tasks.
But as the server has evolved we need to consider how we can use these in other ways.
We have extented the OAuth2 client types to now almost act like a service account, especially
with the behaviour of things like a client credentials grant.
At this point we need to decide how to proceed with service accounts and what shape they could
take in the future.
## Prior Art
* (Microsoft AD-DS Service Accounts)[https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/manage/understand-service-accounts]
* (FreeIPA Service Principals)[https://www.freeipa.org/page/Administrators_Guide#managing-service-principals]
Note that both of these have some kerberos centric ideas as KRB requires service accounts to mutually
authenticate to clients, which means they need to maintain credentials. This is different to our needs,
but there are still some ideas in these docs worth knowing about and considering like group managed
service accounts (gMSA).
## Current state of affairs
We have:
* Break glass accounts (`admin`/`idm_admin`) are service accounts, may not have delegated management.
* OAuth2 is not a service account, supports delegated management.
* Service accounts can be group or user managed.
* Applications (To Be Introduced) is an extension of a Service account.
From this we can see that we have some separation, but also some cross over of functionality.
break glass isn't delegated, but service account is, OAuth2 isn't an SA, but Applications are.
## Capabilities
In order to properly handle this, we don't want to grant unbounded abilities to types, we don't
want to fully merge them, but we want to be able to mix-match what they require.
This also makes it possible in the future that we can more easily assign (or remove) a capability
from an account type.
To achieve this we should introduce the idea of capabilities - capabilities can act via schema
classes, and we can extend the schema such that only the parent class needs to know that the
capabilities class is required.
This allows us to nominate more carefully what each role type can or can't do. More importantly
within the server, we don't have to hardcode that "service accounts and applications" can use
api tokens vs every other capability type. We only need look for the capability on the entry.
| Capabilities | Api Token | OAuth2 Sessions | Interactive Login |
|-----------------|------------------|------------------------------|---------------------|
| OAuth2 | No | Via Client Credentials Grant | No |
| Application | Yes (ro) | No | No |
| Service Account | Yes (rw capable) | Yes (via session grant (TBD) | Yes (to be removed) |
| Machine Account | Yes (ro) | No | No |
| Break Glass | No | No | Yes |
| Person | No | Yes | Yes |
A key requirement of this is that we want each role to have a defined intent - it shouldn't be
the everything role, it still needs to be focused and administered in it's own right.
| | Intent |
|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| OAuth2 | An OAuth2 client (external server/service) that is treating Kani as the IDP it trusts to validate user authorisation to it's resources. |
| Application | An application password context, allowing per-user/per-device/per-application passwords to validated, as well as defining group based authorisation of whom may use this application |
| Service Account | An account that belongs to a process or automation that needs to read from or write to Kanidm, or a Kanidm related service. |
| Machine Account | A domain joined machine that is reads user posix or login information. May be used to configure machine service accounts in future. |
| Break Glass | An emergency access account used in disaster recovery. |
| Person | A humans owned account that needs to authenticate day to day, and self manage their own credentials. A person may need to manage other accounts and resource types |
This has the benefit that it makes it easier to assign the permissions via ACP (since we can filter
on the Target class *and* capability type).
### Example
An Email service has an SMTP gateway and OAuth2 web ui.
Although this is "the email service" it is made up of multiple parts that each have their own intents.
The Webui has an Oauth2 client created to define the relationship of who may access the webui.
An LDAP application is made to allow IMAP/SMTP processes to authenticate users with application passwords and
to read users PII via LDAP.
## Below was the drafting process of some ideas
### Attach roles to service accounts.
In this approach we centre the service account, and allow optional extension of other concerns. This
would make OAuth2 applications an extension of a service account. Similar application would become
an extension of service account.
This would mean that we create a service account first, then need a way to extend it with the
application or oauth2 types.
This would mean that a service account could potentially be everything - an application password
provider, an oauth2 client, and more. This would make the administration very difficult and deeply
nested on the single service account type, and could encourage bad administration practices as
admins can "shovel in" every possible role to single accounts.
PROS:
* OAuth2 applications get the ability to have api tokens to kani for other functionality
* Fullstacks like a mail server get a single SA that does everything
* These whole stack service accounts get access to every auth type and feature available
CONS:
* Makes the API around service accounts a bit messier
* Compromise of the SA or SA Manager may lead to higher impact due to more features in one place
* May be confusing to administrators
* More "inheritance" of schema classes, when we may want to try to simplify to single classes in line with SCIM.
* Harder to audit capabilities
* The administration UI becomes a shitshow as the Service Account is now a kitchen sink.
### Separate Concerns
In this approach we split our concerns. This is similar to today, but taken a bit further.
In this example, we would split Application to *just* be about the concern of an authentication
domain for LDAP applications. OAuth2 stays as *just* a configuration of the client and it's behaviour.
We would change the break glass accounts to be a separate type to Service Account. Service Account
becomes closer to the concept of a pure api access account. The break glass accounts become a
dedicated "emergency access account" type.
PROS:
* Similar to today, only small cleanup needed
* Separation of concerns and credentials limit's blast radius of a possible compromise.
* Easier auditing of capabilities of each account
CONS:
* More administrative overhead to manage the multiple accounts
* Stacked applications will need mulitple configurations for a role - OAuth2, LDAP application, Service accounts for example in an email server with a WebUI.
### Bit of A, bit of B, cleanup
AKA Capabilities
Rather than fully merge all the types, or fully split them, have a *little* merge of some bits, allowing
some limited extension of actions to specific actors. Effectively we end up granting *capabilities*
to different roles, and we can add extra capabilities later if we want.
OAuth2 and Applications would gain the ability to have API tokens associated for some tasks and
could act on Kanidm, but they wouldn't be fully fleshed service accounts.
| Capabilities | Api Token | OAuth2 Sessions | Interactive Login |
|-----------------|------------------|------------------------------|---------------------|
| OAuth2 | No | Via Client Credentials Grant | No |
| Application | Yes | No | No |
| Service Account | Yes (rw capable) | Yes (via session grant (TBD) | Yes (to be removed) |
| Break Glass | No | No | Yes |
PROS:
* Minimises changes to existing deployments
* Grants some new abilities within limits to other roles
* While not as locked down as separate concern proposal, still minimises the risk of compromise of an SA
CONS:
* Requires admins to have multiple accounts in some contexts (as above).
* Auditing requires knowledge of what each roles capabilities are, and what the capabilities do

View file

@ -194,7 +194,8 @@ impl Into<PamAuthResponse> for AuthRequest {
}
pub enum AuthResult {
Success { token: UserToken },
Success,
SuccessUpdate { new_token: UserToken },
Denied,
Next(AuthRequest),
}
@ -251,6 +252,7 @@ pub trait IdProvider {
async fn unix_user_online_auth_step(
&self,
_account_id: &str,
_current_token: Option<&UserToken>,
_cred_handler: &mut AuthCredHandler,
_pam_next_req: PamAuthRequest,
_tpm: &mut tpm::BoxedDynTpm,
@ -290,7 +292,8 @@ pub trait IdProvider {
// TPM key.
async fn unix_user_offline_auth_step(
&self,
_token: &UserToken,
_current_token: Option<&UserToken>,
_session_token: &UserToken,
_cred_handler: &mut AuthCredHandler,
_pam_next_req: PamAuthRequest,
_tpm: &mut tpm::BoxedDynTpm,

View file

@ -55,8 +55,6 @@ impl KanidmProvider {
tpm: &mut tpm::BoxedDynTpm,
machine_key: &tpm::MachineKey,
) -> Result<Self, IdpError> {
// FUTURE: Randomised jitter on next check at startup.
// Initially retrieve our HMAC key.
let loadable_hmac_key: Option<tpm::LoadableHmacKey> = keystore
.get_tagged_hsm_key(KANIDM_HMAC_KEY)
@ -248,13 +246,25 @@ impl KanidmProviderInternal {
// Proceed
CacheState::Online => true,
CacheState::OfflineNextCheck(at_time) if now >= at_time => {
// Attempt online. If fails, return token.
self.attempt_online(tpm, now).await
}
CacheState::OfflineNextCheck(_) | CacheState::Offline => false,
}
}
#[instrument(level = "debug", skip_all)]
async fn check_online_right_meow(
&mut self,
tpm: &mut tpm::BoxedDynTpm,
now: SystemTime,
) -> bool {
match self.state {
CacheState::Online => true,
CacheState::OfflineNextCheck(_) => self.attempt_online(tpm, now).await,
CacheState::Offline => false,
}
}
#[instrument(level = "debug", skip_all)]
async fn attempt_online(&mut self, _tpm: &mut tpm::BoxedDynTpm, now: SystemTime) -> bool {
let mut max_attempts = 3;
@ -295,7 +305,7 @@ impl IdProvider for KanidmProvider {
async fn attempt_online(&self, tpm: &mut tpm::BoxedDynTpm, now: SystemTime) -> bool {
let mut inner = self.inner.lock().await;
inner.check_online(tpm, now).await
inner.check_online_right_meow(tpm, now).await
}
async fn mark_next_check(&self, now: SystemTime) {
@ -431,6 +441,7 @@ impl IdProvider for KanidmProvider {
async fn unix_user_online_auth_step(
&self,
account_id: &str,
current_token: Option<&UserToken>,
cred_handler: &mut AuthCredHandler,
pam_next_req: PamAuthRequest,
tpm: &mut tpm::BoxedDynTpm,
@ -449,15 +460,23 @@ impl IdProvider for KanidmProvider {
match auth_result {
Ok(Some(n_tok)) => {
let mut token = UserToken::from(n_tok);
token.kanidm_update_cached_password(
let mut new_token = UserToken::from(n_tok);
// Update any keys that may have been in the db in the current
// token.
if let Some(previous_token) = current_token {
new_token.extra_keys = previous_token.extra_keys.clone();
}
// Set any new keys that are relevant from this authentication
new_token.kanidm_update_cached_password(
&inner.crypto_policy,
cred.as_str(),
tpm,
&inner.hmac_key,
);
Ok(AuthResult::Success { token })
Ok(AuthResult::SuccessUpdate { new_token })
}
Ok(None) => {
// TODO: i'm not a huge fan of this rn, but currently the way we handle
@ -552,7 +571,8 @@ impl IdProvider for KanidmProvider {
async fn unix_user_offline_auth_step(
&self,
token: &UserToken,
current_token: Option<&UserToken>,
session_token: &UserToken,
cred_handler: &mut AuthCredHandler,
pam_next_req: PamAuthRequest,
tpm: &mut tpm::BoxedDynTpm,
@ -561,11 +581,13 @@ impl IdProvider for KanidmProvider {
(AuthCredHandler::Password, PamAuthRequest::Password { cred }) => {
let inner = self.inner.lock().await;
if token.kanidm_check_cached_password(cred.as_str(), tpm, &inner.hmac_key) {
if session_token.kanidm_check_cached_password(cred.as_str(), tpm, &inner.hmac_key) {
// Ensure we have either the latest token, or if none, at least the session token.
let new_token = current_token.unwrap_or(session_token).clone();
// TODO: We can update the token here and then do lockouts.
Ok(AuthResult::Success {
token: token.clone(),
})
Ok(AuthResult::SuccessUpdate { new_token })
} else {
Ok(AuthResult::Denied)
}

View file

@ -47,7 +47,6 @@ pub enum AuthSession {
client: Arc<dyn IdProvider + Sync + Send>,
account_id: String,
id: Id,
token: Option<Box<UserToken>>,
cred_handler: AuthCredHandler,
/// Some authentication operations may need to spawn background tasks. These tasks need
/// to know when to stop as the caller has disconnected. This receiver allows that, so
@ -59,7 +58,7 @@ pub enum AuthSession {
account_id: String,
id: Id,
client: Arc<dyn IdProvider + Sync + Send>,
token: Box<UserToken>,
session_token: Box<UserToken>,
cred_handler: AuthCredHandler,
},
System {
@ -225,7 +224,7 @@ impl Resolver {
// Attempt to search these in the db.
let mut dbtxn = self.db.write().await;
let r = dbtxn.get_account(account_id).map_err(|err| {
debug!("get_cached_usertoken {:?}", err);
debug!(?err, "get_cached_usertoken");
})?;
drop(dbtxn);
@ -318,7 +317,12 @@ impl Resolver {
}
}
async fn set_cache_usertoken(&self, token: &mut UserToken) -> Result<(), ()> {
async fn set_cache_usertoken(
&self,
token: &mut UserToken,
// This is just for proof that only one write can occur at a time.
_tpm: &mut BoxedDynTpm,
) -> Result<(), ()> {
// Set an expiry
let ex_time = SystemTime::now() + Duration::from_secs(self.timeout_seconds);
let offset = ex_time
@ -451,6 +455,22 @@ impl Resolver {
let mut hsm_lock = self.hsm.lock().await;
// We need to re-acquire the token now behind the hsmlock - this is so that
// we know that as we write the updated token, we know that no one else has
// written to this token, since we are now the only task that is allowed
// to be in a write phase.
let token = if token.is_some() {
self.get_cached_usertoken(account_id)
.await
.map(|(_expired, option_token)| option_token)
.map_err(|err| {
debug!(?err, "get_usertoken error");
})?
} else {
// Was already none, leave it that way.
None
};
let user_get_result = if let Some(tok) = token.as_ref() {
// Re-use the provider that the token is from.
match self.client_ids.get(&tok.provider) {
@ -486,12 +506,11 @@ impl Resolver {
}
};
drop(hsm_lock);
match user_get_result {
Ok(UserTokenState::Update(mut n_tok)) => {
// We have the token!
self.set_cache_usertoken(&mut n_tok).await?;
self.set_cache_usertoken(&mut n_tok, hsm_lock.deref_mut())
.await?;
Ok(Some(n_tok))
}
Ok(UserTokenState::NotFound) => {
@ -958,7 +977,6 @@ impl Resolver {
client,
account_id: account_id.to_string(),
id,
token: Some(Box::new(token)),
cred_handler,
shutdown_rx,
};
@ -979,7 +997,7 @@ impl Resolver {
account_id: account_id.to_string(),
id,
client,
token: Box::new(token),
session_token: Box::new(token),
cred_handler,
};
Ok((auth_session, next_req.into()))
@ -1022,7 +1040,6 @@ impl Resolver {
client: client.clone(),
account_id: account_id.to_string(),
id,
token: None,
cred_handler,
shutdown_rx,
};
@ -1050,19 +1067,32 @@ impl Resolver {
auth_session: &mut AuthSession,
pam_next_req: PamAuthRequest,
) -> Result<PamAuthResponse, ()> {
let mut hsm_lock = self.hsm.lock().await;
let maybe_err = match &mut *auth_session {
&mut AuthSession::Online {
ref client,
ref account_id,
id: _,
token: _,
ref id,
ref mut cred_handler,
ref shutdown_rx,
} => {
let mut hsm_lock = self.hsm.lock().await;
// This is not used in the authentication, but is so that any new
// extra keys or data on the token are updated correctly if the authentication
// requests an update. Since we hold the hsm_lock, no other task can
// update this token between now and completion of the fn.
let current_token = self
.get_cached_usertoken(id)
.await
.map(|(_expired, option_token)| option_token)
.map_err(|err| {
debug!(?err, "get_usertoken error");
})?;
let result = client
.unix_user_online_auth_step(
account_id,
current_token.as_ref(),
cred_handler,
pam_next_req,
hsm_lock.deref_mut(),
@ -1071,7 +1101,7 @@ impl Resolver {
.await;
match result {
Ok(AuthResult::Success { .. }) => {
Ok(AuthResult::SuccessUpdate { .. } | AuthResult::Success) => {
info!(?account_id, "Authentication Success");
}
Ok(AuthResult::Denied) => {
@ -1087,17 +1117,29 @@ impl Resolver {
}
&mut AuthSession::Offline {
ref account_id,
id: _,
ref id,
ref client,
ref token,
ref session_token,
ref mut cred_handler,
} => {
// This is not used in the authentication, but is so that any new
// extra keys or data on the token are updated correctly if the authentication
// requests an update. Since we hold the hsm_lock, no other task can
// update this token between now and completion of the fn.
let current_token = self
.get_cached_usertoken(id)
.await
.map(|(_expired, option_token)| option_token)
.map_err(|err| {
debug!(?err, "get_usertoken error");
})?;
// We are offline, continue. Remember, authsession should have
// *everything you need* to proceed here!
let mut hsm_lock = self.hsm.lock().await;
let result = client
.unix_user_offline_auth_step(
token,
current_token.as_ref(),
session_token,
cred_handler,
pam_next_req,
hsm_lock.deref_mut(),
@ -1105,7 +1147,7 @@ impl Resolver {
.await;
match result {
Ok(AuthResult::Success { .. }) => {
Ok(AuthResult::SuccessUpdate { .. } | AuthResult::Success) => {
info!(?account_id, "Authentication Success");
}
Ok(AuthResult::Denied) => {
@ -1156,8 +1198,13 @@ impl Resolver {
match maybe_err {
// What did the provider direct us to do next?
Ok(AuthResult::Success { mut token }) => {
self.set_cache_usertoken(&mut token).await?;
Ok(AuthResult::Success) => {
*auth_session = AuthSession::Success;
Ok(PamAuthResponse::Success)
}
Ok(AuthResult::SuccessUpdate { mut new_token }) => {
self.set_cache_usertoken(&mut new_token, hsm_lock.deref_mut())
.await?;
*auth_session = AuthSession::Success;
Ok(PamAuthResponse::Success)