kanidm/book/src/troubleshooting.md
James Hodgkinson 60e5935faa
Moving daemon tracing to OpenTelemetry (#2292)
* sally forth into the great otel unknown
* make the build env identification slightly more durable
* docs updates
* wasm recompile
2023-11-09 05:15:12 +00:00

106 lines
3.3 KiB
Markdown

# Troubleshooting
Some things to try.
## Is the server started?
If you don't see "ready to rock! 🪨" in your logs, it's not started. Scroll back and look for
errors!
## Can you connect?
If the server's running on `idm.example.com:8443` then a simple connectivity test is done using
[curl](https://curl.se).
Run the following command:
```shell
curl https://idm.example.com:8443/status
```
This is similar to what you _should_ see:
```shell
{{#rustdoc_include troubleshooting/curl_connection_test.txt}}
```
This means:
1. you've successfully connected to a host (10.0.0.14),
2. TLS worked
3. Received the status response "true"
If you see something like this:
```shell
➜ curl -v https://idm.example.com:8443
* Trying 10.0.0.1:8443...
* connect to 10.0.0.1 port 8443 failed: Connection refused
* Failed to connect to idm.example.com port 8443 after 5 ms: Connection refused
* Closing connection 0
curl: (7) Failed to connect to idm.example.com port 8443 after 5 ms: Connection refused
```
Then either your DNS is wrong (it's pointing at 10.0.0.1) or you can't connect to the server for
some reason.
If you get errors about certificates, try adding `-k` to skip certificate verification checking and
just test connectivity:
```shell
curl -vk https://idm.example.com:8443/status
```
## Server things to check
- Has the config file got `bindaddress = "127.0.0.1:8443"` ? Change it to
`bindaddress = "[::]:8443"`, so it listens on all interfaces.
- Is there a firewall on the server?
- If you're running in docker, did you expose the port (`-p 8443:8443`) or configure the network to
host/macvlan/ipvlan?
## Client errors
When you receive a client error it will list an "Operation ID" sometimes also called the OpId or
KOpId. This UUID matches to the UUID's in the logs allowing you to precisely locate the server logs
related to the failing operation.
Try running commands with `RUST_LOG=debug` to get more information:
```shell
RUST_LOG=debug kanidm login --name anonymous
```
## Reverse Proxies not sending HTTP/1.1 requests
NGINX (and probably other proxies) send HTTP/1.0 requests to the upstream server by default. This'll
lead to errors like this in your proxy logs:
```text
*17 upstream prematurely closed connection while reading response header from upstream, client: 172.19.0.1, server: example.com, request: "GET / HTTP/1.1", upstream: "https://172.19.0.3:8443/", host: "example.com:8443"
```
The fix for NGINX is to set the
[proxy_http_version](https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_http_version)
to `1.1`. This can go in the same block as the `proxy_pass` option.
```text
proxy_http_version 1.1
```
## OpenTelemetry errors
If you see something like this:
> `OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (The system is not in a state required for the operation's execution): , detailed error message: TRACE_TOO_LARGE: max size of trace (5000000) exceeded while adding 86725 bytes to trace a657b63f6ca0415eb70b6734f20f82cf for tenant single-tenant`
Then you'l need to tweak the maximum trace size in your OTLP receiver. In Grafana Tempo you can add
the following keys to your `tempo.yaml`, in this example we're setting it to 20MiB:
```yaml
overrides:
defaults:
global:
max_bytes_per_trace: 20971520 # 20MiB
```