nplus/samples/ha/README.md

# High Availability

To gain a higher level of availability for your Instance, you can 

- create more Kubernetes Cluster Nodes
- create more replicas of the *nscale* and *nplus* components
- distribute those replicas across multiple nodes using anti-affinities

This is how:

```
helm install \
  --values samples/ha/values.yaml
  --values samples/environment/demo.yaml \
  sample-ha nplus/nplus-instance
```

The essents of the values file is this:

- We use three (3) *nscale Server Application Layer*, two dedicated to user access, one dedicated to jobs
- if the jobs node fails, the user nodes take the jobs (handled by priority)
- if one of the user nodes fail, the other one handles the load
- Kubernetes takes care of restarting nodes should that happen
- All components run with two replicas
- Pod anti-affinities handle the distribution
- any administration component only connects to the jobs nappl, leaving the user nodes to the users
- PodDisruptionBudgets are defined for the crutial components. These are set via `minReplicaCount` for the components that can support multiple replicas, and `minReplicaCountType` for the **first** replicaSet of the components that do not support replicas, in this case nstla.

```
web:
  replicaCount: 2
  minReplicaCount: 1
rs:
  replicaCount: 2
  minReplicaCount: 1
ilm:
  replicaCount: 2
  minReplicaCount: 1
cmis:
  replicaCount: 2
  minReplicaCount: 1
webdav:
  replicaCount: 2
  minReplicaCount: 1
nstla:
  minReplicaCountType: 1
administrator:
  nappl:
    host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"
  waitFor:
    - "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
pam:
  nappl:
    host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"
  waitFor:
    - "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
nappl:
  replicaCount: 2
  minReplicaCount: 1
  jobs: false
  waitFor:
    - "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
nappljobs:
  replicaCount: 1
  jobs: true
  disableSessionReplication: true
  ingress:
    enabled: false
  snc:
    enabled: true
  waitFor:
    - "-service {{ .component.prefix }}database.{{ .Release.Namespace }}.svc.cluster.local:5432 -timeout 600"
application:
  nstl: 
    host: "{{ .component.prefix }}nstl-cluster.{{ .Release.Namespace }}"
  nappl:
    host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"
```
Public Information 2025-01-24 16:18:47 +01:00			`# High Availability`

			`To gain a higher level of availability for your Instance, you can`

			`- create more Kubernetes Cluster Nodes`
			`- create more replicas of the nscale and nplus components`
			`- distribute those replicas across multiple nodes using anti-affinities`

			`This is how:`

			```
			`helm install \`
			`--values samples/ha/values.yaml`
			`--values samples/environment/demo.yaml \`
			`sample-ha nplus/nplus-instance`
			```

			`The essents of the values file is this:`

			`- We use three (3) nscale Server Application Layer, two dedicated to user access, one dedicated to jobs`
			`- if the jobs node fails, the user nodes take the jobs (handled by priority)`
			`- if one of the user nodes fail, the other one handles the load`
			`- Kubernetes takes care of restarting nodes should that happen`
			`- All components run with two replicas`
			`- Pod anti-affinities handle the distribution`
			`- any administration component only connects to the jobs nappl, leaving the user nodes to the users`
			- PodDisruptionBudgets are defined for the crutial components. These are set via `minReplicaCount` for the components that can support multiple replicas, and `minReplicaCountType` for the first replicaSet of the components that do not support replicas, in this case nstla.

			```
			`web:`
			`replicaCount: 2`
			`minReplicaCount: 1`
			`rs:`
			`replicaCount: 2`
			`minReplicaCount: 1`
			`ilm:`
			`replicaCount: 2`
			`minReplicaCount: 1`
			`cmis:`
			`replicaCount: 2`
			`minReplicaCount: 1`
			`webdav:`
			`replicaCount: 2`
			`minReplicaCount: 1`
			`nstla:`
			`minReplicaCountType: 1`
			`administrator:`
			`nappl:`
			`host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"`
			`waitFor:`
			`- "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"`
			`pam:`
			`nappl:`
			`host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"`
			`waitFor:`
			`- "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"`
			`nappl:`
			`replicaCount: 2`
			`minReplicaCount: 1`
			`jobs: false`
			`waitFor:`
			`- "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"`
			`nappljobs:`
			`replicaCount: 1`
			`jobs: true`
			`disableSessionReplication: true`
			`ingress:`
			`enabled: false`
			`snc:`
			`enabled: true`
			`waitFor:`
			`- "-service {{ .component.prefix }}database.{{ .Release.Namespace }}.svc.cluster.local:5432 -timeout 600"`
			`application:`
			`nstl:`
			`host: "{{ .component.prefix }}nstl-cluster.{{ .Release.Namespace }}"`
			`nappl:`
			`host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"`
			```