High Availability
To gain a higher level of availability for your Instance, you can
- create more Kubernetes Cluster Nodes
- create more replicas of the nscale and nplus components
- distribute those replicas across multiple nodes using anti-affinities
This is how:
helm install \
--values samples/ha/values.yaml
--values samples/environment/demo.yaml \
sample-ha nplus/nplus-instance
The essents of the values file is this:
- We use three (3) nscale Server Application Layer, two dedicated to user access, one dedicated to jobs
- if the jobs node fails, the user nodes take the jobs (handled by priority)
- if one of the user nodes fail, the other one handles the load
- Kubernetes takes care of restarting nodes should that happen
- All components run with two replicas
- Pod anti-affinities handle the distribution
- any administration component only connects to the jobs nappl, leaving the user nodes to the users
- PodDisruptionBudgets are defined for the crutial components. These are set via
minReplicaCountfor the components that can support multiple replicas, andminReplicaCountTypefor the first replicaSet of the components that do not support replicas, in this case nstla.
web:
replicaCount: 2
minReplicaCount: 1
rs:
replicaCount: 2
minReplicaCount: 1
ilm:
replicaCount: 2
minReplicaCount: 1
cmis:
replicaCount: 2
minReplicaCount: 1
webdav:
replicaCount: 2
minReplicaCount: 1
nstla:
minReplicaCountType: 1
administrator:
nappl:
host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"
waitFor:
- "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
pam:
nappl:
host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"
waitFor:
- "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
nappl:
replicaCount: 2
minReplicaCount: 1
jobs: false
waitFor:
- "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
nappljobs:
replicaCount: 1
jobs: true
disableSessionReplication: true
ingress:
enabled: false
snc:
enabled: true
waitFor:
- "-service {{ .component.prefix }}database.{{ .Release.Namespace }}.svc.cluster.local:5432 -timeout 600"
application:
nstl:
host: "{{ .component.prefix }}nstl-cluster.{{ .Release.Namespace }}"
nappl:
host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"