High Availability

To gain a higher level of availability for your Instance, you can

create more Kubernetes Cluster Nodes
create more replicas of the nscale and nplus components
distribute those replicas across multiple nodes using anti-affinities

This is how:

helm install \
  --values samples/ha/values.yaml
  --values samples/environment/demo.yaml \
  sample-ha nplus/nplus-instance

The essents of the values file is this:

We use three (3) nscale Server Application Layer, two dedicated to user access, one dedicated to jobs
if the jobs node fails, the user nodes take the jobs (handled by priority)
if one of the user nodes fail, the other one handles the load
Kubernetes takes care of restarting nodes should that happen
All components run with two replicas
Pod anti-affinities handle the distribution
any administration component only connects to the jobs nappl, leaving the user nodes to the users
PodDisruptionBudgets are defined for the crutial components. These are set via minReplicaCount for the components that can support multiple replicas, and minReplicaCountType for the first replicaSet of the components that do not support replicas, in this case nstla.

web:
  replicaCount: 2
  minReplicaCount: 1
rs:
  replicaCount: 2
  minReplicaCount: 1
ilm:
  replicaCount: 2
  minReplicaCount: 1
cmis:
  replicaCount: 2
  minReplicaCount: 1
webdav:
  replicaCount: 2
  minReplicaCount: 1
nstla:
  minReplicaCountType: 1
administrator:
  nappl:
    host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"
  waitFor:
    - "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
pam:
  nappl:
    host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"
  waitFor:
    - "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
nappl:
  replicaCount: 2
  minReplicaCount: 1
  jobs: false
  waitFor:
    - "-service {{ .component.prefix }}nappljobs.{{ .Release.Namespace }}.svc.cluster.local:{{ .this.nappl.port }} -timeout 600"
nappljobs:
  replicaCount: 1
  jobs: true
  disableSessionReplication: true
  ingress:
    enabled: false
  snc:
    enabled: true
  waitFor:
    - "-service {{ .component.prefix }}database.{{ .Release.Namespace }}.svc.cluster.local:5432 -timeout 600"
application:
  nstl: 
    host: "{{ .component.prefix }}nstl-cluster.{{ .Release.Namespace }}"
  nappl:
    host: "{{ .component.prefix }}nappljobs.{{ .Release.Namespace }}"

2.5 KiB Raw Permalink Blame History

High Availability

2.5 KiB

Raw Permalink Blame History