Skip to content

Clustering

DeepIntShield Clustering delivers production-ready high availability through a peer-to-peer network architecture with automatic service discovery. The clustering system uses gossip protocols to maintain consistent state across nodes while providing seamless scaling, automatic failover, and zero-downtime deployments.

Modern AI gateway deployments require robust infrastructure to handle production workloads:

ChallengeImpactClustering Solution
Single Point of FailureComplete service outage if gateway failsDistributed architecture with automatic failover
Traffic SpikesPerformance degradation under high loadDynamic load distribution across multiple nodes
Provider Rate LimitsRequest throttling and service interruptionDistributed rate limit tracking across cluster
Regional LatencyPoor user experience in distant regionsGeographic distribution with local processing
Maintenance WindowsService downtime during updatesRolling updates with zero-downtime deployment
Capacity PlanningOver/under-provisioning resourcesElastic scaling based on real-time demand
FeatureDescription
Automatic Service Discovery6 discovery methods for any infrastructure (K8s, Consul, etcd, DNS, UDP, mDNS)
Peer-to-Peer ArchitectureNo single point of failure with equal node participation
Gossip-Based State SyncReal-time synchronization of traffic patterns and limits
Automatic FailoverSeamless traffic redistribution when nodes fail
Zero-Downtime UpdatesRolling deployments without service interruption

DeepIntShield clustering uses a peer-to-peer (P2P) network where all nodes are equal participants. Each node:

  • Discovers peers automatically using configured discovery method
  • Synchronizes state via gossip protocol
  • Shares traffic patterns and rate limits
  • Handles failover automatically

The gossip protocol ensures all nodes maintain consistent views of:

  • Traffic Patterns: Request volume, latency metrics, error rates
  • Rate Limit States: Current usage counters for each provider/model
  • Node Health: CPU, memory, network status of all peers
  • Configuration Changes: Provider updates, routing rules, policies

Convergence: All nodes converge to the same state within seconds with eventual consistency guarantees.

Cluster SizeFault ToleranceUse Case
3 nodes1 node failureSmall production deployments
5 nodes2 node failuresMedium production deployments
7+ nodes3+ node failuresLarge enterprise deployments

The new clustering configuration uses a cluster_config object with integrated service discovery:

{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "kubernetes",
"service_name": "deepintshield-cluster",
// Discovery-specific configuration here
},
"gossip": {
"port": 10101,
"config": {
"timeout_seconds": 10,
"success_threshold": 3,
"failure_threshold": 3
}
}
}
}

All discovery methods support these common fields:

FieldTypeRequiredDescription
enabledbooleanYesEnable/disable discovery
typestringYesDiscovery type: kubernetes, consul, etcd, dns, udp, mdns
service_namestringYesService name for discovery
bind_portintegerNoPort for cluster communication (default: 10101)
dial_timeoutdurationNoDiscovery timeout (default: 10s)
allowed_address_spacearrayNoCIDR ranges to filter discovered nodes (e.g., ["10.0.0.0/8"])
FieldDescriptionDefault
portGossip protocol port10101
timeout_secondsHealth check timeout10
success_thresholdSuccessful checks to mark healthy3
failure_thresholdFailed checks to mark unhealthy3

DeepIntShield supports 6 service discovery methods to fit any infrastructure. Choose based on your deployment environment:

Kubernetes

Native K8s pod discovery via label selectors

Open →

Consul

HashiCorp Consul service mesh integration

Open →

etcd

etcd-based distributed discovery

Open →

DNS

Traditional DNS SRV record discovery

Open →

UDP Broadcast

Local network broadcast discovery

Open →

mDNS

Multicast DNS for local development

Open →


Best for: Kubernetes deployments with StatefulSets or Deployments

Kubernetes discovery uses the K8s API to automatically discover pods based on label selectors. This is the most common method for cloud-native deployments.

  1. Each DeepIntShield pod queries the Kubernetes API for pods matching the label selector
  2. Discovers pod IPs automatically as pods scale up/down
  3. Works seamlessly with StatefulSets, Deployments, and DaemonSets
  4. No external dependencies required
{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "kubernetes",
"service_name": "deepintshield-cluster",
"k8s_namespace": "default",
"k8s_label_selector": "app=deepintshield"
},
"gossip": {
"port": 10101
}
}
}
ParameterRequiredDescriptionExample
k8s_namespaceNoKubernetes namespace to search"default", "production"
k8s_label_selectorYesLabel selector for pod discovery"app=deepintshield", "app=deepintshield,env=prod"
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: deepintshield
namespace: default
spec:
serviceName: deepintshield-cluster
replicas: 3
selector:
matchLabels:
app: deepintshield
template:
metadata:
labels:
app: deepintshield
spec:
serviceAccountName: deepintshield
containers:
- name: deepintshield
image: <enterprise_repo_base_url>/deepintshield:latest
ports:
- containerPort: 8080
name: http
- containerPort: 10101
name: gossip
volumeMounts:
- name: config
mountPath: /etc/deepintshield
volumes:
- name: config
configMap:
name: deepintshield-config
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: deepintshield
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: deepintshield-pod-reader
namespace: default
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: deepintshield-pod-reader
namespace: default
subjects:
- kind: ServiceAccount
name: deepintshield
namespace: default
roleRef:
kind: Role
name: deepintshield-pod-reader
apiGroup: rbac.authorization.k8s.io
Pods not discovering each other

Symptoms: Cluster shows only 1 member, pods running in isolation

Solutions:

  • Verify ServiceAccount has RBAC permissions to list pods
  • Check label selector matches pod labels exactly
  • Ensure namespace is correct (defaults to “default”)
  • Verify gossip port (10101) is not blocked by NetworkPolicies
  • Check logs for “error listing pods” messages
Permission denied errors

Symptoms: “error getting kubernetes config” or “forbidden” errors

Solutions:

  • Create ServiceAccount for DeepIntShield pods
  • Create Role with get, list, watch permissions on pods
  • Create RoleBinding linking ServiceAccount to Role
  • Verify RBAC is enabled in cluster
Cluster forms but nodes show as unhealthy

Symptoms: Nodes discovered but marked as “suspect” or “dead”

Solutions:

  • Verify gossip port (10101) is accessible between pods
  • Check for NetworkPolicies blocking pod-to-pod communication
  • Increase timeout_seconds in gossip config if network is slow
  • Verify pods are in Running state with kubectl get pods

Best for: Consul service mesh environments, multi-datacenter deployments

Consul discovery integrates with HashiCorp Consul for service registration and discovery. Ideal for environments already using Consul for service mesh or service discovery.

  1. Each DeepIntShield node registers itself with Consul on startup
  2. Nodes query Consul to discover other DeepIntShield instances
  3. Consul performs health checks on each node
  4. Unhealthy nodes are automatically deregistered
  5. Supports multi-datacenter deployments
{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "consul",
"service_name": "deepintshield-cluster",
"consul_address": "consul.service.consul:8500"
},
"gossip": {
"port": 10101
}
}
}
ParameterRequiredDescriptionExample
consul_addressNoConsul agent address"localhost:8500", "consul.service.consul:8500" (default: localhost:8500)
version: '3.8'
services:
consul:
image: hashicorp/consul:latest
command: agent -dev -client=0.0.0.0
ports:
- "8500:8500"
networks:
- deepintshield-net
deepintshield-1:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config-node1.json:/etc/deepintshield/config.json
ports:
- "8080:8080"
depends_on:
- consul
networks:
- deepintshield-net
deepintshield-2:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config-node2.json:/etc/deepintshield/config.json
ports:
- "8081:8080"
depends_on:
- consul
networks:
- deepintshield-net
deepintshield-3:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config-node3.json:/etc/deepintshield/config.json
ports:
- "8082:8080"
depends_on:
- consul
networks:
- deepintshield-net
networks:
deepintshield-net:
driver: bridge
Failed to register with Consul

Symptoms: “failed to register service with Consul” errors

Solutions:

  • Verify Consul agent is accessible at configured address
  • Check Consul agent logs for registration errors
  • Ensure Consul ACL token has write permissions if ACLs enabled
  • Verify network connectivity between DeepIntShield and Consul
  • Check firewall rules allow connections to port 8500
Services registered but not discovered

Symptoms: Consul UI shows services but nodes don’t join cluster

Solutions:

  • Verify service_name matches across all nodes
  • Check Consul service health checks are passing
  • Ensure gossip port is accessible between nodes
  • Verify nodes are registered in correct datacenter
  • Check for DNS resolution issues if using service DNS names
Health checks failing

Symptoms: Services show as critical in Consul UI

Solutions:

  • Verify gossip port (10101) is accessible
  • Check Consul agent can reach node’s gossip port
  • Increase health check timeout in Consul if needed
  • Review DeepIntShield logs for startup errors
  • Ensure nodes have correct IP addresses registered

Best for: etcd-based distributed systems, existing etcd infrastructure

etcd discovery uses etcd’s distributed key-value store for service registration and discovery. Perfect for environments already using etcd or requiring strong consistency.

  1. Each DeepIntShield node registers itself in etcd with a lease
  2. Nodes maintain lease through keepalive messages
  3. Nodes query etcd prefix to discover other instances
  4. Failed nodes’ leases expire and are automatically removed
  5. Provides strongly consistent service registry
{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "etcd",
"service_name": "deepintshield-cluster",
"etcd_endpoints": [
"http://etcd-1:2379",
"http://etcd-2:2379",
"http://etcd-3:2379"
],
"dial_timeout": "10s"
},
"gossip": {
"port": 10101
}
}
}
ParameterRequiredDescriptionExample
etcd_endpointsYesArray of etcd endpoint URLs["http://localhost:2379"], ["https://etcd1:2379", "https://etcd2:2379"]
dial_timeoutNoConnection timeout"10s" (default), "30s"
version: '3.8'
services:
etcd:
image: quay.io/coreos/etcd:latest
command:
- etcd
- --advertise-client-urls=http://etcd:2379
- --listen-client-urls=http://0.0.0.0:2379
- --listen-peer-urls=http://0.0.0.0:2380
- --initial-cluster=etcd=http://etcd:2380
- --initial-advertise-peer-urls=http://etcd:2380
ports:
- "2379:2379"
- "2380:2380"
networks:
- deepintshield-net
deepintshield-1:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8080:8080"
depends_on:
- etcd
networks:
- deepintshield-net
deepintshield-2:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8081:8080"
depends_on:
- etcd
networks:
- deepintshield-net
deepintshield-3:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8082:8080"
depends_on:
- etcd
networks:
- deepintshield-net
networks:
deepintshield-net:
driver: bridge
Failed to create etcd client

Symptoms: “etcd client error” on startup

Solutions:

  • Verify etcd endpoints are accessible
  • Check URL format (http:// or https://)
  • Ensure etcd cluster is healthy and running
  • Verify network connectivity to etcd endpoints
  • Check firewall rules allow connections to port 2379
  • Increase dial_timeout if network is slow
Failed to register with etcd

Symptoms: “failed to register with etcd” errors

Solutions:

  • Verify etcd cluster is accepting writes
  • Check etcd cluster has available space
  • Ensure authentication credentials if etcd has auth enabled
  • Review etcd logs for permission or quota errors
  • Verify node can resolve etcd hostnames
Lease keepalive failures

Symptoms: Nodes repeatedly registering/deregistering

Solutions:

  • Check network stability between nodes and etcd
  • Verify etcd cluster is not overloaded
  • Monitor etcd metrics for high latency
  • Increase lease TTL if network has high latency
  • Check for etcd leader election issues

Best for: Traditional infrastructure, static node addresses, cloud DNS services

DNS discovery uses standard DNS resolution to discover cluster nodes. Works with any DNS server and is ideal for static deployments or cloud environments with DNS integration.

  1. Configure DNS A records or SRV records for cluster nodes
  2. DeepIntShield queries DNS to resolve configured names
  3. All returned IP addresses are treated as potential cluster members
  4. Supports multiple DNS names for different node groups
  5. Works with internal DNS, cloud DNS, or public DNS
{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "dns",
"service_name": "deepintshield-cluster",
"dns_names": [
"deepintshield-cluster.local",
"deepintshield-nodes.internal.company.com"
],
"bind_port": 10101
},
"gossip": {
"port": 10101
}
}
}
ParameterRequiredDescriptionExample
dns_namesYesArray of DNS names to resolve["deepintshield.local"], ["node1.local", "node2.local", "node3.local"]
bind_portNoPort appended to discovered IPs10101 (default)
Terminal window
# Create A records for each node
aws route53 change-resource-record-sets \
--hosted-zone-id Z1234567890ABC \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "deepintshield-cluster.internal.company.com",
"Type": "A",
"TTL": 60,
"ResourceRecords": [
{"Value": "10.0.1.10"},
{"Value": "10.0.1.11"},
{"Value": "10.0.1.12"}
]
}
}]
}'
DNS lookup errors

Symptoms: “dns lookup error” in logs, no nodes discovered

Solutions:

  • Verify DNS names are resolvable: nslookup deepintshield-cluster.local
  • Check DNS server is accessible from DeepIntShield nodes
  • Verify /etc/resolv.conf has correct nameserver
  • Test DNS resolution from inside container if using Docker
  • Check for DNS caching issues (try flushing DNS cache)
No nodes discovered via DNS

Symptoms: DNS resolves but cluster has 0 members

Solutions:

  • Verify DNS returns multiple A records (not CNAME)
  • Check that returned IPs are correct and reachable
  • Ensure bind_port matches actual gossip port on nodes
  • Verify nodes are listening on returned IP addresses
  • Use dig or nslookup to verify DNS response format
Nodes discovered but can’t connect

Symptoms: IPs discovered but gossip connection fails

Solutions:

  • Verify gossip port (10101) is open on all nodes
  • Check firewall rules between nodes
  • Ensure nodes are listening on correct network interface
  • Verify IP addresses match node’s actual network addresses
  • Test connectivity: telnet <ip> 10101

Best for: Local network deployments, on-premise infrastructure, development clusters

UDP broadcast discovery automatically finds nodes on the same local network using broadcast packets. No external dependencies required.

  1. Nodes broadcast UDP discovery beacons on configured port
  2. Other nodes on the same network respond with acknowledgments
  3. Nodes discover each other’s IP addresses automatically
  4. Limited to nodes on the same broadcast domain (subnet)
  5. Requires allowed_address_space for security
{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "udp",
"service_name": "deepintshield-cluster",
"udp_broadcast_port": 9999,
"allowed_address_space": [
"192.168.1.0/24",
"10.0.0.0/8"
],
"dial_timeout": "10s"
},
"gossip": {
"port": 10101
}
}
}
ParameterRequiredDescriptionExample
udp_broadcast_portYesPort for broadcast discovery9999, 8888
allowed_address_spaceYesCIDR ranges to limit discovery scope["192.168.1.0/24"], ["10.0.0.0/8", "172.16.0.0/12"]
dial_timeoutNoTime to wait for responses"10s" (default)
version: '3.8'
services:
deepintshield-1:
image: <enterprise_repo_base_url>/deepintshield:latest
network_mode: bridge
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8080:8080"
- "9999:9999/udp"
- "10101:10101"
deepintshield-2:
image: <enterprise_repo_base_url>/deepintshield:latest
network_mode: bridge
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8081:8080"
- "9999:9999/udp"
- "10101:10101"
deepintshield-3:
image: <enterprise_repo_base_url>/deepintshield:latest
network_mode: bridge
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8082:8080"
- "9999:9999/udp"
- "10101:10101"
No nodes discovered via UDP broadcast

Symptoms: Discovery runs but finds 0 nodes

Solutions:

  • Verify allowed_address_space includes node IP addresses
  • Check UDP broadcast port is open (firewall/security groups)
  • Ensure nodes are on same subnet/broadcast domain
  • Verify broadcast is enabled on network interface
  • Test with tcpdump -i any -n udp port 9999
  • Check Docker network mode supports broadcast (use bridge or host)
Address space filtering issues

Symptoms: “not in allowed address space” warnings

Solutions:

  • Verify CIDR notation is correct (e.g., 192.168.1.0/24)
  • Ensure allowed_address_space covers all node IPs
  • Check node IP addresses: ip addr or ifconfig
  • Remember to use network address, not host address
  • Test CIDR match online or with ipcalc
Permission denied on UDP port

Symptoms: “permission denied” or “address already in use”

Solutions:

  • Check if another process is using the UDP broadcast port
  • Verify port number is > 1024 (non-privileged) or run as root
  • Use netstat -tulpn | grep 9999 to check port usage
  • Change udp_broadcast_port to different value
  • Ensure firewall isn’t blocking UDP on that port

Best for: Local development, testing, zero-configuration setups

mDNS (Multicast DNS) provides zero-configuration service discovery on local networks. Perfect for development and testing without requiring any infrastructure setup.

  1. Nodes advertise themselves via mDNS (Bonjour/Avahi)
  2. Other nodes browse for mDNS services
  3. Automatic discovery within the same local network
  4. No DNS server or configuration required
  5. Limited to local network segment
{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "mdns",
"service_name": "deepintshield",
"mdns_service": "_bifrost._tcp",
"dial_timeout": "10s"
},
"gossip": {
"port": 10101
}
}
}
ParameterRequiredDescriptionExample
mdns_serviceNomDNS service type"_bifrost._tcp" (default), "_myapp._tcp"
dial_timeoutNoTime to wait for mDNS responses"10s" (default)
Terminal window
# Start first node
docker run -p 8080:8080 -p 10101:10101 \
-v $(pwd)/config-mdns.json:/etc/deepintshield/config.json \
<enterprise_repo_base_url>/deepintshield:latest
# Start second node (discovers first automatically)
docker run -p 8081:8080 -p 10102:10101 \
-v $(pwd)/config-mdns.json:/etc/deepintshield/config.json \
<enterprise_repo_base_url>/deepintshield:latest
# Start third node (discovers both automatically)
docker run -p 8082:8080 -p 10103:10101 \
-v $(pwd)/config-mdns.json:/etc/deepintshield/config.json \
<enterprise_repo_base_url>/deepintshield:latest
mDNS services not discovered

Symptoms: Nodes don’t discover each other via mDNS

Solutions:

  • Verify mDNS is enabled on network (check firewall)
  • Ensure multicast is enabled on network interface
  • Check nodes are on same local network segment
  • Verify mDNS port 5353 is not blocked
  • Test mDNS resolution: avahi-browse -a (Linux) or dns-sd -B (macOS)
  • Increase dial_timeout if discovery is slow
Network address validation errors

Symptoms: “skipping invalid host address” warnings

Solutions:

  • This is normal - mDNS returns network/broadcast addresses
  • mDNS automatically filters invalid addresses (127.x.x.x, *.0, *.255)
  • Check that nodes have valid non-loopback IP addresses
  • Ensure nodes are not using 127.0.0.1 for binding
  • Verify network interface has proper IP configuration
Discovery works but cluster unstable

Symptoms: Nodes discover then disconnect repeatedly

Solutions:

  • mDNS has eventual consistency, allow time for propagation
  • Check gossip port accessibility between nodes
  • Verify network doesn’t drop multicast packets
  • Consider using a more robust discovery method for production
  • Check for network congestion or packet loss

Complete example using Kubernetes-style discovery with a shared config store:

version: '3.8'
services:
postgres:
image: postgres:14
environment:
POSTGRES_DB: deepintshield
POSTGRES_USER: deepintshield
POSTGRES_PASSWORD: deepintshield_password
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- deepintshield-net
consul:
image: hashicorp/consul:latest
command: agent -dev -client=0.0.0.0
ports:
- "8500:8500"
networks:
- deepintshield-net
deepintshield-1:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8080:8080"
depends_on:
- postgres
- consul
networks:
- deepintshield-net
deepintshield-2:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8081:8080"
depends_on:
- postgres
- consul
networks:
- deepintshield-net
deepintshield-3:
image: <enterprise_repo_base_url>/deepintshield:latest
environment:
- DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json
volumes:
- ./config.json:/etc/deepintshield/config.json
ports:
- "8082:8080"
depends_on:
- postgres
- consul
networks:
- deepintshield-net
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- deepintshield-1
- deepintshield-2
- deepintshield-3
networks:
- deepintshield-net
volumes:
postgres_data:
networks:
deepintshield-net:
driver: bridge

nginx.conf for load balancing:

events {
worker_connections 1024;
}
http {
upstream bifrost_cluster {
least_conn;
server deepintshield-1:8080 max_fails=3 fail_timeout=30s;
server deepintshield-2:8080 max_fails=3 fail_timeout=30s;
server deepintshield-3:8080 max_fails=3 fail_timeout=30s;
}
server {
listen 80;
location / {
proxy_pass http://bifrost_cluster;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
}
}

Production-ready Kubernetes deployment with StatefulSet:

apiVersion: v1
kind: ConfigMap
metadata:
name: deepintshield-config
namespace: deepintshield
data:
config.json: |
{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "kubernetes",
"service_name": "deepintshield-cluster",
"k8s_namespace": "deepintshield",
"k8s_label_selector": "app=deepintshield,component=gateway"
},
"gossip": {
"port": 10101,
"config": {
"timeout_seconds": 10,
"success_threshold": 3,
"failure_threshold": 3
}
}
},
"config_store": {
"enabled": true,
"type": "postgres",
"config": {
"host": "postgres.deepintshield.svc.cluster.local",
"port": "5432",
"user": "deepintshield",
"password": "changeme",
"db_name": "deepintshield",
"ssl_mode": "require"
}
}
}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: deepintshield
namespace: deepintshield
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: deepintshield-pod-reader
namespace: deepintshield
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: deepintshield-pod-reader
namespace: deepintshield
subjects:
- kind: ServiceAccount
name: deepintshield
namespace: deepintshield
roleRef:
kind: Role
name: deepintshield-pod-reader
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: deepintshield
namespace: deepintshield
spec:
serviceName: deepintshield-cluster
replicas: 3
selector:
matchLabels:
app: deepintshield
component: gateway
template:
metadata:
labels:
app: deepintshield
component: gateway
spec:
serviceAccountName: deepintshield
containers:
- name: deepintshield
image: <enterprise_repo_base_url>/deepintshield:latest
ports:
- containerPort: 8080
name: http
protocol: TCP
- containerPort: 10101
name: gossip
protocol: TCP
env:
- name: DEEPINTSHIELD_CONFIG
value: /etc/deepintshield/config.json
volumeMounts:
- name: config
mountPath: /etc/deepintshield
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
volumes:
- name: config
configMap:
name: deepintshield-config
---
apiVersion: v1
kind: Service
metadata:
name: deepintshield-cluster
namespace: deepintshield
spec:
clusterIP: None
selector:
app: deepintshield
component: gateway
ports:
- port: 10101
name: gossip
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: deepintshield
namespace: deepintshield
spec:
type: LoadBalancer
selector:
app: deepintshield
component: gateway
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: deepintshield-pdb
namespace: deepintshield
spec:
minAvailable: 2
selector:
matchLabels:
app: deepintshield
component: gateway

For bare metal or VM deployments using systemd:

Step 1: Install DeepIntShield on each node

Terminal window
# Download DeepIntShield Enterprise binary
curl -O https://releases.getmaxim.ai/deepintshield-enterprise/latest/deepintshield-enterprise-linux-amd64
chmod +x deepintshield-enterprise-linux-amd64
sudo mv deepintshield-enterprise-linux-amd64 /usr/local/bin/deepintshield-enterprise

Step 2: Create configuration file

Terminal window
sudo mkdir -p /etc/deepintshield
sudo cat > /etc/deepintshield/config.json <<EOF
{
"cluster_config": {
"enabled": true,
"discovery": {
"enabled": true,
"type": "dns",
"service_name": "deepintshield-cluster",
"dns_names": ["deepintshield-cluster.internal.company.com"]
},
"gossip": {
"port": 10101
}
},
"config_store": {
"enabled": true,
"type": "postgres",
"config": {
"host": "postgres.internal.company.com",
"port": "5432",
"user": "deepintshield",
"password": "secure_password",
"db_name": "deepintshield",
"ssl_mode": "require"
}
}
}
EOF

Step 3: Create systemd service

Terminal window
sudo cat > /etc/systemd/system/deepintshield.service <<EOF
[Unit]
Description=DeepIntShield Enterprise API Gateway
After=network.target
[Service]
Type=simple
User=deepintshield
Group=deepintshield
Environment="DEEPINTSHIELD_CONFIG=/etc/deepintshield/config.json"
ExecStart=/usr/local/bin/deepintshield-enterprise
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/deepintshield
[Install]
WantedBy=multi-user.target
EOF

Step 4: Setup DNS records

Terminal window
# Add A records for deepintshield-cluster.internal.company.com
# pointing to all node IPs:
# 10.0.1.10 (node1)
# 10.0.1.11 (node2)
# 10.0.1.12 (node3)

Step 5: Start and enable service

Terminal window
sudo useradd -r -s /bin/false deepintshield
sudo mkdir -p /var/lib/deepintshield
sudo chown deepintshield:deepintshield /var/lib/deepintshield
sudo systemctl daemon-reload
sudo systemctl enable deepintshield
sudo systemctl start deepintshield
sudo systemctl status deepintshield

Step 6: Verify cluster formation

Terminal window
# Check logs on each node
sudo journalctl -u deepintshield -f
# Look for messages like:
# "successfully joined X peers on startup"
# "cluster health: HEALTHY"

Cluster forms but only has 1 member

Symptoms: Each node thinks it’s the only member

Common Causes & Solutions:

  • Discovery not configured: Verify discovery.enabled: true and discovery.type is set
  • Service name mismatch: Ensure all nodes have identical service_name
  • Gossip port blocked: Check firewall allows TCP port 10101 between nodes
  • Discovery method issues: See method-specific troubleshooting above
  • Network isolation: Verify nodes can reach each other on gossip port
Split brain - nodes form separate clusters

Symptoms: Nodes divided into separate clusters

Common Causes & Solutions:

  • Network partition: Check network connectivity between all nodes
  • Different discovery configs: Ensure all nodes use same discovery settings
  • Firewall blocking gossip: Verify bidirectional connectivity on port 10101
  • Discovery scoped incorrectly: Check label selectors, DNS names, or address spaces
  • Restart all nodes: Sometimes requires simultaneous restart to reform cluster
High memory usage in cluster

Symptoms: Memory grows over time, especially in large clusters

Common Causes & Solutions:

  • Large gossip messages: Check size of gossiped data
  • Too many nodes: Optimize for clusters with 3-7 nodes typically
  • Message deduplication cache: This is normal, cache TTL is 2 minutes
  • Increase node resources: Ensure adequate memory allocation
Cluster unstable - nodes flapping

Symptoms: Nodes repeatedly join and leave cluster

Common Causes & Solutions:

  • Network instability: Check for packet loss or high latency
  • Resource constraints: Ensure nodes have adequate CPU/memory
  • Timeout too aggressive: Increase timeout_seconds in gossip config
  • Health check failures: Review liveness probe configuration
  • Discovery intervals: Check discovery isn’t running too frequently
Cannot broadcast messages to cluster

Symptoms: Broadcast queue errors, messages not propagating

Common Causes & Solutions:

  • Queue not initialized: Check logs for initialization errors
  • No active members: Verify cluster has multiple healthy members
  • Gossip port unreachable: Test connectivity between all nodes
  • Message too large: Check size of broadcast messages

Key log messages to look for:

✅ Successful cluster formation:
- "successfully joined X peers on startup"
- "cluster health: HEALTHY"
- "discovered X nodes"
⚠️ Warning signs:
- "no new nodes discovered"
- "failed to join cluster"
- "cluster health: NOT HEALTHY"
- "node marked as suspect"
❌ Errors:
- "discovery failed"
- "failed to broadcast"
- "timeout waiting for response"

Monitor cluster health via HTTP endpoints:

Terminal window
# Check if node is healthy
curl http://localhost:8080/health
# Get cluster status (if exposed)
curl http://localhost:8080/cluster/status
# Expected response shows all cluster members
{
"local_node": "deepintshield-remote-10101-...",
"members": 3,
"healthy_members": 3,
"cluster_health": "HEALTHY"
}

This clustering implementation ensures DeepIntShield can handle enterprise-scale deployments with high availability, automatic service discovery, and intelligent traffic distribution across any infrastructure.