Your long-running sessions could fail after you deployed Vertica on Elastic Kubernetes Cluster (EKS) with Load Balancer as the service type.
dbadmin@v-sc-0:/$ /opt/vertica/bin/vsql -h internal-acc1a79a37984458b9930acd01cba3f5-782667786.us-east-1.elb.amazonaws.com -c "select sleep(70);" server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost
When the load balancer type is not specified in yaml, by default, EKS generates a classic load balancer with a default connection idle timeout of 60 seconds. The aws-load-balancer-connection-idle-timeout can be set up to 4000 seconds, according to the documentation. What should be done, though, if a query takes more than 4000 seconds?
dbadmin@v-sc-0:/$ /opt/vertica/bin/vsql -h internal-acc1a79a37984458b9930acd01cba3f5-782667786.us-east-1.elb.amazonaws.com -c "select sleep(4001);" server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost
Well, there is a solution to it. Instead of Class Load Balancer, you can use Network Load Balancer (NLB). For best performance, Vertica recommends using Network Load Balancer.
Include the following annotation in the yaml file when deploying the database to ensure that an NLB load balancer is generated.
serviceName: LoadbalancerService serviceType: LoadBalancer serviceAnnotations: service.beta.kubernetes.io/aws-load-balancer-type: "nlb-ip"
AWS, however, sets the TCP flow idle timeout value by default to 350 seconds. This value cannot be changed. From Vertica 11.x onwards, a set of similar keepalive parameters that can replace TCP keepalive parameter values are supported. By default, all Vertica keepalive parameters are set to 0, which signifies using TCP keepalive settings. You need to adjust the database keepalive settings so that keepalives are sent less than 350 seconds apart. Use the following queries to alter the database’s keep alive settings:
ALTER DATABASE DEFAULT SET KeepAliveIdleTime = 300; ALTER DATABASE DEFAULT SET KeepAliveProbeInterval = 60; ALTER DATABASE DEFAULT SET KeepAliveProbeCount = 20;
You will now see that Vertica successfully completes the sleep test after running it for 400 seconds, which is longer than the usual NLB timeout of 350 seconds.
dbadmin@v1-sc-0:/$ date;vsql -h a21a604ba68664737b89a5bf267147f5-7d71000035490e9f.elb.us-east-1.amazonaws.com -c 'select sleep(355)';date Tue Feb 21 21:17:18 UTC 2023 sleep ------- 0 (1 row) Tue Feb 21 21:23:13 UTC 2023 dbadmin@v1-sc-0:/$
Configure TCP keepalive with AWS Network Load Balancer
Limiting the Number and Length of Client Connections
AWS Load Balance Controller Annotations