SDC - OpenShift Setup
This section outlines deployment of DRISTI application SDC's self-managed OpenShift Container Platform.
Prerequisites:
Knowledge on Kubernetes, helm, helmfile , openshift
Setup VPN and login to SDC network from your local system
Configure the local system with the following:
kubectl
OpenShift cli
helmfile and helmfile diff(plugin)
DB & Minio installed in VM provided by SDC
Store all the required Images in the container registry which was whitelisted by SDC.
Processes Followed for SDC Approvals
Steps:
Accessing the Openshift in SDC using VPN
Download Accops Hysecure (VPN client) from this link
Once we got the VPN credentials from SDC use them to login to the VPN client with server address as “evpn.kerala.gov.in”
Modify /etc/hosts and do the host entry with the details provided by SDC.
Example: This is for UAT environment
10.2.147.242
console-openshift-console.apps.ocp1.ksdc.in oauth-openshift.apps.ocp1.ksdc.in
10.2.147.240
api.ocp1.ksdc.in
10.2.147.242 - for console access & 10.2.147.240 for cli access.
use the URL and credentials provided for openshift by SDC to access OpenShift Console.
Note: Use Accops Hysecure client downloaded not the web. The Ip's and the hosts for entry will differ for each cluster and these will be shared by SDC. The console had private certificates, so the browser will prompt “connection is not secure” go ahead and access the URL.
Once we have the VPN access make sure that the VPN user has access to necessary server ip addresses and ports.
Container Registry:
We have our application images stored in azure container registry, In SDC only GitHub and docker hub are whitelisted so we got the ACR whitelisted
We need to create a docker-registry secret in the namespace with ACR credentials and add the imagePullSecrets entry in the deployment file.
kubectl create secret docker-registry <secret-name> \
--namespace <namespace> \
--docker-server=<container-registry-name>.azurecr.io \
--docker-username=<service-principal-ID> \
--docker-password=<service-principal-password>
Note: Secrets are Kubernetes namespaced api-resource. Need to create in every namespace.
Virtual Machines:
We are deploying DB and MINIO in Virtual Machines.
Check internet access by doing curl (ping won't work in SDC)
Check whether the VM is registered with RHEL by doing yum upgrade
To auto-enable Redhat Subscription, You may Run This Command on your VMs
curl -sSL http://10.5.93.94/subscription/subscribe1.sh | bash
Need a firewall entry with ports 5432, 9000 & 9001, instance ips for communication with OpenShift cluster
Installing Postgres DB.
Postgres version: 15.8
Architecture : Master-Slave ( 1 master and 2 slaves)
Created a virtual ip to assign to the healthy vm automatically, Used repmgr to replicate data into both slaves.
Follow this file for db installation in vm
Minio installation:
MinIO is an open-source object storage service that is compatible with AWS S3. It is used to store application assets, uploaded files, and PDFs generated by the system etc and serves as an alternative to NFS (Network File System) services.
o Install minio using rpm package manager
o Version details: minio version RELEASE.2024-08-03T04-33-23Z (commit-id=6efb56851c40da88d1ca15112e2d686a4ecec6b3)
o Useful commands in minio server:
mc alias set local http://<ip address>:9000 <access-key> <secret-key>
mc mb local/uat
# uat is bucket name
mc admin user add local accesskey secretaccesskey
for policy in $(mc admin policy list local); do mc admin policy attach local $policy --user <username/accesskey>; done
mc anonymous set download local/uat
Note: We have 2 VMs assigned for minio service in active passive configuration, with passive as failback in case active node fails. Use virtual ip to assign the request to running vm and rsync to sync data between both vms.
Installation Process:
Login to OpenShift cluster Web console with the credentials provided by SDC
Get the Namespaces/projects created by admin, share the SCC and clusterrole manifest files with cluster admin and get them deployed
Login to OpenShift cli using token generated by OpenShift console
Clone the DIGIT-Openshift repo to the system where VPN access is present.
Navigate to respective directory and do helmfile –f <filename> apply command
Note: Regarding monitoring tools we will be using the existing prometheus , grafana , elastic search which are integrated with openshift cluster rather than digit monitoring tools which require access to nodes.
Steps:
Cluster Admin Access:
As the cluster in SDC contains many applications, we will be provided with restricted access which lets us deploy kubernetes objects to our namespaces only. Some k8s objects are namespaced resources. If we need cluster scoped resources ( cluster role with our namespaces or across cluster ) , creation of namespaces we need admin access. Deployment for which cluster access is required is as follows:
Namespaces:
In Openshift, kubernetes namespaces are termed as projects. Only Cluster admin will can create this
SCC:
Security Context Constraint (SCC) : RBAC is used to provide access( get , list , create etc ) to kubernetes resources like pods , deployments , services etc to the users or service accounts. Whereas SCC are used to provide access for application in the pod to the underlying host resources such as host volumes , host network etc
By default the pod will be associated with restricted scc, based on the application requirement we can assign existing-scc or create new ones then add the scc to a service account by using a role.
Get the existing scc yaml deployed by cluster admin
Note: Suppose if a pod is not getting deployed due to lack of permissions, use this command to understand which scc can provide access.
Kubectl get pod <pod-name> -o yaml | oc adm policy scc-subject-review -f -
To check which scc is associated : Kubectl describe pod <pod-name> | grep scc
To check whether we have access to create a particular resource
Kubectl auth can-i <action-name> <resource>
Nginx:
This deployment requires creation of a cluster role which provides access to configmaps, pods, secrets across the application namespaces . Refer ingress deployment file for more info
The cluster may contain more than one ingress controller, so make sure to use different metadata of controller-class and ingress-class. And also the ingress objects of your application should contain the annotation of your ingress-class.
kubernetes.io/ingress.class: "nginx-prod"
In ingress controller configmap add the below
ssl-redirect: 'false'
worker-processes: '24'
Allow-snippet-annotations: ‘true’
In the nginx.conf file located in ingress pod /etc location, the worker_process value is set to automatic i.e it can assume all the existing processes which causes an error. The ingress will be up and running only. But it servers requests randomly like it processes some and avoids some. To avoid this change the worker_processes 24; ( 24 or 16 or 8 )
To avoid continuous loop error ssl-redirect should be false.
Note: Whenever we are deploying nginx from the same manifest file , it is necessary to modify the labels as well along with other details like namespace etc.
Nginx Rate-limiting and security Headers:
This helps in protecting the application against DDOS attacks by limiting the incoming request rate of http requests a user can make in a given period of time i.e 100 request per minute etc.
The Below mentioned annotations helps in achieving rate limiting and security headers (to mitigate common web-security threats)
nginx.ingress.kubernetes.io/limit-burst-multiplier: '1'
nginx.ingress.kubernetes.io/limit-rpm: '500'
nginx.ingress.kubernetes.io/limit-connections: '10'
nginx.ingress.kubernetes.io/configuration-snippet: |
more_set_headers "X-Content-Type-Options: nosniff";
more_set_headers "X-Frame-Options: DENY";
more_set_headers "X-Xss-Protection: 1; mode=block";
limit_req_status 429;
more_set_headers "Strict-Transport-Security: max-age=31536000; includeSubDomains; preload";
more_set_headers "Content-Security-Policy: object-src 'self'; media-src 'self'; frame-ancestors 'none'; base-uri 'self'; frame-src 'self'; worker-src 'self'; manifest-src 'self';" always;
Gateway Rate-Limiting:
We can limit requests at nginx level and spring cloud gateway level as well . Deploy the latest gateway image with the following values to be present in every kubernetes service.
service:
additionalAnnotations: |
gateway-burstCapacity: "5"
gateway-keyResolver: "ipKeyResolver"
gateway-replenishRate: "5"
The spring cloud gateway uses the Token Bucket algorithm. Burst Capacity is the number of tokens the bucket can hold in a second. Replenish rate is the rate at which tokens are added to the bucket in a second.
With the above calculation the service will allow 300 requests per minute.
Cert-Manager-service:
Kubernetes API Server has default api resources like deployments , services, persistent volume etc, if we need any other api resources which are not default we need to create a custom resource definition (CRD) object .
This service takes care of automatically renewing the certificate , creating challenges etc. And it generally uses the lets encrypt as certificate issuing authority.
Note: In our case this service may not be useful as the certificates are procured by kerala high court and we need to just create a secret and the tls part in ingress object should refer to the secret created.
InsightsAndResolutions:
Accessibility:
We got 2 public Ip addresses assigned by SDC, one for incoming traffic (ingress) and the other for outgoing traffic mapped to egress of our application.
Ping command won't work in Kerala SDC network
To check public access use curl
Route: It is a feature in Openshift which exposes a service to make it public.
Ingress: Whenever we create a route weather it is default or custom host do the host entry in /etc/hosts against the Ip address for Console (10.2.147.242 in UAT)
Domain name: we created a route to expose the ingress load balancer service using default endpoint(route). But this endpoint is for internal networks only. So, to add an “A” record entry in DNS server against the incoming public address we are using oncourts.kerala.gov.in then using CNAME record we will map this domain name to the default route created.
Note:
Instead of using the default route and making 2 entries in the DNS server we used a custom route and map the route URL to ingress public Ip address.
Ingress Controller:
As we are hosting our application on-premises we don’t have load balancer service to expose ingress controller we need to use nodeport service and then proxy server to avoid entering port every time but in openshift we can do this by creating a route(expose the service).
We need to use the passthrough route type as it will send the https request to the router and then to the ingress controller as https only.
Don't use edge type as https traffic will become http(tls termination) after reaching the router and pass as http request to ingress controller. (Http request will send to https port)
Using the tls part in the ingress object, the ingress controller will let the traffic of the host to pass through as http, all the communication within the application is http.
Networking issue:
Whenever an egress Ip is created it will be associated with a particular master node even though it is for the entire cluster. Let's assume our egress is associated with master node 1(mn1) and some of our services (egov) are deployed in master node 2(mn2). If the services in mn2 want to reach the internet the request should come from mn2 to mn1 then outside to the internet.
We can check this by installing tcp dump on the master nodes to check the traffic:
You will get interface from ip a
tcpdump -neli <interface> port 65530
Send your request by adding --local-port 65530
Issues faced
Request is directly going outside from mn2 while egress ip is associated to mn1 as a result at the receiver's end, we are hitting with different Ip rather than the one which was assigned to us and got whitelisted by the 3rd party.
The egress Ip is created but not whitelisted in SDC, the services in the nodes were not able to able to get internet while the nodes themselves can do stuff like upgrade
ElasticSearch:
Add an init container script to update the file permissions which is preventing the pod to come up in both master and data.
In the deployment provide both resource limit and request which helps in avoiding OOM(out of memory ) error and its restart.
Make sure this variable in present in the statefulset deployment file
- name: ELASTIC_USERNAME
valueFrom:
secretKeyRef:
name: elasticsearch-master-credentials
key: username
SSL-Certificate:
Generated ssl certificate manually using lets encrypt.
Command: sudo certbot certonly --manual --preferred-challenges=dns --email <email-details> --server https://acme-v02.api.letsencrypt.org/directory --agree-tos --manual-public-ip-logging-ok -d <"Domain - name ">
Approach:
Deployed a pod with network image (praqma/network-multitool) helps in troubleshooting networking issues.
Deployed a temporary nginx pod along with service and modify the root-ingress of the application to point the request to temp-nginx service to make it accessible.
Make sure the domain name should be accessible (able to curl) from the network pod
Run the certbot command in a network pod which will create a dns challenge. Copy the text generated.
Exec to nginx-temp pod follow the below steps:
cd usr/share/nginx/html/
mkdir .well-known
cd .well-known/
mkdir acme-challenge
cd acme-challenge/
nano <file-name-string>
## ADD the full string into the file. ##
nano /etc/nginx/conf.d/default.conf
## add this in the server block ##
location ^~ /.well-known {
allow all;
alias /usr/share/nginx/html/.well-known/;
}
Last updated