Cloud Architecture Resources

DZone's Featured Cloud Architecture Resources

Use AWS Controllers for Kubernetes To Deploy a Serverless Data Processing Solution With SQS, Lambda, and DynamoDB

By Abhishek Gupta CORE

In this blog post, you will be using AWS Controllers for Kubernetes on an Amazon EKS cluster to put together a solution wherein data from an Amazon SQS queue is processed by an AWS Lambda function and persisted to a DynamoDB table. AWS Controllers for Kubernetes (also known as ACK) leverage Kubernetes Custom Resource and Custom Resource Definitions and give you the ability to manage and use AWS services directly from Kubernetes without needing to define resources outside of the cluster. The idea behind ACK is to enable Kubernetes users to describe the desired state of AWS resources using the Kubernetes API and configuration language. ACK will then take care of provisioning and managing the AWS resources to match the desired state. This is achieved by using Service controllers that are responsible for managing the lifecycle of a particular AWS service. Each ACK service controller is packaged into a separate container image that is published in a public repository corresponding to an individual ACK service controller. There is no single ACK container image. Instead, there are container images for each individual ACK service controller that manages resources for a particular AWS API. This blog post will walk you through how to use the SQS, DynamoDB, and Lambda service controllers for ACK. Prerequisites To follow along step-by-step, in addition to an AWS account, you will need to have AWS CLI, kubectl, and Helm installed. There are a variety of ways in which you can create an Amazon EKS cluster. I prefer using eksctl CLI because of the convenience it offers. Creating an EKS cluster using eksctl can be as easy as this: eksctl create cluster --name my-cluster --region region-code For details, refer to Getting started with Amazon EKS – eksctl. Clone this GitHub repository and change it to the right directory: git clone https://github.com/abhirockzz/k8s-ack-sqs-lambda cd k8s-ack-sqs-lambda Ok, let's get started! Setup the ACK Service Controllers for AWS Lambda, SQS, and DynamoDB Install ACK Controllers Log into the Helm registry that stores the ACK charts: aws ecr-public get-login-password --region us-east-1 | helm registry login --username AWS --password-stdin public.ecr.aws Deploy the ACK service controller for Amazon Lambda using the lambda-chart Helm chart: RELEASE_VERSION_LAMBDA_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/lambda-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/lambda-chart "--version=${RELEASE_VERSION_LAMBDA_ACK}" --generate-name --set=aws.region=us-east-1 Deploy the ACK service controller for SQS using the sqs-chart Helm chart: RELEASE_VERSION_SQS_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/sqs-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/sqs-chart "--version=${RELEASE_VERSION_SQS_ACK}" --generate-name --set=aws.region=us-east-1 Deploy the ACK service controller for DynamoDB using the dynamodb-chart Helm chart: RELEASE_VERSION_DYNAMODB_ACK=$(curl -sL "https://api.github.com/repos/aws-controllers-k8s/dynamodb-controller/releases/latest" | grep '"tag_name":' | cut -d'"' -f4) helm install --create-namespace -n ack-system oci://public.ecr.aws/aws-controllers-k8s/dynamodb-chart "--version=${RELEASE_VERSION_DYNAMODB_ACK}" --generate-name --set=aws.region=us-east-1 Now, it's time to configure the IAM permissions for the controller to invoke Lambda, DynamoDB, and SQS. Configure IAM Permissions Create an OIDC Identity Provider for Your Cluster For the steps below, replace the EKS_CLUSTER_NAME and AWS_REGION variables with your cluster name and region. export EKS_CLUSTER_NAME=demo-eks-cluster export AWS_REGION=us-east-1 eksctl utils associate-iam-oidc-provider --cluster $EKS_CLUSTER_NAME --region $AWS_REGION --approve OIDC_PROVIDER=$(aws eks describe-cluster --name $EKS_CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f2- | cut -d '/' -f2-) Create IAM Roles for Lambda, SQS, and DynamoDB ACK Service Controllers ACK Lambda Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-lambda-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_lambda.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-lambda-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK lambda controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_lambda.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # we are getting the policy directly from the ACK repo INLINE_POLICY="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/lambda-controller/main/config/iam/recommended-inline-policy)" aws iam put-role-policy \ --role-name "${ACK_CONTROLLER_IAM_ROLE}" \ --policy-name "ack-recommended-policy" \ --policy-document "${INLINE_POLICY}" Attach ECR permissions to the controller IAM role. These are required since Lambda functions will be pulling images from ECR. aws iam put-role-policy \ --role-name "${ACK_CONTROLLER_IAM_ROLE}" \ --policy-name "ecr-permissions" \ --policy-document file://ecr-permissions.json Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Repeat the steps for the SQS controller. ACK SQS Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-sqs-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_sqs.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-sqs-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK sqs controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_sqs.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # for sqs controller, we use the managed policy ARN instead of the inline policy (unlike the Lambda controller) POLICY_ARN="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/sqs-controller/main/config/iam/recommended-policy-arn)" aws iam attach-role-policy --role-name "${ACK_CONTROLLER_IAM_ROLE}" --policy-arn "${POLICY_ARN}" Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Repeat the steps for the DynamoDB controller. ACK DynamoDB Controller Set the following environment variables: ACK_K8S_SERVICE_ACCOUNT_NAME=ack-dynamodb-controller ACK_K8S_NAMESPACE=ack-system AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) Create the trust policy for the IAM role: read -r -d '' TRUST_RELATIONSHIP <<EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": "system:serviceaccount:${ACK_K8S_NAMESPACE}:${ACK_K8S_SERVICE_ACCOUNT_NAME}" } } } ] } EOF echo "${TRUST_RELATIONSHIP}" > trust_dynamodb.json Create the IAM role: ACK_CONTROLLER_IAM_ROLE="ack-dynamodb-controller" ACK_CONTROLLER_IAM_ROLE_DESCRIPTION="IRSA role for ACK dynamodb controller deployment on EKS cluster using Helm charts" aws iam create-role --role-name "${ACK_CONTROLLER_IAM_ROLE}" --assume-role-policy-document file://trust_dynamodb.json --description "${ACK_CONTROLLER_IAM_ROLE_DESCRIPTION}" Attach IAM policy to the IAM role: # for dynamodb controller, we use the managed policy ARN instead of the inline policy (like we did for Lambda controller) POLICY_ARN="$(curl https://raw.githubusercontent.com/aws-controllers-k8s/dynamodb-controller/main/config/iam/recommended-policy-arn)" aws iam attach-role-policy --role-name "${ACK_CONTROLLER_IAM_ROLE}" --policy-arn "${POLICY_ARN}" Associate the IAM role to a Kubernetes service account: ACK_CONTROLLER_IAM_ROLE_ARN=$(aws iam get-role --role-name=$ACK_CONTROLLER_IAM_ROLE --query Role.Arn --output text) export IRSA_ROLE_ARN=eks.amazonaws.com/role-arn=$ACK_CONTROLLER_IAM_ROLE_ARN kubectl annotate serviceaccount -n $ACK_K8S_NAMESPACE $ACK_K8S_SERVICE_ACCOUNT_NAME $IRSA_ROLE_ARN Restart ACK Controller Deployments and Verify the Setup Restart the ACK service controller Deployment using the following commands. It will update service controller Pods with IRSA environment variables. Get list of ACK service controller deployments: export ACK_K8S_NAMESPACE=ack-system kubectl get deployments -n $ACK_K8S_NAMESPACE Restart Lambda, SQS, and DynamoDB controller Deployments: DEPLOYMENT_NAME_LAMBDA=<enter deployment name for lambda controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_LAMBDA DEPLOYMENT_NAME_SQS=<enter deployment name for sqs controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_SQS DEPLOYMENT_NAME_DYNAMODB=<enter deployment name for dynamodb controller> kubectl -n $ACK_K8S_NAMESPACE rollout restart deployment $DEPLOYMENT_NAME_DYNAMODB List Pods for these Deployments. Verify that the AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN environment variables exist for your Kubernetes Pod using the following commands: kubectl get pods -n $ACK_K8S_NAMESPACE LAMBDA_POD_NAME=<enter Pod name for lambda controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $LAMBDA_POD_NAME | grep "^\s*AWS_" SQS_POD_NAME=<enter Pod name for sqs controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $SQS_POD_NAME | grep "^\s*AWS_" DYNAMODB_POD_NAME=<enter Pod name for dynamodb controller> kubectl describe pod -n $ACK_K8S_NAMESPACE $DYNAMODB_POD_NAME | grep "^\s*AWS_" Now that the ACK service controller has been set up and configured, you can create AWS resources! Create SQS Queue, DynamoDB Table, and Deploy the Lambda Function Create SQS Queue In the file sqs-queue.yaml, replace the us-east-1 region with your preferred region as well as the AWS account ID. This is what the ACK manifest for the SQS queue looks like: apiVersion: sqs.services.k8s.aws/v1alpha1 kind: Queue metadata: name: sqs-queue-demo-ack annotations: services.k8s.aws/region: us-east-1 spec: queueName: sqs-queue-demo-ack policy: | { "Statement": [{ "Sid": "__owner_statement", "Effect": "Allow", "Principal": { "AWS": "AWS_ACCOUNT_ID" }, "Action": "sqs:SendMessage", "Resource": "arn:aws:sqs:us-east-1:AWS_ACCOUNT_ID:sqs-queue-demo-ack" }] } Create the queue using the following command: kubectl apply -f sqs-queue.yaml # list the queue kubectl get queue Create DynamoDB Table This is what the ACK manifest for the DynamoDB table looks like: apiVersion: dynamodb.services.k8s.aws/v1alpha1 kind: Table metadata: name: customer annotations: services.k8s.aws/region: us-east-1 spec: attributeDefinitions: - attributeName: email attributeType: S billingMode: PAY_PER_REQUEST keySchema: - attributeName: email keyType: HASH tableName: customer You can replace the us-east-1 region with your preferred region. Create a table (named customer) using the following command: kubectl apply -f dynamodb-table.yaml # list the tables kubectl get tables Build Function Binary and Create Docker Image GOARCH=amd64 GOOS=linux go build -o main main.go aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws docker build -t demo-sqs-dynamodb-func-ack . Create a private ECR repository, tag and push the Docker image to ECR: AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text) aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com aws ecr create-repository --repository-name demo-sqs-dynamodb-func-ack --region us-east-1 docker tag demo-sqs-dynamodb-func-ack:latest $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest docker push $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest Create an IAM execution Role for the Lambda function and attach the required policies: export ROLE_NAME=demo-sqs-dynamodb-func-ack-role ROLE_ARN=$(aws iam create-role \ --role-name $ROLE_NAME \ --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{ "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]}' \ --query 'Role.[Arn]' --output text) aws iam attach-role-policy --role-name $ROLE_NAME --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole Since the Lambda function needs to write data to DynamoDB and invoke SQS, let's add the following policies to the IAM role: aws iam put-role-policy \ --role-name "${ROLE_NAME}" \ --policy-name "dynamodb-put" \ --policy-document file://dynamodb-put.json aws iam put-role-policy \ --role-name "${ROLE_NAME}" \ --policy-name "sqs-permissions" \ --policy-document file://sqs-permissions.json Create the Lambda Function Update function.yaml file with the following info: imageURI - The URI of the Docker image that you pushed to ECR, e.g., <AWS_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest role - The ARN of the IAM role that you created for the Lambda function, e.g., arn:aws:iam::<AWS_ACCOUNT_ID>:role/demo-sqs-dynamodb-func-ack-role This is what the ACK manifest for the Lambda function looks like: apiVersion: lambda.services.k8s.aws/v1alpha1 kind: Function metadata: name: demo-sqs-dynamodb-func-ack annotations: services.k8s.aws/region: us-east-1 spec: architectures: - x86_64 name: demo-sqs-dynamodb-func-ack packageType: Image code: imageURI: AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/demo-sqs-dynamodb-func-ack:latest environment: variables: TABLE_NAME: customer role: arn:aws:iam::AWS_ACCOUNT_ID:role/demo-sqs-dynamodb-func-ack-role description: A function created by ACK lambda-controller To create the Lambda function, run the following command: kubectl create -f function.yaml # list the function kubectl get functions Add SQS Trigger Configuration Add SQS trigger which will invoke the Lambda function when an event is sent to the SQS queue. Here is an example using AWS Console: Open the Lambda function in the AWS Console and click on the Add trigger button. Select SQS as the trigger source, select the SQS queue, and click on the Add button. Now you are ready to try out the end-to-end solution! Test the Application Send a few messages to the SQS queue. For the purposes of this demo, you can use the AWS CLI: export SQS_QUEUE_URL=$(kubectl get queues/sqs-queue-demo-ack -o jsonpath='{.status.queueURL}') aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user1@foo.com --message-attributes 'name={DataType=String, StringValue="user1"}, city={DataType=String,StringValue="seattle"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user2@foo.com --message-attributes 'name={DataType=String, StringValue="user2"}, city={DataType=String,StringValue="tel aviv"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user3@foo.com --message-attributes 'name={DataType=String, StringValue="user3"}, city={DataType=String,StringValue="new delhi"}' aws sqs send-message --queue-url $SQS_QUEUE_URL --message-body user4@foo.com --message-attributes 'name={DataType=String, StringValue="user4"}, city={DataType=String,StringValue="new york"}' The Lambda function should be invoked and the data should be written to the DynamoDB table. Check the DynamoDB table using the CLI (or AWS console): aws dynamodb scan --table-name customer Clean Up After you have explored the solution, you can clean up the resources by running the following commands: Delete SQS queue, DynamoDB table and the Lambda function: kubectl delete -f sqs-queue.yaml kubectl delete -f function.yaml kubectl delete -f dynamodb-table.yaml To uninstall the ACK service controllers, run the following commands: export ACK_SYSTEM_NAMESPACE=ack-system helm ls -n $ACK_SYSTEM_NAMESPACE helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the sqs chart> helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the lambda chart> helm uninstall -n $ACK_SYSTEM_NAMESPACE <enter name of the dynamodb chart> Conclusion and Next Steps In this post, we have seen how to use AWS Controllers for Kubernetes to create a Lambda function, SQS, and DynamoDB table and wire them together to deploy a solution. All of this (almost) was done using Kubernetes! I encourage you to try out other AWS services supported by ACK. Here is a complete list. Happy building! More

Orchestration Pattern: Managing Distributed Transactions

By Gaurav Gaur CORE

As organizations migrate to the cloud, they desire to exploit this on-demand infrastructure to scale their applications. But such migrations are usually complex and need established patterns and control points to manage. In my previous blog posts, I covered a few of the proven designs for cloud applications. In this article, I’ll introduce the Orchestration Pattern (also known as the Orchestrator Pattern) to add to the list. This technique allows the creation of scalable, reliable, and fault-tolerant systems. The approach can help us manage the flow and coordination among components of a distributed system, predominantly in a microservices architecture. Let’s dive into a problem statement to see how this pattern works. Problem Context Consider a legacy monolith retail e-commerce website. This complex monolith consists of multiple subdomains such as shopping baskets, inventory, payments etc. When a client sends a request, the website performs a sequence of operations to fulfil the request. In this traditional architecture, each operation can be described as a method call. The biggest challenge for the application is scaling with the demand. So, the organisation decided to migrate this application to the cloud. However, the monolithic approach that the application uses is too restricted and would limit scaling even in the cloud. Adopting a lift and shift approach to perform migration would not reap the real benefits of the cloud. Thus, a better migration would be to refactor the entire application and break it down by subdomains. The new services must be deployed and managed individually. The new system comes with all the improvements of distributed architecture. These distributed and potentially stateless services are responsible for their own sub-domains. But the immediate question is how to manage a complete workflow in this distributed architecture. Let us try to address this question in the next section and explore more about Orchestration Patterns. Monolithic application migration to the cloud What Is Orchestration Pattern We have designed an appropriate architecture where all services operate within their bounded context. However, we still need a component that is aware of the entire business workflow. The missing element is responsible for generating the final response by communicating with all of the services. Think of it like an orchestra with musicians playing their instruments. In an orchestra, a central conductor coordinates and aligns the members to produce a final performance. The Orchestration Pattern also introduces a centralized controller or service known as the orchestrator, similar to a central conductor. The orchestrator does not perform business logic but manages complex business flows by calling independently deployed services, handling exceptions, retrying requests, maintaining state, and returning the final response. Orchestrator Pattern The figure above illustrates the pattern. It has three components: the orchestrator or central service, business services that need coordination, and the communication channel between them. It is an extension of the Scatter Gather pattern but involves a sequence of operations instead of executing a single task in parallel. Let’s examine a use case to understand how the pattern works. Use Case Many industries, such as e-commerce, finance, healthcare, telecommunications, and entertainment, widely use the orchestrator pattern with microservices. By now, we also have a good understanding of the pattern. In this section, I will talk about payment processing, which is relevant in many contexts, to detail the pattern in action. Consider a payment gateway system that mediates between a merchant and a customer bank. The payment gateway aims to facilitate secure transactions by managing and coordinating multiple participating services. When the orchestrator service receives a payment request, it triggers a sequence of service calls in the following order: Firstly, it calls the payment authorization service to verify the customer’s payment card, the amount going out, and bank details. The service also confirms the merchant’s bank and its status. Next, the orchestrator invokes the Risk Management Service to retrieve the transaction history of the customer and merchant to detect and prevent fraud. After this, the orchestrator checks for Payment Card Industry (PCI) Compliance by calling the PCI Compliance Service. This service enforces the mandated security standards and requirements for cardholder data. Credit card companies need all online transactions to comply with these security standards. Finally, the orchestrator calls another microservice, the Transaction Service. This service converts the payment to the merchant’s preferred currency if needed. The service then transfers funds to the merchant’s account to settle the payment transaction. Payment Gateway System Flow After completing all the essential steps, the Orchestrator Service responds with a transaction completion status. At this point, the calling service may send a confirmation email to the buyer. The complete flow is depicted in the above diagram. It is important to note that this orchestration service is not just a simple API gateway that calls the APIs of different services. Instead, it is the only service with the complete context and manages all the steps necessary to finish the transaction. If we want to add another step, for example, the introduction of new compliance by the government, all we need to do is create a new service that ensures compliance and add this to the orchestration service. It’s worth noting that the new addition may not affect the other services, and they may not even be aware of it. Implementation Details The previous section has demonstrated a practical use case for managing service using an orchestrator. However, below are a few tactics that can be used while implementing the pattern: Services vs ServerlessMostly following this pattern means having a business logic that spreads across many services. However, there are specific situations when not all the business steps require execution or only a few steps are necessary. Should these steps be deployed as functions instead of services in these scenarios? Events usually trigger functions, which shut down once they complete their job. Such an infrastructure can save us money compared to a service that remains active continuously and performs minimal tasks. Recovery from Transient FailuresThe orchestration pattern implementation can be challenging because it involves coordinating multiple services and workflows, which requires a different approach to designing and managing software systems than traditional monolithic architectures. The implementation must be able to handle potential transient failures, such as network failure, service failure, or database failure. Below are a few ways to cater to such issues: Retry MechanismImplementing a retry mechanism can improve resiliency when a service operation fails. The retry mechanism should configure the number of retries allowed, the delay between retries, and the conditions to attempt retries. Circuit Breaker PatternIn case a service fails, the orchestrator must detect the failure, isolate the failed service, and give it a chance to recover. It can help the service heal without disruption and avoid complete system failure. Graceful DegradationIf a service fails and becomes unavailable, the rest of the services should continue to operate. The orchestrator should look for fallback options to minimize the impact on end-users, such as previously cached results or an alternate service. Monitoring and AlertingThe entire business flow is distributed among various services when we operate with the Orchestration Pattern. Therefore, an effective monitoring and alerting solution is mandatory to trace and debug any failures. The solution must be capable of detecting any issues in real-time and taking appropriate actions to mitigate the impact. It includes implementing auto-recovery strategies, such as restarting failed services or switching to a backup service, and setting up alerts to notify the operations team when exceptions occur. The logs generated by the orchestrator are also valuable for the operations team to troubleshoot errors. We can operate smoothly and meet user needs by proactively identifying and resolving issues. Orchestration Service FailureFinally, we must prepare for scenarios where the orchestrator fails itself while processing requests. For instance, in our payment gateway example, imagine a scenario where the orchestrator calls the Transaction service to transfer the funds but crashes or loses connection before getting a successful response for the occurred transaction. It could lead to a frustrating user experience, with the risk of the customer being charged twice for the same product. To prevent such failure scenarios, we can adopt one of the following solutions: Service ReplicationReplicate the orchestration service across multiple nodes. The service can automatically fail over to the backup node when needed. With a load balancer that can detect and switch to the available node, the replication guarantees seamless service and prevents disruptions to the user. Data ReplicationNot only should we replicate the service, but we should also replicate the data to ensure data consistency. It enables the backup node to take over seamlessly without any data loss. Request QueuesImplementing queues like a buffer for requests when the orchestration service is down. The queue can hold incoming requests until the service is available again. Once the backup node is up and running, it can retrieve the data from the queue buffer and process them in the correct order. Why Use Orchestration Pattern The pattern comes with the following advantages: Orchestration makes it easier to understand, monitor and observe the application, resulting in a better understanding of the core part of the system with less effort. The pattern promotes loose coupling. Each downstream service exposes an API interface and is self-contained, without any need to know about the other services. The pattern simplifies the business workflows and improves the separation of concerns. Each service participates in a long-running transaction without any need to know about it. The orchestrator service can decide what to do in case of failure making the system fault-tolerant and reliable. Important Considerations The primary goal of this architectural pattern is to decompose the entire business workflow into multiple services, making it more flexible and scalable. And due to this, it’s crucial to analyse and comprehend the business processes in detail before implementation. A poorly defined and overly complicated business process will lead to a system that would be hard to maintain and scale. Secondly, it’s easy to fall into the trap of adding business logic into the orchestration service. Sometimes it’s inevitable because certain functionalities are too small to create their separate service. But the risk here is that if the orchestration service becomes too intelligent and performs too much business logic, it can evolve into a monolithic application that also happens to talk to microservices. So, it’s crucial to keep track of every addition to the orchestration service and ensure that its work remains within the boundaries of orchestration. Maintaining the scope of the orchestration service will prevent it from becoming a burden on the system, leading to decreased scalability and flexibility. Summary Numerous organizations are adopting microservice patterns to handle their complex distributed systems. The orchestration pattern plays a vital role in designing and managing these systems. By centralizing control and coordination, the orchestration pattern enhances agility, scalability, and resilience, making it an essential tool for organizations looking to modernize their infrastructure. More

The Power of Docker Images: A Comprehensive Guide to Building From Scratch

By Ruchita Varma

IBM App Connect Enterprise Shared Classes and Containers

By Trevor Dolby

Demystifying Multi-Cloud Integration

By Boris Zaikin CORE

AWS: Integrating OpenAPI With the Amazon API Gateway and Lambda Functions

API Gateway is the AWS service that allows interfacing an application's back-end with its front-end. The figure below shows an example of such an application, consisting of a web/mobile-based front-end and a back-end residing in a REST API, implemented as a set of serverless Lambda functions, as well as a number of legacy services. The figure above illustrates the so-called design pattern Legacy API Proxy, as described by Peter Sbarski, Yan Cui, and Ajay Nair in their excellent book Serverless Architectures on AWS (Manning, 2022). This pattern refers to a use case where Amazon API Gateway and Lambda are employed together, in order to create a new API layer over legacy APIs and services, such that to adapt and reuse them. In this design, the API Gateway exposes a REST interface invoking Lambda functions which, in turn, modify the requests and the responses or transform data to legacy-specific formats. This way, legacy services may be consumed by modern clients that don't support older protocols. This can be done, of course, using the AWS Console, by selecting the API Gateway service and, on the behalf of the proposed GUI (Graphical User Interface), by browsing among the dozens of possible options such that, about one hour later, to come to a functional skeleton. And when our API specifications are changing, i.e., several times per month, we need to start again, from the beginning. We shall not proceed accordingly. We will rather adopt an IaC (Infrastructure as Code) approach consisting in defining our API in a repeatable and deterministic manner. This could be done in several ways, via a script-based automation process using, for example, AWS CLI (Command Line Interpreter), CloudFormation, or Terraform. But there is another interesting alternative that most developers prefer: OpenAPI. And it's this alternative that we chose to use here, as shown further. Designing the REST Interface With OpenAPI In 2011, SmartBear Software, a small company specializing in testing and monitoring tools, developed Swagger, a set of utilities dedicated to the creation and documentation of RESTful services. Several years later in November 2015 under the auspices of the Linux Foundation, this same company was announcing the creation of a new organization, named OpenAPI Initiative. Other majors, like Google, IBM, etc., got committed as founding members. In January 2016, Swagger changed its name and became OpenAPI. OpenAPI is a formalism based on the YAML notation, which could also be expressed in JSON. It aims at defining REST APIs in a language-agnostic manner. There are currently a lot of tools around OpenAPI and our goal here isn't to extensively look at all the possibilities which are open to us, as far as these tools and their utilization is concerned. One of the most common use cases is probably to login to the SwaggerHub online service, create a new API project, export the resulted YAML file, and use it in conjunction with the SAM (Serverless Application Model) tool in order to expose the given API via Amazon API Gateway. And since we need to illustrate the modus operandi described above, let's consider the use case of a money transfer service, named send-money. This service, as its name clearly shows it, is responsible to perform bank account transfers. It exposes a REST API whose specifications are presented in the table below: Resource HTTP Request Action Java Class /orders GET Get the full list of the currently registered orders GetMoneyTransferOrders /orders POST Create a new money transfer order CreateMoneyTransferOrder /orders PUT Update an existing money transfer order UpdateMoneyTransferOrder /orders/{ref} GET Get the money transfer order identified by itsreference passed as an argument GetMoneyTransferOrder /orders/{ref} DELETE Remove the money transfer order identified by its reference passed as an argument RemoveMoneyTransferOrder This simple use case, consisting of a CRUD (Create, Read, Update, Delete) and exposed as a REST API, is the one that we chose to implement here, such that to illustrate the scenario described above and here are the required steps: Go to the Send Money API on SwaggerHub. Here you'll find an already prepared project showing the OpenAPI specification of the REST API defined in the table above. This is a public project and, in order to get access, one doesn't need to register and log in. You'll be presented with a screen similar to the one in the figure below: This screen shows in its left pane the OpenAPI description of our API. Once again, the full explanation of the OpenAPI notation is out of our scope here, as this topic might make the subject of an entire book, like the excellent one of Joshua S. Ponelat and Lukas L. Rosenstock, titled Designing APIs with Swagger and OpenAPI (Manning 2022). The right pane of the screen presents schematically the HTTP requests of our API and allows, among others, to test it. You may spend some time browsing in this part of the screen, by clicking the button labeled with an HTTP request and then selecting Try it out. Notice that these tests are simulated, of course, as there is no concrete implementation behind them. However, they allow you to make sure that the API is correctly defined, from a syntactical and a semantic point of view. Now that you finished playing with the test interface, you can use the Export -> Download API -> YAML Resolved function located in the screen's rightmost upper corner to download our API OpenAPI definition in YAML format. In fact, you don't really have to do that because you can find this same file in the Maven project used to exemplify this blog ticket. Let's have now a quick look at this YAML file. The first thing we notice is the declaration openapi: which defines the version of the notation that we're using: in this case, 3.0.0. The section labeled info identifies general information like the API name, its author, and the associated contact details, etc. The next element, labeled servers: defines the auto-mocking function. It allows us to run the simulated tests outside the SwagerHub site. Just copy the URL declared here and use it with your preferred browser. Last but not least, we have the element labeled paths: where our API endpoints are defined. There are two such endpoints: /orders and /orders/{ref}. For each one, we define the associated HTTP requests, their parameters as well as the responses, including the HTTP headers. OpenAPI is an agnostic notation and, consequently, it isn't bound to any specific technology, framework, or programming language. However, AWS-specific extensions are available. One of these extensions is x-amazon-apigateway-integration which allows a REST endpoint to connect to the API Gateway. As you can see looking at the OpenAPI YAML definition, each endpoint includes an element labeled x-amazon-apigateway-integration which declares, among others, the URL of the Lambda function where the call will be forwarded. The Project Ok, we have an OpenAPI specification of our API. In order to generate an API Gateway stack out of it and deploy it on AWS, we will use SAM, as explained above. For more details on SAM and how to use it, please don't hesitate to have a look here. Our Java project containing all the required elements may be found here. Once you cloned it from GitHub, open the file template.yaml. We reproduce it below: YAML AWSTemplateFormatVersion: '2010-09-09' Transform: 'AWS::Serverless-2016-10-31' Description: Send Money SAM Template Globals: Function: Runtime: java11 MemorySize: 512 Timeout: 10 Tracing: Active Parameters: BucketName: Type: String Description: The name of the S3 bucket in which the OpenAPI specification is stored Resources: SendMoneyRestAPI: Type: AWS::Serverless::Api Properties: Name: send-money-api StageName: dev DefinitionBody: Fn::Transform: Name: AWS::Include Parameters: Location: Fn::Join: - '' - - 's3://' - Ref: BucketName - '/openapi.yaml' MoneyTransferOrderFunction: Type: AWS::Serverless::Function Properties: FunctionName: MoneyTransferOrderFunction CodeUri: send-money-lambda/target/send-money.jar Handler: fr.simplex_software.aws.lambda.send_money.functions.MoneyTransferOrder::handleRequest Events: GetAll: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: GET Get: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: GET Create: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: POST Update: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: PUT Delete: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: DELETE ConfigLambdaPermissionForMoneyTransferOrderFunction: Type: "AWS::Lambda::Permission" DependsOn: - SendMoneyRestAPI Properties: Action: lambda:InvokeFunction FunctionName: !Ref MoneyTransferOrderFunction Principal: apigateway.amazonaws.com Our template.yaml file will create an AWS CloudFormation stack containing an API Gateway. This API Gateway will be generated from the OpenAPI specification that we just discussed. The DefinitionBody element in the SendMoneyAPI resource says that the API's endpoints are described by the file named openapi.yaml located in an S3 bucket, which name is passed as an input parameter. The idea here is that we need to create a new S3 bucket, copy into it our OpenAPI specifications in the form of an yaml file, and use this bucket as an input source for the AWS CloudFormation stack containing the API Gateway. A Lambda function, named MoneyTransferOrderFunction, is defined in this same SAM template as well. The CodeUri parameter configures the location of the Java archive which contains the associated code, while the Handler one declares the name of the Java method implementing the AWS Lambda Request Handler. Last but not least, the Event paragraph sets the HTTP requests that our Lambda function is serving. As you can see, there are 5 endpoints, labeled as follows (each defined in the OpenAPI specification): GetAll mapped to the GET /orders operation Get mapped to the GET /orders/{ref} operation Create mapped to the POST /orders operation Update mapped to the PUT /orders operation Delete mapped to the DELETE /orders/{ref} operation To build and deploy the project, proceed as shown in the listing below: Shell $ mkdir test-aws $ cd test-aws $ git clone https://github.com/nicolasduminil/aws-showcase ... $mvn package ... $ ./deploy.sh ... make_bucket: bucketname-3454 upload: ./open-api.yaml to s3://bucketname-3454/openapi.yaml Uploading to 73e5d262c96743505970ad88159b929b 2938384 / 2938384 (100.00%) Deploying with following values =============================== Stack name : money-transfer-stack Region : eu-west-3 Confirm changeset : False Disable rollback : False Deployment s3 bucket : bucketname-3454 Capabilities : ["CAPABILITY_IAM"] Parameter overrides : {"BucketName": "bucketname-3454"} Signing Profiles : {} Initiating deployment ===================== Uploading to b0cf548da696c5a94419a83c5088de48.template 2350 / 2350 (100.00%) Waiting for changeset to be created.. CloudFormation stack changeset ... Successfully created/updated stack - money-transfer-stack in eu-west-3 Your API with ID mtr6ryktjk is deployed and ready to be tested at https://mtr6ryktjk.execute-api.eu-west-3.amazonaws.com/dev In this listing, we start by cloning the Git repository containing the project. Then, we execute a Maven build, which will package the Java archive named send-money-lambda.jar, after having performed some unit tests. The script deploy.sh, like its name implies, is effectively responsible to fulfill the deployment operation. Its code is reproduced below: Shell #!/bin/bash RANDOM=$$ BUCKET_NAME=bucketname-$RANDOM STAGE_NAME=dev AWS_REGION=$(aws configure list | grep region | awk '{print $2}') aws s3 mb s3://$BUCKET_NAME echo $BUCKET_NAME > bucket-name.txt aws s3 cp open-api.yaml s3://$BUCKET_NAME/openapi.yaml sam deploy --s3-bucket $BUCKET_NAME --stack-name money-transfer-stack --capabilities CAPABILITY_IAM --parameter-overrides BucketName=$BUCKET_NAME aws cloudformation wait stack-create-complete --stack-name money-transfer-stack API_ID=$(aws apigateway get-rest-apis --query "items[?name=='send-money-api'].id" --output text) aws apigateway create-deployment --rest-api-id $API_ID --stage-name $STAGE_NAME >/dev/null 2>&1 echo "Your API with ID $API_ID is deployed and ready to be tested at https://$API_ID.execute-api.$AWS_REGION.amazonaws.com/$STAGE_NAME" We're using here the $$ Linux command which generates a random number. By appending this randomly generated number to the S3 bucket name that will be used in order to store the OpenAPI specification file, we satisfy its region-wide uniqueness condition. This bucket name is further stored in a local file, such that it can be later retrieved and cleaned up. Notice also the aws configure command used in order to get the current AWS region. The command aws s3 mb is creating the S3 bucket. Here mb states for make bucket. Once the bucket is created, we'll be using it in order to store inside the open-api.yaml file, containing the API specifications. This is done on the behalf of the command aws s3 cp. Now, we are ready to start the deployment process. This is done through the sam deploy command. Since this operation might take a while, we need to wait until the AWS CloudFormation stack is completely created before continuing. This is done by the statement aws cloudformation wait, as shown in the listing above. The last operation is the deployment of the previously created API Gateway, done by running the aws apigateway create-deployment command. Here we need to pass, as an input parameter, the API Gateway identifier, retrieved on the behalf of the command aws apigateway get-rest-api, which returns information about all the current API Gateways. Then, using the --query option, we filter among the JSON payload, in order to find ours, named send-money-api. At the end of its execution, the script displays the URL of the newly created API Gateways. This is the URL that can be used for testing purposes. For example, you may use Postman, if you have it installed, or simply the AWS Console, which benefits a nice and intuitive test interface. If you decide to use the AWS Console, you need to select the API Gateway service and you'll be presented with the list of all current existent ones. Clicking on the one named send-money-api will display the list of the endpoint to be tested. For that, you need to start, of course, by creating a new money transfer order. You can do this by pasting the JSON payload below in the request body: JSON { "amount": 200, "reference": "reference", "sourceAccount": { "accountID": "accountId", "accountNumber": "accountNumber", "accountType": "CHECKING", "bank": { "bankAddresses": [ { "cityName": "poBox", "countryName": "countryName", "poBox": "cityName", "streetName": "streetName", "streetNumber": "10", "zipCode": "zipCode" } ], "bankName": "bankName" }, "sortCode": "sortCode", "transCode": "transCode" }, "targetAccount": { "accountID": "accountId", "accountNumber": "accountNumber", "accountType": "CHECKING", "bank": { "bankAddresses": [ { "cityName": "poBox", "countryName": "countryName", "poBox": "cityName", "streetName": "streetName", "streetNumber": "10", "zipCode": "zipCode" } ], "bankName": "bankName" }, "sortCode": "sortCode", "transCode": "transCode" } } If the status code appearing in the AWS Console is 200, then the operation has succeeded and now you can test the two GET operations, the one retrieving all the existent money transfer orders and the one getting the money transfer order identified by its reference. For this last one, you need to initialize the input parameter of the HTTP GET request with the value of the money transfer order reference which, in our test, is simply "reference". In order to test the PUT operation, just paste in its body the same JSON payload used previously to test the POST, and slightly modify it. For example, modify the amount to 500 instead of 200. Test again now the two GET operations and they should retrieve a newly updated money transfer order, this time having an amount of 500. When you finished playing with the AWS Console interface, test the DELETE operation and paste the same reference in its input parameter. After that, the two GET operations should return an empty result set. If you're tired to use the AWS Console, you can switch to the provided integration test. First, you need to open the FunctionsIT class in the send-money-lambda Maven module. Here, you need to make sure that the static constant named AWS_GATEWAY_URL matches the URL displayed by the deploy.sh script. Then compile and run the integration tests as follows: Shell mvn test-compile failsafe:integration-test You should see statistics showing that all the integration tests have succeeded. Have fun!

By Nicolas Duminil CORE

Single Cloud vs. Multi-Cloud: 7 Key Differences

The advent of the Internet has brought revolutionary changes in the IT world. One of the notable changes is that virtualization has advanced with the Internet to become an integral part of the IT infrastructure of modern organizations. As a result, companies are now relying on the virtual online entity housing data and services, commonly referred to as the cloud. The switch to the cloud was brought on by the exponential data growth in the last couple of decades. In fact, studies predict that by 2025, the cloud will be storing up to 100 zettabytes of data. What Is the Cloud? The cloud refers to a global network of remote servers, each with a unique function that are connected and work together as a unitary ecosystem. In simple terms, the cloud describes what we commonly know as the “internet.” This remote network of servers is designed to either store and manage data, run applications, or deliver content or a service such as streaming videos or accessing social media networks for anyone with an internet connection. What Is Cloud Computing? It is the provision of computing resources such as servers, storage, databases, networking, software, analytics, and intelligence over the cloud (internet). Cloud computing eliminates the need for enterprises to acquire, configure, or manage resources themselves, and instead, only pay for what they use. Virtual computers gained popularity in the 1990s when the IT industry started to rent virtual private networks. Their use sped up the development of the cloud computing infrastructure that organizations use today. Cloud computing offers a variety of benefits for businesses with some of the key ones being: Flexible resources Cost savings Scalability with growing business needs Data recovery Security With that being said, there are three main types of cloud computing deployments: Public Cloud - An open infrastructure for general public usage. Private Cloud - Computing infrastructure that’s exclusively used by a single organization. Hybrid Cloud - A combination of private and public cloud infrastructures. Community Cloud - A collaborative cloud infrastructure shared by a community of organizations with similar requirements and regulations. Single and multi-cloud concepts come from employing these deployment types from either one or numerous vendors. What Is a Single Cloud? Single cloud is a cloud computing model where organizations rely on a single third-party vendor for their cloud computing services. The provider maintains on-premise servers to provide either of the following cloud services in the single-cloud environment: Software-as-a-Service (SaaS) - a software on-demand service allowing users to utilize cloud-based applications such as email. Infrastructure-as-a-Service (IaaS) - provides computing resources hosted on the cloud. Amazon Web Services (AWS) is a famous IaaS example. Platform-as-a-Service (PaaS) - offers a development and deployment environment hosted on a provider's cloud infrastructure. A good example in this category is Google App Engine. Single Cloud Use Cases The single cloud strategy is suitable for companies with the following use cases: Strict organizational regulations are in place for data and workload governance. Insufficiency of skilled cloud engineers for efficient cloud workload management. Less cloud workload that a single provider can manage. Single Cloud Strategy Advantages It is easier to manage as it does not require workload migration between multiple cloud providers. Privacy and control are maintained. Needs limited resources in terms of cloud engineering staffing as well as managing vendor relationships. Faster workload handling with a single provider. Reduced risk of data inconsistencies. Easier to hold a single vendor accountable in case of any cloud issues. Single Cloud Strategy Disadvantages Hard to avoid vendor lock-in with single platform dependencies. It costs more to have all workloads managed by a single vendor. Choosing the right vendor is difficult as a single provider has limited cloud resources and flexibility in design. Risk of cloud resource unavailability due to any cloud issues that result in a single point of failure. What Is Multi-Cloud? Multi-cloud describes a cloud computing model where organizations use multiple cloud providers for their infrastructure requirements. The name multi-cloud refers to the use of multiple cloud providers, accounts, availability zones, premises, or a combination of them. Multi-Cloud Use Cases The multi-cloud strategy is suitable for companies with the following use cases: You are unable to fulfill business requirements with a single cloud. Multi-cloud meets the proximity requirements of your globally distributed users and service requirements in different regions. When the workload is big, varying, and needs to be distributed, which calls for specific cloud services. The regulations you are subject to require some data in private clouds for security reasons. Multi-Cloud Strategy Advantages Organizations consider a multi-cloud environment for the following benefits: It is a creative approach to simultaneously executing disparate workloads that offers customizable and flexible cloud services. Organizations spend less time by moving workloads between multiple clouds offering the required services at the best prices. You can switch vendors to ensure data availability by reducing vulnerabilities to cloud issues. Having multiple vendors reduces vendor dependencies and saves you from being locked into a single vendor. Multiple cloud providers in different deployment regions enable you to meet data sovereignty requirements for global cloud services. This minimizes concerns about non-compliance with government regulations. Multi-Cloud Strategy Disadvantages The multi-cloud model comes with the following disadvantages: Multi-cloud management can get complicated due to issues such as multi-vendor management, cloud computing inconsistencies, and inefficiencies, as well as task redundancies. Data migration between multiple cloud vendors can have cost overheads and slow down performance. Workload implementation can be inconsistent due to distribution among multiple clouds. Companies require excessive cloud engineering expertise to manage multi-cloud computing. Single Cloud vs. Multi-Cloud: The Key Differences This table gives you a side-by-side comparison of the single cloud vs multi-cloud strategies: Differences Single Cloud Multi-Cloud Vendors Single vendor dependency Multiple vendors offering more control Cost Payment to one provider Payment to multiple providers Purpose Provides single service Handles multiple services with multiple solutions Required Skillset Fewer cloud engineersrequired to manage the cloud Require extensive cloud engineering teamswith strong multi-cloud expertise Security Easier to ensure data compliance Less secure with distributed sensitive data Disaster Recovery Single point of failure making it vulnerable to disasters Easier disaster recovery Management Easier management Complex management The Cloud Portability Myth Under the Multi-Cloud Model and Potential Workarounds Migrating cloud services in a multi-cloud environment is always vulnerable to disruption. Cloud portability potentially reduces this vulnerability by facilitating the transfer of services between cloud environments with minimal disruption. While cloud portability may seem practical, some underlying complexities render this concept mythical. Essentially, cloud environments are migrated in compiled containers that make an entire cloud environment portable. However, while the containers may be portable, other public clouds cannot execute them without the underlying cloud-native services. Consequently, migrating this way defeats the purpose of employing a multi-cloud strategy. Achieving cloud portability may be complex, but companies still opt for the multi-cloud strategy to keep up with their competitors. The key is to find out how to work around this myth to run your multi-cloud models successfully. A trial-and-error approach would be to make multiple copies of compiled containers for each cloud environment. The container copy that offers the correct solution passes for deployment in other cloud platforms. Alternatively, you can use a Platform-as-a-Service option to provide portable services that are not dependent on specific cloud platforms. This aspect makes migrating such an application platform achievable for organizations. Single Cloud vs. Multi-Cloud Strategy: Which Is Better? When it comes to single cloud vs. multi-cloud strategies, businesses are increasingly adopting the multi-cloud model. This strategy is favored as it allows you to work globally with data and applications spread across various cloud servers and data centers. However, such a model only suits large organizations because setting up and maintaining a multi-cloud environment is a costly and complex task. Additionally, they require excessive resources and robust strategies to optimize cloud migration. It is important to note that despite the use of optimized strategies, cloud portability still remains a myth for multi-cloud organizations. Primarily, at some point, your cloud portability workarounds are bound to become too complex to manage. These complexities include: Lack of knowledgeable staff Absence of holistic disaster management Security gaps Are all these complexities worth investing in a multi-cloud strategy? The answer depends on your company’s use cases. However, another key consideration, in this case, is focusing on choosing the "right vendor" on top of debating the single cloud vs. multi-cloud strategies, as it is vital to finding the best solution for your business. Conclusion Depending on your use case, being locked to a single vendor does more good to an organization than delving into multi-vendor complexities. The opposite is also true. To sum it up, instead of working around a myth, cloud optionality gives you a better chance to adopt a successful cloud strategy. While it may prolong the vendor selection process, if either a single cloud or a multi-cloud strategy is right for your business, you can save your company from costly cloud expenses.

By Florian Pialoux

Documentation 101: How to Properly Document Your Cloud Infrastructure Project

In today's rapidly evolving technology landscape, cloud infrastructure has become an indispensable part of modern business operations. To manage this complex infrastructure, documenting its setup, configuration, and ongoing maintenance is critical. Without proper documentation, it becomes challenging to scale the infrastructure, onboard new team members, troubleshoot issues and ensure compliance. At Provectus, I have witnessed the advantages of handing over projects with proper documentation and how it allows successful transition and preserves customer satisfaction. Whether you are an active engineer, an engineering team leader, or a demanding user of cloud infrastructure, in this article, I will help you to understand the importance of documentation and offer some easy steps for implementing best practices. Why Is Documentation Important? Documentation is a key feature that allows for the consistent maintenance of any process. It is a storehouse of intelligence that can be accessed for future reference and replicated if needed. For example, if an engineer or anyone in the organization has performed, tested, and improved a process, failure to document it would be a waste of intellectual capital and a loss to the organization. Documentation is important for many reasons: It helps to keep processes and systems up to date for usage It helps with the onboarding and training of new team members It helps to improve security by imposing boundaries It functions as a means of proof for audits It provides a starting point when documenting from scratch It helps to continuously improve processes Documenting your cloud infrastructure is imperative for its smooth and efficient operation. What Should Be Documented for Cloud Infrastructure? In the past, building a computing infrastructure required huge investment and vast planning, taking into consideration the required expertise in the field and the needs of your organization. Once servers and hardware were purchased, it was very difficult to make any significant changes. The cloud brought with it significant improvements, making it much easier and more feasible to implement the infrastructure. But still, the ability to make changes and improvements is highly dependent on accurate documentation. Following is a basic list of requirements for documentation to ensure that your cloud infrastructure is easy to use and update. Architecture Diagrams An architecture diagram is a visual representation of cloud components and the interconnections that support their underlying applications. The main goal of creating an architecture diagram is to communicate with all stakeholders — clients, developers, engineers, and management — using a common language that everyone can understand. To create a diagram, you need a list of components and an understanding of how they interact. You may need to create multiple diagrams if the architecture is complex or if it has several environments. There are user-friendly tools to help you with this first step, many of which are free. For example, Diagrams.net (formerly Draw.io), Miro, SmartDraw, Lucidchart, and others. Creating an architecture diagram will help with future planning and design when you are ready to improve the infrastructure. It will help you to easily spot issues or areas that need improvement. Your diagram can also help with troubleshooting. Engineers will be able to use it to detect flaws in the system and discover their root causes. It will also help with compliance and security requirements. How-To Instructions Your infrastructure will likely host many features and applications that require specific steps for access. How-to instructions provide end users with a detailed step-by-step guide that streamlines various processes and saves time. Such instructions are sometimes referred to as detailed process maps, DIYs (do it yourself), walkthroughs, job aids, tutorials, runbooks, or playbooks. Some examples of processes that can benefit from how-to instructions include: How to request access for developers How to subscribe to an SNS topic How to rotate IAM Access Keys How to retrieve ALB logs Policies Your cloud infrastructure will have its own policies, whether they are predefined by the IT department or created in collaboration with different teams. Some policies that can be documented include: Access policies: What security measures are in place, and what is required for various individuals, groups, or roles to gain access? What are the premises and procedures for access removal? Are we compliant with the least privilege access best practice? Security policies: Protective policies for management, practices, and resources for data in the cloud. Data privacy policies: Data must be classified and collected in ways that keep it secure and protected from unauthorized access. Compliance policies: Which regulations and auditing processes must be complied with to use cloud services? What are the responsibilities of Infrastructure team members? Incident and change management: Define the necessary steps to respond to incidents and changes; define outage prioritization, SLA response time, ownership, and post-mortem processes. Monitoring: Along with incident management, there should be documentation of monitors and channels in place to ensure that the infrastructure is up and running. Monitoring is a 24/7 preventative approach to incident management. Disaster Recovery A Disaster Recovery plan is one of the most important yet least prioritized documents. It should outline the procedures needed to restore services after a disaster event. The document should cover at least the following items: Scope Steps to restore service as soon as possible How to determine damage or data loss, risk assessment Emergency response — who should be notified and how? Steps to back up all data and services The main goal of a Disaster Recovery plan is to ensure that business operations continue, even after a disaster. Failure to recover presents a large gap in the infrastructure. Best Practices You Should Follow Formatting When creating documentation, it is important to follow certain rules. Let's identify them: Organization: A stable company will usually have a brand book that establishes boundaries and provides guidelines for content. In the case of documentation, you may need to use a specific font, size, and layout, and you may be required to include a logo or other elements. Before documenting, find out what the company requirements are. If there are no established guidelines, create your own to establish consistency across your department. Grammar: The way you communicate while documenting should also follow a standard. Some best practices include: Use an active voice, i.e., The entire infrastructure is described as code via terraform. Avoid a passive voice, i.e.: Terraform was used to describe the entire infrastructure as code. Avoid long sentences. Stick with simple structured sentences that are easy for the reader to follow. Create a glossary of abbreviations, and use consistent terminology. For example, if you mention an SSL certificate but you use TLS instead, the reader might be confused. Use appropriate verb tenses: For example, use the present tense for describing a procedure and the past tense for describing a completed action. Storage: When saving the document, always use a conventional name that makes it easy to find and share with others. Store the file in the most appropriate path or structure, such as a particular file system or a collaborative tool like Confluence. File naming example: departmentname_typeofdocument_nameofdocument_mm_yyyy ManagedServices_internal_stepsfordocumentation_03_2023 Content How you display your document’s content plays a relevant role in the entire process. A document that is attractively laid out and easy to read will help to prevent confusion and avoid unnecessary questions. Here are some tips for you: Screenshots: A picture is worth a thousand words. Use screenshots to help the user better relate to your instructions. Within your AWS Account, go to the EC2 Dashboard and check the Security groups. Diagrams: A flow chart provides a visual aid to help you describe a step-by-step process so that the reader can easily identify which step they are on. Open the console Ping the corresponding IP If you get an error, copy and paste the message Open ticket in AnyDesk Paste the error message Assign to AnyTeam Table of contents: Use heading formats to create a table of contents. If the document is quite large, the reader will have the option to jump to a specific section. That reader could be you, wanting to update the document a few months later! Troubleshooting: Readers will likely have some issues when putting your document into action. Be sure to include a troubleshooting section to help resolve common problems. Lifecycle One of the most common mistakes in documenting is to think documentation is over because the project is up and running. Keeping your documentation up to date is an important part of the documentation lifecycle: Maintenance: Considering that your Infrastructure is constantly changing, your documentation must be kept current. Outdated documentation will misinform others and could trigger disastrous actions. Back-up: Always keep a backup of your documents. Ideally, your place of storage should have certain features by default, like versioning control, searching, filtering, collaboration, etc. But it is also a good practice to keep your own backup – it might be useful one day. Share: Once you have completed documentation, share it with potential users and ask for feedback. They can help suggest improvements that make your documentation more robust. Conclusion If you are not 100% convinced about the benefits of documentation, think of it this way: No one wants to waste time figuring out someone else’s work or reinventing the wheel by creating a project that has already grown and evolved. Documentation that is clear, concise, and easy to understand is the first step toward building a successful cloud infrastructure.

By Carolina Chavarria

The Secret to High-Availability System for the Cloud

What Is a Highly-Available System? You call a system highly available when it can remain operational and accessible even when there are hardware and software failures. The idea is to ensure continuous service. We all want our system to be highly-available. It seems like a good thing to have and makes for a nice bullet point in our application description. But designing a high-availability system is not an easy task. So, how can you go about it? The most reliable approach is to leverage the concept of static stability. But before we get to the meaning of this term, it’s important to understand the concept of availability zones. What Are Availability Zones? You must have heard about availability zones in AWS or other cloud platforms. If not, here’s a quick definition of the term from the context of AWS: Availability Zones are isolated sections of an AWS region. They are physically separated from each other by a meaningful distance so that a single event cannot impair them all at once. For perspective, this single event could be a lightning strike, tornado, or even an earthquake. The Godzilla attack makes it really clear that building an availability zone is not trivial engineering. To achieve this incredible level of separation, availability zones don’t share power or other infrastructure. However, they are connected with fast and encrypted fiber-optic networking so that application failover can be smooth as butter. This means that in the case of a catastrophic hardware or software failure, the workloads can be quickly and seamlessly transferred to another server without loss of data or interruption of service. Moreover, the use of encryption ensures that sensitive data transmitted across the network status secure from any type of unauthorized access. Here’s a picture showing the AWS Global Infrastructure from a few years ago. AWS Global Infra (source: AWS Website) The orange circles denote a region and the number within those circles is the number of availability zones within that region. What's Static Stability? Let's get back to our key term: static stability. Availability zones let you build systems with high availability. But, you can go about it in two ways: Reactive Proactive In a reactive approach, you let the service scale up in another availability zone after there is some sort of disruption in one of the zones. You might use something like AWS Autoscaling Group to manage the scale-up automatically. But the idea is that you react to the impairments when they happen rather than being prepared in advance. In a proactive approach, you over-provision the infrastructure in a way that your system continues to operate satisfactorily even in the case of disruption within a particular Availability Zone. The proactive approach ensures that your service is statically stable. A lot of AWS services use static stability as a guiding principle. Some of the most popular ones are: AWS EC2 AWS RDS AWS S3 AWS DynamoDB AWS ElastiCache If your system is statically stable, it keeps working even when a dependency becomes impaired. For example, the AWS EC2 service supports static stability by targeting high availability for its data plane (the one that manages existing EC2 instances). This means that once launched, an EC2 instance has local access to all the information it needs to route packets. The main benefit of this approach is that instances can operate independently and maintain their own local state even in the case of a network or service disruption. However, leveraging static stability is not just for the cloud provider. You can also use static stability while designing your own applications for the cloud. Let’s look at a couple of patterns that use the concept of static stability. Pattern 1: Active-Active High-Availability Using AZs Here’s an example of how you can implement a load-balanced HTTP service. Active-Active High-Availability with AZs You have a public-facing load balancer targeting an auto scaling group that spans three availability zones in a particular Region. Also, you make sure to over-provision capacity by 50%. If an AZ goes down for whatever reason, you don’t need to do much to support the system. The EC2 instances within the problematic AZ will start failing health checks and the load balancer will shift traffic away from them. This is an important mechanism since constant monitoring helps the load balancer quickly identify any instances that are experiencing issues and work on the appropriate fallback without human intervention. Since the setup is statically stable, it will continue to remain operational without hiccups. Pattern 2: Active-Standby on Availability Zones The previous pattern dealt with stateless services. However, you might also need to implement high availability for a stateful service. A prime example is a database system such as Amazon RDS. A typical high-availability setup for this requirement needs a primary instance that takes all the writes and a standby instance. The standby instance will be kept in a different availability zone. Here’s what it looks like: Active-Standby High Availability with AZs When the primary AZ goes down for whatever reason, RDS manages the failover to the new primary (the standby instance). Again, since we have already over-provisioned, there is no need to create new instances. The switchover can happen seamlessly without impacting the availability. In essence, the service is statically stable. So, What’s the Takeaway? In both patterns, you already provisioned the capacity needed in case of an availability zone goes down. In either case, you are not trying to create new instances on the fly since you have already over-provisioned the infrastructure across AZs. This means your systems are statically stable and can easily survive outages or disruptions. In other words, your system is highly-available in a proactive manner which is an extremely good characteristic to have. Over to You Does high availability matter to you? If yes, how do you handle it within your applications? What techniques do you use? Write your replies in the comments section. The inspiration for this post came from this wonderful paper released as part of the Amazon Builders Library. You can check it out in case you are interested in going deeper into the theoretical foundations of static stability. If you found today’s post useful, consider sharing it with friends and colleagues.

By Saurabh Dashora CORE

My Evaluation of the Scaleway Cloud Provider

A couple of years ago, I developed an app that helped me manage my conference submission workflow. Since then, I have been a happy user of the free Heroku plan. Last summer, Heroku's owner, Salesforce, announced that it would stop the free plan in November 2022. I searched for a new hosting provider and found Scaleway. In this post, I'd like to explain my requirement, why I chose them, and my experience using them. The Context I've already described the app in previous blog posts, especially the deployment part. Yet, here's a summary in case you want to avoid rereading it. The source of truth is Trello, where I manage the state of my conference CFPs: Backlog, Submitted, Abandoned, Accepted, and Published (on this blog). The "done" state is when I archive a Published card. I wrote the application in Kotlin with Spring Boot. It's a web app that listens to change events from Trello via webhooks. An event starts a BPMN workflow based on Camunda. The workflow manages my Google Calendar and a Google Sheet file. For example, when I move a card from Backlog to Submitted, it adds the conference to my calendar. The event is labeled as "Free" and has a particular gray color to mark it's a placeholder. It also adds a line in the Google Sheet with the status "Submitted." When I move the card from Submitted to Accepted, it changes the Google Calendar event color to default and marks it as "Busy." It also changes the Google Sheet status to "Accepted." Why Scaleway? As I mentioned in the introduction, I was a happy Heroku user. One of the great things about Heroku, apart from the free plan, was the "hibernating" feature: when the app was not in use, it switched it off. In essence, it was scale-to-zero for web apps. The first request in a while was slower, but it wasn't an issue for my usage. The exciting bit is that scale-to-zero was not a feature but Heroku's way to keep costs low. Outside of Heroku's free plan, automatic scaling can only scale down to 1. I'm a big fan of Viktor Farcic's YouTube channel, DevOps Toolkit. At the same time Heroku announced the end of its free plan, I watched "Scaleway - Everything We Expect From A Cloud Computing Service?". By chance, Scaleway offers free credits to startups, including the company I'm currently working for. It didn't take long for me to move the application to Scaleway. Deploying on Scaleway Before describing how to deploy on Scaleway, let's explain how I deployed on Heroku. The latter provides a Git repo. Every push to master triggers a build based on what Heroku can recognize. For example, if it sees a pom.xml, it knows it's a Maven project and calls the Maven command accordingly. Under the hood, it creates a regular Docker container and stores and runs it. For the record, this approach is the foundation of Buildpacks, and Heroku is part of its creators along with VMWare. On Heroku, developers follow their regular workflow, and the platform handles both the build and the deployment parts. Scaleway offers a dedicated scale-to-zero feature for its Serverless Containers offering. First, you need to have an already-built container. When I started to use it, the container was to be hosted on Scaleway's dedicated Container Registry; now, it can be hosted anywhere. On the UI, one chooses the container to deploy, fills in environment variables and secrets, and Heroku deploys it. Main Issues I stumbled upon two main issues using Scaleway so far. The GUI is the only way to deploy a container: It's a good thing to start with, but it doesn't fit regular usage. The industry standard is based on build pipelines, which compile, test, create container images, store them in a registry, and deploy them on remote infrastructure. You need to fill in secrets on every deployment: GitHub and GitLab both allow configuring deployed containers with environment variables. This way, one can create a single container but deploy it in different environments. You can configure some environment variables as secrets. Nobody can read them afterward, and they don't appear in logs. Scaleway also offers secrets. However, you must fill them out at every deployment. Beyond a couple of them, it's unmanageable. Bugs and Scaleway's Support In my short time using Scaleway, I encountered two bugs. The first bug was a long delay between the time I uploaded a container in Scaleway's registry and the time it was available for deployment. It lasted for a couple of days. The support was quick to answer the ticket, but afterward, it became a big mess. There were more than a couple of back-and-forth messages until the support finally acknowledged that the bug affected everybody. The worst was one of the messages telling me it was due to an existing running container of mine failing to start; i.e., the bug was on my side. The second bug happened on the GUI. The deployment form reset itself while I was filling in the different fields. I tried to be fast enough when filling but to no avail. The same happened as with the previous issue: many back and forths, and no actual fixing. Finally, I tried a couple of days after I created the ticket, and I informed the support. They answered that it was normal because they had fixed it, but without telling me. Finally, I opened a ticket to ask whether an automated deployment option was possible. After several messages, the support redirected me to a GitHub project. The latter offers a GitHub Action that seemed to fulfill my requirement. Unfortunately, it cannot provide a way to configure the deployed container with environment variables. The only alternative the support offers is to embed environment variables in the container, including secrets. Regardless of the issues, the support's relevance ranges from average to entirely useless. Logging All Cloud providers I've tried so far offer a logging console. My experience is that the console looks and behaves like a regular terminal console: the oldest log line is on top and the newest at the bottom, and one can scroll through the history, limited by a buffer. Scaleway's approach is completely different. It orders the log lines in the opposite order, the newest first and oldest last. Worse, there's no scrolling but pagination. Finally, there's no auto-refresh! One has to paginate back and forth to refresh the view - and if new log lines appear, pages don't display the same data. It severely impairs the developer experience and makes trying to follow the logs hard. I tried to fathom why Scaleway implemented the logging console this way and came up with a couple of possible explanations: Engineering doesn't eat its own dog food Engineering doesn't care about Developer Experience It was cheaper this way Product said it was a bad Developer Experience but Engineering did it anyway because of one of the reasons above and it has more organizational power In any case, it reflects poorly on the product. Conclusion Even though my usage of Scaleway is 100% free, I'm pretty unhappy about the deployment part. I came for the free credits and the scale-to-zero capability. However, the lack of an acceptable automated deployment solution and the support of heterogeneous quality (to be diplomatic) make me reconsider. On the other hand, the Scaleway Cloud service itself has been reliable so far. My Trello workflow runs smoothly, and I cannot complain. Scaleway is typical of a not-bad product ruined by an abysmally bad Developer Experience. If you're developing a product, be sure to take care of this aspect of things: the perception of your product can take a turn for the worse because of a lack of consideration for developers. To Go Further: Google Cloud Run

By Nicolas Fränkel CORE

Overcoming Challenges and Best Practices for Data Migration From On-Premise to Cloud

Data migration is the process of moving data from one location to another, which is an essential aspect of cloud migration. Data migration involves transferring data from on-premise storage to the cloud. With the rapid adoption of cloud computing, businesses are moving their IT infrastructure to the cloud. This shift from on-premise to cloud computing creates challenges for IT professionals, as it requires careful planning and execution. This article discusses the challenges and best practices of data migration when transferring on-premise data to the cloud. The article will also explore the role of data engineering in ensuring successful data transfer and integration and different approaches to data migration. Obstacles Data migration poses several obstacles that businesses must address to ensure a smooth transition to the cloud. Some of the significant challenges of data migration include: Data Compatibility Compatibility is the primary challenge of data migration. It is essential to ensure that the data is compatible with the cloud platform before migrating it. It is crucial to test data compatibility before migration, as data loss and corruption can occur if the data is not compatible with the cloud platform. Security and Privacy Security and privacy are significant concerns for businesses when migrating data to the cloud. It is crucial to ensure that data is secure during migration, as it can lead to data breaches and loss of sensitive data. Data Integrity Data integrity is another challenge of data migration. It is crucial to ensure that the data remains consistent and accurate during migration. Downtime Downtime is another challenge of data migration. Therefore, it is essential to ensure that the migration process does not cause any downtime or interruptions to business operations. Cloud Scaling Cloud scaling options are an essential aspect of data migration. Cloud scalability is the ability of a cloud platform to scale up or down depending on the workload. The cloud platform should be able to handle the increased workload during the migration process. It should also be scalable to handle future workload increases. There are two types of cloud scalability options: Vertical Scaling: Vertical scaling is the process of adding more resources to a single instance. This method is suitable for workloads that require more processing power, memory, or storage. Horizontal Scaling: Horizontal scaling is the process of adding more instances to handle the workload. This method is suitable for workloads that require additional resources to handle traffic spikes. Cloud Hardware Upgrade Cloud hardware upgrade is critical to data migration. The cloud hardware should be up-to-date to handle the workload during the migration process. Therefore, it is essential to ensure that the cloud hardware is capable of handling the workload and that the hardware is compatible with the cloud platform. The next-generation and upgradation of cloud hardware involve upgrading the hardware to the latest technology. It is essential to ensure that the cloud hardware is scalable and can handle the workload. Traditional Methodology The traditional methodology for data migration involves copying data from the on-premise storage to the cloud. This method involves a large amount of data transfer, which can lead to data loss and corruption. The classical approach can also cause downtime and interruptions to business operations. Adaptability Adaptability is another important aspect of data migration. Elasticity is the ability of the cloud platform to scale up or down depending on the workload. The cloud platform should be elastic to handle the increased workload during the migration process. The cloud platform should also be elastic to handle future workload increases. Add-Ons The cloud platform should have additional features to support data migration, such as data backup and recovery, data migration tools, and data monitoring tools. These features ensure that the data is backed up, can be recovered in case of data loss or corruption, and the migration process runs smoothly. IT Support Services IT support services are crucial to the success of data migration. IT organizations should have the necessary expertise to plan and execute the migration process. They should also be able to provide support during the migration process to minimize downtime and interruptions to business operations. Summary To summarize, data migration is a complex process that requires careful planning and execution to avoid data loss, corruption, downtime, and interruptions to business operations. To mitigate these challenges, businesses need to consider cloud scalability options, upgrade cloud hardware, leverage elasticity, and use additional features to support data migration. IT organizations should also be involved in the process to ensure a successful transition to the cloud. Furthermore, businesses should consider alternative approaches to data migration, such as using migration tools that are designed to automate the migration process and reduce the risk of data loss and corruption. These tools can help to ensure a smoother transition to the cloud. Ultimately, businesses should approach data migration with caution and seek expert advice to ensure a successful migration process. With careful planning, execution, and the right support, businesses can achieve a smooth transition from on-premise storage to the cloud and enjoy the benefits of cloud computing, such as increased flexibility, scalability, and cost savings.

By srinivas Venkata

Cloud Performance Engineering

Along with the extensively discussed technological trending topics like AI, hyper-automation, blockchain, edge computing, and hyper-automation, Cloud computing is the central component in the upcoming years of various firms' IT strategies. These days the benefits of flexibility, alertness, fast speed, and cost efficiency have become essential for various CIOs. A few businesses are currently working on refining their overall IT cloud strategy. They take fundamental considerations such as which plan of action to opt Whether they should go for public, private, or a mixture of both. Others have progressed even further. They are working with full efforts to modify their applications. Moreover, they are taking advantage of different PaaS capabilities provided by the cloud to maximize benefits. Challenges Faced by Cloud Computing Such firms can also overcome the essential issues of Cloud computing, such as safety, data coherence, flexibility, and functional coherence, by focusing on the main elements of the cloud: simplifying the cloud performance. The frequent question in the area of cloud performance engineering is which execution can be achieved by the relocated and modified system in comparison to a pure on-site landscape. Is it going to be less, similar, or even higher and better performance? Cloud Scalability Options Many experts claim that in dynamic scalability possibilities in the cloud, it is simple to grow the system linearly just by amplifying the number of machines. It is unquestionably the initial Step that should be observed. Same as on-site systems, the vertical scalability capabilities are first employed traditional hardware capacities like CPUs and Read-Only-Memory raised. However, larger firms' IT systems with high output, access rates, and peak loads are reaching the breaking point. When ambitious expansion strategies combined with disorganized application might result in IT needs that exceed Moore's Law. Thus, requisite hardware is not yet accessible. Next-Generation and Upgradation of Cloud Hardware On one side, CIOs can aspire that the upcoming generation of hardware is ready to enter the market and can be provided to its users soon. On the other side, the subject of horizontal scaling has also achieved a lot of traction. Different from increasing servers for similar sections of the application. In many situations, this needs substantial changes in the application itself, like on-site systems. In particular, databases need an elaborated concept permitting the data to persist autonomously across many servers. In this situation, there might be an alternative for applications. That is an increasing number of read-only transactions. To gain execution goals in the absence of "real" horizontal scaling, implementing such PaaS offerings can be a solution. Such as, Microsoft provides the so-called Hyper-scale service for SQL databases, which dynamically scales the computing power through caching techniques and divides it horizontally to read replicas used as images of the database. AWS Cloud also provides read copies for RDS MySQL, PostgreSQL, MariaDB, Amazon Aurora, and Oracle Cloud. They depend on their popular Oracle RAC. Classical Approach There are other possibilities beyond vertical and horizontal scalability offered by Cloud performance engineering. There are still many well-known options available in the cloud compared to the availability on-premise. The most common classic approach is to boost your indexes, which helps to determine I/O performance for over 80% of your performance activity. However, if any one indicator is missing, the performance of the entire IT system may suffer. As a result, cloud performance engineers should always prioritize database indexing. In addition, topics related to batch processing and session handling, the definition of maximum batch sizes, connection durations, read frequencies, idle times, and possible connection pooling of, for example, SSL connection, can be decisive for the performance of the system. Due to this, your interface partner's CPUs from being overloaded by opening a new connection for each HTTPS request. It says that it is desirable to reduce the number of requests to the database and actively apply caching mechanisms. Similarly, the number of instances, the number of threads, and the hardware itself can be varied until a self-defined level of perfection is reached. Elasticity In cloud computing, scalability is just one aspect of performance engineering. One of the features that the cloud promises are fully automated elasticity, allowing resources to be dynamically adjusted to meet every demand. The hurdle is that on-premises applications are usually designed with static environments in mind, so they need to respond to dynamic scaling first. As a result, it requires defining and testing different test scenarios for the cloud. Attention should be on the interaction between the cloud and the applications. An essential metric is how well the application responds to the dynamic scaling of the cloud, whether it doesn't lose connections or experience other unusual behavior, and whether it doesn't suffer from the usual performance degradation that occurs on a system. Additional Features Cloud service providers present numerous new possibilities to quickly create test environments and analyze and evaluate performance KPIs at runtime. The best way to cover planned testing concepts in the cloud is to combine existing testing tools with new testing options in the cloud. It is always preferable to consider a complete rebuild of old applications instead of heavily customizing an existing application. This approach works when various functional, non-functional, and technical requirements are not implemented in the current application. Support of IT Organizations IT organizations are playing a vital role here by supporting them in the best possible ways. They support all the activities and improve the performance of the cloud, which has been recognized and benefited by agile processes and architectures of modular containers and not only to them but also to the new concept that is time-to-market, for instance, CI/CD pipelines. Most of the time, it is beneficial to implement such ideas beforehand before opting for the cloud. Conclusion Lastly, even though shifting to the cloud offers multiple opportunities and benefits, cloud performance engineering is a challenge that needs to be defeated by approved and new methods. The feature of automatically usable scalability in the cloud has to be opposed by many large-scale companies. The budget and time frame plan customization needed during the implementation. It is because well-planned, high-level supervision is highly recommended to obtain the best possible reaction among the users and facilitate them in the best possible way. Apart from this, there are other activities of testing. Those are data integrity, security, and resilience. These are significant to provide exclusive performance to the world. A good connection between all the teams involved in this regard, like the CEO, CIO, architects, cloud experts, and performance engineering specialist, is essential to achieve the shift to the cloud and successfully convey this new topic, cloud performance engineering, to the world.

By Manideep Yenugula

Use Golang for Data Processing With Amazon Kinesis and AWS Lambda

This blog post is for folks interested in learning how to use Golang and AWS Lambda to build a serverless solution. You will be using the aws-lambda-go library along with the AWS Go SDK v2 for an application that will process records from an Amazon Kinesis data stream and store them in a DynamoDB table. But that's not all! You will also use Go bindings for AWS CDK to implement "Infrastructure-as-code" for the entire solution and deploy it with the AWS CDK CLI. Introduction Amazon Kinesis is a platform for real-time data processing, ingestion, and analysis. Kinesis Data Streams is a serverless streaming data service (part of the Kinesis streaming data platform, along with Kinesis Data Firehose, Kinesis Video Streams, and Kinesis Data Analytics) that enables developers to collect, process, and analyze large amounts of data in real-time from various sources such as social media, IoT devices, logs, and more. AWS Lambda, on the other hand, is a serverless compute service that allows developers to run their code without having to manage the underlying infrastructure. The integration of Amazon Kinesis with AWS Lambda provides an efficient way to process and analyze large data streams in real time. A Kinesis data stream is a set of shards and each shard contains a sequence of data records. A Lambda function can act as a consumer application and process data from a Kinesis data stream. You can map a Lambda function to a shared-throughput consumer (standard iterator), or to a dedicated-throughput consumer with enhanced fan-out. For standard iterators, Lambda polls each shard in your Kinesis stream for records using HTTP protocol. The event source mapping shares read throughput with other consumers of the shard. Amazon Kinesis and AWS Lambda can be used together to build many solutions including real-time analytics (allowing businesses to make informed decisions), log processing (use logs to proactively identify and address issues in server/applications, etc. before they become critical), IoT data processing (analyze device data in real-time and trigger actions based on the results), clickstream analysis (provide insights into user behavior), fraud detection (detect and prevent fraudulent card transactions) and more. As always, the code is available on GitHub. Prerequisites Before you proceed, make sure you have the Go programming language (v1.18 or higher) and AWS CDK installed. Clone the GitHub repository and change to the right directory: git clone https://github.com/abhirockzz/kinesis-lambda-events-golang cd kinesis-lambda-events-golang Use AWS CDK To Deploy the Solution To start the deployment, simply invoke cdk deploy and wait for a bit. You will see a list of resources that will be created and will need to provide your confirmation to proceed. cd cdk cdk deploy # output Bundling asset KinesisLambdaGolangStack/kinesis-function/Code/Stage... ✨ Synthesis time: 5.94s This deployment will make potentially sensitive changes according to your current security approval level (--require-approval broadening). Please confirm you intend to make the following modifications: //.... omitted Do you wish to deploy these changes (y/n)? y This will start creating the AWS resources required for our application. If you want to see the AWS CloudFormation template which will be used behind the scenes, run cdk synth and check the cdk.out folder. You can keep track of the progress in the terminal or navigate to the AWS console: CloudFormation > Stacks > KinesisLambdaGolangStack. Once all the resources are created, you can try out the application. You should have: A Lambda function A Kinesis stream A DynamoDB table Along with a few other components (like IAM roles, etc.) Verify the Solution You can check the table and Kinesis stream info in the stack output (in the terminal or the Outputs tab in the AWS CloudFormation console for your Stack): Publish a few messages to the Kinesis stream. For the purposes of this demo, you can use the AWS CLI: export KINESIS_STREAM=<enter the Kinesis stream name from cloudformation output> aws kinesis put-record --stream-name $KINESIS_STREAM --partition-key user1@foo.com --data $(echo -n '{"name":"user1", "city":"seattle"}' | base64) aws kinesis put-record --stream-name $KINESIS_STREAM --partition-key user2@foo.com --data $(echo -n '{"name":"user2", "city":"new delhi"}' | base64) aws kinesis put-record --stream-name $KINESIS_STREAM --partition-key user3@foo.com --data $(echo -n '{"name":"user3", "city":"new york"}' | base64) Check the DynamoDB table to confirm that the file metadata has been stored. You can use the AWS console or the AWS CLI aws dynamodb scan --table-name <enter the table name from cloudformation output>. Don’t Forget To Clean Up Once you're done, to delete all the services, simply use: cdk destroy #output prompt (choose 'y' to continue) Are you sure you want to delete: KinesisLambdaGolangStack (y/n)? You were able to set up and try the complete solution. Before we wrap up, let's quickly walk through some of the important parts of the code to get a better understanding of what's going the behind the scenes. Code Walkthrough Some of the code (error handling, logging, etc.) has been omitted for brevity since we only want to focus on the important parts. AWS CDK You can refer to the CDK code here. We start by creating the DynamoDB table: table := awsdynamodb.NewTable(stack, jsii.String("dynamodb-table"), &awsdynamodb.TableProps{ PartitionKey: &awsdynamodb.Attribute{ Name: jsii.String("email"), Type: awsdynamodb.AttributeType_STRING}, }) table.ApplyRemovalPolicy(awscdk.RemovalPolicy_DESTROY) We create the Lambda function (CDK will take care of building and deploying the function) and make sure we provide it with appropriate permissions to write to the DynamoDB table. function := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("kinesis-function"), &awscdklambdagoalpha.GoFunctionProps{ Runtime: awslambda.Runtime_GO_1_X(), Environment: &map[string]*string{"TABLE_NAME": table.TableName()}, Entry: jsii.String(functionDir), }) table.GrantWriteData(function) Then, we create the Kinesis stream and add that as an event source to the Lambda function. kinesisStream := awskinesis.NewStream(stack, jsii.String("lambda-test-stream"), nil) function.AddEventSource(awslambdaeventsources.NewKinesisEventSource(kinesisStream, &awslambdaeventsources.KinesisEventSourceProps{ StartingPosition: awslambda.StartingPosition_LATEST, })) Finally, we export the Kinesis stream and DynamoDB table name as CloudFormation outputs. awscdk.NewCfnOutput(stack, jsii.String("kinesis-stream-name"), &awscdk.CfnOutputProps{ ExportName: jsii.String("kinesis-stream-name"), Value: kinesisStream.StreamName()}) awscdk.NewCfnOutput(stack, jsii.String("dynamodb-table-name"), &awscdk.CfnOutputProps{ ExportName: jsii.String("dynamodb-table-name"), Value: table.TableName()}) Lambda Function You can refer to the Lambda Function code here. The Lambda function handler iterates over each record in the Kinesis stream, and for each of them: Unmarshals the JSON payload in the Kinesis stream into a Go struct Stores the stream data partition key as the primary key attribute (email) of the DynamoDB table The rest of the information is picked up from the stream data and also stored in the table. func handler(ctx context.Context, kinesisEvent events.KinesisEvent) error { for _, record := range kinesisEvent.Records { data := record.Kinesis.Data var user CreateUserInfo err := json.Unmarshal(data, &user) item, err := attributevalue.MarshalMap(user) if err != nil { return err } item["email"] = &types.AttributeValueMemberS{Value: record.Kinesis.PartitionKey} _, err = client.PutItem(context.Background(), &dynamodb.PutItemInput{ TableName: aws.String(table), Item: item, }) } return nil } type CreateUserInfo struct { Name string `json:"name"` City string `json:"city"` } Wrap Up In this blog, you saw an example of how to use Lambda to process messages in a Kinesis stream and store them in DynamoDB, thanks to the Kinesis and Lamdba integration. The entire infrastructure life-cycle was automated using AWS CDK. All this was done using the Go programming language, which is well-supported in DynamoDB, AWS Lambda, and AWS CDK. Happy building!

By Abhishek Gupta CORE

How To Create a Failover Client Using the Hazelcast Viridian Serverless

Failover is an important feature of systems that rely on near-constant availability. In Hazelcast, a failover client automatically redirects its traffic to a secondary cluster when the client cannot connect to the primary cluster. Consider using a failover client with WAN replication as part of your disaster recovery strategy. In this tutorial, you’ll update the code in a Java client to automatically connect to a secondary, failover cluster if it cannot connect to its original, primary cluster. You’ll also run a simple test to make sure that your configuration is correct and then adjust it to include exception handling. You'll learn how to collect all the resources that you need to create a failover client for a primary and secondary cluster, create a failover client based on the sample Java client, test failover and add exception handling for operations. Step 1: Set Up Clusters and Clients Create two Viridian Serverless clusters that you’ll use as your primary and secondary clusters and then download and connect sample Java clients to them. Create the Viridian Serverless cluster that you’ll use as your primary cluster. When the cluster is ready to use, the Quick Connection Guide is displayed. Select the Java icon and follow the on-screen instructions to download, extract, and connect the preconfigured Java client to your primary cluster. Create the Viridian Serverless cluster that you’ll use as your secondary cluster. Follow the instructions in the Quick Connection Guide to download, extract, and connect the preconfigured Java client to your secondary cluster. You now have two running clusters, and you’ve checked that both Java clients can connect. Step 2: Configure a Failover Client To create a failover client, update the configuration and code of the Java client for your primary cluster. Start by adding the keystore files from the Java client of your secondary cluster. Go to the directory where you extracted the Java client for your secondary cluster and then navigate to src/main/resources. Rename the client.keystore file to client2.keystore and rename the client.truststore file to client2.truststore to avoid overwriting the files in your primary cluster keystore. Copy both files over to the src/main/resources directory of your primary cluster. Update the code in the Java client (ClientwithSsl.java) of your primary cluster to include a failover class and the connection details for your secondary cluster. You can find these connection details in the Java client of your secondary cluster. Go to the directory where you extracted the Java client for your primary cluster and then navigate to src/main/java/com/hazelcast/cloud/. Open the Java client (ClientwithSsl.java) and make the following updates. An example failover client is also available for download. Step 3: Verify Failover Check that your failover client automatically connects to the secondary cluster when your primary cluster is stopped. Make sure that both Viridian Serverless clusters are running. Connect your failover client to the primary cluster in the same way as you did in Step 1. Stop your primary cluster. From the dashboard of your primary cluster, in Cluster Details, select Pause. In the console, you’ll see the following messages in order as the client disconnects from your primary cluster and reconnects to the secondary cluster: CLIENT_DISCONNECTED CLIENT_CONNECTED CLIENT_CHANGED_CLUSTER If you’re using the nonStopMapExample in the sample Java client, your client stops. This is expected because write operations are not retryable when a cluster is disconnected. The client has sent a put request to the cluster but has not received a response, so the result of the request is unknown. To prevent the client from overwriting more recent write operations, this write operation is stopped and an exception is thrown. Step 4: Exception Handling Update the nonStopMapExample() function in your failover client to trap the exception that is thrown when the primary cluster disconnects. Add the following try-catch block to the while loop in the nonStopMapExample() function. This code replaces the original map.put() function.try { map.put("key-" + randomKey, "value-" + randomKey); } catch (Exception e) { // Captures exception from disconnected client e.printStackTrace(); } 2. Verify your code again (repeat Step 3). This time the client continues to write map entries after it connects to the secondary cluster.

By Fawaz Ghali, PhD

Understanding and Solving the AWS Lambda Cold Start Problem

What Is the AWS Lambda Cold Start Problem? AWS Lambda is a serverless computing platform that enables developers to quickly build and deploy applications without having to manage any underlying infrastructure. However, this convenience comes with a downside—the AWS Lambda cold start problem. This problem can cause delays in response times for applications running on AWS Lambda due to its cold start problem, which can affect user experience and cost money for businesses running the application. In this article, I will discuss what causes the AWS Lambda cold start problem and how it can be addressed by using various techniques. What Causes the AWS Lambda Cold Start Problem? The AWS Lambda cold start problem is an issue that arises due to the initialization time of Lambda functions. It refers to the delay in response time when a user tries to invoke a Lambda function for the first time. This delay is caused by the container bootstrapping process, which takes place when a function is invoked for the first time. The longer this process takes, the more pronounced the cold start problem becomes, leading to longer response times and degraded user experience. How To Mitigate the Cold Start Problem AWS Lambda functions are a great way to scale your applications and save costs, but they can suffer from the “cold start” problem. This is where the function takes longer to respond when it has not been recently used. Fortunately, there are ways to mitigate this issue, such as pre-warming strategies for AWS Lambda functions. Pre-warming strategies help ensure that your Lambda functions are always ready and responsive by running them periodically in advance of when they are needed. Additionally, you can also warm up your Lambda functions manually by using the AWS Console or API calls. By taking these steps, you can ensure your applications will be able to respond quickly and reliably without any issues caused by cold starts. In the following sections, I’ll discuss two possible ways to avoid the cold start problem: 1. Lambda Provisioned Concurrency Lambda provisioned concurrency is a feature that allows developers to launch and initialize execution environments for Lambda functions. In other words, this facilitates the creation of pre-warmed Lambdas waiting to serve incoming requests. As this is pre-provisioned, the configured number of provisioned environments will be up and running all the time even if there are no requests to cater to. Therefore, this contradicts the very essence of serverless environments. Also, since environments are provisioned upfront, this feature is not free and comes with a considerable price. I created a simple Lambda function (details in the next section) and tried to configure provisioned concurrency to check the price—following is the screenshot. However, if there are strict performance requirements and cold starts are show stoppers then certainly, provisioned concurrency is a fantastic way of getting over the problem. 2. SnapStart The next thing I’ll discuss that can be a potential game changer is SnapStart. Amazon has released a new feature called Lambda SnapStart at re:invent 2022 to help mitigate the problem of cold start. With SnapStart, Lambda initializes your function when you publish a function version. Lambda takes a Firecracker micro VM snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and caches it for low-latency access. When you invoke the function version for the first time, and as the invocations scale up, Lambda resumes new execution environments from the cached snapshot instead of initializing them from scratch, improving startup latency. The best part is that, unlike provisioned concurrency, there is no additional cost for SnapStart. SnapStart is currently only available for Java 11 (Corretto) runtime. To test SnapStart and to see if it’s really worth, I created a simple Lambda function. I used the Spring Cloud function and did not try to create a thin jar. I wanted the package to be bulky so that I can see what SnapStart does. Following is the function code: Java public class ListObject implements Function<String, String> { @Override public String apply(String bucketName) { System.out.format("Objects in S3 bucket %s:\n", bucketName); final AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion(Regions.DEFAULT_REGION).build(); ListObjectsV2Result result = s3.listObjectsV2(bucketName); List<S3ObjectSummary> objects = result.getObjectSummaries(); for (S3ObjectSummary os : objects) { System.out.println("* " + os.getKey()); } return bucketName; } } At first, I uploaded the code using the AWS management console and tested it. I used a bucket, which is full of objects. Following is the screenshot of the execution summary. Note: the time required to initialize the environment was 3980.17 ms. That right there is the cold start time and this was somewhat expected as I’m working with a bulky jar file. Subsequently, I turned on SnapStart from the “AWS console”—> “Configuration”—> “General Configuration”—> “Edit.” After a while, I executed the function again and following is the screenshot of the execution summary. Note: here, “Init duration” has been replaced by “Restore duration” as with SnapStart, Lambda will restore the image snapshot of the execution environment. Also, it’s worth noting that the time consumed for restoration is 408.34 ms, which is significantly lower than the initialization duration. The first impression about SnapStart is that it is definitely promising and exciting. Let’s see what Amazon does with it in the coming days. In addition, Amazon announced, at re:invent 2022, that they are working on further advancements in the field of serverless computing, which could potentially eliminate the cold start issue altogether. By using Lambda SnapStart and keeping an eye out for future developments from AWS, developers can ensure their serverless applications are running smoothly and efficiently. Best Strategies to Optimize Your Serverless Applications on AWS Lambda Serverless applications on AWS Lambda have become increasingly popular as they offer a great way to reduce cost and complexity. However, one of the biggest challenges with serverless architectures is the AWS Lambda cold start issue. This issue can cause latency issues and affect user experience. As a result, optimizing Lambda functions for performance can be a challenge. To ensure that your serverless applications run smoothly, there are some best practices you can use to optimize their performance on AWS Lambda. These strategies include reducing latency with warm Lambdas, optimizing memory usage of Lambda functions, and leveraging caching techniques to improve the response time of your application. Also, as discussed in the above sections, provisioned concurrency and SnapStart are great ways to mitigate the Lambda cold start issue. With these strategies in place, you can ensure your serverless applications run as efficiently as possible and deliver the best user experience.

By Satrajit Basu CORE

Cloud Architecture

DZone's Featured Cloud Architecture Resources

Top Cloud Architecture Experts

The Latest Cloud Architecture Topics