Cloud architecture refers to how technologies and components are built in a cloud environment. A cloud environment comprises a network of servers that are located in various places globally, and each serves a specific purpose. With the growth of cloud computing and cloud-native development, modern development practices are constantly changing to adapt to this rapid evolution. This Zone offers the latest information on cloud architecture, covering topics such as builds and deployments to cloud-native environments, Kubernetes practices, cloud databases, hybrid and multi-cloud environments, cloud computing, and more!
API Gateway is the AWS service that allows interfacing an application's back-end with its front-end. The figure below shows an example of such an application, consisting of a web/mobile-based front-end and a back-end residing in a REST API, implemented as a set of serverless Lambda functions, as well as a number of legacy services. The figure above illustrates the so-called design pattern Legacy API Proxy, as described by Peter Sbarski, Yan Cui, and Ajay Nair in their excellent book Serverless Architectures on AWS (Manning, 2022). This pattern refers to a use case where Amazon API Gateway and Lambda are employed together, in order to create a new API layer over legacy APIs and services, such that to adapt and reuse them. In this design, the API Gateway exposes a REST interface invoking Lambda functions which, in turn, modify the requests and the responses or transform data to legacy-specific formats. This way, legacy services may be consumed by modern clients that don't support older protocols. This can be done, of course, using the AWS Console, by selecting the API Gateway service and, on the behalf of the proposed GUI (Graphical User Interface), by browsing among the dozens of possible options such that, about one hour later, to come to a functional skeleton. And when our API specifications are changing, i.e., several times per month, we need to start again, from the beginning. We shall not proceed accordingly. We will rather adopt an IaC (Infrastructure as Code) approach consisting in defining our API in a repeatable and deterministic manner. This could be done in several ways, via a script-based automation process using, for example, AWS CLI (Command Line Interpreter), CloudFormation, or Terraform. But there is another interesting alternative that most developers prefer: OpenAPI. And it's this alternative that we chose to use here, as shown further. Designing the REST Interface With OpenAPI In 2011, SmartBear Software, a small company specializing in testing and monitoring tools, developed Swagger, a set of utilities dedicated to the creation and documentation of RESTful services. Several years later in November 2015 under the auspices of the Linux Foundation, this same company was announcing the creation of a new organization, named OpenAPI Initiative. Other majors, like Google, IBM, etc., got committed as founding members. In January 2016, Swagger changed its name and became OpenAPI. OpenAPI is a formalism based on the YAML notation, which could also be expressed in JSON. It aims at defining REST APIs in a language-agnostic manner. There are currently a lot of tools around OpenAPI and our goal here isn't to extensively look at all the possibilities which are open to us, as far as these tools and their utilization is concerned. One of the most common use cases is probably to login to the SwaggerHub online service, create a new API project, export the resulted YAML file, and use it in conjunction with the SAM (Serverless Application Model) tool in order to expose the given API via Amazon API Gateway. And since we need to illustrate the modus operandi described above, let's consider the use case of a money transfer service, named send-money. This service, as its name clearly shows it, is responsible to perform bank account transfers. It exposes a REST API whose specifications are presented in the table below: Resource HTTP Request Action Java Class /orders GET Get the full list of the currently registered orders GetMoneyTransferOrders /orders POST Create a new money transfer order CreateMoneyTransferOrder /orders PUT Update an existing money transfer order UpdateMoneyTransferOrder /orders/{ref} GET Get the money transfer order identified by itsreference passed as an argument GetMoneyTransferOrder /orders/{ref} DELETE Remove the money transfer order identified by its reference passed as an argument RemoveMoneyTransferOrder This simple use case, consisting of a CRUD (Create, Read, Update, Delete) and exposed as a REST API, is the one that we chose to implement here, such that to illustrate the scenario described above and here are the required steps: Go to the Send Money API on SwaggerHub. Here you'll find an already prepared project showing the OpenAPI specification of the REST API defined in the table above. This is a public project and, in order to get access, one doesn't need to register and log in. You'll be presented with a screen similar to the one in the figure below: This screen shows in its left pane the OpenAPI description of our API. Once again, the full explanation of the OpenAPI notation is out of our scope here, as this topic might make the subject of an entire book, like the excellent one of Joshua S. Ponelat and Lukas L. Rosenstock, titled Designing APIs with Swagger and OpenAPI (Manning 2022). The right pane of the screen presents schematically the HTTP requests of our API and allows, among others, to test it. You may spend some time browsing in this part of the screen, by clicking the button labeled with an HTTP request and then selecting Try it out. Notice that these tests are simulated, of course, as there is no concrete implementation behind them. However, they allow you to make sure that the API is correctly defined, from a syntactical and a semantic point of view. Now that you finished playing with the test interface, you can use the Export -> Download API -> YAML Resolved function located in the screen's rightmost upper corner to download our API OpenAPI definition in YAML format. In fact, you don't really have to do that because you can find this same file in the Maven project used to exemplify this blog ticket. Let's have now a quick look at this YAML file. The first thing we notice is the declaration openapi: which defines the version of the notation that we're using: in this case, 3.0.0. The section labeled info identifies general information like the API name, its author, and the associated contact details, etc. The next element, labeled servers: defines the auto-mocking function. It allows us to run the simulated tests outside the SwagerHub site. Just copy the URL declared here and use it with your preferred browser. Last but not least, we have the element labeled paths: where our API endpoints are defined. There are two such endpoints: /orders and /orders/{ref}. For each one, we define the associated HTTP requests, their parameters as well as the responses, including the HTTP headers. OpenAPI is an agnostic notation and, consequently, it isn't bound to any specific technology, framework, or programming language. However, AWS-specific extensions are available. One of these extensions is x-amazon-apigateway-integration which allows a REST endpoint to connect to the API Gateway. As you can see looking at the OpenAPI YAML definition, each endpoint includes an element labeled x-amazon-apigateway-integration which declares, among others, the URL of the Lambda function where the call will be forwarded. The Project Ok, we have an OpenAPI specification of our API. In order to generate an API Gateway stack out of it and deploy it on AWS, we will use SAM, as explained above. For more details on SAM and how to use it, please don't hesitate to have a look here. Our Java project containing all the required elements may be found here. Once you cloned it from GitHub, open the file template.yaml. We reproduce it below: YAML AWSTemplateFormatVersion: '2010-09-09' Transform: 'AWS::Serverless-2016-10-31' Description: Send Money SAM Template Globals: Function: Runtime: java11 MemorySize: 512 Timeout: 10 Tracing: Active Parameters: BucketName: Type: String Description: The name of the S3 bucket in which the OpenAPI specification is stored Resources: SendMoneyRestAPI: Type: AWS::Serverless::Api Properties: Name: send-money-api StageName: dev DefinitionBody: Fn::Transform: Name: AWS::Include Parameters: Location: Fn::Join: - '' - - 's3://' - Ref: BucketName - '/openapi.yaml' MoneyTransferOrderFunction: Type: AWS::Serverless::Function Properties: FunctionName: MoneyTransferOrderFunction CodeUri: send-money-lambda/target/send-money.jar Handler: fr.simplex_software.aws.lambda.send_money.functions.MoneyTransferOrder::handleRequest Events: GetAll: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: GET Get: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: GET Create: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: POST Update: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: PUT Delete: Type: Api Properties: RestApiId: Ref: SendMoneyRestAPI Path: /orders Method: DELETE ConfigLambdaPermissionForMoneyTransferOrderFunction: Type: "AWS::Lambda::Permission" DependsOn: - SendMoneyRestAPI Properties: Action: lambda:InvokeFunction FunctionName: !Ref MoneyTransferOrderFunction Principal: apigateway.amazonaws.com Our template.yaml file will create an AWS CloudFormation stack containing an API Gateway. This API Gateway will be generated from the OpenAPI specification that we just discussed. The DefinitionBody element in the SendMoneyAPI resource says that the API's endpoints are described by the file named openapi.yaml located in an S3 bucket, which name is passed as an input parameter. The idea here is that we need to create a new S3 bucket, copy into it our OpenAPI specifications in the form of an yaml file, and use this bucket as an input source for the AWS CloudFormation stack containing the API Gateway. A Lambda function, named MoneyTransferOrderFunction, is defined in this same SAM template as well. The CodeUri parameter configures the location of the Java archive which contains the associated code, while the Handler one declares the name of the Java method implementing the AWS Lambda Request Handler. Last but not least, the Event paragraph sets the HTTP requests that our Lambda function is serving. As you can see, there are 5 endpoints, labeled as follows (each defined in the OpenAPI specification): GetAll mapped to the GET /orders operation Get mapped to the GET /orders/{ref} operation Create mapped to the POST /orders operation Update mapped to the PUT /orders operation Delete mapped to the DELETE /orders/{ref} operation To build and deploy the project, proceed as shown in the listing below: Shell $ mkdir test-aws $ cd test-aws $ git clone https://github.com/nicolasduminil/aws-showcase ... $mvn package ... $ ./deploy.sh ... make_bucket: bucketname-3454 upload: ./open-api.yaml to s3://bucketname-3454/openapi.yaml Uploading to 73e5d262c96743505970ad88159b929b 2938384 / 2938384 (100.00%) Deploying with following values =============================== Stack name : money-transfer-stack Region : eu-west-3 Confirm changeset : False Disable rollback : False Deployment s3 bucket : bucketname-3454 Capabilities : ["CAPABILITY_IAM"] Parameter overrides : {"BucketName": "bucketname-3454"} Signing Profiles : {} Initiating deployment ===================== Uploading to b0cf548da696c5a94419a83c5088de48.template 2350 / 2350 (100.00%) Waiting for changeset to be created.. CloudFormation stack changeset ... Successfully created/updated stack - money-transfer-stack in eu-west-3 Your API with ID mtr6ryktjk is deployed and ready to be tested at https://mtr6ryktjk.execute-api.eu-west-3.amazonaws.com/dev In this listing, we start by cloning the Git repository containing the project. Then, we execute a Maven build, which will package the Java archive named send-money-lambda.jar, after having performed some unit tests. The script deploy.sh, like its name implies, is effectively responsible to fulfill the deployment operation. Its code is reproduced below: Shell #!/bin/bash RANDOM=$$ BUCKET_NAME=bucketname-$RANDOM STAGE_NAME=dev AWS_REGION=$(aws configure list | grep region | awk '{print $2}') aws s3 mb s3://$BUCKET_NAME echo $BUCKET_NAME > bucket-name.txt aws s3 cp open-api.yaml s3://$BUCKET_NAME/openapi.yaml sam deploy --s3-bucket $BUCKET_NAME --stack-name money-transfer-stack --capabilities CAPABILITY_IAM --parameter-overrides BucketName=$BUCKET_NAME aws cloudformation wait stack-create-complete --stack-name money-transfer-stack API_ID=$(aws apigateway get-rest-apis --query "items[?name=='send-money-api'].id" --output text) aws apigateway create-deployment --rest-api-id $API_ID --stage-name $STAGE_NAME >/dev/null 2>&1 echo "Your API with ID $API_ID is deployed and ready to be tested at https://$API_ID.execute-api.$AWS_REGION.amazonaws.com/$STAGE_NAME" We're using here the $$ Linux command which generates a random number. By appending this randomly generated number to the S3 bucket name that will be used in order to store the OpenAPI specification file, we satisfy its region-wide uniqueness condition. This bucket name is further stored in a local file, such that it can be later retrieved and cleaned up. Notice also the aws configure command used in order to get the current AWS region. The command aws s3 mb is creating the S3 bucket. Here mb states for make bucket. Once the bucket is created, we'll be using it in order to store inside the open-api.yaml file, containing the API specifications. This is done on the behalf of the command aws s3 cp. Now, we are ready to start the deployment process. This is done through the sam deploy command. Since this operation might take a while, we need to wait until the AWS CloudFormation stack is completely created before continuing. This is done by the statement aws cloudformation wait, as shown in the listing above. The last operation is the deployment of the previously created API Gateway, done by running the aws apigateway create-deployment command. Here we need to pass, as an input parameter, the API Gateway identifier, retrieved on the behalf of the command aws apigateway get-rest-api, which returns information about all the current API Gateways. Then, using the --query option, we filter among the JSON payload, in order to find ours, named send-money-api. At the end of its execution, the script displays the URL of the newly created API Gateways. This is the URL that can be used for testing purposes. For example, you may use Postman, if you have it installed, or simply the AWS Console, which benefits a nice and intuitive test interface. If you decide to use the AWS Console, you need to select the API Gateway service and you'll be presented with the list of all current existent ones. Clicking on the one named send-money-api will display the list of the endpoint to be tested. For that, you need to start, of course, by creating a new money transfer order. You can do this by pasting the JSON payload below in the request body: JSON { "amount": 200, "reference": "reference", "sourceAccount": { "accountID": "accountId", "accountNumber": "accountNumber", "accountType": "CHECKING", "bank": { "bankAddresses": [ { "cityName": "poBox", "countryName": "countryName", "poBox": "cityName", "streetName": "streetName", "streetNumber": "10", "zipCode": "zipCode" } ], "bankName": "bankName" }, "sortCode": "sortCode", "transCode": "transCode" }, "targetAccount": { "accountID": "accountId", "accountNumber": "accountNumber", "accountType": "CHECKING", "bank": { "bankAddresses": [ { "cityName": "poBox", "countryName": "countryName", "poBox": "cityName", "streetName": "streetName", "streetNumber": "10", "zipCode": "zipCode" } ], "bankName": "bankName" }, "sortCode": "sortCode", "transCode": "transCode" } } If the status code appearing in the AWS Console is 200, then the operation has succeeded and now you can test the two GET operations, the one retrieving all the existent money transfer orders and the one getting the money transfer order identified by its reference. For this last one, you need to initialize the input parameter of the HTTP GET request with the value of the money transfer order reference which, in our test, is simply "reference". In order to test the PUT operation, just paste in its body the same JSON payload used previously to test the POST, and slightly modify it. For example, modify the amount to 500 instead of 200. Test again now the two GET operations and they should retrieve a newly updated money transfer order, this time having an amount of 500. When you finished playing with the AWS Console interface, test the DELETE operation and paste the same reference in its input parameter. After that, the two GET operations should return an empty result set. If you're tired to use the AWS Console, you can switch to the provided integration test. First, you need to open the FunctionsIT class in the send-money-lambda Maven module. Here, you need to make sure that the static constant named AWS_GATEWAY_URL matches the URL displayed by the deploy.sh script. Then compile and run the integration tests as follows: Shell mvn test-compile failsafe:integration-test You should see statistics showing that all the integration tests have succeeded. Have fun!
The advent of the Internet has brought revolutionary changes in the IT world. One of the notable changes is that virtualization has advanced with the Internet to become an integral part of the IT infrastructure of modern organizations. As a result, companies are now relying on the virtual online entity housing data and services, commonly referred to as the cloud. The switch to the cloud was brought on by the exponential data growth in the last couple of decades. In fact, studies predict that by 2025, the cloud will be storing up to 100 zettabytes of data. What Is the Cloud? The cloud refers to a global network of remote servers, each with a unique function that are connected and work together as a unitary ecosystem. In simple terms, the cloud describes what we commonly know as the “internet.” This remote network of servers is designed to either store and manage data, run applications, or deliver content or a service such as streaming videos or accessing social media networks for anyone with an internet connection. What Is Cloud Computing? It is the provision of computing resources such as servers, storage, databases, networking, software, analytics, and intelligence over the cloud (internet). Cloud computing eliminates the need for enterprises to acquire, configure, or manage resources themselves, and instead, only pay for what they use. Virtual computers gained popularity in the 1990s when the IT industry started to rent virtual private networks. Their use sped up the development of the cloud computing infrastructure that organizations use today. Cloud computing offers a variety of benefits for businesses with some of the key ones being: Flexible resources Cost savings Scalability with growing business needs Data recovery Security With that being said, there are three main types of cloud computing deployments: Public Cloud - An open infrastructure for general public usage. Private Cloud - Computing infrastructure that’s exclusively used by a single organization. Hybrid Cloud - A combination of private and public cloud infrastructures. Community Cloud - A collaborative cloud infrastructure shared by a community of organizations with similar requirements and regulations. Single and multi-cloud concepts come from employing these deployment types from either one or numerous vendors. What Is a Single Cloud? Single cloud is a cloud computing model where organizations rely on a single third-party vendor for their cloud computing services. The provider maintains on-premise servers to provide either of the following cloud services in the single-cloud environment: Software-as-a-Service (SaaS) - a software on-demand service allowing users to utilize cloud-based applications such as email. Infrastructure-as-a-Service (IaaS) - provides computing resources hosted on the cloud. Amazon Web Services (AWS) is a famous IaaS example. Platform-as-a-Service (PaaS) - offers a development and deployment environment hosted on a provider's cloud infrastructure. A good example in this category is Google App Engine. Single Cloud Use Cases The single cloud strategy is suitable for companies with the following use cases: Strict organizational regulations are in place for data and workload governance. Insufficiency of skilled cloud engineers for efficient cloud workload management. Less cloud workload that a single provider can manage. Single Cloud Strategy Advantages It is easier to manage as it does not require workload migration between multiple cloud providers. Privacy and control are maintained. Needs limited resources in terms of cloud engineering staffing as well as managing vendor relationships. Faster workload handling with a single provider. Reduced risk of data inconsistencies. Easier to hold a single vendor accountable in case of any cloud issues. Single Cloud Strategy Disadvantages Hard to avoid vendor lock-in with single platform dependencies. It costs more to have all workloads managed by a single vendor. Choosing the right vendor is difficult as a single provider has limited cloud resources and flexibility in design. Risk of cloud resource unavailability due to any cloud issues that result in a single point of failure. What Is Multi-Cloud? Multi-cloud describes a cloud computing model where organizations use multiple cloud providers for their infrastructure requirements. The name multi-cloud refers to the use of multiple cloud providers, accounts, availability zones, premises, or a combination of them. Multi-Cloud Use Cases The multi-cloud strategy is suitable for companies with the following use cases: You are unable to fulfill business requirements with a single cloud. Multi-cloud meets the proximity requirements of your globally distributed users and service requirements in different regions. When the workload is big, varying, and needs to be distributed, which calls for specific cloud services. The regulations you are subject to require some data in private clouds for security reasons. Multi-Cloud Strategy Advantages Organizations consider a multi-cloud environment for the following benefits: It is a creative approach to simultaneously executing disparate workloads that offers customizable and flexible cloud services. Organizations spend less time by moving workloads between multiple clouds offering the required services at the best prices. You can switch vendors to ensure data availability by reducing vulnerabilities to cloud issues. Having multiple vendors reduces vendor dependencies and saves you from being locked into a single vendor. Multiple cloud providers in different deployment regions enable you to meet data sovereignty requirements for global cloud services. This minimizes concerns about non-compliance with government regulations. Multi-Cloud Strategy Disadvantages The multi-cloud model comes with the following disadvantages: Multi-cloud management can get complicated due to issues such as multi-vendor management, cloud computing inconsistencies, and inefficiencies, as well as task redundancies. Data migration between multiple cloud vendors can have cost overheads and slow down performance. Workload implementation can be inconsistent due to distribution among multiple clouds. Companies require excessive cloud engineering expertise to manage multi-cloud computing. Single Cloud vs. Multi-Cloud: The Key Differences This table gives you a side-by-side comparison of the single cloud vs multi-cloud strategies: Differences Single Cloud Multi-Cloud Vendors Single vendor dependency Multiple vendors offering more control Cost Payment to one provider Payment to multiple providers Purpose Provides single service Handles multiple services with multiple solutions Required Skillset Fewer cloud engineersrequired to manage the cloud Require extensive cloud engineering teamswith strong multi-cloud expertise Security Easier to ensure data compliance Less secure with distributed sensitive data Disaster Recovery Single point of failure making it vulnerable to disasters Easier disaster recovery Management Easier management Complex management The Cloud Portability Myth Under the Multi-Cloud Model and Potential Workarounds Migrating cloud services in a multi-cloud environment is always vulnerable to disruption. Cloud portability potentially reduces this vulnerability by facilitating the transfer of services between cloud environments with minimal disruption. While cloud portability may seem practical, some underlying complexities render this concept mythical. Essentially, cloud environments are migrated in compiled containers that make an entire cloud environment portable. However, while the containers may be portable, other public clouds cannot execute them without the underlying cloud-native services. Consequently, migrating this way defeats the purpose of employing a multi-cloud strategy. Achieving cloud portability may be complex, but companies still opt for the multi-cloud strategy to keep up with their competitors. The key is to find out how to work around this myth to run your multi-cloud models successfully. A trial-and-error approach would be to make multiple copies of compiled containers for each cloud environment. The container copy that offers the correct solution passes for deployment in other cloud platforms. Alternatively, you can use a Platform-as-a-Service option to provide portable services that are not dependent on specific cloud platforms. This aspect makes migrating such an application platform achievable for organizations. Single Cloud vs. Multi-Cloud Strategy: Which Is Better? When it comes to single cloud vs. multi-cloud strategies, businesses are increasingly adopting the multi-cloud model. This strategy is favored as it allows you to work globally with data and applications spread across various cloud servers and data centers. However, such a model only suits large organizations because setting up and maintaining a multi-cloud environment is a costly and complex task. Additionally, they require excessive resources and robust strategies to optimize cloud migration. It is important to note that despite the use of optimized strategies, cloud portability still remains a myth for multi-cloud organizations. Primarily, at some point, your cloud portability workarounds are bound to become too complex to manage. These complexities include: Lack of knowledgeable staff Absence of holistic disaster management Security gaps Are all these complexities worth investing in a multi-cloud strategy? The answer depends on your company’s use cases. However, another key consideration, in this case, is focusing on choosing the "right vendor" on top of debating the single cloud vs. multi-cloud strategies, as it is vital to finding the best solution for your business. Conclusion Depending on your use case, being locked to a single vendor does more good to an organization than delving into multi-vendor complexities. The opposite is also true. To sum it up, instead of working around a myth, cloud optionality gives you a better chance to adopt a successful cloud strategy. While it may prolong the vendor selection process, if either a single cloud or a multi-cloud strategy is right for your business, you can save your company from costly cloud expenses.
In today's rapidly evolving technology landscape, cloud infrastructure has become an indispensable part of modern business operations. To manage this complex infrastructure, documenting its setup, configuration, and ongoing maintenance is critical. Without proper documentation, it becomes challenging to scale the infrastructure, onboard new team members, troubleshoot issues and ensure compliance. At Provectus, I have witnessed the advantages of handing over projects with proper documentation and how it allows successful transition and preserves customer satisfaction. Whether you are an active engineer, an engineering team leader, or a demanding user of cloud infrastructure, in this article, I will help you to understand the importance of documentation and offer some easy steps for implementing best practices. Why Is Documentation Important? Documentation is a key feature that allows for the consistent maintenance of any process. It is a storehouse of intelligence that can be accessed for future reference and replicated if needed. For example, if an engineer or anyone in the organization has performed, tested, and improved a process, failure to document it would be a waste of intellectual capital and a loss to the organization. Documentation is important for many reasons: It helps to keep processes and systems up to date for usage It helps with the onboarding and training of new team members It helps to improve security by imposing boundaries It functions as a means of proof for audits It provides a starting point when documenting from scratch It helps to continuously improve processes Documenting your cloud infrastructure is imperative for its smooth and efficient operation. What Should Be Documented for Cloud Infrastructure? In the past, building a computing infrastructure required huge investment and vast planning, taking into consideration the required expertise in the field and the needs of your organization. Once servers and hardware were purchased, it was very difficult to make any significant changes. The cloud brought with it significant improvements, making it much easier and more feasible to implement the infrastructure. But still, the ability to make changes and improvements is highly dependent on accurate documentation. Following is a basic list of requirements for documentation to ensure that your cloud infrastructure is easy to use and update. Architecture Diagrams An architecture diagram is a visual representation of cloud components and the interconnections that support their underlying applications. The main goal of creating an architecture diagram is to communicate with all stakeholders — clients, developers, engineers, and management — using a common language that everyone can understand. To create a diagram, you need a list of components and an understanding of how they interact. You may need to create multiple diagrams if the architecture is complex or if it has several environments. There are user-friendly tools to help you with this first step, many of which are free. For example, Diagrams.net (formerly Draw.io), Miro, SmartDraw, Lucidchart, and others. Creating an architecture diagram will help with future planning and design when you are ready to improve the infrastructure. It will help you to easily spot issues or areas that need improvement. Your diagram can also help with troubleshooting. Engineers will be able to use it to detect flaws in the system and discover their root causes. It will also help with compliance and security requirements. How-To Instructions Your infrastructure will likely host many features and applications that require specific steps for access. How-to instructions provide end users with a detailed step-by-step guide that streamlines various processes and saves time. Such instructions are sometimes referred to as detailed process maps, DIYs (do it yourself), walkthroughs, job aids, tutorials, runbooks, or playbooks. Some examples of processes that can benefit from how-to instructions include: How to request access for developers How to subscribe to an SNS topic How to rotate IAM Access Keys How to retrieve ALB logs Policies Your cloud infrastructure will have its own policies, whether they are predefined by the IT department or created in collaboration with different teams. Some policies that can be documented include: Access policies: What security measures are in place, and what is required for various individuals, groups, or roles to gain access? What are the premises and procedures for access removal? Are we compliant with the least privilege access best practice? Security policies: Protective policies for management, practices, and resources for data in the cloud. Data privacy policies: Data must be classified and collected in ways that keep it secure and protected from unauthorized access. Compliance policies: Which regulations and auditing processes must be complied with to use cloud services? What are the responsibilities of Infrastructure team members? Incident and change management: Define the necessary steps to respond to incidents and changes; define outage prioritization, SLA response time, ownership, and post-mortem processes. Monitoring: Along with incident management, there should be documentation of monitors and channels in place to ensure that the infrastructure is up and running. Monitoring is a 24/7 preventative approach to incident management. Disaster Recovery A Disaster Recovery plan is one of the most important yet least prioritized documents. It should outline the procedures needed to restore services after a disaster event. The document should cover at least the following items: Scope Steps to restore service as soon as possible How to determine damage or data loss, risk assessment Emergency response — who should be notified and how? Steps to back up all data and services The main goal of a Disaster Recovery plan is to ensure that business operations continue, even after a disaster. Failure to recover presents a large gap in the infrastructure. Best Practices You Should Follow Formatting When creating documentation, it is important to follow certain rules. Let's identify them: Organization: A stable company will usually have a brand book that establishes boundaries and provides guidelines for content. In the case of documentation, you may need to use a specific font, size, and layout, and you may be required to include a logo or other elements. Before documenting, find out what the company requirements are. If there are no established guidelines, create your own to establish consistency across your department. Grammar: The way you communicate while documenting should also follow a standard. Some best practices include: Use an active voice, i.e., The entire infrastructure is described as code via terraform. Avoid a passive voice, i.e.: Terraform was used to describe the entire infrastructure as code. Avoid long sentences. Stick with simple structured sentences that are easy for the reader to follow. Create a glossary of abbreviations, and use consistent terminology. For example, if you mention an SSL certificate but you use TLS instead, the reader might be confused. Use appropriate verb tenses: For example, use the present tense for describing a procedure and the past tense for describing a completed action. Storage: When saving the document, always use a conventional name that makes it easy to find and share with others. Store the file in the most appropriate path or structure, such as a particular file system or a collaborative tool like Confluence. File naming example: departmentname_typeofdocument_nameofdocument_mm_yyyy ManagedServices_internal_stepsfordocumentation_03_2023 Content How you display your document’s content plays a relevant role in the entire process. A document that is attractively laid out and easy to read will help to prevent confusion and avoid unnecessary questions. Here are some tips for you: Screenshots: A picture is worth a thousand words. Use screenshots to help the user better relate to your instructions. Within your AWS Account, go to the EC2 Dashboard and check the Security groups. Diagrams: A flow chart provides a visual aid to help you describe a step-by-step process so that the reader can easily identify which step they are on. Open the console Ping the corresponding IP If you get an error, copy and paste the message Open ticket in AnyDesk Paste the error message Assign to AnyTeam Table of contents: Use heading formats to create a table of contents. If the document is quite large, the reader will have the option to jump to a specific section. That reader could be you, wanting to update the document a few months later! Troubleshooting: Readers will likely have some issues when putting your document into action. Be sure to include a troubleshooting section to help resolve common problems. Lifecycle One of the most common mistakes in documenting is to think documentation is over because the project is up and running. Keeping your documentation up to date is an important part of the documentation lifecycle: Maintenance: Considering that your Infrastructure is constantly changing, your documentation must be kept current. Outdated documentation will misinform others and could trigger disastrous actions. Back-up: Always keep a backup of your documents. Ideally, your place of storage should have certain features by default, like versioning control, searching, filtering, collaboration, etc. But it is also a good practice to keep your own backup – it might be useful one day. Share: Once you have completed documentation, share it with potential users and ask for feedback. They can help suggest improvements that make your documentation more robust. Conclusion If you are not 100% convinced about the benefits of documentation, think of it this way: No one wants to waste time figuring out someone else’s work or reinventing the wheel by creating a project that has already grown and evolved. Documentation that is clear, concise, and easy to understand is the first step toward building a successful cloud infrastructure.
What Is a Highly-Available System? You call a system highly available when it can remain operational and accessible even when there are hardware and software failures. The idea is to ensure continuous service. We all want our system to be highly-available. It seems like a good thing to have and makes for a nice bullet point in our application description. But designing a high-availability system is not an easy task. So, how can you go about it? The most reliable approach is to leverage the concept of static stability. But before we get to the meaning of this term, it’s important to understand the concept of availability zones. What Are Availability Zones? You must have heard about availability zones in AWS or other cloud platforms. If not, here’s a quick definition of the term from the context of AWS: Availability Zones are isolated sections of an AWS region. They are physically separated from each other by a meaningful distance so that a single event cannot impair them all at once. For perspective, this single event could be a lightning strike, tornado, or even an earthquake. The Godzilla attack makes it really clear that building an availability zone is not trivial engineering. To achieve this incredible level of separation, availability zones don’t share power or other infrastructure. However, they are connected with fast and encrypted fiber-optic networking so that application failover can be smooth as butter. This means that in the case of a catastrophic hardware or software failure, the workloads can be quickly and seamlessly transferred to another server without loss of data or interruption of service. Moreover, the use of encryption ensures that sensitive data transmitted across the network status secure from any type of unauthorized access. Here’s a picture showing the AWS Global Infrastructure from a few years ago. AWS Global Infra (source: AWS Website) The orange circles denote a region and the number within those circles is the number of availability zones within that region. What's Static Stability? Let's get back to our key term: static stability. Availability zones let you build systems with high availability. But, you can go about it in two ways: Reactive Proactive In a reactive approach, you let the service scale up in another availability zone after there is some sort of disruption in one of the zones. You might use something like AWS Autoscaling Group to manage the scale-up automatically. But the idea is that you react to the impairments when they happen rather than being prepared in advance. In a proactive approach, you over-provision the infrastructure in a way that your system continues to operate satisfactorily even in the case of disruption within a particular Availability Zone. The proactive approach ensures that your service is statically stable. A lot of AWS services use static stability as a guiding principle. Some of the most popular ones are: AWS EC2 AWS RDS AWS S3 AWS DynamoDB AWS ElastiCache If your system is statically stable, it keeps working even when a dependency becomes impaired. For example, the AWS EC2 service supports static stability by targeting high availability for its data plane (the one that manages existing EC2 instances). This means that once launched, an EC2 instance has local access to all the information it needs to route packets. The main benefit of this approach is that instances can operate independently and maintain their own local state even in the case of a network or service disruption. However, leveraging static stability is not just for the cloud provider. You can also use static stability while designing your own applications for the cloud. Let’s look at a couple of patterns that use the concept of static stability. Pattern 1: Active-Active High-Availability Using AZs Here’s an example of how you can implement a load-balanced HTTP service. Active-Active High-Availability with AZs You have a public-facing load balancer targeting an auto scaling group that spans three availability zones in a particular Region. Also, you make sure to over-provision capacity by 50%. If an AZ goes down for whatever reason, you don’t need to do much to support the system. The EC2 instances within the problematic AZ will start failing health checks and the load balancer will shift traffic away from them. This is an important mechanism since constant monitoring helps the load balancer quickly identify any instances that are experiencing issues and work on the appropriate fallback without human intervention. Since the setup is statically stable, it will continue to remain operational without hiccups. Pattern 2: Active-Standby on Availability Zones The previous pattern dealt with stateless services. However, you might also need to implement high availability for a stateful service. A prime example is a database system such as Amazon RDS. A typical high-availability setup for this requirement needs a primary instance that takes all the writes and a standby instance. The standby instance will be kept in a different availability zone. Here’s what it looks like: Active-Standby High Availability with AZs When the primary AZ goes down for whatever reason, RDS manages the failover to the new primary (the standby instance). Again, since we have already over-provisioned, there is no need to create new instances. The switchover can happen seamlessly without impacting the availability. In essence, the service is statically stable. So, What’s the Takeaway? In both patterns, you already provisioned the capacity needed in case of an availability zone goes down. In either case, you are not trying to create new instances on the fly since you have already over-provisioned the infrastructure across AZs. This means your systems are statically stable and can easily survive outages or disruptions. In other words, your system is highly-available in a proactive manner which is an extremely good characteristic to have. Over to You Does high availability matter to you? If yes, how do you handle it within your applications? What techniques do you use? Write your replies in the comments section. The inspiration for this post came from this wonderful paper released as part of the Amazon Builders Library. You can check it out in case you are interested in going deeper into the theoretical foundations of static stability. If you found today’s post useful, consider sharing it with friends and colleagues.
A couple of years ago, I developed an app that helped me manage my conference submission workflow. Since then, I have been a happy user of the free Heroku plan. Last summer, Heroku's owner, Salesforce, announced that it would stop the free plan in November 2022. I searched for a new hosting provider and found Scaleway. In this post, I'd like to explain my requirement, why I chose them, and my experience using them. The Context I've already described the app in previous blog posts, especially the deployment part. Yet, here's a summary in case you want to avoid rereading it. The source of truth is Trello, where I manage the state of my conference CFPs: Backlog, Submitted, Abandoned, Accepted, and Published (on this blog). The "done" state is when I archive a Published card. I wrote the application in Kotlin with Spring Boot. It's a web app that listens to change events from Trello via webhooks. An event starts a BPMN workflow based on Camunda. The workflow manages my Google Calendar and a Google Sheet file. For example, when I move a card from Backlog to Submitted, it adds the conference to my calendar. The event is labeled as "Free" and has a particular gray color to mark it's a placeholder. It also adds a line in the Google Sheet with the status "Submitted." When I move the card from Submitted to Accepted, it changes the Google Calendar event color to default and marks it as "Busy." It also changes the Google Sheet status to "Accepted." Why Scaleway? As I mentioned in the introduction, I was a happy Heroku user. One of the great things about Heroku, apart from the free plan, was the "hibernating" feature: when the app was not in use, it switched it off. In essence, it was scale-to-zero for web apps. The first request in a while was slower, but it wasn't an issue for my usage. The exciting bit is that scale-to-zero was not a feature but Heroku's way to keep costs low. Outside of Heroku's free plan, automatic scaling can only scale down to 1. I'm a big fan of Viktor Farcic's YouTube channel, DevOps Toolkit. At the same time Heroku announced the end of its free plan, I watched "Scaleway - Everything We Expect From A Cloud Computing Service?". By chance, Scaleway offers free credits to startups, including the company I'm currently working for. It didn't take long for me to move the application to Scaleway. Deploying on Scaleway Before describing how to deploy on Scaleway, let's explain how I deployed on Heroku. The latter provides a Git repo. Every push to master triggers a build based on what Heroku can recognize. For example, if it sees a pom.xml, it knows it's a Maven project and calls the Maven command accordingly. Under the hood, it creates a regular Docker container and stores and runs it. For the record, this approach is the foundation of Buildpacks, and Heroku is part of its creators along with VMWare. On Heroku, developers follow their regular workflow, and the platform handles both the build and the deployment parts. Scaleway offers a dedicated scale-to-zero feature for its Serverless Containers offering. First, you need to have an already-built container. When I started to use it, the container was to be hosted on Scaleway's dedicated Container Registry; now, it can be hosted anywhere. On the UI, one chooses the container to deploy, fills in environment variables and secrets, and Heroku deploys it. Main Issues I stumbled upon two main issues using Scaleway so far. The GUI is the only way to deploy a container: It's a good thing to start with, but it doesn't fit regular usage. The industry standard is based on build pipelines, which compile, test, create container images, store them in a registry, and deploy them on remote infrastructure. You need to fill in secrets on every deployment: GitHub and GitLab both allow configuring deployed containers with environment variables. This way, one can create a single container but deploy it in different environments. You can configure some environment variables as secrets. Nobody can read them afterward, and they don't appear in logs. Scaleway also offers secrets. However, you must fill them out at every deployment. Beyond a couple of them, it's unmanageable. Bugs and Scaleway's Support In my short time using Scaleway, I encountered two bugs. The first bug was a long delay between the time I uploaded a container in Scaleway's registry and the time it was available for deployment. It lasted for a couple of days. The support was quick to answer the ticket, but afterward, it became a big mess. There were more than a couple of back-and-forth messages until the support finally acknowledged that the bug affected everybody. The worst was one of the messages telling me it was due to an existing running container of mine failing to start; i.e., the bug was on my side. The second bug happened on the GUI. The deployment form reset itself while I was filling in the different fields. I tried to be fast enough when filling but to no avail. The same happened as with the previous issue: many back and forths, and no actual fixing. Finally, I tried a couple of days after I created the ticket, and I informed the support. They answered that it was normal because they had fixed it, but without telling me. Finally, I opened a ticket to ask whether an automated deployment option was possible. After several messages, the support redirected me to a GitHub project. The latter offers a GitHub Action that seemed to fulfill my requirement. Unfortunately, it cannot provide a way to configure the deployed container with environment variables. The only alternative the support offers is to embed environment variables in the container, including secrets. Regardless of the issues, the support's relevance ranges from average to entirely useless. Logging All Cloud providers I've tried so far offer a logging console. My experience is that the console looks and behaves like a regular terminal console: the oldest log line is on top and the newest at the bottom, and one can scroll through the history, limited by a buffer. Scaleway's approach is completely different. It orders the log lines in the opposite order, the newest first and oldest last. Worse, there's no scrolling but pagination. Finally, there's no auto-refresh! One has to paginate back and forth to refresh the view - and if new log lines appear, pages don't display the same data. It severely impairs the developer experience and makes trying to follow the logs hard. I tried to fathom why Scaleway implemented the logging console this way and came up with a couple of possible explanations: Engineering doesn't eat its own dog food Engineering doesn't care about Developer Experience It was cheaper this way Product said it was a bad Developer Experience but Engineering did it anyway because of one of the reasons above and it has more organizational power In any case, it reflects poorly on the product. Conclusion Even though my usage of Scaleway is 100% free, I'm pretty unhappy about the deployment part. I came for the free credits and the scale-to-zero capability. However, the lack of an acceptable automated deployment solution and the support of heterogeneous quality (to be diplomatic) make me reconsider. On the other hand, the Scaleway Cloud service itself has been reliable so far. My Trello workflow runs smoothly, and I cannot complain. Scaleway is typical of a not-bad product ruined by an abysmally bad Developer Experience. If you're developing a product, be sure to take care of this aspect of things: the perception of your product can take a turn for the worse because of a lack of consideration for developers. To Go Further: Google Cloud Run
Data migration is the process of moving data from one location to another, which is an essential aspect of cloud migration. Data migration involves transferring data from on-premise storage to the cloud. With the rapid adoption of cloud computing, businesses are moving their IT infrastructure to the cloud. This shift from on-premise to cloud computing creates challenges for IT professionals, as it requires careful planning and execution. This article discusses the challenges and best practices of data migration when transferring on-premise data to the cloud. The article will also explore the role of data engineering in ensuring successful data transfer and integration and different approaches to data migration. Obstacles Data migration poses several obstacles that businesses must address to ensure a smooth transition to the cloud. Some of the significant challenges of data migration include: Data Compatibility Compatibility is the primary challenge of data migration. It is essential to ensure that the data is compatible with the cloud platform before migrating it. It is crucial to test data compatibility before migration, as data loss and corruption can occur if the data is not compatible with the cloud platform. Security and Privacy Security and privacy are significant concerns for businesses when migrating data to the cloud. It is crucial to ensure that data is secure during migration, as it can lead to data breaches and loss of sensitive data. Data Integrity Data integrity is another challenge of data migration. It is crucial to ensure that the data remains consistent and accurate during migration. Downtime Downtime is another challenge of data migration. Therefore, it is essential to ensure that the migration process does not cause any downtime or interruptions to business operations. Cloud Scaling Cloud scaling options are an essential aspect of data migration. Cloud scalability is the ability of a cloud platform to scale up or down depending on the workload. The cloud platform should be able to handle the increased workload during the migration process. It should also be scalable to handle future workload increases. There are two types of cloud scalability options: Vertical Scaling: Vertical scaling is the process of adding more resources to a single instance. This method is suitable for workloads that require more processing power, memory, or storage. Horizontal Scaling: Horizontal scaling is the process of adding more instances to handle the workload. This method is suitable for workloads that require additional resources to handle traffic spikes. Cloud Hardware Upgrade Cloud hardware upgrade is critical to data migration. The cloud hardware should be up-to-date to handle the workload during the migration process. Therefore, it is essential to ensure that the cloud hardware is capable of handling the workload and that the hardware is compatible with the cloud platform. The next-generation and upgradation of cloud hardware involve upgrading the hardware to the latest technology. It is essential to ensure that the cloud hardware is scalable and can handle the workload. Traditional Methodology The traditional methodology for data migration involves copying data from the on-premise storage to the cloud. This method involves a large amount of data transfer, which can lead to data loss and corruption. The classical approach can also cause downtime and interruptions to business operations. Adaptability Adaptability is another important aspect of data migration. Elasticity is the ability of the cloud platform to scale up or down depending on the workload. The cloud platform should be elastic to handle the increased workload during the migration process. The cloud platform should also be elastic to handle future workload increases. Add-Ons The cloud platform should have additional features to support data migration, such as data backup and recovery, data migration tools, and data monitoring tools. These features ensure that the data is backed up, can be recovered in case of data loss or corruption, and the migration process runs smoothly. IT Support Services IT support services are crucial to the success of data migration. IT organizations should have the necessary expertise to plan and execute the migration process. They should also be able to provide support during the migration process to minimize downtime and interruptions to business operations. Summary To summarize, data migration is a complex process that requires careful planning and execution to avoid data loss, corruption, downtime, and interruptions to business operations. To mitigate these challenges, businesses need to consider cloud scalability options, upgrade cloud hardware, leverage elasticity, and use additional features to support data migration. IT organizations should also be involved in the process to ensure a successful transition to the cloud. Furthermore, businesses should consider alternative approaches to data migration, such as using migration tools that are designed to automate the migration process and reduce the risk of data loss and corruption. These tools can help to ensure a smoother transition to the cloud. Ultimately, businesses should approach data migration with caution and seek expert advice to ensure a successful migration process. With careful planning, execution, and the right support, businesses can achieve a smooth transition from on-premise storage to the cloud and enjoy the benefits of cloud computing, such as increased flexibility, scalability, and cost savings.
Along with the extensively discussed technological trending topics like AI, hyper-automation, blockchain, edge computing, and hyper-automation, Cloud computing is the central component in the upcoming years of various firms' IT strategies. These days the benefits of flexibility, alertness, fast speed, and cost efficiency have become essential for various CIOs. A few businesses are currently working on refining their overall IT cloud strategy. They take fundamental considerations such as which plan of action to opt Whether they should go for public, private, or a mixture of both. Others have progressed even further. They are working with full efforts to modify their applications. Moreover, they are taking advantage of different PaaS capabilities provided by the cloud to maximize benefits. Challenges Faced by Cloud Computing Such firms can also overcome the essential issues of Cloud computing, such as safety, data coherence, flexibility, and functional coherence, by focusing on the main elements of the cloud: simplifying the cloud performance. The frequent question in the area of cloud performance engineering is which execution can be achieved by the relocated and modified system in comparison to a pure on-site landscape. Is it going to be less, similar, or even higher and better performance? Cloud Scalability Options Many experts claim that in dynamic scalability possibilities in the cloud, it is simple to grow the system linearly just by amplifying the number of machines. It is unquestionably the initial Step that should be observed. Same as on-site systems, the vertical scalability capabilities are first employed traditional hardware capacities like CPUs and Read-Only-Memory raised. However, larger firms' IT systems with high output, access rates, and peak loads are reaching the breaking point. When ambitious expansion strategies combined with disorganized application might result in IT needs that exceed Moore's Law. Thus, requisite hardware is not yet accessible. Next-Generation and Upgradation of Cloud Hardware On one side, CIOs can aspire that the upcoming generation of hardware is ready to enter the market and can be provided to its users soon. On the other side, the subject of horizontal scaling has also achieved a lot of traction. Different from increasing servers for similar sections of the application. In many situations, this needs substantial changes in the application itself, like on-site systems. In particular, databases need an elaborated concept permitting the data to persist autonomously across many servers. In this situation, there might be an alternative for applications. That is an increasing number of read-only transactions. To gain execution goals in the absence of "real" horizontal scaling, implementing such PaaS offerings can be a solution. Such as, Microsoft provides the so-called Hyper-scale service for SQL databases, which dynamically scales the computing power through caching techniques and divides it horizontally to read replicas used as images of the database. AWS Cloud also provides read copies for RDS MySQL, PostgreSQL, MariaDB, Amazon Aurora, and Oracle Cloud. They depend on their popular Oracle RAC. Classical Approach There are other possibilities beyond vertical and horizontal scalability offered by Cloud performance engineering. There are still many well-known options available in the cloud compared to the availability on-premise. The most common classic approach is to boost your indexes, which helps to determine I/O performance for over 80% of your performance activity. However, if any one indicator is missing, the performance of the entire IT system may suffer. As a result, cloud performance engineers should always prioritize database indexing. In addition, topics related to batch processing and session handling, the definition of maximum batch sizes, connection durations, read frequencies, idle times, and possible connection pooling of, for example, SSL connection, can be decisive for the performance of the system. Due to this, your interface partner's CPUs from being overloaded by opening a new connection for each HTTPS request. It says that it is desirable to reduce the number of requests to the database and actively apply caching mechanisms. Similarly, the number of instances, the number of threads, and the hardware itself can be varied until a self-defined level of perfection is reached. Elasticity In cloud computing, scalability is just one aspect of performance engineering. One of the features that the cloud promises are fully automated elasticity, allowing resources to be dynamically adjusted to meet every demand. The hurdle is that on-premises applications are usually designed with static environments in mind, so they need to respond to dynamic scaling first. As a result, it requires defining and testing different test scenarios for the cloud. Attention should be on the interaction between the cloud and the applications. An essential metric is how well the application responds to the dynamic scaling of the cloud, whether it doesn't lose connections or experience other unusual behavior, and whether it doesn't suffer from the usual performance degradation that occurs on a system. Additional Features Cloud service providers present numerous new possibilities to quickly create test environments and analyze and evaluate performance KPIs at runtime. The best way to cover planned testing concepts in the cloud is to combine existing testing tools with new testing options in the cloud. It is always preferable to consider a complete rebuild of old applications instead of heavily customizing an existing application. This approach works when various functional, non-functional, and technical requirements are not implemented in the current application. Support of IT Organizations IT organizations are playing a vital role here by supporting them in the best possible ways. They support all the activities and improve the performance of the cloud, which has been recognized and benefited by agile processes and architectures of modular containers and not only to them but also to the new concept that is time-to-market, for instance, CI/CD pipelines. Most of the time, it is beneficial to implement such ideas beforehand before opting for the cloud. Conclusion Lastly, even though shifting to the cloud offers multiple opportunities and benefits, cloud performance engineering is a challenge that needs to be defeated by approved and new methods. The feature of automatically usable scalability in the cloud has to be opposed by many large-scale companies. The budget and time frame plan customization needed during the implementation. It is because well-planned, high-level supervision is highly recommended to obtain the best possible reaction among the users and facilitate them in the best possible way. Apart from this, there are other activities of testing. Those are data integrity, security, and resilience. These are significant to provide exclusive performance to the world. A good connection between all the teams involved in this regard, like the CEO, CIO, architects, cloud experts, and performance engineering specialist, is essential to achieve the shift to the cloud and successfully convey this new topic, cloud performance engineering, to the world.
This blog post is for folks interested in learning how to use Golang and AWS Lambda to build a serverless solution. You will be using the aws-lambda-go library along with the AWS Go SDK v2 for an application that will process records from an Amazon Kinesis data stream and store them in a DynamoDB table. But that's not all! You will also use Go bindings for AWS CDK to implement "Infrastructure-as-code" for the entire solution and deploy it with the AWS CDK CLI. Introduction Amazon Kinesis is a platform for real-time data processing, ingestion, and analysis. Kinesis Data Streams is a serverless streaming data service (part of the Kinesis streaming data platform, along with Kinesis Data Firehose, Kinesis Video Streams, and Kinesis Data Analytics) that enables developers to collect, process, and analyze large amounts of data in real-time from various sources such as social media, IoT devices, logs, and more. AWS Lambda, on the other hand, is a serverless compute service that allows developers to run their code without having to manage the underlying infrastructure. The integration of Amazon Kinesis with AWS Lambda provides an efficient way to process and analyze large data streams in real time. A Kinesis data stream is a set of shards and each shard contains a sequence of data records. A Lambda function can act as a consumer application and process data from a Kinesis data stream. You can map a Lambda function to a shared-throughput consumer (standard iterator), or to a dedicated-throughput consumer with enhanced fan-out. For standard iterators, Lambda polls each shard in your Kinesis stream for records using HTTP protocol. The event source mapping shares read throughput with other consumers of the shard. Amazon Kinesis and AWS Lambda can be used together to build many solutions including real-time analytics (allowing businesses to make informed decisions), log processing (use logs to proactively identify and address issues in server/applications, etc. before they become critical), IoT data processing (analyze device data in real-time and trigger actions based on the results), clickstream analysis (provide insights into user behavior), fraud detection (detect and prevent fraudulent card transactions) and more. As always, the code is available on GitHub. Prerequisites Before you proceed, make sure you have the Go programming language (v1.18 or higher) and AWS CDK installed. Clone the GitHub repository and change to the right directory: git clone https://github.com/abhirockzz/kinesis-lambda-events-golang cd kinesis-lambda-events-golang Use AWS CDK To Deploy the Solution To start the deployment, simply invoke cdk deploy and wait for a bit. You will see a list of resources that will be created and will need to provide your confirmation to proceed. cd cdk cdk deploy # output Bundling asset KinesisLambdaGolangStack/kinesis-function/Code/Stage... ✨ Synthesis time: 5.94s This deployment will make potentially sensitive changes according to your current security approval level (--require-approval broadening). Please confirm you intend to make the following modifications: //.... omitted Do you wish to deploy these changes (y/n)? y This will start creating the AWS resources required for our application. If you want to see the AWS CloudFormation template which will be used behind the scenes, run cdk synth and check the cdk.out folder. You can keep track of the progress in the terminal or navigate to the AWS console: CloudFormation > Stacks > KinesisLambdaGolangStack. Once all the resources are created, you can try out the application. You should have: A Lambda function A Kinesis stream A DynamoDB table Along with a few other components (like IAM roles, etc.) Verify the Solution You can check the table and Kinesis stream info in the stack output (in the terminal or the Outputs tab in the AWS CloudFormation console for your Stack): Publish a few messages to the Kinesis stream. For the purposes of this demo, you can use the AWS CLI: export KINESIS_STREAM=<enter the Kinesis stream name from cloudformation output> aws kinesis put-record --stream-name $KINESIS_STREAM --partition-key user1@foo.com --data $(echo -n '{"name":"user1", "city":"seattle"}' | base64) aws kinesis put-record --stream-name $KINESIS_STREAM --partition-key user2@foo.com --data $(echo -n '{"name":"user2", "city":"new delhi"}' | base64) aws kinesis put-record --stream-name $KINESIS_STREAM --partition-key user3@foo.com --data $(echo -n '{"name":"user3", "city":"new york"}' | base64) Check the DynamoDB table to confirm that the file metadata has been stored. You can use the AWS console or the AWS CLI aws dynamodb scan --table-name <enter the table name from cloudformation output>. Don’t Forget To Clean Up Once you're done, to delete all the services, simply use: cdk destroy #output prompt (choose 'y' to continue) Are you sure you want to delete: KinesisLambdaGolangStack (y/n)? You were able to set up and try the complete solution. Before we wrap up, let's quickly walk through some of the important parts of the code to get a better understanding of what's going the behind the scenes. Code Walkthrough Some of the code (error handling, logging, etc.) has been omitted for brevity since we only want to focus on the important parts. AWS CDK You can refer to the CDK code here. We start by creating the DynamoDB table: table := awsdynamodb.NewTable(stack, jsii.String("dynamodb-table"), &awsdynamodb.TableProps{ PartitionKey: &awsdynamodb.Attribute{ Name: jsii.String("email"), Type: awsdynamodb.AttributeType_STRING}, }) table.ApplyRemovalPolicy(awscdk.RemovalPolicy_DESTROY) We create the Lambda function (CDK will take care of building and deploying the function) and make sure we provide it with appropriate permissions to write to the DynamoDB table. function := awscdklambdagoalpha.NewGoFunction(stack, jsii.String("kinesis-function"), &awscdklambdagoalpha.GoFunctionProps{ Runtime: awslambda.Runtime_GO_1_X(), Environment: &map[string]*string{"TABLE_NAME": table.TableName()}, Entry: jsii.String(functionDir), }) table.GrantWriteData(function) Then, we create the Kinesis stream and add that as an event source to the Lambda function. kinesisStream := awskinesis.NewStream(stack, jsii.String("lambda-test-stream"), nil) function.AddEventSource(awslambdaeventsources.NewKinesisEventSource(kinesisStream, &awslambdaeventsources.KinesisEventSourceProps{ StartingPosition: awslambda.StartingPosition_LATEST, })) Finally, we export the Kinesis stream and DynamoDB table name as CloudFormation outputs. awscdk.NewCfnOutput(stack, jsii.String("kinesis-stream-name"), &awscdk.CfnOutputProps{ ExportName: jsii.String("kinesis-stream-name"), Value: kinesisStream.StreamName()}) awscdk.NewCfnOutput(stack, jsii.String("dynamodb-table-name"), &awscdk.CfnOutputProps{ ExportName: jsii.String("dynamodb-table-name"), Value: table.TableName()}) Lambda Function You can refer to the Lambda Function code here. The Lambda function handler iterates over each record in the Kinesis stream, and for each of them: Unmarshals the JSON payload in the Kinesis stream into a Go struct Stores the stream data partition key as the primary key attribute (email) of the DynamoDB table The rest of the information is picked up from the stream data and also stored in the table. func handler(ctx context.Context, kinesisEvent events.KinesisEvent) error { for _, record := range kinesisEvent.Records { data := record.Kinesis.Data var user CreateUserInfo err := json.Unmarshal(data, &user) item, err := attributevalue.MarshalMap(user) if err != nil { return err } item["email"] = &types.AttributeValueMemberS{Value: record.Kinesis.PartitionKey} _, err = client.PutItem(context.Background(), &dynamodb.PutItemInput{ TableName: aws.String(table), Item: item, }) } return nil } type CreateUserInfo struct { Name string `json:"name"` City string `json:"city"` } Wrap Up In this blog, you saw an example of how to use Lambda to process messages in a Kinesis stream and store them in DynamoDB, thanks to the Kinesis and Lamdba integration. The entire infrastructure life-cycle was automated using AWS CDK. All this was done using the Go programming language, which is well-supported in DynamoDB, AWS Lambda, and AWS CDK. Happy building!
Failover is an important feature of systems that rely on near-constant availability. In Hazelcast, a failover client automatically redirects its traffic to a secondary cluster when the client cannot connect to the primary cluster. Consider using a failover client with WAN replication as part of your disaster recovery strategy. In this tutorial, you’ll update the code in a Java client to automatically connect to a secondary, failover cluster if it cannot connect to its original, primary cluster. You’ll also run a simple test to make sure that your configuration is correct and then adjust it to include exception handling. You'll learn how to collect all the resources that you need to create a failover client for a primary and secondary cluster, create a failover client based on the sample Java client, test failover and add exception handling for operations. Step 1: Set Up Clusters and Clients Create two Viridian Serverless clusters that you’ll use as your primary and secondary clusters and then download and connect sample Java clients to them. Create the Viridian Serverless cluster that you’ll use as your primary cluster. When the cluster is ready to use, the Quick Connection Guide is displayed. Select the Java icon and follow the on-screen instructions to download, extract, and connect the preconfigured Java client to your primary cluster. Create the Viridian Serverless cluster that you’ll use as your secondary cluster. Follow the instructions in the Quick Connection Guide to download, extract, and connect the preconfigured Java client to your secondary cluster. You now have two running clusters, and you’ve checked that both Java clients can connect. Step 2: Configure a Failover Client To create a failover client, update the configuration and code of the Java client for your primary cluster. Start by adding the keystore files from the Java client of your secondary cluster. Go to the directory where you extracted the Java client for your secondary cluster and then navigate to src/main/resources. Rename the client.keystore file to client2.keystore and rename the client.truststore file to client2.truststore to avoid overwriting the files in your primary cluster keystore. Copy both files over to the src/main/resources directory of your primary cluster. Update the code in the Java client (ClientwithSsl.java) of your primary cluster to include a failover class and the connection details for your secondary cluster. You can find these connection details in the Java client of your secondary cluster. Go to the directory where you extracted the Java client for your primary cluster and then navigate to src/main/java/com/hazelcast/cloud/. Open the Java client (ClientwithSsl.java) and make the following updates. An example failover client is also available for download. Step 3: Verify Failover Check that your failover client automatically connects to the secondary cluster when your primary cluster is stopped. Make sure that both Viridian Serverless clusters are running. Connect your failover client to the primary cluster in the same way as you did in Step 1. Stop your primary cluster. From the dashboard of your primary cluster, in Cluster Details, select Pause. In the console, you’ll see the following messages in order as the client disconnects from your primary cluster and reconnects to the secondary cluster: CLIENT_DISCONNECTED CLIENT_CONNECTED CLIENT_CHANGED_CLUSTER If you’re using the nonStopMapExample in the sample Java client, your client stops. This is expected because write operations are not retryable when a cluster is disconnected. The client has sent a put request to the cluster but has not received a response, so the result of the request is unknown. To prevent the client from overwriting more recent write operations, this write operation is stopped and an exception is thrown. Step 4: Exception Handling Update the nonStopMapExample() function in your failover client to trap the exception that is thrown when the primary cluster disconnects. Add the following try-catch block to the while loop in the nonStopMapExample() function. This code replaces the original map.put() function.try { map.put("key-" + randomKey, "value-" + randomKey); } catch (Exception e) { // Captures exception from disconnected client e.printStackTrace(); } 2. Verify your code again (repeat Step 3). This time the client continues to write map entries after it connects to the secondary cluster.
What Is the AWS Lambda Cold Start Problem? AWS Lambda is a serverless computing platform that enables developers to quickly build and deploy applications without having to manage any underlying infrastructure. However, this convenience comes with a downside—the AWS Lambda cold start problem. This problem can cause delays in response times for applications running on AWS Lambda due to its cold start problem, which can affect user experience and cost money for businesses running the application. In this article, I will discuss what causes the AWS Lambda cold start problem and how it can be addressed by using various techniques. What Causes the AWS Lambda Cold Start Problem? The AWS Lambda cold start problem is an issue that arises due to the initialization time of Lambda functions. It refers to the delay in response time when a user tries to invoke a Lambda function for the first time. This delay is caused by the container bootstrapping process, which takes place when a function is invoked for the first time. The longer this process takes, the more pronounced the cold start problem becomes, leading to longer response times and degraded user experience. How To Mitigate the Cold Start Problem AWS Lambda functions are a great way to scale your applications and save costs, but they can suffer from the “cold start” problem. This is where the function takes longer to respond when it has not been recently used. Fortunately, there are ways to mitigate this issue, such as pre-warming strategies for AWS Lambda functions. Pre-warming strategies help ensure that your Lambda functions are always ready and responsive by running them periodically in advance of when they are needed. Additionally, you can also warm up your Lambda functions manually by using the AWS Console or API calls. By taking these steps, you can ensure your applications will be able to respond quickly and reliably without any issues caused by cold starts. In the following sections, I’ll discuss two possible ways to avoid the cold start problem: 1. Lambda Provisioned Concurrency Lambda provisioned concurrency is a feature that allows developers to launch and initialize execution environments for Lambda functions. In other words, this facilitates the creation of pre-warmed Lambdas waiting to serve incoming requests. As this is pre-provisioned, the configured number of provisioned environments will be up and running all the time even if there are no requests to cater to. Therefore, this contradicts the very essence of serverless environments. Also, since environments are provisioned upfront, this feature is not free and comes with a considerable price. I created a simple Lambda function (details in the next section) and tried to configure provisioned concurrency to check the price—following is the screenshot. However, if there are strict performance requirements and cold starts are show stoppers then certainly, provisioned concurrency is a fantastic way of getting over the problem. 2. SnapStart The next thing I’ll discuss that can be a potential game changer is SnapStart. Amazon has released a new feature called Lambda SnapStart at re:invent 2022 to help mitigate the problem of cold start. With SnapStart, Lambda initializes your function when you publish a function version. Lambda takes a Firecracker micro VM snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and caches it for low-latency access. When you invoke the function version for the first time, and as the invocations scale up, Lambda resumes new execution environments from the cached snapshot instead of initializing them from scratch, improving startup latency. The best part is that, unlike provisioned concurrency, there is no additional cost for SnapStart. SnapStart is currently only available for Java 11 (Corretto) runtime. To test SnapStart and to see if it’s really worth, I created a simple Lambda function. I used the Spring Cloud function and did not try to create a thin jar. I wanted the package to be bulky so that I can see what SnapStart does. Following is the function code: Java public class ListObject implements Function<String, String> { @Override public String apply(String bucketName) { System.out.format("Objects in S3 bucket %s:\n", bucketName); final AmazonS3 s3 = AmazonS3ClientBuilder.standard().withRegion(Regions.DEFAULT_REGION).build(); ListObjectsV2Result result = s3.listObjectsV2(bucketName); List<S3ObjectSummary> objects = result.getObjectSummaries(); for (S3ObjectSummary os : objects) { System.out.println("* " + os.getKey()); } return bucketName; } } At first, I uploaded the code using the AWS management console and tested it. I used a bucket, which is full of objects. Following is the screenshot of the execution summary. Note: the time required to initialize the environment was 3980.17 ms. That right there is the cold start time and this was somewhat expected as I’m working with a bulky jar file. Subsequently, I turned on SnapStart from the “AWS console”—> “Configuration”—> “General Configuration”—> “Edit.” After a while, I executed the function again and following is the screenshot of the execution summary. Note: here, “Init duration” has been replaced by “Restore duration” as with SnapStart, Lambda will restore the image snapshot of the execution environment. Also, it’s worth noting that the time consumed for restoration is 408.34 ms, which is significantly lower than the initialization duration. The first impression about SnapStart is that it is definitely promising and exciting. Let’s see what Amazon does with it in the coming days. In addition, Amazon announced, at re:invent 2022, that they are working on further advancements in the field of serverless computing, which could potentially eliminate the cold start issue altogether. By using Lambda SnapStart and keeping an eye out for future developments from AWS, developers can ensure their serverless applications are running smoothly and efficiently. Best Strategies to Optimize Your Serverless Applications on AWS Lambda Serverless applications on AWS Lambda have become increasingly popular as they offer a great way to reduce cost and complexity. However, one of the biggest challenges with serverless architectures is the AWS Lambda cold start issue. This issue can cause latency issues and affect user experience. As a result, optimizing Lambda functions for performance can be a challenge. To ensure that your serverless applications run smoothly, there are some best practices you can use to optimize their performance on AWS Lambda. These strategies include reducing latency with warm Lambdas, optimizing memory usage of Lambda functions, and leveraging caching techniques to improve the response time of your application. Also, as discussed in the above sections, provisioned concurrency and SnapStart are great ways to mitigate the Lambda cold start issue. With these strategies in place, you can ensure your serverless applications run as efficiently as possible and deliver the best user experience.
Boris Zaikin
Senior Software Cloud Architect,
Nordcloud GmBH
Ranga Karanam
Best Selling Instructor on Udemy with 1 MILLION Students,
in28Minutes.com
Samir Behara
Senior Cloud Infrastructure Architect,
AWS
Pratik Prakash
Master Software Engineer (SDE-IV),
Capital One