Is your cloud environment architected to meet your desired business and technical goals? Consider a formal evaluation of your cloud infrastructure with an AWS Well-Architected Review. Learn, measure, and build using architectural best practices to enhance and modernize your infrastructure. This assessment will help your business optimize and accelerate your AWS environment to meet your key business objectives. But what does the phrase “well-architected” mean?
What is an AWS Well-Architected Review (WAR)?
An AWS Well-Architected Review, or WAR, is a framework that was developed by AWS Cloud Architects to help create an efficient and effective infrastructure for applications being used in the AWS environment. The framework is now used globally by AWS Cloud Architect’s to help customers increase the value of their AWS platform for their specific business needs.
AWS Well-Architected Reviews are based on the following five key pillars:
These five key pillars are the foundation of your architecture. Just like buildings, when the foundation is not solid, structural problems can weaken the integrity of the building, leaving you at risk. Incorporating the pillars into your cloud architecture allows you to produce a stable and efficient foundation that can be easily built upon.
What is the Value of an AWS Well-Architected Review?
Conducting a Well Architected Review will help align your technology and business objectives. After this assessment, you will receive direct actionable solutions to strengthen your foundation. These recommendations are highly valuable and if chosen to proceed with the remediations, the benefits your company will experience are very clear. A WAR can provide value to your business in the following ways:
Cut down costs and maximize your company’s IT spend
Help leverage cloud technology to improve your cloud usage and modernize infrastructure
Address any concerns or questions surrounding security, reliability, and operations.
Receive help in navigating the many services provided by the AWS.
How can an AWS Well-Architected Review Support Your Business?
A WAR can teach you how to achieve your business outcomes while cost optimizing in four key ways:
Right sizing your resources so you only pay for what you use
Choosing the right pricing model to meet your cost targets
Meeting changes in demand with cloud elasticity
Measuring, monitoring, and improving your usage and spending to ensure you are taking the most cost-effective approaches
Why Choose Innovative for an AWS Well-Architected Review?
Just like AWS, we are customer-obsessed in everything we do. We want to help customers maximize their AWS platform to get the most out of it. Our experts provide an efficient process to help clients create a roadmap to improve their infrastructure. To help drive confidence in your cloud decisions, we are committed to showing you relentless support. As an AWS Advanced Consulting Partner, we can take your company to the next level through modernizing and transforming your business and technology. We will show you how to harness the power of AWS to experience full business potential.
What Should I Do Next?
There is no better time than now to schedule your AWS Well-Architected Review. Make sure your business is running efficiently in a cost-optimized environment and you are leveraging the right services to meet your key business objectives.
Infrastructure as Code (IaC) has revolutionized the way that infrastructure is provisioned. In short, IaC is defining your cloud infrastructure (Amazon VPC, subnet, Amazon EC2 instances, security groups, etc.) in a template file or in actual code.
By defining your infrastructure as code with a servicelike AWS CloudFormation you can easily build your entire infrastructure with the click of a button. Before cloud computing platforms, like AWS, the infrastructure team would need to manually spin up each server, configure their settings and services, and install any needed software and packages. This was a manual, time-consuming process with a high risk of human error. By using AWS CloudFormation and its associated helper scripts such as cfn-init and cfn-signal, you can install and configure software packages as the infrastructure is provisioned ensuring everything is built in the correct order.
AWS provides the Metadata section in AWS CloudFormation to define information that can be used to customize the setup of an instance. The AWS::CloudFormation::Init: section under Metadata helps us declare information that we need to help install and configure our instances. For example, we can automate the installation and configuration of a LAMP stack onto our Amazon EC2 instance. As seen below, we declare two configSets: Install and Configure. Under the Install configSet, we declare the packages that we want to install and the package manager we want to use to install them (yum in this case).
Further down in the Amazon EC2 resource definition, the UserData section is where we can define commands to run automatically on startup of an instance. In this case, we update the AWS CloudFormation bootstrap package and then run the cfn-init command, which looks at the AWS::CloudFormation::Init section where we defined the packages that we want to install. It passes in the name of the AWS CloudFormation stack, the name of the resource, the configSets that we want to run and the region as command line parameters.
After the cfn-init command, there is another AWS CloudFormation helper script command called cfn-signal. This command is receiving the output (success or failure) from the cfn-init command and signals to the CreationPolicy if the installation was successful. The timeout in the CreationPolicy section means that AWS CloudFormation will wait for five minutes for a success signal. If it doesn’t receive a signal in that time period, the AWS CloudFormation will stop the stack creation and mark it as “failed to create.”
Once you have defined your infrastructure in an AWS CloudFormation template, you can repeatably create environments anytime. Here at Innovative Solutions, we have a standard networking templates that can be used for any new projects. This removes human error involved with manually provisioning your infrastructure with each new project.
By default, an AWS CloudFormation stack allows update actions on all the underlying resources. To solve this, we can define a stack policy that will ensure that the resources in the AWS CloudFormation stack cannot be updated. There are also other tools such as Drift Detection to ensure no one is changing the underlying infrastructure. Ad hoc manual changes to the stack should never be permitted because this could result in a non-compliant environment. Especially for a production environment, all changes should be run through the AWS CloudFormation template via a stack update.
Another great part of having your infrastructure defined as code is you can check it into source control just as you would with any code. This allows you and your team to be able to see the history of templates and the various changes that happen over time. Also, this allows your team to collaborate on the development of templates.
Organizing and managing templates between teams
When starting out with AWS CloudFormation you will probably put all your resources in one template. However, as your infrastructure gets more complex, this will become unmanageable. For example, a company has three teams working on a given application: a network team, an application development team, and a security team. Each team will have multiple resources that they need to provision for the application. Let’s say the network team needs to make a change to the VPC resource they have defined in the AWS CloudFormation template. If the teams are sharing one template, this could cause confusion and unnecessary overlap. To solve this issue, the best practice is to create three separate templates, one for each of the teams. This way each team can manage their own template without needing to check with and coordinate with the other teams before making changes to their resources.
Certainly, there will be resources that will need to be shared and referenced between the three templates. To solve this, we can use cross-stack references, which allow resources to be exported from one template and imported into another. For example, if the security team needs to reference the VPC defined in the network stack, it can do so by importing the VPC resource (if the network stack exported that VPC resource).
In the network stack, we need to export the ProdVPC resource:
In the application stack, we import the VPC Id for use in defining the target group of our Elastic Load Balancer.
Another best practice is to use nested stacks to re-use templates that are commonly used. Let’s say you have a network stack that is used in all the applications you create. Instead of defining the same network stack in each application template, you can make the network stack its own template, host it in Amazon S3, and then when you require another stack, you can define a resource type AWS::CloudFormation::Stack and then point to the location of the network template.
Here is an example of what this looks like in a template:
We are defining a CloudFormation stack that is referencing a template file that is stored in S3.
How does Innovative Solutions leverage IaC?
Innovative Solutions has been leveraging AWS CloudFormation for years because of the many benefits it provides for our organization. We have developed many different templates for networking, security, and others, that have been incrementally improved through the years. Having mature AWS CloudFormation templates at our disposal makes it easy to build infrastructure quickly and reliably. This allows us to save time and focus on the actual workload.
Our templates are stored in source control so they can be easily updated as services evolve. All past versions are tracked and can be easily updated for collaboration. For each project, we are able leverage our AWS CloudFormation templates to easily deploy multiple identical stacks for each of our environments (dev, staging, production).
Cloud Development Kit
When you run your CDK app, an AWS CloudFormation template is synthesized (created). This doesn’t create any resources. The cdk deploy command actually creates the stack and the underlying resources.
Below is a sample Python CDK application that creates an SQS queue and an SNS topic. The queue is added as a subscription to the SNS topic so that it will receive messages when they are pushed to the SNS topic.
As seen above, this is simple, easy to understand code to write in Python. These lines of code create a CloudFormation template that is 150 lines long! The CDK provides an amazing level of abstraction that organizations can adopt quickly, if they haven’t already.
IaC has forever changed how we create virtual infrastructure. Once an organization learns how to leverage IaC, they will never go back to manually creating virtual servers and configuring all the settings and services associated with them. Not only is doing all this manual work extremely tedious, it also poses a high risk of human error because of the manual steps involved. With the development of the CDK, there is less of a barrier to entry leveraging IaC at your organization. You can find numerous sample templates online on the AWS website e. There is some up-front work involved with IaC, but once you are up and running you will appreciate the multitude of benefits that come with it.
Do you still have questions about Infrastructure as Code (IaC) ?
Feel free to contact us, we’d love the opportunity to further discuss anything you have read.
Gone are the days that monolithic applications run on a single server on-premise. The current landscape of cloud computing and microservices has made many aspects of computing easier but monitoring and logging is not one of them. Instead of having logs on one web server, they are now often highly distributed across many different systems.
With cloud computing and containerization becoming so popular, we often have many short-lived resources that are spun up for short periods of time and then destroyed after they are done serving their purpose. This makes logging even more challenging because you need to capture and store them elsewhere before they are destroyed. Not only are the logs distributed across various systems, but there are also an increasing number of logs.
Computing systems produce logs that can be used to give insight into the current system state. Most companies store log files because “that’s what you’re supposed to do”. However, the only time companies look at them is when a system fails, at which point they are already in panic mode trying to figure out what went wrong. Then they begin to look at the various systems logs and sort through them, trying to find the relevant ones. This becomes a nightmare when these logs are spread across multiple systems.
If you don’t have all your logs in a centralized place where you can efficiently sort and filter through them, you won’t be able to quickly troubleshoot problems. The information exists but because there is no easy way to decipher it; you’re essentially looking into a black box.
Innovative Solutions legacy process for monitoring and logging
In order to see logs that the web servers and databases were producing, we had to log into web servers and databases. This was very time-consuming and inefficient. We didn’t have a simple way to monitor our applications. We were reactive to issues reported by customers, rather than having a proactive approach to identify and act on issues before they happened.
Current use of monitoring and logging leveraging AWS
One logging and monitoring pattern that we leverage today is to aggregate Iog files and metrics in third-party cloud-native monitoring platforms such as Datadog. Logs are collected in Amazon CloudWatch and then pulled into Datadog via the Datadog AWS integration. Agents running on compute instances also push logs into Datadog. This allows us to monitor and analyze our production environment in near real-time.
The goal for a logging system is not to collect logs like trading cards; it’s to use them to carry out automated actions and achieve high system visibility. Amazon CloudWatch provides log storage in a centralized location, so you don’t need to go searching across various systems. However, just storing logs in a centralized location isn’t enough. Amazon CloudWatch provides services that allow you to monitor your systems and take action based on events and alarms. Amazon CloudWatch events and alarms integrate with many other AWS services such as Auto Scaling Groups, Amazon SNS, Amazon SQS, AWS CodePipeline, AWS Lambda, and many more.
Monitoring in Amazon CloudWatch
With Amazon CloudWatch you can track system metrics for your instances (e.g. CPU utilization) and have them display on a dashboard. This allows you to see the health of your application without needing to dig through thousands of log files. We can check our dashboards to make sure the operation of our systems is nominal. You can see an example dashboard below that shows the healthy host count, consumed RCUs and WCUs, and incoming log events. A simple dashboard like this can give you a quick idea of how your systems are performing with minimal effort.
Amazon CloudWatch Alarms
Amazon CloudWatch alarms enable us to scale up and down based on thresholds for metrics identified as application bottlenecks. In this example, I created an Amazon CloudWatch alarm that watches the average CPU Utilization metric for all the instances in an Auto Scaling Group. If the average CPU Utilization goes above 60%, an alarm will be triggered which will: send a message to an SNS topic and add an instance to the Auto Scaling Group.
This shows the creation of the Amazon CloudWatch alarm which sends a notification to the ‘Default_CloudWatch_Alarms_Topic’ SNS topic.
Then we added the auto scaling action so that our Auto Scaling Group adds one instance when the alarm (awsec2-devops-competency-CPU-Utilization) is triggered.
You can see the graph of the CPUUtilization with the red horizontal line representing the 60% CPUUtilization alarm. The blue line represents the CPUUtilization of the Amazon EC2 instances in the auto scaling group. When the alarm threshold is met the alarm is triggered and subsequent actions are run. In order to test triggering the alarm I logged into the Amazon EC2 instance and created and ran a Python script that simulates high CPU load.
As you can see below for this CloudWatch alarm, we have set up two actions: a message to an Amazon SNS topic “Default_CloudWatch_Alarms_Topic”, and an auto scaling action that will add one instance to the Auto Scaling Group that we have specified ‘devops-competency.’
After the alarm was triggered, the message was sent to the SNS topic which then sent an email to the subscribers of that topic. I set my email as a subscriber to the topic and received the following email:
Having the appropriate people notified of an alarm being triggered is nice, but you also want to pair that with an appropriate action automatically triggered such as scaling up your Auto Scaling Group. As seen in notification below, after the alarm was triggered the capacity of the Auto Scaling Group was increased from one to two instances.
Amazon CloudWatch provides many types of events. Some examples would be events created upon state transition in AWS CodePipeline.
When you create a pipeline via AWS CodePipeline you are given the option to use Amazon CloudWatch Events as a change detection option in the source stage. This means that when I push my code to the source repository, the pipeline automatically starts and runs through all the subsequent stages.
You can also configure Amazon CloudWatch events to watch your AWS CodePipeline and receive notifications based on state changes. I created a CloudWatch event rule that would detect when the AWS CodePipeline Execution State changed to a FAILED state.
After you specify an event source, you can then specify the target of what you want to happen after a rule is matched. In this case I set an Amazon SNS topic (called ‘codepipeline-failed’) as the target:
Then I created an AWS Lambda function that would send a notification to a specified Slack channel, saying that an AWS CodePipeline stage has failed. The members of the Slack channel will see this and can take the appropriate actions to figure out what went wrong.
I also set the AWS Lambda function as a subscriber to the Amazon SNS topic so that when a message is sent to the Amazon SNS topic the AWS Lambda function will be automatically triggered.
You can see the corresponding message in the Slack channel below:
By using Amazon CloudWatch events, we now have a better integrated CI/CD pipeline. If a developer pushes code to our AWS CodeCommit repository and any stage of the AWS CodePipeline fails, our team will be automatically notified in our Slack channel.
Do you still have questions about monitoring and logging?
Feel free to contact us, we’d love the opportunity to further discuss anything you have read.