If you are Working on AWS you must be working with EC2. Creating and managing Your Appication On EC2 Would definitely require to have a backup strategy. Its quite annoying to take the backup of Selected Servers on Regular Basis or in some interval of time. Taking backup is Important but managing the retention time of the backup is also important from the cost prospective
In this blog, We will see how we can take the AMI Backup of an EC2 Instance with the user defined Retention period. We are going to use lambda function with python2.7 as a runtime to completely automate this process. We Will be covering the following,
- AMI Backup of an EC2 Instance
- Older Ami backup Deletion
- Scheduling the Backup Using Cloudwatch Events
- Pushing Execution Logs to Cloudwatch Logs
- Monitoring the lambda Function Using Cloudwatch metrics
The Logic Which we are going to use is very simple and clear,
- We Will Write 2 lambda Function Once for Taking backup and the other one if for deleting the Backup based on the defined retention period.
- Both lambda functions are scheduled to trigger at some time interval using the cloudwatch events.
- The AMI backup Script will Search for all the Instances With Specific tags and initiate a backup for it.
- Once the backup is done, It Will create a DeleteOn date to the AMI and snapshots depending upon the retention period you define.
- Lambda Script for Ami deletion will check for the current date on Every AMI taken and If finds the same day date then it will delete those AMI’s or else ignore.
- This Process can take the backup of multiple EC2 Instances all together and will keep on deleting the Older AMI which no more need to be retained.
- Both the Lambda Function will be triggered using Cloudwatch Events at perticular time.
- We Can also use Cloudwatch for monitoring the lambda Function failure metrics and can be notified in case backup Script fails.
Let’s get Started,
Create a lambda IAM Role Which Allow lambda Fuction to Communicate with the Required resources. Please find the Sample IAM Role In DevOps Github Repository.
Note: You can modify the IAM policy to make it more specific in terms of permissions. For this Example I have assigned Full permission for EC2 and Cloudwatch Logs.
Now, Go to lambda and Click On Create Function, Next Go ahead and fill the Basic Information as Shown in the Screenshot below and Click Create Function.
Make Sure to choose the Runtime as python 2.7 and Also Select the lambda Role which you have created above.
Once lambda Function Created, grab the AMI backup Script and paste in there.
Note: We are taking the Amazon Account Number and Retention period from the Environment variable, So make Sure you pass in the Correct Environment Variable.
Now Verify Whether You have Selected the Correct Role or Not. Give Tag to the Function, Add Descriptions etc. Let memory be 128MB (default) and Increase the Timeout Period to say 2 min. (Depending Upton the number of Instances you have)
Now, Click Save
Most Importantly, Go to the EC2 and Create a Tag On All the Instances Which You wanted to Backup.
You can define any tag but you have to change the tag section in the code as well.
Now Create a test event and Click on Test to Run the lambda function
Also Check the Cloudwatch Logs for the Detailed Execution logs or in case of any errors. The Log will help you to understand what actually is happening.
Go to the AMI Section and Check Whether the AMI of all those instances which was tagged for backup happened or not. Also verify the tags on the Created AMI, It will have DeleteOn Tag based on the retention period You defined. Just for an Example, If the backup taken on 27th and the retention period is 2 days, then the DeleteOn Tag will be 29th. So this AMI Will be deleted by the AMIDeletion Script on the Same date Which Will keep on maintaining you 2 days of latest backup.
Similarly, create another lambda function with same runtime and configure the same way and name it as say AMIDeletion. The Script for the AMIDeletion you can find it from the Official Github Repository of DevOpsAGE.
Note: Make sure to change the region name at line number 15 of the script.
Triggering lambda Function using Cloudwatch Events
Now, as we have created a lambda function, It need to be triggered for the execution. We can use the Cloudwatch Events to schedule a trigger.
Go to the Cloudwatch and select Events and Click on Get Started. Select the Cron Expression and Schedule to run on Specific time. Target Will be the lambda Functions which we have created.
Click Configure Details.
Give name to the Event and you are all Set. This Event will keeps on triggering lambda functions on time and Instance backup will be automatically taken and rotated.
Note: Cron Expression uses the GMT Time Zone. i.e 8:30 GMT = 4:30 EST
Monitoring Lambda Functions using Cloudwatch Metrics.
Go to the Cloudwatch, Click On Create Alarm and Select Lambda Metrics
Make Sure to select the metrics by function name, then Select the function with metrics name as Errors.
Now Set the Condition like,
if Error is >= 1
Under Additional Configuration,
Treat missing data as good (Not breaching the threshold)
Click next, Select the SNS topic and Give Proper name to the Alarm, Verify the Setup and Click Create Alarm.
That’s it, Now if your lambda function fails then you will be notified based on the SNS Topic and you can See the Cloudwatch logs for troubleshooting.
Note: Please make sure you test this setup in the Dev Environment before implementing dirctly in the production Environment.
If you Like Our Content here at Devopsage, then please support us by sharing this post.
Also, Please comment on the post with your views and let us know if any changes need to be done.