Menu Close

How to save costs when working with Amazon Redshift?

¿Cómo se pueden ahorrar costos al trabajar con Amazon Redshift?

Among the several factors that may affect the Amazon Redshift billing in an AWS account, it is worth of considering:

A. Regions:

Depending on the region, the same type of node may significatively change of price (price per hour, in American Dollars):

Amazon Redshift: How can you save costs when working with this platform?

An inadequate region may create an unnecessary overcost in the Redshift cluster.

 

B. Types of Nodes
When creating a cluster for Redshift, it is important to choose the correct type of node to use (prices for October 4th, 2.018):

Amazon Redshift: How can you save costs when working with this platform?

 

Redshift has 2 types of nodes (https://aws.amazon.com/es/redshift/pricing):

  1. Dense Compute (dcX.XXXX): from 30% to 60% cheaper than Dense Storage, optimized for faster queries, and generally recommended for data sets not larger than 500GB.
  2. Dense Storage (dsX.XXXX): more expensive than Dense Compute, but optimized to store large data sets, it is usually recommended for data sets larger than 500GB.

Amazon Redshift: How can you save costs when working with this platform?

 

C. Snapshots
Let’s suppose we have a dc2.large cluster working all the time in the N. Virginia region (cost per hour = $0.25). In an ordinary month (30 days) there are 720 hours, which would result in a US$180 monthly billing. But, what if we could have this same cluster working ONLY during the working hours (8 hours per day, 5 days per week, 4 weeks per month)?:

Amazon Redshift: How can you save costs when working with this platform?

It would represent a 78% savings!

 

However, Redshift doesn’t allow to stop and resume a cluster. The alternative process could be, when finishing every working day:

  1. Create a snapshot of the cluster
  2. Delete the cluster
  3. And before starting every working day, create a cluster with the snapshot generated.

 

The next command deletes a cluster, generating before a snapshot:

aws redshift delete-cluster –cluster-identifier motest –final-cluster-snapshot-identifier motest-daily-snapshot

 

While the snapshot is being generated:

Amazon Redshift: How can you save costs when working with this platform?

Amazon Redshift: How can you save costs when working with this platform?

 

The cluster will remain active, but once the snapshot is complete:

 

Amazon Redshift: How can you save costs when working with this platform?

 

The cluster deletion will begin:

 

Amazon Redshift: How can you save costs when working with this platform?

 

To restore the cluster, use this commnad:

aws redshift restore-from-cluster-snapshot –cluster-identifier motest –snapshot-identifier motest-daily-snapshot

 

This process can be monitored from AWS console:

 

Amazon Redshift: How can you save costs when working with this platform?

Amazon Redshift: How can you save costs when working with this platform?

 

Until the cluster is completely restored:

 

Amazon Redshift: How can you save costs when working with this platform?

 

These commands can be executed as administrative tasks, depending on the operating system, for Windows (through the “Task Scheduler”) or for Linux (using crontab).

For Windows, we can create a PowerShell script with a content similar to this one:

aws configure set AWS_ACCESS_KEY_ID xxxx
aws configure set AWS_SECRET_ACCESS_KEY yyyy
aws configure set default.region zzzz
aws redshift delete-cluster –cluster-identifier aaaa –final-cluster-snapshot-identifier bbbb

Where:

  • xxxx = access key ID
  • yyyy = secret access key
  • zzzz = región del cluster
  • aaaa = nombre del cluster
  • bbbb = nombre del snapshot

 

Both access key ID and secret access key correspond to a user with enough permissions to run the commands via AWS CLI:

 

Amazon Redshift: How can you save costs when working with this platform?

 

The rest of the process is similar to the creation of a normal task for Windows. However please keep in mind that the file .ps1 must be considered as argument of the task:

 

Amazon Redshift: How can you save costs when working with this platform?

 

And the program must be powershell.exe

The task to restore the cluster may be created by following a process similar to the previous one, but this time the content of the script must be:

aws configure set AWS_ACCESS_KEY_ID xxxx
aws configure set AWS_SECRET_ACCESS_KEY yyyy
aws configure set default.region zzzz
aws redshift restore-from-cluster-snapshot –cluster-identifier aaaa –snapshot-identifier bbbb

Do
{
$ClusterJSON = aws redshift describe-clusters –cluster-identifier aaaa | ConvertFrom-Json
Start-Sleep -s 30
} While ($ClusterJSON.Clusters.ClusterStatus –ne ‘available’)
aws redshift modify-cluster –cluster-identifier aaaa –vpc-security-group-ids ssss

 

Where:

  • xxxx = access key ID
  • yyyy = secret access key
  • zzzz = región del cluster
  • aaaa = nombre del cluster
  • bbbb = nombre del snapshot
  • ssss = Security Group (el ID, no el nombre: sg…..)

Amazon Redshift: How can you save costs when working with this platform?

 

It is important to note that when restoring a cluster from a snapshot, the resulting cluster will have the same configuration as the original cluster from which the snapshot was created, EXCEPT for the SecurityGroup. That’s why the last command should be update the resulting cluster to associate it with the proper SecurityGroup, but this change can only be applied when the cluster is already available.

 

Content generated by Morris & Opazo team

Posted in »Blog English