Move data to Azure Archive Storage using PowerShell

Concept

You may have come across a term multi-tiered storage. This means, that the storage solution has multiple arrays, fast and expensive and slow but cheap. Files accessed very frequently are stored on very fast SSD disks, files accessed less frequently are stored on the much cheaper but much slower spinning disks. Files designated for long term archive are often stored on magnetic tapes. Most enterprise storage solutions can handle the first two tiers automatically and some can also offload to tapes on a schedule.

Azure offers similarly tiered storage: Hot, Cold and Archive. The Archive tier is very cheap (and yes, also very slow) storage designed for long term retention. Azure Storage Account allows changing the tier for existing, individual files. The current tier can be seen in the Azure Storage Explorer:

Azure Storage Explorer
Azure Storage Explore AccessTier: Archive
Azure Storage Explorer
Azure Storage Explorer AccessTier: Cool

The idea is that we can move existing files, in the existing Storage Account to the archive tier, or upload files as we normally would to an existing account and can then mark them for archiving. This process is completely manual but with the help of PowerShell can be easily automated.

Microsoft does not reveal how the archive storage works and what exactly sure happens behind the scenes, whether files are physically moved to tapes or just coped to slower and bigger storage arrays. The speed of retrieval operations would suggest a mechanical operations are involved i.e. tapes. If you know the internals of the Archive Storage, please share it in comments.

The archiving process can take anything between a couple of minutes to a few hours, depending on the file size and how busy the archive storage is.

As of January 2020, according to the official Azure pricing, the archive storage is 90% cheaper than the COOL tier, 95% cheaper than the HOT tier and about 99.4 % cheaper than the PREMIUM storage.

PREMIUMHOTCOOLARCHIVE
First 50 terabyte (TB) / month$0.15 per GB$0.0184 per GB$0.01 per GB$0.00099 per GB
Next 450 TB / Month$0.15 per GB$0.0177 per GB$0.01 per GB$0.00099 per GB
Over 500 TB / Month$0.15 per GB$0.0170 per GB$0.01 per GB$0.00099 per GB

When should I use the Archive Tier?

Remember! The Archive tier is slow. It can take several hours to bring the files back and before you can use them.

Do not use Archive Storage for operational (most recent) backups as you may not be able to retrieve the backup on time. Make sure you are complying with your Recovery Time Objective (RTO) requirements.

Regulatory requirements

If your company deals with finance data, you will have likely been audited at some point in your career and you may be familiar with regulatory compliance. Whether it is SoX or any other requirements, you may be required to store a snapshot of the financial data at the end of the financial year for a number of years. Any future need for the historical data will likely not be urgent and should have enough.

Access Audits

Access auditing is another regulatory requirement that may not require frequent access but must be kept for a long time – just in case. We should, therefore, be in a position to offload either event logs or the SQL Server ERRORLOG to Azure Blob Storage

Data Migration Projects

Quite often, after the data migration project has concluded we may want to archive the old data or the entire system for future reference or, again, regulatory requirements.

Risks of the long term storage

We do not need backups, we need restores. In order to make sure we can restore from a backup, we should test it periodically as storage and thus files on the storage can become corrupted over time. Although this is a minimal risk, especially with Azure Storage as the Microsoft folks make sure the storage and data integrity is super-resilient, the corruption is still a possibility.

Ok, show me how to automate it

Everyone will have their own reason to use Azure Archive Storage so let’s focus on how we can “push” files to the slow and cheap tier.

First of all, login to your Azure Portal and run PowerShell CloudShell console:

Azure Portal PowerShell Console
Azure CloudShell running PowerShell

Once connected, execute the below script in the Storage Account of your choice. You will need the Storage Account Name and the KEY:

<#  exclude metadata file and only process blobs that are NOT already in the archive tier 
THIS MUST BE RUN FROM AZURE POWERSHELL CONSOLE #>

$StorageAccountName = "YOURSTORAGEACCOUNT"
$StorageAccountKey = "YOURSTORAGEACCOUNTKEY"

New-AzureStorageContext `
    -StorageAccountName $StorageAccountName `
    -StorageAccountKey $StorageAccountKey `
    | Get-AzureStorageBlob -Container "archive" `
    | ?{ $_.Name -NotLike "*metadata.tar.gz*" } `
    | ?{ $_.ICloudBlob.Properties.StandardBlobTier -ne "Archive" } `
    | %{ $_.ICloudBlob.setstandardblobtier("Archive") } 

First, set the $StorageAccountName:

Azure Portal PowerShell Console Setting Account Name

Secondly, set the $StorageAccountKey :

Azure Portal PowerShell Console Setting Account Key

Lastly, run the archive process. In this example, we are archiving all files in the “archive” container, that are not already in the archive tier, excluding files ending with *metadata.tar.gz

This approach will allow us to set a scheduled runbook that will archive any new files in the given container.

Conclusion

Archive Storage, whether Azure or the AWS equivalent – Glacier is great for long term storage. However, we have to make sure the dehydration process will be fast enough to comply with our RTO policies. In most cases, it may not be good enough for operational backups where we may have a few minutes to a couple of hours to restore the database from the backup.

This post was originally published on April 5, 2020.

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x