Backup at Scale – Part 2 – MapReduce for Backup

Stretch the analogy

mapreduce-logo[1] So mentioning MapReduce in connection with backup will probably get lots of funky agile programmers rolling their eyes at me but hey! I am a simple guy and saw an analogy that might work… let’s see.

So previously we found that in order to backup at scale we need to automate the living daylights out of our backup processes.  This can be done by using off the shelf products like EMC Avamar integrated with vCloud Director or by bespoking your backup environment yourself (this is really only for a few huge Google scale environments).

Distribute your load

So onwards with the shoehorned MapReduce analogy;  my simple minded view of the MapReduce process is as follows:

My Simple View

In order to backup at scale we really need to do the same type of distribution of the workload and then collection of the results.  So a backup system built around the MapReduce architecture would exhibit this type of workflow:  

Backup MapReduce

In traditional backup architectures you would have to roll out backup clients to all these application or file servers in order to get them to do a backup.  This locks the backup servers into doing a load of IO and encapsulating all the backups into a proprietary backup format which is not massively scalable (big, but not huge). 

However the more modern scalable approach is to integrate with the backup function supplied by the application, get this function to write the data to some protection storage (a deduplication appliance for instance) and then to report back to a central catalog that the backup is done.  This way you can more easily scale your backup catalog because that server isn’t bogged down with the workload of actually moving the data around.  So schematically the architecture would look like this:

 mapreduce backup architecture Summary

To backup at scale, take the IO workload away from the backup server, distribute it throughout the enterprise, using the resource on the application servers.  Send the backups directly to the protection storage in the application native format to make it simple for recoveries.  Create a central backup authority for maintaining a backup catalog, enforcing the backup policies, collecting alerts and providing operational and chargeback reports.

Summary of the two articles on how to backup at scale – Automate and Distribute, simples…

And if this looks a bit like the EMC data protection vision it is completely coincidental! … honest 😉


Backup at Scale – Part 1 – Linear is badness

linearIn a few technologies recently we see that, by design, performance grows linearly as building blocks are added.  In clustered systems a building block will include CPU, Memory and disk resulting in linear growth of compute performance and capacity.  In the backup world linear just doesn’t cut the mustard.

Who cuts mustard anyway?

Don’t get sidetracked with silly questions like that, use Google!  What I am trying to say is that for backup systems there is a requirement for the “work done to achieve backups” to grow significantly slower than the growth of data to protect.

Imagine a world where 1TB of protected data requires 10% of a building block of “work done”.  Where “work done” is a combination of admin time, compute, backup storage etc.  If our backup processes and technologies required a linear growth of work done then much badness occurs.  Diagrammatically…


No one would ever get to the situation described in the diagram above as they would soon realise that “this just ain’t workin’” and rethink their systems.  However the question is what should the “work done” growth look like?  It needs to be a shallower growth curve than that of the data protected and needs to slow as the capacities increase.  So we can imagine that we would want to achieve something like this:

slow growth

But how… How… HOW!?!

A number of methodologies can be employed to work towards this goal.  The first and most obvious step is to A-U-T-O-M-A-T-E (sounds better if you say it in a robotty way).

Phase 1 -Take the drudge processes (and believe me there are plenty) and automate them:

  1. Checking backup logs for failures
  2. Restarting backups that have failed
  3. Generating reports

Phase 2 – Take some of the more difficult but boring jobs and automate them too!

  1. Restore testing
  2. New backup client requests
  3. Restore requests

If your environment is at Google scale you may want to automate crazy things like purchasing, receipt and labelling of new backup media.  This is an extreme case but you get the principle, break down the tasks done in the backup process and see what you can get machines to do better and more accurately than humans.

There are plenty of people that have already done all this and many products to look at for help. Start Googling…

Is that it? – No, we will return with other methods to help backup at scale


Beginners Guide to Data Protection Strategy – Collectors Edition



Last summer while recovering from a knee op, I unburdened myself in a series of blog posts about the basics of a data protection strategy.  Lucky for you! the box set has just been released.  Here it is!


Beginners Guide to Data Protection Strategy – Part 1

Beginners Guide to Data Protection Strategy – Part 2

Beginners Guide to Data Protection Strategy – Part 3

Beginners Guide to Data Protection Strategy – Part 4

Beginners Guide to Data Protection Strategy – Part 5

Beginners Guide to Data Protection Strategy – The End

When I say box set, it is basically a lazy blog post, stop judging me …


Networker 8.1 – The New NMDA 1.5 Oracle backup

Whats new?!

Networker 8.1 was included in the July product update announcements a few weeks ago, and lots of lovely new shiny features were mentioned.  One was the additional method of doing Oracle backups.  This article looks at this new feature of the Networker Module for DB and Application (NMDA) version 1.5.

First let’s consider the usual ways that you can use to protect Oracle:

  1. Dump and sweep – the Oracle admin dumps his backups to a Flash Recovery Area (FRA), then a completely disconnected backup job sweeps the files off to the backup server.  The backup server doesn’t know it is a DB backup and the recovery is in two steps.
  2. DB Backup Module – this is where the Oracle backup is directed using a DB module for a backup application.  The data streams straight out of the Oracle server to the backup server.  The potential disadvantage is that the Oracle admin feels he hasn’t got his FRA copy for fast recovery.
  3. Oracle direct to Data Domain – This is where the DD Boost agent is installed on an Oracle server and the RMAN script runs as usual, but the data is intercepted by the DD Boost agent, deduped and placed on a DD.  In these cases it is often advantageous to have the FRA on the DD too, something that Oracle admins are often nervous to do.

NMDA 1.5 – the new workflow…

The new feature we are discussing sees Networker adding a new workflow for Oracle backups.  It uses the Networker Module for DBs and Applications (NMDA 1.5) to monitor the FRA and if an RMAN backup has been written to the FRA it will initiate a Networker backup.  NMDA will perform the backup into Networker where it will be indexed as an Oracle DB backup.  Because the Networker backup uses RMAN SBT it will also be cataloged in Oracle.  This allows the DBA to perform his usual backups to the FRA and then automagically a Networker Oracle backup will run after the DBAs backup.  The DBA doesn’t need to know anything about Networker and the NMDA backup happens without his input.

Because the backup in Networker is cataloged in Oracle, it means that the DBA can access it for a one-step restore, rather than having to do the traditional file restore to the FRA then Oracle restore to the DB.


The way that NMDA 1.5 monitors the FRA to check for new RMAN backups is by using the probe-based backup feature.  The probe runs based on the interval that you set in Networker, when it runs, it checks a certain condition, this condition is described below:

“DBA_DISK_BACKUP=TRUE, if the probe finds any new Oracle disk backups and no Oracle disk backup is currently running for the database, the probe triggers an NMDA scheduled backup of the new disk backups.”

A picture speaks a thousand words

So in diagrammatic form let’s look at the workflow:

 NMDA 1.5


18:00 – Oracle RMAN backup is running

1a – Oracle DBA runs his usual RMAN backup to the FRA, he doesn’t need any other software or knowledge of Networker.

1b – NMDA probes the FRA while the RMAN backup is running, it finds a new backup but also finds a RMAN backup in progress.  NMDA backs away and waits for the next probe interval.


19:00 – FRA is all quiet…

2a – NMDA probes again and finds new backup data and no other activity on the FRA, it initiates a NMDA backup.

2b – NMDA performs an Oracle RMAN SBT backup of the contents of the FRA.

2c – Networker then updates the Oracle catalog to make it aware of the backup that just occurred.


So hope that was all clear… if not tough, I am not saying it again.