Tip:
Highlight text to annotate it
X
bjbj Hi, my name is Eric Siebert. I ll be your instructor for this part of Backup Academy.
This lesson will be covering core technologies that are used for virtual machine backup.
Before we begin, a little bit about me. I m a 25 year IT industry veteran. I ve been
focusing on virtualization for the last five years. I m the author of two books on virtualization:
"Maximum vSphere" and "The VMware VI3 Implementation and Administration." I m the proprietor of
vSphere-land.com and VMware Information website. I m a regular contributor on many tech target
websites, including SearchServerVirtualization.com and SearchVMware.com. I presented at VMworld
twice, in 2008 and 2010, and I ve been recognized as a V-expert by VMware in 2009 and 2010.
Here s our agenda today. We re going to cover first the different methods that you re going
to use when you backup virtual environments. Next we re going to cover virtual machine
snapshots. We re going to cover disk-to-disk backup, which is a common method of backing
up data in a virtual environment. We re going to cover the Microsoft Volume Shadow Copy
Service, otherwise known as VSS, a key component of getting proper backups. We re going to
cover the vStorage APIs, which are some APIs that VMware introduced that are helpful for
storage and data protection related functions. Then we re going to cover data deduplication
and data compression. First we re going to give you just a general overview of virtual
environment backup methods. Virtualization technology is pretty unique stuff. When you
insert that virtualization layer between the hardware of a server and the operating system,
it gives you a lot more options and greater flexibility for doing backup and recovery
on your servers. The architecture of a virtual environment is pretty dramatically different.
The hypervisor controls all the hardware and the resources on the host. You can use traditional
methods to backup your virtual machine, like installing an agent inside of the operating
system. But if you do it that way, it can create bottlenecks. It ll cause performance
problems. Your backup just won t be efficient. It will take a lot longer to complete. So,
when you re in a virtual environment, you need to think outside of the box and use methods
that are developed specifically for backing up virtual machines. If you continue to use
those traditional methods, then you re not going to have the efficiencies and take advantage
of the virtualization architecture. So you want to stop using those and go to the methods
that are specifically developed for backing up virtual machines that not only increase
the efficiencies of the virtual machines, but also have minimal impact on the virtual
machines. We ll talk about that more in a little bit. When I talked about how you need
to do things differently in a virtual environment, the way that backups are done in virtual environments
is at the image level. The image level means that you re going in and you re backing up
the whole image of the virtual machine s disk file. You re not going through an agent anymore,
through the operating system, to backup data. Doing it this way is just a lot more efficient.
Instead of going through the guest OS, where you have another layer to go through, you
re going through the virtualization layer to get to the guest OS. Why do that? Why go
through that extra overhead of going through the guest OS, when you can just go to the
virtualization layer instead, because that s where the disk is anyway? So, like I said,
with image level backups, you re backing up the whole image of the virtual machine s disk
file. That VMDK file, that s the virtual disk of a VM, that resides on a host data store,
when a backup server goes to backup a VM, it goes to that file and backs it up raw and
doesn t go inside the file at all. This is done at the disk block level, instead of the
traditional methods, which are done at the file level, where you re actually going through
the operating system and backing up individual files. Image level backups will go at the
disk block level. They can t see inside the operating system, so they can t backup individual
files. So they re going at the VMDK, at the virtual disk, and backing up each individual
disk block. When you re doing it this way, you re going to create some inefficiencies
because you don t know what s inside that virtual disk. There may be tons of deleted
data, empty data, disk blocks that aren t used anymore. If you re going outside of it,
and aren t aware of what s inside the operating system, then how do you know what to back
up? The way the backup applications get around this, by doing image level backups, and become
more efficient, so they don t have to go inside the OS is they look at all the disk blocks
that come through. If they see empty disk blocks, they ll just ignore them. If you have
a virtual disk that only has, maybe it s a 20Gb virtual disk, but only 2Gb of space is
in use, the backup application s looking at each disk block when it s doing a full backup.
So it s going to look through each disk block and see all these empty ones, and it s just
going to discard them. I m not going to back those up, because they re empty. I m only
going to backup disk blocks that have data on them. You also need to track the blocks
that have changed since the last backup for incremental backups. Once you do a full backup,
you need to know, again, since you re not at the traditional file level, you don t have
archive bits that you can sit on files so you know when they re modified. The backup
application has to go through there and check each disk block to see, when it does an incremental
backup, have you changed since the last backup that I ve taken of you? There are also methods
that we ll talk about in a little bit, as well, that allow that backup application in
a virtual environment to track the blocks that have changed since the last backup, so
it can do efficient incremental backups. Now we re going to cover virtual machine snapshots,
which are the core technology that auto-backup applications use. It s not to be confused
with other types of snapshots, like storage array snapshots and operating system snapshots
that are taken inside the OS. A virtual machine snapshot is taken at the virtualization layer.
It s basically a point in time picture of a VM that preserves both the virtual disk
itself, and optionally, you can also preserve the system memory of a VM at a certain point
in time. You can take multiple snapshots, as well, if you want to have multiple pictures
of that VM, for having multiple recovery points. Backup applications, they take advantage of
these virtual machine snapshots, because when you take a snapshot, the VM disk files become
read only. No more writes can occur to that virtual disk. Now, of course, the operating
system isn t aware that this is happening, because it s done at the virtualization layer.
So the operating system s going to continue to try to write to that virtual disk file.
What happens is the hypervisor basically creates a new delta disk. That delta disk, all the
writes that are going to occur when the operating system tries to write to that virtual disk
file, it s going to get deflected from there and not be allowed to go to the original VMDK
file and will happen in that delta file instead. Each virtual disk of a virtual machine, if
a virtual machine has two disk files, each disk will have its own delta file. All those
changes that occur while the snapshot is active reside in that delta file. Once you delete
snapshots and you no longer need the snapshot, let s say that the backup is finished and
you don t need that snapshot anymore, what happens at that point is those delta files
then are merged back into the original VMDK file and then deleted. One by one those delta
files will all be merged back. Then the delta file will be deleted, and you ll only have
the original VM disk file, and it won t be read-only anymore. It will be unlocked and
become read-write. We talked about deleting snapshots and how that delta file is rolled
back into the original disk file. If you have multiple snapshots active, the process for
deleting those multiple snapshots has actually changed across the different vSphere versions.
With all versions, what happens is a special helper file, a helper snapshot file, another
delta file, is created. It s a little container to hold any disk writes that occur while the
snapshots are being written to the original disk. As each snapshot is being committed
back to the original disk, that helper file will contain all the writes that are occurring
while the whole process is running. Then, eventually, once all the delta files are all
written to the disk, that helper file will also be committed to the original disk as
well. That helper file usually isn t active for too long. It s only active while the snapshot
s being committed. So it usually doesn t get too large in size. It usually stays pretty
small. For old versions of vSphere, the way the process worked was newer snapshots were
copied to each of the older snapshots in order, and then finally the helper file was copied
to the original disk file. While this worked, it just wasn t as efficient, because what
happened was, because each snapshot data was copied to the next snapshot in succession,
before it gets back to the original, each snapshot would grow because it s taking the
data of the previous snapshot and putting it inside of it. What happens then is each
snapshot grows in size. So you need a lot more disk space on your data stores to be
able to delete those snapshots, because all of those snapshots are going to grow while
they re getting rolled in one to the other. So, while that worked, it wasn t as efficient.
We ll talk about how that was improved in later versions of vSphere. Starting in vSphere
4.1, and also in later 4.0 versions, the way that process worked was changed, because VMware
recognized that doing it the way they were doing it just wasn t as efficient. It took
up more disk space, which in some cases, if you re running low on disk space, it becomes
a problem because you don t have that extra disk space that you need to commit snapshots.
What they did was change the method that snapshots were merged back directly into the original
disk file. As you can see here on this slide, this disk file has three snapshots running.
What happens now when you merge those is the first snapshot is merged back into the original
disk. Once that completes, then the second snapshot is merged back into the original
disk. Finally, the third snapshot is merged back into the original disk. So they re not
rolled into each other anymore. They re rolled directly, all of them, directly back into
the original disk in turn, and once one is rolled back, the other one, and then the other.
Then finally, once that is all complete, the helper snapshot that s created while the snapshot
process is running is rolled back into the original disk file. Then all the delta files,
snapshot files are deleted. Now we re going to talk about virtual machine snapshot sizes.
When you first create a snapshot, that snapshot file starts out small, with an initial size
of 16Mb. Then it grows in 16Mb increments as writes are made by the operating system
to that virtual machine disk file. As each 16Mb allocation or increment gets filled up,
all the blocks get written to, it then extends that snapshot by another 16Mb, and will continue
to do that as more and more data is written to that snapshot. So a single snapshot file
can never exceed the size of the original disk file. Why that is, is because if a disk
block was written to once inside the snapshot file, if it s written to again, it doesn t
create another disk block for that particular block. What it does is update the existing
disk block in a delta file. As a result, if all of the blocks inside the virtual disk
got written to while that snapshot was active, it would equal the exact same size as the
original virtual disk. If you have a 20Gb virtual disk and every single block was written
to by that snapshot, by the operating system that went and wrote to every single disk block,
that snapshot would equal the size of the virtual disk and not exceed it. If you have
multiple snapshots running, the combined space that those take up could exceed the size of
the original disk file, because each individual snapshot could grow up to the size of the
original disk file. If you have multiple ones of those, well they could all get large enough
where if you combined and added up all the disk space of those, it could exceed the original
disk file length. When it comes to how these snapshots grow, and like I said, they grow
in 16Mb increments, the rate of growth will basically be determined by how much disk write
activity occurs on your server when the snapshot is taken. If you have a simple server, maybe
it s an application server or a web server, not a lot of disk IO activity is occurring,
maybe it s a lot of reads, but not a lot of writes. What happens then is that snapshot
would probably be pretty small, because while that snapshot s active, a lot of those disk
blocks aren t getting changed. Let s say we had that snapshot active while backup was
running for an hour. Since there s not a whole lot of activity, a lot of disk writes happening,
that snapshot may be, I d say no more than, it s going to vary based on every single VM,
but it will be under a Gb probably. Now, if you have something that has very heavy IO,
maybe it s a database server or an exchange server that you re backing up, well that snapshot
could get pretty big pretty quick, because there are a lot of disk writes happening on
that VM. As a result, during that short time period, you could have a lot of IO activity
that would make that snapshot grow pretty quick. That s why it s important, if you have
a snapshot active on a VM, don t do things like a defrag inside the operating system
or something that is really disk intensive, because it s really going to cause that snapshot
to grow really quick and get large at a real rapid rate which you don t want to happen.
So, whenever you have virtual machine snapshots running, don t do anything that could cause
a lot of disk IO, or write IO, on that VM, which will make the snapshot grow pretty large
in size. s talk a little bit about virtual machine snapshot usage, when you should use
snapshots. Snapshots, I ll tell you right now when you shouldn t use them. They should
not be considered a primary method for backing up your VMs. I could say that over and over,
because there are some people that do that, and you can t rely on them to be a primary
backup method. They re useful for things like short-term backups, where you have to do an
ad hoc or on-the-fly backup of a VM to preserve it. Let s say you re going to patch that VM
or upgrade an application on that VM, and it s useful to be able to roll back to a certain
point in time. Maybe that upgrade had problems and you need to go back to the previous version.
So, they re great for things like that, but don t use them as a primary method for backing
up VMs. When snapshots are running, they slightly degrade the VM performance. As they grow and
that delta file is continually increased in size, they consume extra disk space on your
data stores. When you commit them and delete them, it also takes a lot of resource usage
on the host. Because you re extending that virtual disk into multiple disks, it can create
complications of that virtual disk, where things can happen to that virtual disk in
some situations. Also, some features don t work if you have a snapshot running. There
are certain features in vSphere that you can t do things if the VM has a snapshot running
on it. Snapshots, you should always just use them for ad hoc, on-the-fly backups and not
really use them for a primary backup mechanism. Now, having said that, snapshots are a primary
enabler for performing image level backups. What happens is, when I m backing up the VM
at the VMDK level, or the image of that VM, what I need to do is I need to pause that
disk so the backup application can read it without any changes that occur to a disk.
A snapshot is great for that, because what it does is it makes the VM s disk read-only,
so the backup app can mount it and have exclusive access to that disk. It doesn t have to worry
about the operating system writing to that disk while the backup is being completed.
So, snapshots are really used by almost all backup applications that do virtual machine
backup at the virtualization level. Once that snapshot is taken of the VM, then what happens
is, when the backup s completed, the backup application automatically deletes that snapshot.
The changes that have occurred while that backup was running are rolled back into the
original disk file. It just removes that snapshot like it wasn t even there. When it comes to
backing up your VMs, typically in the old days, you stored that data on a tape target.
You would take those tapes, put them off-site, or somewhere safe, and you always have them
there in case you need to ever do a restore. Today, a lot of the backup applications, instead
of using tape, they re using disk targets. How that works is, instead of writing to the
tape, data is written to a special backup repository that s created on a storage device.
Typically that s somewhere on a network. It could be a local storage device, but typically
it s on a network where you have a dedicated storage device where you can dump all that
data and put it into a backup repository. A lot of the backup apps now support writing
directly to disk targets. Before, to be able to do that, they had a special thing called
a virtual tape library, a VTL. What that did was it emulated a tape drive. Say you have
NetBackup and it needs a target. Install the VTL as a tape drive, but the VTL had a storage
backend where it emulated a tape drive, but it actually stored the data on a storage device
instead of on a tape. Today, most of the applications don t need that anymore. The VTL technology
is used less and less. Most of the apps can write directly to a disk target. That backup
repository that sits on that disk target, it s typically de-duped and compressed. It
s basically the virtual disk files of the VM, and it s all crunched together. It puts
it all together into this large repository. It stores all these backups as files on whatever
storage device that you re using for the target. That target storage, it can be local or remote.
You can have a local disk on your backup service where it s storing all that data, or it could
be remote. That backup server could be accessing it via an NFS share or a SYS share or a FTP.
The backup server is then taking all the data from the VM that it backs up and putting it
in the backup repository that's located on that target storage device, whether it be
local or remote. Once that data s written to a backup repository, it s got some advantages,
where you can actually take that data and then replicate it, either via storage replication.
I have an array that supports replication, or there are some products that can do that
at the virtualization layer as well to an offsite location. Those backups are great
for DR purposes, where as soon as that data s backed up, it s automatically replicated
off-site. You don t have to do anything with tape media and putting it off-site. All that
data automatically can be transferred to another location. If you have that disk repository,
there s still a need for tape in a lot of cases. So, what do you do? How do you get
that data to tape? A lot of the backup applications that support doing disk-to-disk backup also
support writing that data to tape as well. It s called disk to disk to tape. Basically,
you re writing to a disk target first, and then you re writing to a tape target afterwards.
You may have a tape system that would come through then and backup that repository. All
it is, is files that reside on a storage device somewhere. So that tape drive can come through
and access and backup that data that resides in the repository. That way, if you want to
put that off-site, or lock it up, you actually have another copy of your data, and maybe
a longer term archive that you can store to tape. You might use the disk for shorter term
storage, where you can keep a month s worth of data in there, and then the tape would
be used for longer retention periods. Backing up to disk makes restoring the data much easier
and quicker, because it s already there on your network. You don t have to go finding
tapes or recalling tapes. You can access that repository and pull back whatever you need
really quickly. With tapes, it could take a while, because you may no longer have that
tape on site. It could take days to get that back, but with disk you can get that data
back almost right away. The total cost of ownership can be a little bit higher with
disk-to-disk backups, compared to traditional tape backups, because typically these storage
systems are a lot more expensive than buying tapes and that. You continually have to add
storage, or you typically need more of a higher end storage device if you re going to do replications
and things like that. The price of that disk system can add up compared to the price of
implementing just a simple tape array with tapes. Tapes are definitely a lower cost solution
with more capacity compared to disk. Another key core technology that s used in backups
pretty frequently is called the Volume Shadow Copy Service, otherwise known as VSS. This
is Microsoft product only. It s a mechanism for creating consistent point in time copies
of the data. It's also known as shadow copies. It s basically a Windows service that runs.
It was first introduced in Windows XP, and all the versions of Windows since then have
that service running. It works at the block level of the file system. It doesn t work
at the file level. It works at the disk block level. It ensures data cannot be changed during
backups. So, when you have a backup running, you know how we talked earlier about virtual
machine snapshots, where you would freeze the virtual disk of a VM. This is the equivalent
inside the operating system, where the VSS service does pretty much the same thing. It
creates a snapshot and ensures that the data cannot be changed during the backup. It also
helps avoid problems with file locking by creating that snapshot. The application that
might be writing while that backup is running, is writing to a different area, and isn t
writing back to the original disk file. If you go into the services on any Windows machine
that has this service, you ll see it. It s called Volume Shadow Copy. It s start up type
is manual, but as long as it s not disabled, it will run automatically. So I don t actually
have to have it running, started. Backup applications that use that can start it on their own when
it s set to manual. The Volume Shadow Copy service works with both file system and applications.
It pauses them and puts them in a proper state for snapshots and backups to occur. When a
request is made, it s going to tell the application or the OS to pause for a second. Hold on a
second. I need to prepare my data. I might flush data to disk, out of memory, and things
like that, before I do the backup, so the data s in a consistent state to be backed
up. VSS is made up of several core components. First is the requestor. What the requestor
does is it initiates the request to the VSS service that s running in the operating system.
Typically, the requestor is a backup application or some type of application that needs to
do something inside the OS to back it up or whatever it s going to do. Requestors then
work with the writers to collect information about the data to be backed up. That requestor
is going to query the writer and say, "I need to know information about everything on the
disk so I can prepare it in the proper state to be backed up." The writer piece, then,
is a part of the application of services that are designed to work with VSS. A writer prepares
the application to quiesce their data, to ensure that data isn t written until the shadow
copy is created. It does the actual work here, where it pauses the application and quiesces
it and says, "Don t write any data while I m doing this, because I ve got to get this
data written to make sure it s in a proper state." So, the writer does all the hard work
there. Doing that basically ensures that the backup will be in a consistent state, and
any data that might be written in memory, that needs to be written to disk, so the data
can be in a consistent state, is written to. Finally, the provider, this does the actual
work of creating the shadow copy. Once the writer, like the traffic cop, has ensured
that the apps are all quiesced properly, the provider goes out and creates and maintains
the shadow copies, hence they re no longer needed. It s kind of like a snapshot. You
can consider a shadow copy, where disk writes occur in a different area while this is active.
We talked about providers. There are different types of providers that can do the work of
creating the shadow copies. There are hardware providers, which can offload that shadow copy
to a hardware storage device. This is typically used on things like shared storage or maybe
a special array card in a server, where it s more efficient at doing this than the actual
operating system is. So, by offloading that whole process to a hardware device, it makes
that whole process a lot more efficient. Then there are software providers. These work at
the software layer. They intercept and process IO requests. Anytime a request, just like
with the VM snapshot, comes through, it s going to say, "Hey, you can t write this to
the original spot, because this shadow copy is active." So, what you re going to do is
write them to a special area. It can be any type of storage device. The system providers
are built into the Windows OS and write to an NTFS volume on the system. You can see
there on the bottom a little depiction of how this works. The shadow copy service runs
on the operating system. A requestor, like a backup application, comes in, makes the
request, and the service is what directs all the different components. It will go to the
writer, and the writer will work with the applications and the operating system to prepare
them to be quiesced properly so they re in a proper state. Then, from that point, it
will turn it over to the providers that work at the disk level to do things like intercepting
all the IO requests, the writes that occur, and moving them to a different area while
that shadow copy is active. Then, again, the process is kind of the same as the VM snapshot.
Once that process is over, the data is simply taken and written back to the original location.
When it came to backups, VMware recognized that we can t do things the way we traditionally
did. We can t use agents inside the operating system. We need to leverage virtualization
to make backups a lot more efficient. What they came up with was, in VI3 they introduced
the VMware Consolidated Backup Framework, also known as VCB, which essentially was a
proxy server that acted as a middle man. A backup server, instead of going directly to
that VM to back it up, it would go through the proxy server. The proxy server would then
mount the virtual disk of that VM onto the VCB server and not involve the VM at all.
The backup server would backup the disk that s attached to the VCB server. So all the resources
and all the workload is on that VCB server. This basically shifted the backup overhead
from the VM, where if you backed up inside the OS, you have overhead there. You have
an agent running CPU memory disk and all that. It shifted that from the VM and the host itself,
because the VM is running on the host. So, it s using host resource. It shifted to the
backup server, the proxy server instead. This initial method basically became more efficient,
because you no longer had to go directly to that VM and involve the VM and the host. The
guest OS on the VM didn t even know it was being backed up because the proxy server would
just mount the virtual disk of that VM and the backup server would contact the proxy
server instead. So that more efficient way of doing backups became the vStorage APIs,
which VMware introduced in the vSphere, which basically eliminated VCB and that proxy server.
Instead of using that method, it leveraged a combination of APIs, SDKs, and also their
virtual disk development kit which allows applications to interface directly with virtual
disks. These vStorage APIs, what they did was allow third party applications to directly
connect to virtual storage data stores, instead of going through that proxy server. Why would
we want to go through the proxy server, which is another hop, when we can go there directly?
vStorage APIs allows this, which improved efficiency and better management, because
you can go direct to those VMs and not involve external sources to do that. vStorage APIs,
they are basically just a marketing term. They are a collection, in various ways, in
which the APIs integrate with the different areas of storage in vSphere. There are four
categories of these APIs we re going to cover in the next few slides. The first category
of the vStorage APIs are the vStorage APIs for array integration, also known as VAAI.
There were not initially, while they were there in vSphere, it took some time to develop
these. VMware actually had to work with specific storage vendors to enable storage array based
capabilities directly from within vSphere itself. This enabled vSphere to act more efficiently
for certain storage weighted operations. The storage array does this stuff a lot more efficiently.
Instead of trying to do this stuff in vSphere, it makes more sense to offload this stuff
to the storage array, and vSphere just tells the storage array, "Go do this. Go do this
for me and go do this." The storage array can do it much quicker than vSphere can. This
included certain operations, like doing array-based snapshots, where you could actually call an
array-based snapshot directly from within vSphere. Copy, offload, where if you re going
to clone a VM, or something like that, the copy process happens a lot quicker at the
storage array, instead of trying to do it through vSphere itself. Essentially, then,
when we try to clone something or create a copy, it s going to tell the storage array,
"Hey, do this for me," and it goes ahead and takes care of it much quicker. Same offload
for things like zeroing blocks, if you have a lot of disk blocks to zero on a VM, a virtual
disk, let the storage array do it. It does it a lot quicker. Hardware assisted locking,
instead of SCSI reservations, which can cause performance problems, let the array handle
locking of the virtual disk files. It can do it a lot more efficiently. Things like
storage provisioning integration, where I can actually go through vSphere now to provision
storage in that. So I don t need two separate consoles. I can do everything, all the storage
related tasks directly from within vSphere. The second vStorage APIs are called the vStorage
API s for multi-pathing, also known as VAMP. These were specifically designed for third
party storage vendors to leverage multi-pathing functionality in arrays through specific plug-ins
that each storage vendor develops. So, what this allows is vendors to more intelligently
utilize multi-pathing. Typically, to your storage device, you have multiple paths. If
you have a back-end SAN, you ll probably have at least two fiber channel controllers instead
your host. Because you have multiple paths available to that, vStorage APIs can work
with the storage array to get the best possible multi-pathing, so you can get the best storage
IO throughput and path failover for each specific storage array. Each storage array is different,
and they all handle multi-pathing in a little bit different way. To get these to work, each
vendor must certify, again, with their multi-pathing extension modules with VMware. Currently this
only supports ISCSI and fiber channel storage. A lot of things in the vStorage APIs don t
support NFS yet, but NFS support will most likely be added to this in the future. Here
are the other two categories of vStorage APIs. First there s a vStorage API for Site Recovery
Manager. These APIs work with VMware s Site Recovery Manager product, which allows you
to do site-to-site replication of virtual machines for use for business continuity or
disaster recovery purposes. This product relies on array based replication to do all the replication
at the array level. It can t do any of the replication through the VM level. What this
is basically is vendors develop their own site recovery adapters that plug into Site
Recovery Manager for their specific storage sub-systems. Then this allows integration
where the Site Recovery Manager product can actually control replication and keep on top
of all the replication that happens, to keep each site updated. The final one is the vStorage
APIs for data protection. This is the big one to backups. It s the successor to VCB.
It makes up for a lot of those shortcomings of VCB. A lot of people were unhappy with
VCB. It was a real pain to set up and use. Where VCB was a separate standalone framework,
the vStorage APIs are built directly into vSphere, and they don t require any additional
software installed like VCB did. So they include all the VCB functionality, but they also added
a lot of new functionality that we ll talk about. These vStorage APIs for data protection
are basically targeted at third party backup and data protection applications, that provide
those applications better integration and greater flexibility for backing up virtual
machines and recovering data when needed. The vStorage APIs for data protection introduced
some more efficient methods for backing up virtual machines. Remember, before we had
to go through a proxy server using that VCB to get to the virtual disk of a VM. We no
longer have to do that with the vStorage APIs. The vStorage APIs allow backup applications
and servers to directly mount a virtual disk of a VM. There are two ways it can do that.
Maybe if your backup server is a physical server, that physical server can access directly
that VM data store and mount that virtual disk so it could back it up. The other way
is if your backup server is a virtual appliance running on a host. What it can do in that
case is hot add the target VM s disk to that backup server virtual appliance, so that it
appears as a local disk, just another disk on that appliance. It could back it up directly
that way as well. So, both of these methods allow for a lot more efficient backups of
your virtual disk compared to involving the VM or going through a proxy server like you
had to do with VCB. The change block tracking feature is probably one of the most significant
and standout features of the vStorage APIs. For incremental backups and replication, typically
you have to know what blocks have changed since the last backup or replication cycle
occurred. So, applications normally have to compute that on their own. They have to look
at all the disk blocks and figure out, since the last backup or the last replication cycle,
what exactly has changed, so I can only backup that data now? CBT allows third party apps
to simply query the VM kernel to find that information out. The VM kernel now keeps tracks
of all the disk blocks that have changed from a particular point in time, like a last backup
operation or a replication cycle. The application can now just query and ask the VM kernel,
"Hey, what do I need to back up this time?" And the VM kernel gives him the answer right
away. So, you no longer have all that overhead of an application trying to figure that out
on its own. It can simply query the VM kernel and find that out instantly. This allows for
much faster incremental backups. It also allows for near continuous data protection, because
you re replication cycles can be a lot quicker now. You no longer have to compute every that
s changed. The VM kernel s going to tell you right away. This change block tracking supports
any type of storage except for physical mode, raw device mapping, RDMs. It supports virtual
mode REMs, but not physical mode. It also works with thin and thick disks. They can
be on any type of storage, whether it s NSF storage, ISCSI storage, fiber channel, or
local storage. How does change block tracking work? What CBT does is it stores all that
change block information into a special, if you look at a VM s home directory, you ll
see a special file there with a dash CTK extension, a VMDK file. That s created for each virtual
disk of a VM. This file is a fixed length file. It doesn t grow like a snapshot file
does, because, essentially, it s just a mapping of a VM s virtual disk blocks. The size of
this file will vary based on the size of the virtual disk. It s not really that big. It
s about half a Mb for every 10Gb of virtual disk size. So, these files are relatively
small and won t take up a lot of room on your data storage. The state of each block is stored
using sequence numbers that can tell applications if a particular block has changed or not.
The CBT can be enabled, a backup application can do this automatically. Typically in each
backup application, there s a setting there for if you want to use change block tracking
or not. Or, if you want to enable it yourself, there s a virtual machine advanced setting
that you can set, that will enable the feature as well. Another common technology used in
backups, whether it's in the physical or virtual environment, is data deduplication. Your backup
repositories, where all the images of your VMs are stored, can grow pretty large over
time. So, what do you do when you run out of space on those backup repositories? Well,
you could purchase more storage. That means either upgrading your current storage device,
adding more storage to it, or buying a new storage device. Both options can be pretty
expensive. You could also limit the number of backups that you store in the repository.
By doing this though you limit your ability to quickly restore data from that repository.
Since neither solution is desirable, instead of increasing the amount of storage that you
use to store your backups on, it s better to reduce the size of the data that s stored
on your backup repositories instead. Data deduplication eliminates duplicate blocks
of data from being stored in the backup repository. Every single block of data that gets sent
to the backup repository. It s going to look at every single block and pick out ones that
are already stored there to avoid restoring them repetitively. There are different ways
of doing data deduplication. You can do it inline versus post-processing. Inline you're
actually doing it as the backup is occurring. Post-processing is where you re actually doing
it after the backup is finished, where you re going to the target storage device and
deduping that. There s source versus target, where the source is actually doing it at the
source. Typically, this is through the operating system, where the target, you re actually
doing it on the target storage device. There are different chunking methods and half sizes.
When you re deduping data, you look at data in a certain size, maybe it s 16kb chunks
or 64kb chunks. You re taking all the data and putting it into those chunks, then looking
at those chunks to see which ones are duplicate. There are a lot of different ways. Each vendor
has their own preferred methods of doing data duplication that you ll find for different
backup products. Out of the different methods that you can use to do data deduplication,
the inline method is the most common. With this method the hash calculations are done
before the blocks are stored in the backup repository. The backup server is looking at
every single block that comes through it and doing a hash calculation on each block. If
that hash matches a block that s already stored in the data store, then it s not going to
store it there again. It s simply going to reference that existing one. So, doing all
these extra hash calculations requires extra overhead, because the backup server is no
longer just moving data from point A to point B. It s actually having to do math as well,
to calculate all these hashes on all these blocks that are going through it. You can
tweak the hash calculations to either get maximum deduplication or a better performance.
When you use smaller hash block sizes, that mean you ll get the maximum deduplications.
The block are smaller, so the chances of more duplicate blocks occurring in your data store
are greater. But you also get slower backups as a results, because it has to process a
lot more blocks of data to do all those hash calculations. If you use larger hash block
sizes, that means that you re going to more of a minimum deduplication, because those
blocks are a lot larger in size. Because those blocks are larger, there s less of a chance
of it matching existing blocks that are already stored in the data store. Doing this still
gets you some level of deduplication, but it also gets you the best backup job performance.
So, you ve got to find a balance of what works in your environment to meet your backup windows.
If you have a lot of time to do backups, maybe do the maximum. That way you have less data
stored in your data stores, but your backups will be slower. If you re windows are shorter
and you have to get backups done quicker, then maybe you d use a larger size to get
the best performance possible of your backups. Not all backup products support deduplication
natively or include it. A lot of times you have to buy that extra, or do it not within
the application itself, but have to maybe do it on the storage device end. So, always
check your backup products to see what their level of support is for doing data deduplication.
Data compression is another technology that can be used to reduce the space that data
takes up on your backup repositories. Data compression essentially squeezes the data
so that it takes up less space. To do this, it uses particular algorithms to reduce the
number of bits that data normally would take up. Doing all this compression, it s very
CPU intensive, and again, your backup server isn t just moving data. It s having to do
a lot of different math to get those backups done. By doing compression, it can really
increase the backup times, because it s got to compress all of this data before it can
store it in your backup repositories. Because it s so CPU intensive, you really can t skimp
on the resources on your backup server if you want to use compression. You re going
to need a pretty beefy backup server to do all this extra math that s going to occur
while the backup is running. Typically, maybe eight CPU cores on the backup server would
be a good amount if you re going to use your maximum compression. Like dedupes, there s
also multiple compression levels that can vary the amount of compression that s done.
So, again, if you have the largest, if you have a big backup window where you have a
lot of time to be able to shrink this data down, you can do that. But if your window
s smaller and you don t want to do as much compression on the data, then you can vary
the level to whatever meets your requirements. Between data dedupe and compression, you can
dramatically shrink your backup repositories, which results in you being able to store more
data for a longer period of time without having to look at alternate methods, like increasing
your repository size or taking data off of there. Now let s review what we ve covered
in regards to the core technologies for virtual machine backups. We took a look at what virtual
environment backups are used, and specifically at technology such as a virtual machine snapshot.
We then focused on disk-to-disk backups, and then we moved onto the Volume Shadow Copy
Service, or VSS provider, which is a way to take an application consistent backup. We
then talked about the vStorage APIs, and specifically change block tracking, which can enable the
fastest virtual machine backups. We then talked about data deduplication, which can lead to
storage savings on your backup targets, and we also talked about data compression. h=B>
h2A@ h6g@ [Content_Types].xml _rels/.rels theme/theme/themeManager.xml sQ}# theme/theme/theme1.xml
w toc'v )I`n 3Vq%'#q :\TZaG L+M2 e\O* $*c? Qg20pp \}DU4 hsF+ ,)''K K4'+ vt]K O@%\w S;
Z |s*Y 0X4D) ?*|f -45x /Y|t theme/theme/_rels/themeManager.xml.rels 6?$Q K(M&$R(.1 [Content_Types].xmlPK _rels/.relsPK
theme/theme/themeManager.xmlPK theme/theme/theme1.xmlPK theme/theme/_rels/themeManager.xml.relsPK
Kathryn Makowski Normal
Eugenia Ivanova Microsoft Office Word