Tip:
Highlight text to annotate it
X
Network management operations are an important part of the SCP program.
This training video will introduce you to network management operations and explain the
requirements of the SCP program on this subject.
Within this section, you should expect to find things like device management,
capacity planning, and information on service level agreement management.
Enabling devices and enabling the network itself for management is an
important part
of managing any network.
Some considerations you want to keep in mind when it comes to managing the
network in enabling these devices
for management by your NMS, or for instance the network management protocols
that you will be using.
And not only do you have to think about
which protocols, meaning SNMP vs. IMCP
vs. WMI, you also may have to think about which versions and types of
those protocols,
meaning SNMP version 1 vs. version 2, version 3 of course.
Now in the detailed videos for network management protocols,
we go over each protocol in-depth, so we won't do that again here.
But you want to pay attention to that.
You also want to pay attention to the options
certain network management protocols have available.
For instance, with Netflow, you have the option of using traditional Netflow as in
Netflow Version 5,
vs. flexible Netflow, which is implemented in
version 9
and with an IP fix.
Something else to think about when it comes to enabling devices and then enabling
the network for management is Event Severity.
You need to think about which types of events you want to send traps on,
which different levels of logging you want to enable,
so that you can control
how detailed your syslog messages are,
and also how many messages you receive. Now obviously if you set the lowest
level of syslogging,
then you receive a tremendous amount of messages, which can require
a lot of disk space for storage,
and can place a lot of load upon your NMS, so keep that in mind.
Now with syslog, for instance, there are seven specific different levels
that you can use in terms of the Event Severity,
and you want to understand what those are
and what the most detailed level is.
Access lists and firewalls are another thing you have to know about when it comes to
enabling the devices
and the network for management.
Not only
do most devices require that you update that device itself in terms of an
access list or a filter,
limiting network management traffic,
you also have to think about allowing those protocols through the network
so it's not blocked en route to the device you are trying to pull.
Another thing to keep in mind is you will have to think about your capacity
requirements and how to determine those when it comes to scoping and
scaling your NMS. You will need to know which statistics mean what.
In other words,
whenever you have a circuit that is overloaded,
you will also want to check for those types of statistics under interface statistics, such
as bandwidth utilization,
you will want to look at the bit rates in terms of bits per second. You will also want to look at
any errors and discards on the interface.
However, when a device is being overloaded,
you want to look at statistics like CPU, and memory, and buffer usage.
Understanding which types of statistics to look for
when you're doing capacity planning is really important.
Now you also need to be able to understand how the statistics work
and what they are used for.
For example, SNMP would tell you how much traffic is present
on a network interface, but NetFlow tells you who and what;
who is using the bandwidth, what they are using it for,
what websites they are visiting, what protocols and applications they have out.
Now here is an example of how you calculate bandwidth utilization on a specific
interface.
So we are going to cover the octets in and out, bandwidth capacity, and the
time span, meaning these are the three things you need to know in order to
calculate bandwidth utilization on a circuit. You need to know the octet values in and out,
you need to know actually the bandwidth or capacity for the circuit
and the time frame you are talking about.
Let's take this for example:
assume you have a traditional T-1, which is 1.544
megabits per second.
Now if you were to poll
for the traffic going through it,
typically that's done with either IF or iflnOctets.
And typically, that would give you a value, again, back in octets, which is
the same as bytes.
So if I polled that interface, the first value I might get back
would be a million.
And if I poll it five minutes later
and get a value of forty five million,
then I know that in that five minute interval,
that there were forty four million octets sent
on this interface.
Receive if it's in, or sent if it is out octets.
Now to do the math to calculate the bit rate, what you will want to do first of all is
multiply the forty four million
times eight.
That takes you from octets into bits.
Once you have done that, divide that number by the number of seconds in
between your polls. And in this case we had even number of three hundred seconds
or five minutes,
and we get a little over one million bits per second.
Now divide that by a thousand to get kilobits per second, and by a thousand again
to get megabits per second,
and you see we are at about 1.17 megabits bits per second,
divide that of course by the theoretical max which is 1.544
for a T-1,
and you end up with 76% utilization.
And that is effectively what your NMS is doing all the time. Now this is
important
because you want to be able to understand
in the event that your NMS is unable to collect this data
or the data that you are seeing in the NMS looks odd, you need
to know how it works so you can do it manually, to really understand it.
Now in some cases, people will divide by 1024 instead of 1000,
convert from bits per second to kilobits and up to megabits,
but traditionally if it's a bit rate,
you only need to divide by 1000 or 10^3, and if it is a byte rate,
or if it is measured by bytes, then you divide by 1024.
You also need to be able to understand trends and how to recognize
those within your NMS, specifically within Orion,
and when and how to use the Orion report writer. Now we will cover the report writer
in one of the detailed videos on Orion NPM administration,
and of course you can read about in the Orion administrator's guide.
Last but not least, you will want to understand service level agreement or SLA management,
how to define and list common SLAs,
when to use charting vs. reporting,
and how to understand 95th percentile.
Now common SLAs typically are delivered either by your service
provider or your carrier,
or there are SLAs you define internally
to really give you a way of measuring the service you are providing to your
internal customers. Now the two most common SLAs are around performance,
for instance, you are paying a carrier for a 10 megabit metro ethernet circuit,
you want to be able to track the fact that you are actually able to get
up to 10 megabits under peak load time.
If for instance you are paying for 10 megabit,
but you notice in Orion
you are never spiking above five megabits, even under peak load times, then
they've misprovisioned your circuit, and they probably owe you some money back. You want to really monitor
that and work with your carrier on it.
Now the other common SLA is around availability. Availability simply means
up time.
And so on a lot of networks people like to talk about five nines of availability,
and what five nines means
is simply ninety-nine point nine nine nine percent available
or five nines, which roughly equates to five minutes of down time per year.
Now a lot of organizations
really are managed, and a lot of engineers I worked with are bonused on
availability,
so you want to be able to track that and use both charts and reporting Orion
to tell when and where
you violated these SLAs.
Charting is great
when you want to look at a single interface or single entity, and you want to see the
variations in the statistic over a specific amount of time.
Whereas reporting is more valuable
when you want to see many different entities for a set amount of time, and you want
to see an average for the entire time period.
Now of course 95th percentile we did a lot of questions about.
It's actually a very, very simple method
of taking out the highest peaks in your collected traffic
to understand what your true max has been over time. And 95th percentile
simply means you order all of your samples from the data collected
from highest to lowest,
you then drop the top five percent of those samples and the next sample down the
list is your 95th percentile.