What is Business Continuity in Networking

The importance of the network in IT disaster recovery planning

By Dr. Jim Kennedy, NCE, MRP, MBCI, CBRM, CHS-IV.

All across the globe businesses small, medium and large are all becoming more and more reliant on their IP networks for survival. This coupled with the growing trend for the convergence of the voice, data, and video over a single IP network make an organization's network infrastructure one of the most critical elements in its overall operation. No longer can organizations afford not to include a thorough and comprehensive plan for their continued availability as part of their business continuity and disaster recovery planning efforts.

Company networks now must provide voice, video, and data services that are increasingly integrated with applications. So if the company network fails all forms of communication with customers, suppliers and employees can also fail dramatically. Worse yet access to critical information can be lost or potentially compromised. Yet with these calamitous results real possibilities for companies, I find it surprising that many organizations continue to be inadequately prepared to deal with adverse events relating to their business and network operations.

In 2007 a computer crash in the Customs office of Los Angeles International Airport (LAX) caused hours of delays for more than 17,000 airline passengers. US Customs officials found that a malfunctioning network card caused Custom's to lose access to their national systems and databases and their local area network. This connectivity failure created a 'domino' effect leading to a total system failure that caused massive wait times at the airport, stranding some passengers. It took technicians over ten hours to diagnose the problem, halting screening operations until it could be resolved.

In another case a major Medical Center in Boston relied heavily upon its networked advanced clinical computing system. With this system, clinicians throughout the medical center and other affiliated hospitals could gain access to laboratory results, radiographs, and electrocardiograms electronically, using a secure Intranet. Patients also had secure access to their test results over the Internet. The outage lasted almost a week during which time the hospital staff had to scramble to utilize hand-carried patient records, laboratory-test results, and countless other documents around the hospital in order to maintain clinical operations.

Clearly in both cases disaster recovery plans which address the network and its infrastructure were needed.

It should be clear to all that are responsible for IT within corporations and government agencies that the network and the network infrastructure (comprised of DNS, DHCP, and etc.) are getting more complex and thus harder to manage, yet are a most important part of their organization's overall operational success. Networks provide voice, video, and data services that are increasingly integrated with business critical applications. Applications such as e-mail, CRM, and ERM rely on the network for proper operation. As such the network should be considered of great importance in any business continuity and/or disaster recovery plan.

I hope within this article to provide some important information for those responsible for developing business continuity and disaster recovery plans.

It should be understood that any disaster recovery planning effort should address all of the elements of an organization's network. Most corporate and government networks are comprised of three main elements – LAN, WAN, and network infrastructure services. The LAN provides for interconnectivity around a single organizational location or locale, the WAN provides interconnectivity between these locations (interconnecting geographically specific sites) other business partners and access to public networks such as the public switched telephone network in the case of voice traffic and the Internet for data traffic. The network infrastructure services element provides the services that allow control of the network and flow of data such as DNS, DHCP, WINS, FTP, and contain access to the network in the case of Active Directory, RADIUS, and TACACS.

Power considerations
According to the many producers of business continuity and disaster recovery surveys and statistics the single largest reason for network and systems failures can be directly attributed to power failures. So planning for power failures is essential in any DR plan. This means that all critical network components at either the primary data center, call center or failover site must be connected to a power source that has a very high availability percentage. In the case of a data center the percentage of availability should be in the area of 99.999 percent.

If the LAN provides critical services, as would be the case in a hospital or a bank, then each component of the distribution and access portions of the LAN such as each floor closet should be equipped with uninterruptible power supplies (UPS) which are connected to emergency power sources to maintain internal communication. The WAN routers, switches, firewalls and the like need the same form of protection to provide continuous communication and interconnection to the external sites and other public networks.

Many large data centers or critical operations, such as call centers, rely upon multiple electric power companies to provide utility power to their locations. The power is brought in to the critical site from different geographical locations. In that way if power is interrupted by something like a car-pole accident which severs the electric lines at a particular location the other utility can continue to provide uninterrupted power.

Many critical sites operate with emergency power generators, where possible, instead of alternate utilities as described above. These generators, together with UPS equipment can provide a continuous stream of electrical power for days if necessary while utility power is being restored. However, regular maintenance of these generators and their fuel source is critical to ensure their availability when needed.

Change control and documentation
We all know that all organization change continuously – that is a good thing because without growth many organizations cease to exist. However, change can be challenging for those who need to protect the network and its infrastructure.

Every network should be properly inventoried, have network diagrams which show the exact state of the network at a given point in time. Each critical element of the LAN, WAN and Infrastructure services must be known and identified, and properly classified within the business impact analyses which are done periodically.

As changes propagate, the network documentation should be continually updated to show the exact configuration of the network topology. Even more importantly, if an alternate recovery site exists it should be subject to the same changes, patches, and configurations as the primary site.

People are asked to perform quickly and under extremely difficult conditions when a disaster occurs. The difference between success and failure of a disaster recovery plan may be reliant upon the accuracy of the documentation or the currency of the changes to the DR site's network. Many disaster recovery failures occur because a change to a network element like a switch was never completed on the disaster recovery site switch and a fail-over connection could not be made.

Processes especially need to be in place to insure that patches, address changes, access control lists, and new network equipment are incorporated into the disaster recovery network as changes are made.

Redundancy
Critical network components are identified, impact analyses have been completed, and the necessary recovery point objective has been established. The level of resiliency of the network should be known. Based on that knowledge the planner can determine the levels of redundancy needed in the networks (primary and backup network).

There should be considerations for redundancy of components of network elements (e.g., switches, routers, and etc.). There should also be consideration given to redundant components such as power supplies, CPUs, and circuit cards for those network switches and routers.

There should also be considerations given to the redundancy and diversity of WAN circuits. Redundancy can be achieved by providing multiple circuits and multiple types of circuits between critical sites and applications. For example if the WAN network utilized MPLS or ATM it might be prudent to provide different circuits such as frame-relay so that if a carrier's entire service goes down (which has happened in the past) the organization can have a backup strategy. Satellite or microwave links can also be a strategies considered between some critical sites.

Diversity of circuits can be accomplished either by link diversity – insuring that if two links are used they travel different routes to your locations. That way if a link was compromised the alternate link, traveling a different path would be potentially unaffected. There is also carrier diversity. This protects against a carrier's service failure by utilizing a second carrier to provide a similar service. Many times multiple carriers are used to provide Internet access diversity and redundancy to a company, especially if it relies heavily on Internet connectivity for ecommerce.

Capacity
I have seen failures of disaster recovery plans occur because capacity of the alternate site was not properly estimated. In one case a business was relying on geo-redundancy of a call center for the automatic failover from one site to another which experienced a failure. In one example the primary site was in New York and the geo-redundant site was in Chicago. The failure occurred because Chicago site was swamped by the unanticipated additional high traffic volumes coming in from the failed NY site. The volumes were increased from the norm because suppliers wanted to determine how they could help, customers wanted to know about how their orders already placed would be affected, employees from NY and the north-east region were using the network to determine what they should do next.

When developing a fail-over scenario the planner must take into consideration several capacity factors. One is the peak capacity of the site where the traffic will be rerouted to. The second is the peak capacity coming from the site which has failed. The size of the WAN circuits should be such so as to allow for both peak capacities plus an additional 25 to 40 percent. The reason for the additional 25 to 40 percent is to accommodate new peak traffic volumes from added VoIP and/or data traffic caused by customers, suppliers, and employees needing to find out about how the problem affects them.

The planner should have a clear communication plan for notifying customers, suppliers, and employees in case of an activation of the business continuity plan. Also, the business continuity plan should prepare for additional personnel to be made available to handle additional call volumes at the fail over site otherwise degraded service will be the norm.

On a more simplistic note if the disaster recovery site is designed to employ smaller class routers or switches or reduced capacity circuits than the primary site then the disaster recovery plan is already set up for failure. As soon as the traffic is re-routed to the alternate disaster recovery site the normal volumes of traffic will quickly overload these smaller sized network components. These smaller devices will simply not handle the necessary volumes. Sizing of network components based on expected traffic loads is critical to success. Remember experience indicates that when a disaster occurs excesses in traffic volumes will be experienced.

Security
Hackers and crackers prey on weak networks. They are like a 'feeding frenzy' of sharks in bloody waters. As soon as they realize a business has suffered problems they look to see if they can breach the information and network security of that organization. Many times they are successful because the same level of security as is found at the organization's primary site is not found at its disaster recovery sites. Firewalls, intrusion detection, virus protection, access controls and the like MUST be at the same level of protection or there will be security breaches. Count on it.

Plan exercising
The last area I will discuss is the importance of exercising the disaster recovery plan. Surveys continually indicate that most organizations do not regularly exercise their disaster recovery plans and many have never exercised their plans at all.

Plans need to be tested to ensure that they will work when they are absolutely needed. As stated before, during times of crisis people are asked to perform under very difficult circumstances. The human thinking process is often times obfuscated mainly due to the stress of the moment. It is important to make sure that each and every participant of the disaster recovery plan knows what is expected of them and they have had an opportunity to perform their duties under better than disastrous conditions.

Further, regular planning allows an organization to see if the disaster recovery plan remains fit-for-purpose, that changes to patches and network addresses have been incorporated, and that nothing has changed since the last exercise.

I have seen so often where tapes used to recover critical network applications, like the disaster recovery site DNS/DHCP server were misplaced and the server could not be recovered. It took many hours to finally get the network on line. Critical time wasted, critical dollars lost because business applications were not on line when needed. Make the mistakes, encounter the errors when they do not impact the operation – during an exercise.

The author
Dr. Jim Kennedy has a PhD in Technology and Operations Management and is the Business Continuity Services Practice lead and Principal Consultant for Alcatel-Lucent. Dr. Kennedy has over 30 years' experience in the information security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of two books, 'Blackbook of Corporate Security' and 'Disaster Recovery Planning: An Introduction' and author of the e-book, 'Business Continuity & Disaster Recovery – Conquering the Catastrophic'. jtkennedy@alcatel-lucent.com

Relevant site:

Based on natural disasters and computer based disasters it is becoming increasingly more important for businesses to have a document management system in place.

•Date: 26th February 2008• Region:US/World•Type: Article •Topic: IT continuity
Rate this article or make a comment - click here

ainsworthknotans.blogspot.com

Source: https://www.continuitycentral.com/feature0554.htm

What is Business Continuity in Networking

0 Response to "What is Business Continuity in Networking"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel