Showing posts with label SAN Basics. Show all posts
Showing posts with label SAN Basics. Show all posts

13.5 Information Security Threats and Security Controls Overview

The information made available on a network is exposed to security threats from a variety of sources. Therefore, specific controls must be implemented to secure this information that is stored on an organization’s storage infrastructure. In order to deploy controls, it is important to have a clear understanding of the access paths leading to storage resources.

If each component within the infrastructure is considered a potential access point, the attack surface of all these access points must be analyzed to identify the associated vulnerabilities. To identify the threats that apply to a storage infrastructure, access paths to data storage can be categorized into three security domains.
 Information Security Threats and Security Controls Overview
Image Credits - EMC
  • Application Access to the stored data through the storage network. Application access domain may include only those applications that access the data through the file system or a database interface.
  • Management Access to storage and interconnecting devices and to the data residing on those devices. Management access, whether monitoring, provisioning, or managing storage resources, is associated with every device within the storage environment. Most management software supports some form of CLI, system management console, or a web-based interface. Implementing appropriate controls for securing management applications is important because the damage that can be caused by using these applications can be far more extensive.
  • Backup, Replication, and Archive Access primarily accessed by storage administrators who configure and manage the environment. Along with the access points in this domain, the backup and replication media also needs to be secured.

The Key Security Threats across the Domains

To secure the storage environment, identify the attack surface and existing threats within each of the security domains and classify the threats based on the security goals — availability, confidentiality, and integrity.

Also Read: The next generation IT Data Center Layers
  • Unauthorized access - Unauthorized access is an act of gaining access to the information systems, which includes servers, network, storage, and management servers of an organization illegally. An attacker may gain unauthorized access to the organization’s application, data, or storage resources by various ways such as by bypassing the access control, exploiting a vulnerability in the operating system, hardware, or application, by elevating the privileges, spoofing identity, and device theft. 
  • Denial of Service (DoS) - A Denial of Service (DoS) attack prevents legitimate users from accessing resources or services. DoS attacks can be targeted against servers, networks, or storage resources in a storage environment. In all cases, the intent of DoS is to exhaust key resources, such as network bandwidth or CPU cycles, thereby impacting production use. 
  • Distributed DoS (DDoS) attack - A Distributed DoS (DDoS) attack is a variant of DoS attack in which several systems launch a coordinated, simultaneous DoS attack on their target(s), thereby causing denial of service to the users of the targeted system(s). 
  • Data loss - Data loss can occur in a storage environment due to various reasons other than malicious attacks. Some of the causes of data loss may include accidental deletion by an administrator or destruction resulting from natural disasters. In order to prevent data loss, deploying appropriate measures such as data backup or replication can reduce the impact of such events. 
  • Malicious Insiders -  According to Computer Emergency Response Team (CERT), a malicious insider could be an organization’s current or former employee, contractor, or other business partner who has or had authorized access to an organization’s servers, network, or storage. These malicious insiders may intentionally misuse that access in ways that negatively impact the confidentiality, integrity, or availability of the organization’s information or resources. For example, consider a former employee of an organization who had access to the organization’s storage resources. This malicious insider may be aware of security weaknesses in that storage environment. This is a serious threat because the malicious insider may exploit the security weakness. 
  • Account Hacking - Account hijacking refers to a scenario in which an attacker gains access to an administrator’s or user’s account(s) using methods such as phishing or installing keystroke-logging malware on administrator’s or user’s systems. Phishing is an example of a social engineering attack that is used to deceive users. 
  • Insecure API's - Application programming interfaces (APIs) are used extensively in software-defined and cloud environment. It is used to integrate with management software to perform activities such as resource provisioning, configuration, monitoring, management, and orchestration. These APIs may be open or proprietary. The security of storage infrastructure depends upon the security of these APIs. An attacker may exploit vulnerability in an API to breach a storage infrastructure’s perimeter and carry out an attack. Therefore, APIs must be designed and developed following security best practices such as requiring authentication and authorization, input validation of APIs, and avoiding buffer overflows. 
  • Shared technology vulnerabilties - Shared technology vulnerabilties Technologies that are used to build today’s storage infrastructure provide a multi-tenant environment enabling the sharing of resources. Multi-tenancy is achieved by using controls that provide separation of resources such as memory and storage for each application. Failure of these controls may expose the confidential data of one business unit to users of other business units, raising security risks. 
  • Media Theft - Backups and replications are essential business continuity processes of any data center. However, inadequate security controls may expose organizations confidential information to an attacker. There is a risk of a backup tape being lost, stolen, or misplaced, and the threat is even severe especially if the tapes contain highly confidential information. An attacker may gain access to an organization’s confidential data by spoofing the identity of the DR site. 

The Key Information Security Controls

Any security control should account for three aspects: people, process, and technology, and the relationships among them. Security controls can be administrative or technical. Administrative controls include security and personnel policies or standard procedures to direct the safe execution of various operations. Technical controls are usually implemented through tools or devices deployed on the IT infrastructure. To protect a storage infrastructure, various technical security controls must be deployed at the compute, network, and storage levels.

Also Read: Factors affecting SAN performance

At the server level, security controls are deployed to secure hypervisors and hypervisor management systems, virtual machines, guest operating systems, and applications. Security at the network level commonly includes firewalls, demilitarized zones, intrusion detection and prevention systems, virtual private networks, zoning and iSNS discovery domains, port binding and fabric binding configurations, and VLAN and VSAN. At the storage level, security controls include LUN masking, data shredding, and data encryption. Apart from these security controls, the storage infrastructure also requires identity and access management, role-based access control, and physical security arrangements. The Key Information Security Controls are
  • Physical Security
  • Identity and Access Management
  • Role-based Access Control
  • Network Monitoring and Analysis
  • Firewalls
  • Intrusion Detection and Prevention System
  • Adaptive Security
  • Virtual Private Networks
  • Virtual LAN & Virtual SAN
  • Zoning and ISNS discovery domain
  • Port binding and fabric binding
  • Securing hypervisor and management server
  • VM, OS and Application hardening
  • Malware Protection Software
  • Mobile Device Management
  • LUN Masking
  • Data Encryption
  • Data Shredding


Go To >> Index Page

13.4 Introduction to Information Security

Information is an organization’s most valuable asset. This information, including intellectual property, personal identities, and financial transactions, is regularly processed and stored in storage systems, which are accessed through the network. As a result, storage is now more exposed to various security threats that can potentially damage business-critical data and disrupt critical services. Organizations deploy various tools within their infrastructure to protect these assets. These tools must be deployed on various infrastructure assets to protect the information. The commonly used infrastructure assets are
  • Servers (which processes information)
  • Storage (which stores information)
  • Network (which carries information) 

As organizations are adopting next generation emerging technologies, in which cloud is a core element, one of the key concerns they have is ‘trust’. Trust depends on the degree of control and visibility available to the information’s owner. Therefore, securing storage infrastructure has become an integral component of the storage management process in modern IT datacenters. It is an intensive and necessary task, essential to manage and protect vital information.

Information Security Overview

Information security includes a set of practices that protect information and information systems from unauthorized disclosure, access, use, destruction, deletion, modification, and disruption. Information security involves implementing various kinds of safeguards or controls, in order to lessen the risk of an exploitation or a vulnerability in the information system which could otherwise cause a significant impact to organization’s business. From this perspective, security is an ongoing process, not static, and requires continuous revalidation and modification. Securing the storage infrastructure begins with understanding the goals of information security.


Information Security Overview
 Image Source - Belvatech

Goals of  information security

The goal of information security is to provide
  • Confidentiality
  • Integrity
  • Availability
Confidentiality provides the required secrecy of information to ensure that only authorized users have access to data. Integrity ensures that unauthorized changes to information are not allowed. The objective of ensuring integrity is to detect and protect against unauthorized alteration or deletion of information. Availability ensures that authorized users have reliable and timely access to servers, storage, network, application, and data resources. Ensuring confidentiality, integrity, and availability are the primary objective of any IT security implementation. These are supported through the use of authentication, authorization, and auditing processes.
  • Authentication is a process to ensure that ‘users’ or ‘assets’ are who they claim to be by verifying their identity credentials. A user may be authenticated by a single-factor or multi-factor method. Single-factor authentication involves the use of only one factor, such as a password. Multi-factor authentication uses more than one factor to authenticate a user.
  • Authorization refers to the process of determining whether and in what manner, a user, device, application, or process is allowed to access a particular service or resource. For example, a user with administrator’s privileges is authorized to access more services or resources compared to a user with non-administrator privileges. Authorization should be performed only if authentication is successful. The most common authentication and authorization controls, used in a data center environment are Windows Access Control List (ACL), UNIX permissions, Kerberos, and Challenge-Handshake Authentication Protocol (CHAP). It is essential to verify the effectiveness of security controls that are deployed with the help of auditing. 
  • Auditing refers to the logging of all transactions for the purpose of assessing the effectiveness of security controls. It helps to validate the behaviour of the infrastructure components, and to perform forensics, debugging, and monitoring activities.

Information Security Considerations

Risk assessment
An organization might wants to safeguard the asset from threat agents (attackers) who seek to abuse the assets. Risk arises when the likelihood of a threat agent (an attacker) to exploit the vulnerability arises. Therefore, the organizations deploy various countermeasures to minimize risk by reducing the vulnerabilities.

Also Read: Introduction to Storage Infrastructure Management

Risk assessment is the first step to determine the extent of potential threats and risks in an infrastructure. The process assesses risk and helps to identify appropriate controls to mitigate or eliminate risks. Organizations must apply their basic information security and risk-management policies and standards to their infrastructure. Some of the key security areas that an organization must focus on while building the infrastructure are: authentication, identity and access management, data loss prevention and data breach notification, governance, risk, and compliance (GRC), privacy, network monitoring and analysis, security information and event logging, incident management, and security management. 

Assets and Threats
Information is one of the most important assets for any organization. Other assets include hardware, software, and other infrastructure components required to access the information. To protect these assets, organizations deploy security controls. These security controls have two objectives. The first objective is to ensure that the resources are easily accessible to authorized users. The second objective is to make it difficult for potential attackers to access and compromise the system. The effectiveness of a security control can be measured by two key criteria. One, the cost of implementing the system should be a fraction of the value of the protected data. Two, it should cost heavily to a potential attacker, in terms of money, effort, and time, to compromise and access the assets.

Also Read: Unified Storage Systems Overview

Threats are the potential attacks that can be carried out on an IT infrastructure. These attacks can be classified as active or passive. Passive attacks are attempts to gain unauthorized access into the system. Passive attacks pose threats to confidentiality of information. Active attacks include data modification, denial of service (DoS), and repudiation attacks. Active attacks pose threats to data integrity, availability, and accountability.

Vulnerability 
Vulnerability is a weakness of any information system that an attacker exploits to carry out an attack. The components that provide a path enabling access to information are vulnerable to potential attacks. It is important to implement adequate security controls at all the access points on these components.

Attack surface, attack vector, and work factor are the three factors to consider when assessing the extent to which an environment is vulnerable to security threats. Attack surface refers to the various entry points that an attacker can use to launch an attack, which includes people, process, and technology. For example, each component of a storage infrastructure is a source of potential vulnerability. An attacker can use all the external interfaces supported by that component, such as the hardware and the management interfaces, to execute various attacks. These interfaces form the attack surface for the attacker. Even unused network services, if enabled, can become a part of the attack surface. An attack vector is a step or a series of steps necessary to complete an attack. For example, an attacker might exploit a bug in the management interface to execute a snoop attack. Work factor refers to the amount of time and effort required to exploit an attack vector.

Also Read: Taking Backup and Archive to Cloud Storage

Having assessed the vulnerability of the environment, organizations can deploy specific control measures. Any control measures should involve all the three aspects of infrastructure: people, process, and technology, and their relationship. To secure people, the first step is to establish and assure their identity. Based on their identity, selective controls can be implemented for their access to data and resources. The effectiveness of any security measure is primarily governed by the process and policies. The processes should be based on a thorough understanding of risks in the environment, should enable recognizing the relative sensitivity of different types of data, and help determine the needs of various stakeholders to access the data. Without an effective process, the deployment of technology is neither cost-effective nor aligned to organizations’ priorities.

Security Controls
Finally, the controls that are deployed should ensure compliance with the processes, policies, and people for its effectiveness. These security controls are directed at reducing vulnerability by minimizing the attack surfaces and maximizing the work factors. These controls can be technical or non-technical. Technical controls are usually implemented at server, network, and storage level, whereas non-technical controls are implemented through administrative and physical controls. Administrative controls include security and personnel policies or standard procedures to direct the safe execution of various operations. Physical controls include setting up physical barriers, such as security guards, fences, or locks. Controls are categorized as preventive, detective, and corrective.
  • Preventive: Avoid problems before they occur
  • Detective: Detect a problem that has occurred
  • Corrective: Correct the problem that has occurred
Organizations should deploy defense-in-depth strategy when implementing these controls.

Defence in depth
An organization should deploy multiple layers of defense throughout the infrastructure to mitigate the risk of security threats, in case one layer of the defense is compromised. This strategy is referred to as defense-in-depth. This strategy may also be thought of as a “layered approach to security” because there are multiple measures for security at different levels. Defense-in-depth increases the barrier to exploitation—an attacker must breach each layer of defenses to be successful—and thereby provides additional time to detect and respond to an attack. This potentially reduces the scope of a security breach. However, the overall cost of deploying defense-in-depth is often higher compared to single-layered security controls. An example of defense-in-depth could be a virtual firewall installed on a hypervisor when there is already a network-based firewall deployed within the same environment. This provides additional layer of security reducing the chance of compromising hypervisor’s security if network-level firewall is compromised.

Previous: Factors affecting SAN performance

Go To >> Index Page

13.3 Factors affecting SAN performance

The common factors which can affect storage infrastructure performance are the type of RAID level configured or due to enabling or disabling Cache, due to Thin LUNs provisioning, latency in Network Hops and in some cases due to misconfigured Multipathing. 

Factors which might affect SAN performance

RAID Configurations
The RAID levels that usually cause the most concern from a performance perspective are RAID5 and RAID6 which is why many DB administrators request SAN volumes which are RAID5 or RAID6. Parity-based RAID schemes, such as RAID 5 and RAID 6, perform differently than other RAID schemes such as RAID 1 and RAID 10. This is due to a phenomenon known as the write penalty. 

This can lead to lower performance, especially in cases with workloads that consist of a lot of random write activity as is often the case with database workloads. The reason the write penalty occurs is that small-block writes require a lot of parity recalculation, resulting in additional I/O on the backend. 

Also Read: Types of RAID Levels

Small-block writes are relatively hard work for RAID 5 and RAID 6 because they require changes to be made within RAID stripes, which forces the system to read the other members of the stripe to be able to recompute the parity. In addition, random small-block write workloads require the R/W heads on the disk to move all over the platter surface, resulting in high seek times. The net result is that lots of small-block random writes with RAID 5 and RAID 6 can be slow. Even so, techniques such as redirecton write or write-anywhere filesystems and large caches can go a long way to masking and mitigating this penalty.

Cache
Cache is the magic ingredient that has just about allowed storage arrays based on spinning
disk to keep up to speed with the rest of the technology in the data center. If you take DRAM caches and caching algorithms out of the picture, spinning disk–based storage arrays practically grind to a halt. Having enough cache in your system is important in order to speed up average response times. If a read or write I/O can be satisfied from cache (not having to rely on the disks to complete the read or write I/O), it will be amazingly faster than if it has to rely on the disks on the backend. 

Also Read: The next generation RAID techniques

However, not all workloads benefit equally from having a cache in front of spinning disks. Some workloads result in a high cache-hit rate, whereas other don’t. A cache hit occurs when I/O can be serviced from cache, whereas a cache miss requires access to the backend disks. Even with a large cache in front of your slow spinning disks, there will be some I/Os that result in cache misses and require use of the disks on the backend.

These cache-miss I/Os result in far slower response times than cache hits, meaning that the variance (spread between fastest and slowest response times) can be huge, such as from about 2 ms all the way up to about 100 ms. This is in stark contrast to all-flash arrays,
where the variance is usually very small. Most vendors will have standard ratios of disk capacity to cache capacity, meaning that you don’t need to worry so much about how much cache to put in a system. However, these vendor approaches are one-size-fits-all approaches and may need tuning to your specific requirements. 

Also Read: Importance of Cache technique in Block Based Storage Systems

Thin LUNs
Thin LUNs work on the concept of allocating space to LUNs and volumes on demand. So on day one when you create a LUN, it has no physical space allocated to it. Only as users and applications write to it is capacity allocated. This allocate-on-demand model can have an impact in two ways:
  • The allocate-on-demand process can add latency.
  • The allocate-on-demand process can result in a fragmented backend layout.
The allocate-on-demand process can theoretically add a small delay to the write process because the system has to identify free extents and allocate them to a volume each time a write request comes into a new area on a thin LUN. However, most solutions are optimized to minimize this impact.

Also Read: Storage Provisioning and Capacity Optimization Techniques

Probably of more concern is the potential for some thin LUNS to end up with heavily fragmented backend layout because of the pseudo-random nature in which space is allocated to them. This can be particularly noticeable in applications with heavily sequential
workloads. If users suspect a performance issue because of the use of thin LUNs, perform
representative testing on thin LUNs and thick LUNs and compare the results.

Network Hops
Within the network, FC SAN, or IP, the number of switches that traffic has to traverse has an impact on response time. Hopping across more switches and routers adds latency, often referred to as network-induced latency. This latency is generally higher in IP/Ethernet networks where store-and-forward switching techniques are used, in addition to having the
increased potential for traffic-crossing routers.

Also Read: The need for a Converged Enhanced Ethernet (CEE) Network

Multipathing
If one path fails, another takes over without the application or user even noticing. However, MPIO can also have a significant impact on performance. For example, balancing all I/O from a host over two HBAs and HBA ports can provide more bandwidth than sending all I/O over a single port. It also makes the queues and CPU processing power of both HBAs available. MPIO can also be used to balance I/O across multiple ports on the storage array too. Instead of sending all host I/O to just two ports on a storage array, MPIO can be used to balance the I/O from a single host over multiple array ports, for example, eight ports. This can significantly help to avoid hot spots on the array’s front-end ports, similar to the way that wide-striping avoids hot spots on the array’s backend.

Standard SAN Performance Tools

Perfmon
Perfmon is a Windows tool that allows administrators to monitor an extremely wide variety of hostbased performance counters. From a storage perspective, these counters can be extremely useful, as they give you the picture as viewed from the host. For example, latency experienced from the host will be end-to-end latency, meaning that it will include host-based, network-based, and array-based latency. However, it will give you only a single figure, and it won’t break the overall latency down to host-induced latency, network-induced latency, and array-induced latency.

For example, Open the Windows perfmon utility by typing perfmon at the command prompt of the
Run dialog box.

Standard SAN Performance Tools

iostat
Iostat is a common tool used in the Linux world to monitor storage performance. For example run
iostat -x 20

The following output shows the I/O statistics over the last 20 seconds:

Previous: Introduction to the key Storage Management Operations

Go To >> Index Page

13.2 Introduction to the key Storage Management Operations

The Key Storage Management Operations consists of Storage Monitoring, Storage Alerting, and Storage Reporting. Storage Monitoring provides the performance and availability status of various infrastructure components and services. It also helps to trigger alerts when thresholds are reached, security policies are violated, and service performance deviates from SLA. These functions are explained below.

Storage Management Operations Overview

1) Storage Monitoring

Monitoring forms the basis for performing management operations. Monitoring provides the performance and availability status of various infrastructure components and services. It also helps to measure the utilization and consumption of various storage infrastructure resources by the services. This measurement facilitates the metering of services, capacity planning, forecasting, and optimal use of these resources. Monitoring events in the storage infrastructure, such as a change in the performance or availability state of a component or a service, may be used to trigger automated routines or recovery procedures. 

Also Read: Using Sub-LUN Auto Tiering techniques in SAN Infrastructure

Such procedures can reduce downtime due to known infrastructure errors and the level of manual intervention needed to recover from them. Further, monitoring helps in generating reports for service usage and trends. Additionally, monitoring of the data center environment parameters such as heating, ventilating, and air-conditioning (HVAC) helps in tracking any anomaly from their normal status. A storage infrastructure is primarily monitored for 
  • Configuration Monitoring
  • Availability Monitoring
  • Capacity Monitoring
  • Performance Monitoring
  • Security Monitoring 
Monitoring Configuration
Monitoring configuration involves tracking configuration changes and deployment of storage infrastructure components and services. It also detects configuration errors, non-compliance with configuration policies, and unauthorized configuration changes.

Monitoring Availability
Availability refers to the ability of a component or a service to perform its desired function during its specified time of operation. Monitoring availability of hardware components (for example, a port, an HBA, or a storage controller) or software component for example, a database instance or an orchestration software involves checking their availability status by reviewing the alerts generated from the system. For example, a port failure might result in a chain of availability alerts. A storage infrastructure commonly uses redundant components to avoid a single point of failure. Failure of a component might cause an outage that affects service availability, or it might cause performance degradation even though availability is not compromised. Continuous monitoring for expected availability of each component and reporting any deviation help the administrator to identify failing services and plan corrective action to maintain SLA requirements.

Monitoring Capacity
Capacity refers to the total amount of storage infrastructure resources available. Inadequate capacity leads to degraded performance or even service unavailability. Monitoring capacity involves examining the amount of storage infrastructure resources used and usable such as the free space available on a file system or a storage pool, the numbers of ports available on a switch, or the utilization of allocated storage space to a service. Monitoring capacity helps an administrator to ensure uninterrupted data availability and scalability by averting outages before they occur. For example, if 90 percent of the ports are utilized in a particular SAN fabric, this could indicate that a new switch might be required if more servers and storage systems need to be attached to the same fabric. Monitoring usually leverages analytical tools to perform capacity trend analysis. These trends help to understand future resource requirements and provide an estimation of the time required to deploy them.

Also Read: How storage is provisioned in Software Defined Storage Environments

Monitoring Performance
Performance monitoring evaluates how efficiently different storage infrastructure components and services are performing and helps to identify bottlenecks. Performance monitoring measures and analyzes behavior in terms of response time, throughput, and I/O wait time. It identifies whether the behavior of infrastructure components and services meets the acceptable and agreed performance level. This helps to identify performance bottlenecks. It also deals with the utilization of resources, which affects the way resources behave and respond. For example, if a VM is experiencing 80 percent of processor utilization continuously, it suggests that the VM may be running out of processing power, which can lead to degraded performance and slower response time. Similarly, if the cache and controllers of a storage system is consistently over utilized, it may lead to performance degradation.

Monitoring Security
Monitoring a storage infrastructure for security includes tracking unauthorized access, whether accidental or malicious, and unauthorized configuration changes. For example, monitoring tracks and reports the initial zoning configuration performed and all the subsequent changes. Another example of monitoring security is to track login failures and unauthorized access to switches for performing administrative changes. IT organizations typically comply with various information security policies that may be specific to government regulations, organizational rules, or deployed services. Monitoring detects all operations and data movement that deviate from predefined security policies. Monitoring also detects unavailability of information and services to authorized users due to security breach. Further, physical security of a storage infrastructure can also be continuously monitored using badge readers, biometric scans, or video cameras.

2) Storage Alerting

An alert is a system-to-user notification that provides information about events or impending threats or issues. Alerting of events is an integral part of monitoring. Alerting keeps administrators informed about the status of various components and processes. For example, conditions such as failure of power, storage drives, memory, switches, or availability zone, which can impact the availability of services and require immediate administrative attention. Other conditions, such as a file system reaching a capacity threshold, an operation breaching a configuration policy, or a soft media error on storage drives, are considered warning signs and may also require administrative attention.

Monitoring tools enable administrators to define various alerted conditions and assign different severity levels for these conditions based on the impact of the conditions. Whenever a condition with a particular severity level occurs, an alert is sent to the administrator, an orchestrated operation is triggered, or an incident ticket is opened to initiate a corrective action. Alert classifications can range from information alerts to fatal alerts. 
  • Information alerts provide useful information but do not require any intervention by the administrator. The creation of a zone or LUN is an example of an information alert. 
  • Warning alerts require administrative attention so that the alerted condition is contained and does not affect service availability. For example, if an alert indicates that a storage pool is approaching a predefined threshold value, the administrator can decide whether additional storage drives need to be added to the pool. 
  • Fatal alerts require immediate attention because the condition might affect the overall performance or availability. For example, if a service fails, the administrator must ensure that it is returned quickly.

As every IT environment is unique, most monitoring systems require initial set-up and configuration, including defining what types of alerts should be classified as informational, warning, and fatal. Whenever possible, an organization should limit the number of truly critical alerts so that important events are not lost amidst informational messages. Continuous monitoring, with automated alerting, enables administrators to respond to failures quickly and proactively. Alerting provides information that helps administrators prioritize their response to events.

3) Storage Reporting

Like alerting, reporting is also associated with monitoring. Reporting on a storage infrastructure involves keeping track and gathering information from various components and processes that are monitored. The gathered information is compiled to generate reports for trend analysis, capacity planning, chargeback, performance, and security breaches. 
  • Capacity planning reports contain current and historic information about the utilization of storage, file systems, database tablespace, ports, etc. 
  • Configuration and asset management reports include details about device allocation, local or remote replicas, and fabric configuration. This report also lists all the equipment, with details, such as their purchase date, lease status, and maintenance records. 
  • Chargeback reports contain information about the allocation or utilization of storage infrastructure resources by various users or user groups. 
  • Performance reports provide current and historical information about the performance of various storage infrastructure components and services as well as their compliance with agreed service levels. 
  • Security breach reports provide details on the security violations, duration of breach and its impact.
Reports are commonly displayed like a digital dashboard, which provide real time tabular or graphical views of gathered information. Dashboard reporting helps administrators to make instantaneous and informed decisions on resource procurement, plans for modifications in the existing infrastructure, policy enforcement, and improvements in management processes.

Chargeback Report
Chargeback is the ability to measure storage resource consumption per business unit or user group and charge them back accordingly. It aligns the cost of deployed storage services with organization’s business goals such as recovery of cost, making a profit, justifying new capital spending, influencing consumption behaviors by the business units, and making IT more service aware, cost conscious and accountable. 

Also Read: Taking Backup and Archive to Cloud Storage

To perform chargeback, the storage usage data is collected by a billing system that generates chargeback report for each business unit or user group. The billing system is responsible for accurate measurement of the number of units of storage used and reports cost/charge for the consumed units. 

Chargeback reports can be extended to include a pre-established cost of other resources, such as the number of switch ports, HBAs and storage system ports, and service level requested by the users. Chargeback reports enable metering of storage services, providing transparency for both the provider and the consumer of the utilized services.

Previous: Introduction to Storage Infrastructure Management

Go To >> Index Page

13.1 Introduction to Storage Infrastructure Management

Storage Infrastructure management  defines the management and monitoring of the storage
devices in the data centers. The three major areas of management are capacity, performance, and availability. These three areas can be easily summarized as good storage management, which is about making sure that the storage is always available, always has enough space, and is fast in terms of performance. Good storage management requires solid processes, policies, and tools. 


Storage Infrastructure Management Overview

An example of a good process is life-cycle management, including commissioning and decommissioning servers and services. If you just turn off servers and unplug them without reclaiming the storage associated with them, you will end up with a hopelessly underutilized storage with islands of wasted capacity. As organizations are driving their IT infrastructure to support the workload of next generation applications such as bigdata and cloud, the storage infrastructure management is also transformed to meet the application requirements. Management functions are optimized to help an organization to become a social networking, mobility, big data, or cloud service provider. Before going to storage management techniques, let us see what are the storage infrastructure components and how they are managed previously.

Storage Infrastructure Components

The key storage infrastructure components are Servers, storage systems, and storage area networks (SANs). These components could be physical or virtual and are used to provide services to the users. The storage infrastructure management includes all the storage infrastructure-related functions that are necessary for the management of the infrastructure components and services, and for the maintenance of data throughout its lifecycle. These functions help IT organizations to align their storage operations and services to their strategic business goal and service level requirements. They ensure that the storage infrastructure is operated optimally by using as few resources as needed. They also ensure better utilization of existing components, thereby limiting the need for excessive ongoing investment on infrastructure.

Traditional Storage Management
Traditionally, storage infrastructure management is component specific. The management tools only enable monitoring and management of specific components. This may cause management complexity and system interoperability issues in a large environment that includes many multi-vendor components residing in world-wide locations. In addition, traditional management operations such as provisioning LUNs and zoning are mostly manual. The provisioning tasks often take days to weeks to complete, due to rigid resource acquisition process and long approval cycle. Further, the traditional management processes and tools may not support a service oriented infrastructure, especially if the requirement is to provide cloud services. They usually lack the ability to execute management operations in agile manner, respond to adverse events quickly, coordinate the functions of distributed infrastructure components, and meet sustained service levels. This component specific, extremely manual, time consuming, and overly complex management is simply not appropriate for the next generation storage infrastructure.

Service oriented Storage Management for emerging Technologies

The storage management functions for the new and next generation technologies are different in many ways from the traditional management and have a set of distinctive characteristics. 
  • Service-Focused Approach
  • Software-defined infrastructure aware
  • End-to-end visibility
  • Orchestrated Operations
Service-Focused Approach
The storage infrastructure management for the emerging technologies like big data and cloud has a service-based focus. It is linked to the service requirements and service level agreement (SLA). Service requirements cover the services to be created/upgraded, service features, service levels, and infrastructure components that constitute a service. An SLA is a formalized contract document that describes service level targets, service support guarantee, service location, and the responsibilities of the service provider and the user. These parameters of a service determine how the storage infrastructure will be managed. For example, the SLA might consists of
  • Determine optimal amount of storage space needed in a storagepool to meet the capacity requirements of services.
  • Create a disaster recovery plan to meet the recoverytime Objective (RTO) of services.
  • Ensure that the management processes, mangement software and staffing are appropriate to provide services.
  • Return services to the users within agreed time period in the event of a service failure.
  • Validate changes to the storage infrastrucure for creating or modifying a service.
Software-defined infrastructure management
In the cloud environment, more value is given to the software-defined infrastructure management over the traditional physical component-specific management. Management functions are increasingly becoming decoupled from the physical infrastructure and moving to external software controller. As a result of this shift, the infrastructure components are managed through the software controller. The controller usually has a native management tool for configuring components and creating services. Administrators may also use independent management tools for managing the storage infrastructure. Management tools interact with the controller commonly through the application programming interfaces (APIs).

Also Read: Information Security Threats and Security Controls Overview

Management through a software controller has changed the way a traditional storage infrastructure is operated. The software controller automates and abstracts many common, repeatable, and physical component-specific tasks, thereby reducing the operational complexity. This allows the administrators to focus on strategic, value-driven activities such as aligning services with the business goal, improving resource utilization, and ensuring SLA compliance. Further, the software controller helps in centralizing the management operations. For example, an administrator may set configuration settings related to automated storage tiering, thin provisioning, backup, or replication from the management console. Thereafter, these settings are automatically and uniformly applied across all the managed components that may be distributed across wide locations. 

End-to-End visibility
Management for the cloud and next generation technologies provides end-to-end visibility into the storage infrastructure components and deployed services. The end-to-end visibility of the storage infrastructure enables comprehensive and centralized management. The administrators can view the configuration, connectivity, capacity, performance, and interrelationships of all infrastructure components centrally. Further, it helps in consolidating reports of capacity utilization, correlating issues in multiple components, and tracking the movement of data and services across the infrastructure.

Depending on the size of the storage infrastructure and the number of services involved, the administrators may have to monitor information about hundreds or thousands of components located in multiple data centers. In addition, the configuration, connectivity, and interrelationships of components change as the storage infrastructure grows, applications scale, and services are updated. Organizations typically deploy specialized monitoring tools that provide end-to-end visibility of a storage infrastructure on a digital dashboard. In addition, they are capable of reporting relevant information in a rapidly changing and varying workload environment.

Orchestrated Operations
Orchestration refers to the automated arrangement, coordination, and management of various system or component functions in a storage infrastructure. Orchestration, unlike an automated activity, is not associated with a specific infrastructure component. Instead, it may span multiple components, located in different locations depending on the size of a storage infrastructure. In order to sustain in a cloud environment, the storage infrastructure management must rely on orchestration. Management operations should be orchestrated as much as possible to provide business agility. Orchestration reduces the time to configure, update, and integrate a group of infrastructure components that are required to provide and manage a service. By automating the coordination of component functions, it also reduces the risk of manual errors and the administration cost.

Also Read: Software Defined Storage (SDS) Architecture

A purpose-built software, called orchestrator, is commonly used for orchestrating component functions in a storage infrastructure. The orchestrator provides a library of predefined workflows for executing various management operations. Workflow refers to a series of inter-related component functions that are programmatically integrated and sequenced to accomplish a desired outcome. The orchestrator also provides an interface for administrators or architects to define and customize workflows. It triggers an appropriate workflow upon receiving a service provisioning or management request. Thereafter, it interacts with the components as per the workflow to coordinate and sequence the execution of functions by these components.

Storage Infrastructure Management key Functions

Storage infrastructure management performs two key functions
  • Infrastructure discovery
  • Operations management
Infrastructure discovery creates an inventory of infrastructure components and provides information about the components including their configuration, connectivity, functions, performance, capacity, availability, utilization, and physical-to-virtual dependencies. It provides the visibility needed to monitor and manage the infrastructure components. Discovery is performed using a specialized tool that commonly interacts with infrastructure components commonly through the native APIs of these components. Through the interaction, it collects information from the infrastructure components. A discovery tool may be integrated with the software-defined infrastructure controller, bundled with a management software, or an independent software that passes discovered information to a management software. Discovery may also be initiated by an administrator or be triggered by an orchestrator when a change occurs in the storage infrastructure.

Operations management involves on-going management activities to maintain the storage infrastructure and the deployed services. It ensures that the services and service levels are delivered as committed. Operations management involves several management processes. Ideally, operations management should be automated to ensure the operational agility. Management tools are usually capable of automating many management operations. These automated operations are described along with the management processes. Further, the automated operations of management tools can also be logically integrated and sequenced through orchestration. The key functions of Storage Operations Management are

Also Read: How Data is stored and Retrieved from Object Based Storage Systems ?
  • Configuration ManagementConfiguration management is responsible for maintaining information about configuration items (CI). CIs are components such as services, process documents, infrastructure components including hardware and software, people, and SLAs that need to be managed in order to deliver services.
  • Capacity ManagementCapacity management ensures adequate availability of storage infrastructure resources to provide services and meet SLA requirements. It determines the optimal amount of storage required to meet the needs of a service regardless of dynamic resource consumption and seasonal spikes in storage demand. It also maximizes the utilization of available capacity and minimizes spare and stranded capacity without compromising the service levels.
  • Performance ManagementPerformance management ensures the optimal operational efficiency of all infrastructure components so that storage services can meet or exceed the required performance level. Performance-related data such as response time and throughput of components are collected, analyzed, and reported by specialized management tools. The performance analysis provides information on whether a component meets the expected performance levels. These tools also proactively alert administrators about potential performance issues and may prescribe a course of action to improve a situation.
  • Availability ManagementAvailability management is responsible for establishing a proper guideline based on the defined availability levels of services. The guideline includes the procedures and technical features required to meet or exceed both current and future service availability needs at a justifiable cost. Availability management also identifies all availability-related issues in a storage infrastructure and areas where availability must be improved.
  • Incident ManagementAn incident is an unplanned event such as an HBA failure or an application error that may cause an interruption to services or degrade the service quality. Incident management is responsible for detecting and recording all incidents in a storage infrastructure. The incident management support groups investigate the incidents escalated by the incident management tools or service desk. They provide solutions to bring back the services within an agreed timeframe specified in the SLA. If the support groups are unable to determine and correct the root cause of an incident, error-correction activity is transferred to problem management. In this case, the incident management team provides a temporary solution (workaround) to the incident
  • Problem Management A problem is recognised when multiple incidents exhibit one or more common symptoms. Problem management reviews all incidents and their history to detect problems in a storage infrastructure. It identifies the underlying root cause that creates a problem and provides the most appropriate solution and/or preventive remediation for the problem. Incident and problem management, although separate management processes, require automated interaction between them and use integrated incident and problem management tools. These tools may help an administrator to track and mark specific incident(s) as a problem and transfer the matter to problem management for further investigation.
  • Security Management - Security management is responsible for developing information security policies that govern the organization’s approach towards information security management. It establishes the security architecture, processes, mechanisms, tools, user responsibilities, and standards needed to meet the information security policies in a cost-effective manner. It also ensures that the required security processes and mechanisms are properly implemented.. Security management ensures the confidentiality, integrity, and availability of information in a storage infrastructure. It prevents the occurrence of security-related incidents or activities that adversely affect the infrastructure components, management processes, information, and services. It also meets regulatory or compliance requirements (both internal and external) for protecting information at reasonable/acceptable costs.