IP Storage Networking: Data Coordination

Data Coordination
At the center of the data storage software infrastructure, the data coordination functions perform the task of coordinating data access between applications and storage subsystems. These underlying operations serve other storage management functions, such as data protection, policy management, and resource management. In general, the coordination functions sit closer to the movement and organization of the data.

The data coordination functions are transaction-focused areas of storage management. Acting as the interface between applications and subsystems, the primary operations include file services for NAS and volume management for block-accessed storage. In both cases, storage consolidation allows administrators to make the most of the infrastructure through storage pooling or aggregation to a common application presentation. The basic outline of these functions is shown in Figure 3-7.

Figure 3-7. The data coordination path.

3.4.1 File Services for NAS
File services, or file systems, provide users with the basic functions of

A naming scheme for files and directories that enables organization of data for multiple users and applications.

Authorization for access and modification control that includes a breakdown of creating, reading, and writing operations.

Storage allocation that coordinates with volume management functions to determine where files may reside.

File services provide applications a simpler and easier mechanism to access blocks of disk-based storage. By abstracting the working unit of storage from a disk block fixed in size and location, a file system allows applications to work with files that can be stored across multiple disk blocks and locations through the use of volume management. This facilitates file transactions such as varying the file size, creation and deletion, shared identification and use of file data across multiple users, and access and modification control.

Many early computing applications—particularly those in the mainframe world and those focused on databases—were designed to attach directly to disk storage that was accessed in blocks. Today's computing environments rely heavily on the ability to store and retrieve data in file-based formats. File systems may be used directly by operating systems to maintain information such as what parameters should be used to start the machine, including modules to load and network services to initiate. Once the operating system is running, it may continue to use files to keep information temporarily available but out of memory or to log events that occur during the normal course of operation.

The more common understanding of file systems is for the applications themselves. In these cases, applications use files to hold parameters specific to their requirements, common modules also used by other applications, logs on application-related transaction information, and ultimately the data itself.

The file services function can reside throughout the data storage infrastructure. All host operating systems provide their own file systems, and many provide multiple file systems tailored to different needs of media. In addition, specialized file systems can be added to existing host operating system platforms to provide enhanced functionality. On the server, a network file system like NFS of CIFS maps application-level file system operations into transactions over the network with NAS filers. In turn, NAS filers use a variety of internal file systems to hold the file objects that they service. As a specialized server for file-based storage access, NAS provides scalability benefits in multiuser, dynamic-growth storage environments. New systems take file services one step further by providing them within storage networking fabrics. Often referred to a NAS head, by providing file access in the front and block access (volume management) in the back, these devices provide an additional level of fabric flexibility.

3.4.2 Volume Management
Volume management provides operating systems a means to aggregate disks and LUNs into logical storage units called volumes. Volumes can be directly accessed by operating systems, which may be the case with database applications, or through file systems. In either case, each volume can be made up of a set of disks and LUNs that are connected to the appropriate host, directly or through a storage network.

With a software management layer for volumes between the host and storage subsystem, specific features can be offered to the administrator. The volume management layer allows for the creation of dynamic disks, which then enables the logical manipulation of a group of dynamic disks. This disk group can be used to support one large logical volume, or software-based RAID may be implemented.

Additional features of volume management include modification of dynamic disks within the group, access and control features of hosts to volumes, and the ability to redirect data requests across volumes for failover.

Volume management provides users with the basic functions of

A naming scheme for volumes made up of LUNs that enables organization of data for assignment to multiple hosts and applications.

Storage allocation that coordinates with file system or database management functions to determine where files or databases may ultimately reside.

3.4.3 Storage Aggregation
Using file services and volume management, the overriding benefit of these software tools is that they provide an abstraction layer between applications and disks. This provides consolidation capabilities for aggregating file or volume presentation. In a homogeneous environment, file services can bind multiple volumes to a file system, and volume management binds disks to a volume. Another benefit is the ability to provide file or record-locking, or both, across multiple servers.

The basics of virtualization rely upon the storage aggregation capabilities of file services and volume management. For definition purposes, we focus on virtualization primarily in heterogeneous environments, including a variety of device types as well as vendors, covered in Section 3.8, "Virtualization."

Storage aggregation delivers configuration flexibility to storage administrators, providing the ability to layer more sophisticated storage management, such as policy management, within the configuration.
Storage Policy Management
An effective storage management operation needs storage policies to control and enforce the use of resources. While SRM provides a view of storage devices and the ability to change specific device features, policy management takes a wider approach across the infrastructure to define, implement, and enforce a set of rules between users, applications, and storage.

This section focuses on the basic elements of storage policy management within the technical storage networking context. Chapter 11, "Managing the Storage Domain," takes a more global view of the corporate oversight of storage resources from an organizational perspective.

First and foremost, storage policies help administrators overcome the unending growth of storage resources within the organization. From the enterprise to departmental to personal storage, such as workstations and laptops, administrators need automated mechanisms to set usage rules and provide service levels required throughout the organization.

Basic storage policies (see Figure 3-8) can be broken into the following groups:

Security and authentication policies ensure the proper access and manipulation of the data.

Capacity and quota management policies ensure measured allocation across multiple users.

Quality of service policies guarantee service levels for both the storage itself and the access to that storage.

Figure 3-8. Storage policies.

3.5.1 Storage Policy Cycle
Storage policies have a continuous cycle with stages, outlined in Figure 3-9. The exact definition of stages may vary slightly among organizations, but the important common characteristic is the ongoing process improvement built into the cycle.

Figure 3-9. Storage policy cycle.

Assessment involves observing usage and performance patterns to determine areas for improvement. One example might be an application whose data set continues to grow exponentially, reducing the amount of available storage for other applications sharing the same storage. The definition stage requires identification of the anomaly's root cause and a method to solve it. Implementation comes through setting a rule or policy within the software infrastructure. Depending on the cause, effect, and desired results, setting a policy may occur in more than one component of storage software. Enforcement takes place when the desired actions are achieved through the rules. Careful monitoring provides data for evaluation of the set policy and the ability to begin the cycle again with any required modifications.

In many ways, an active storage infrastructure with a broad mix of users, applications, and resources is like an evolving ecosystem. Cause-and-effect relationships rarely remain isolated, and administrators must track implemented policies to watch for unintended effects. Appropriate monitoring through the use of SRM tools, will provide adequate notification for administrators to adjust accordingly.

3.5.2 Capacity, Content, and Quota Management
Every organization aims to maximize the use of resources, and SRM along with storage policies help accomplish that goal. Examples of policies in action include capacity, content, and quota management.

Capacity management involves optimization of storage space on a given device, such as a RAID array, or within a volume. In an ideal world, all devices would operate at or close to 100 percent capacity. However, the reality of our dynamic computing environments dictates otherwise. Unpredictable spikes in storage demand coupled with the need to have extra real-time capacity available means that 100 percent utilization can actually be counterproductive.

The optimal utilization percentage depends on device and storage cost, application needs, and costs of adding more physical capacity. For example, for a high-end RAID array with relatively expensive storage, the optimal utilization might be around 90 percent with 10 percent reserved for unanticipated peak demand. Less expensive storage might be set with utilization peaks of 60 percent, leaving administrators plenty of leeway for surges in capacity needs before new storage devices become necessary.

On the content side, visibility to the types of data and files can help dramatically reduce storage needs. By examining and categorizing content sources, power consumers in the form of applications or individuals can be identified and contained. For example, the rapid proliferation of MP3 music files might indicate improper use of corporate storage resources. Excessive numbers of large files (image, CAD design) with similar names and underlying data structures could be the result of saving inordinate copies, which might be satisfied with a less expensive storage mechanism, such as tape archiving.

Quota management, after identifying storage content by source and data type, allows administrators to pinpoint and curtail unnecessary consumption. This may apply to corporate users from engineering to administration, or to key applications. Of course, quota management requires a set of rules and guidelines that fit with organizational priorities in concert with IT priorities. A cap on engineering storage capacity that impacts development schedules might require adding to new storage purchase budgets. On the other hand, capping resources for email storage might compel individuals to be more conscious about their consumption without impacting productivity.

3.5.3 Security and Authentication
Security for storage (also see Chapter 9, Section 9.5, "Security for Storage Networking") begins by identifying threats and then implementing solutions to address them. Some of these threats and solutions are outlined in Table 3-1.

Table 3-1. Storage Security Threats and Solutions Potential Threat
Solution

Is the entity accessing the storage whom he or she claims to be?
Authentication

Does the entity accessing the storage have the right clearance?
Authorization

Is the data being read or shared unintentionally?
Privacy

Is the data subject to unintentional or malicious modification?
Integrity

Enforcement of these storage security solutions occur throughout the storage management chain, depending on the software in place. While the implementation of these solutions can vary, the objectives remain the same. For example, authentication can be set within the file system, identifying access at the user level, or this function can take place within lower layers of the infrastructure, such as the network fabric. Storage security therefore requires a multifaceted approach across the entire data management chain, from user access, to device access, to device management.

The most basic form of storage security is the physical isolation of the storage environment. Fibre Channel-based storage networks provide some physical security because the technology is separate from more common corporate IP networking. In the early days of Fibre Channel, storage administrators viewed this physical isolation as a feature—the rest of their networking colleagues were unable to understand and meddle with their setups. More common availability of Fibre Channel equipment no longer guarantees such independence.

LUN masking determines which disk drives (or logical unit numbers) are seen by servers. LUN masking can be enforced by HBAs, switches, or storage array controllers. Modification of LUN masking requires password-protected access to the configuration utility. Zoning by port or WWN via SAN switches is one form of LUN masking.

The use of IP networking technologies for storage also delivers a set of well-defined security mechanisms. Virtual LANs (VLANs) and access control lists (ACLs), common in mainstream IP networking, can be used to segment and isolate traffic between end points or network areas of an IP storage network. Existing IPSec functions, such as authentication and data encryption, which are defined in IETF specifications, can be used with IP storage networking. For example, virtual private network equipment can be used between two IP storage switches or gateways to deliver encrypted IP storage traffic between two SANs.

3.5.4 Storage Encryption
While encryption has long been used as a security mechanism for IP networks, that capability has been largely absent from conventional SANs, partly based on the historical lack of support for Fibre Channel encryption. Recent product introductions now make possible Fibre Channel encryption along with the associated management tools required to administer encryption keys.

While security tools such as zoning, ACLs, and authentication address storage access security, they do not address the security of the data payload. In order for the payload to be protected, it must be encrypted with a key that locks (and unlocks) the original data structures. Encryption keys for 3DES (one of the most secure forms of encryption) are made up of 168 bits.

Encryption can be implemented in software and located within Fibre Channel equipment or specialized encryption hardware. Current implementations typically use hardware-based encryption to guarantee high throughput rates. A number of configurations can be deployed to guarantee data security at the payload level. Even if an unauthorized user were to access the data, it would be useless without the key. The basic configurations are shown in Figure 3-10. In addition to these solutions, users can encrypt data at the file-system level before entrusting it to the storage subsystem.

Figure 3-10. Types of storage encryption. (Source: NeoScale Systems)

In a fabric attached deployment, storage encryption takes place within the storage network, allowing multiple storage devices to be protected. All storage that resides behind the encryption device requires key authentication for data access. Simpler configurations such as subsystem attached encryption allow data to be moved to third-party service providers or offsite vaulting with security guarantees. Data that ended up in the wrong hands while being moved offsite would still be protected. A gateway or tunnel implementation allows data to be encrypted at the payload level while traversing MANs or WANs. Application-attached encryption restricts the protection to those applications requiring the highest level of security.

Of course, in any implementation, key management determines the accessibility and recoverability of encrypted data. IT professionals considering payload-level encryption will need to keep keys in multiple secure locations, including legal offices and other venues that guarantee protected access. Loss of a key results in permanent data loss.

3.5.5 Quality of Service for Storage and Storage Area Networks
Quality of service applies to end-storage devices as well as the storage transport, and both must be considered for end-to-end service guarantees. For end-storage devices, quality of service refers to the availability of the storage media—for example, the RAID level. Mission-critical applications require storage devices that have RAID levels set for maximum availability combined with remote mirroring for business continuity in the event of a disaster. Less critical storage may operate sufficiently with RAID levels that provide less availability in favor of increased performance and use tape archiving as a backup mechanism. This provides cost savings with more usable storage per RAID device, forgoing the remote mirror for recovery by using tape.

Storage professionals should assign storage quality of service requirements to individual applications. These service levels can be implemented through storage management software to create an enterprisewide policy. The exercise alone of matching applications to storage service levels will create a useful framework for balancing availability and cost factors. Carried through to the purchasing decisions of new storage devices, this type of framework clearly demarcates the need for high-end, midrange, or entry-level storage devices.

Quality of service also applies to the storage transport, or the storage network. In this case, the interconnect between storage end systems must be adequately provisioned for mission-critical applications. This can apply to allocated bandwidth, multipath availability, or even balancing the long-distance transport across multiple carriers.

Figure 3-11 shows an example of bandwidth prioritization through the use of VLANs and traffic prioritization. Here, the backup of an online transaction processing (OLTP) database needs priority over a less critical backup of corporate files. No matter when the database backup begins, it will be granted the appropriate bandwidth to complete its backup operation within a designated time window.

Figure 3-11. Storage transport quality of service.

3.5.6 Storage Paths
Storage network administrators may also choose to guarantee dual, redundant paths for specific application to storage connections. This practice is often referred to as path management. New software packages with this feature use storage network topology information to create a map of available connections between servers, fabric switches, and storage devices. Criteria such as dual, redundant paths or 2Gb/s Fibre Channel links can be specified in order to meet availability and performance needs.

Attention to storage paths ensures that applications not only have the appropriate data capacity, but also the appropriate service-level transport for data access.

3.5.7 Departmental Segmentation and Accounting
Storage policy management translates to several organizational policies to help assign storage costs. For example, departments can be easily segmented to provide accounting charges based on storage use. Storage administrators can assign storage in terms of both capacity and quality of service, offering options to individual departments and their specific requirements.

Software automation carries this one step further by allowing dynamic changes with minimal manual reconfiguration. For example, capacity can be automatically added on the fly for specific users. Billing systems can then track allocation and keep department heads aware of their capacity usage and expected internal expenses. This focuses attention on storage capacity requirements of the departments through clear, bottom-line financial metrics, eliminating gray areas of infrastructure requirements and allocation.
Data Protection
Data-protection software, from coordinating tape-based backup to real-time disk replication, defends corporate data assets. Through a range of mechanisms that deliver onsite and offsite data copies at various time intervals and across multiple media formats, corporations mitigate the chance of data loss while striving to provide continuous availability. For most companies, the costs of data loss far outweigh the costs of data protection. As such, they invest millions of dollars per year on data protection techniques such as those described in this section.

The primary decisions for data protection compare cost to availability. In this case, availability refers to both the time taken for the backup operation and the time needed to restore the data set in the event of a disaster. Figure 3-12 outlines the position of several data-protection schemes based on their cost and availability measures.

Figure 3-12. Cost-availability trade-off.

For most organizations, the redundancy built into storage architectures spills into the data-protection implementations as well. For example, many companies implement more than one data-protection technique, such as combining tape-based offsite backup with onsite snapshot copies to disk. This gives them near instant recovery onsite as well as the insurance of an offsite recovery if needed.

3.6.1 Tape-Based Backup
Tape archiving is often relegated to the low end of the totem pole of backup technologies. While falling disk prices have made disk-based backup solutions more attractive to a larger segment, tape still fills a valuable need. Density alone makes tape an attractive choice for large backup operations—tape library capacities measure in hundreds of terabytes compared to disk array units in tens of terabytes. Additionally, the flexibility of removable tape media cartridges for both onsite and offsite storage adds another level of protection. Today's IP storage networking options may soon impact transport of tape cartridges. Why bother with moving them between buildings or sites if the data can be easily sent over an IP network? Even with that capability, it may still be more effective to have a tape library for archiving due to device and media capacity.

Storage networks enable sharing of tape libraries among multiple servers, including NAS filers, as shown in Figure 3-13. This type of configuration also requires software to manage access to the tape library, preventing multiple servers from simultaneous access. Backup packages like Veritas NetBackup, Legato Networker, CA Brightstor, and IBM Tivoli for Storage provide the features for managing tape libraries in shared environments, including schedule modules to perform backups at the appropriate intervals, such as daily, weekly, or monthly.

Figure 3-13. Consolidated tape backup across servers and NAS.

In SAN configurations with shared tape libraries, the backup software resides on a backup application server that manages the control information with the individual servers. With clearance established, the backup server grants access to the appropriate server and allows a direct link to the tape library. This process, often referred to as LAN-free backup, facilitates faster backup operations than do more traditional data paths for shared tape libraries through the LAN.

Tape libraries also serve as a useful complement to NAS. Even though NAS delivers file-based storage through its front end, the back end of a NAS system often has a block-based Fibre Channel connection that can be networked to a tape library for block-based backup of the NAS unit. This type of solution provides economical archiving of NAS data and may also operate simultaneously with NAS replication across multiple filers.

Server-free backup (Figure 3-14) is another technique used in conjunction with tape backup, although it can also be used with disk backup. In server-free implementations the data path is completely offloaded to the storage devices, freeing servers to focus on application processing. Server-free backup solutions take advantage of the same LAN-free designs implemented with SANs but also use data-mover agents that reside in the storage fabric to facilitate disk-to-tape or disk-to-disk data paths.

Figure 3-14. Server-free backup through third-party copy.

The data-mover agents employ a third-party copy command called extended copy. This can run off of a fabric-based storage device such as a router or appliance. The backup application server initiates a command to the data mover, and the data mover then assumes control of the copy operation until complete. The ability to conduct a high-speed, image-level backup free of server intervention builds a scalable backup infrastructure. Key benefits include more backup operations within a fixed window, simpler administration, reduced server load, and faster restores.

3.6.2 Disk Backup and Recovery
Disk-based backup ranks high on the simplicity scale but equally high on the required capacity scale. Specifically, a single disk-based backup will double the required disk storage capacity. This may be on top of disk capacity required for RAID features. In the extreme, an organization may find itself with only 25 percent usable capacity of total terabytes owned. This can be a significant cost when compared to tape archiving.

Aside from capacity considerations, disk solutions eclipse tape for achieving short backup and restore windows. For applications requiring guaranteed availability, disk-based solutions are the only mechanism for secure, rapid recovery.

Because of these cost, availability, and performance dynamics, permutations of disk copy, such as point-in-time and snapshot copy as well as replication and mirroring, come into play.

Many disk-backup solutions leverage internal RAID functions built into subsystems. This allows for multilayered redundancy to protect against disk failure, mirror failure, unit failure, and in the case of remote facilities, infrastructure failure.

3.6.3 Point-in-Time and Snapshot Copy
Point-in-time and snapshot copy solutions address the capacity requirement concerns of disk backup while providing short restore times. Point-in-time copies, as the name implies, isolate time-stamped copies of data at regular intervals and provide the ability to revert to those images in the event of application or data failure.

The first point-in-time copy includes the entire data set, doubling the storage capacity requirement. Thereafter, only changes to the data set are copied at regular intervals. Administrators can choose to initiate and maintain multiple point-in-time copies so recovery can roll back to the appropriate data set. These intervals can measure from minutes to hours to days.

With an updated copy available at all times, the recovery process with a point-in-time copy occurs almost immediately. Instead of attempting to restore the database, users are simply redirected to the last stable copy.

Snapshot copies operate in a similar fashion to point-in-time, but with the slight difference that the snapshot volume merely tracks incremental changes with pointers back to the original data set. Like point-in-time, snapshots can be created at regular intervals to provide rollback capabilities in the event of failure. However, snapshots do not replicate the original data set. Therefore, they don't require the upfront capacity increase, although additional capacity is required to maintain the snapshot information. This will vary based on the amount of data changed at each interval and the interval or snapshot frequency. The additional capacity required for a snapshot copy correlates to the amount of original content that must be saved before incremental changes are made.

Like point-in-time, snapshot copies provide quick recovery in the event of disaster, since no restore is required. But snapshot copies alone cannot protect against all disasters, since the copies still point to the parts of the original data set, and a physical storage failure could invalidate several logical snapshot copies. For a complete disk-based backup solution, snapshot would require the addition of replication or mirroring to protect against physical storage failure or site disaster.

3.6.4 Hierarchical Storage Management (HSM)
Serving a data-protection function and also a space optimization function, hierarchical storage management (HSM) helps storage administrators balance the cost per megabyte with the retrieval time of varying storage media. The conventional business driver behind HSM was the relative high cost of disk compared to the relative low cost of tape, coupled with varying types of data and service levels within the organization (see Figure 3-15).

Figure 3-15. Comparisons of media technologies: disk, optical, tape.

For example, a mission-critical e-commerce database must be instantly accessible at all times and must have mirrored or replicated copies available in the event of disaster and disk backup for fast restore. On the other end of the spectrum, large CAD files for products that have been discontinued probably don't merit the same availability, yet designers are reluctant to discard the data—they want it available if needed. This second example fits perfectly with HSM principles: to take less critical data that doesn't require immediate access times and move it to more economical storage, such as tape. In the event that the design team resurrects a particular product, the CAD files could be transferred back to disk during the production cycle to facilitate faster access times.

With disk costs dropping precipitously, HSM can be applied to disk-only environments as well. For example, mission-critical information might reside on high-end Fibre Channel disk drives, while less critical information could be stored on much less expensive serial Advanced Technology Attachment (ATA) drives.

HSM software manages this entire process, allowing administrators to set guidelines and storage categories that determine when and where files and volumes are stored. Automated HSM policies identify files that haven't been accessed recently, maintain a pointer at the original location, and then move the actual data to less expensive media. This migration process happens transparently to users, requiring only a quick restore when the file is accessed, if ever.

The concepts of HSM have been a part of enterprise storage strategies for many years. Today, with new SAN architectures and the explosive growth of data, applying HSM principles through specific HSM packages or other storage management software can lead to significant cost savings in storage capacity purchases.

IP Storage Networking

Monday, December 10, 2007

Data Coordination

1 comment:

Blog Archive

About Me