Protecting the Host Machine From Failure When It Exceeds Certain Limitations

Paper Type: Research paper

Pages: 7

Wordcount: 1825 Words

Date: 2022-09-26

Categories:

Computer Science and IT Computer Engineering

Introduction

In the year 1960, the IBM first developed the idea of virtual machines. According to Ali & Meghanathan (2011), this development aimed to provide concurrent interactive access to a computer's mainframe. Thus, in the most straightforward understanding, a virtual machine is a replica of the underlying physical device. It gives users the illusion of running directly on the physical computer. Thus, in the process, a virtual machine would isolate, share resources and run multiple flavors and configurations of an OS using different sets of software technology (Ali & Meghanathan, 2011 & Zhou, Hu & Li, 2016).

In understanding how it works, a virtual machine mimics the features of a physical server because it is a product of a configuration of numerous processors, RAM, storage resources and connectivity through the network ports (Zhou, Hu & Li, 2016). Typically, it is powered just like a physical server and loaded with an OS and software solutions for it to operate like a physical server (Ali & Meghanathan, 2011 & Zhou, Hu & Li, 2016). However, a virtual machine only notices the resources it has been configured with as opposed to all the resources of the physical host itself. The hypervisor facilitates the translation and I/O i; this moves from the virtual machine to the physical server devices and back to the correctional virtual machine.

Any virtual machine has a display, processors, memory, hard disk, CD/DVD drive, a network adapter, and a USB controller. All these components are virtualized in the sense that the software creates them and stores them in the file. It is also worthy to note that a virtual machine has a configuration file. The data gives the server definitions, the virtual processors, the allocated RAM, the I/O devices that the VM accesses, the number of network interface cards in the virtual server and many more (Zhou, Hu & Li, 2016). Also, the configuration file describes the VM storage access. If the Virtual Machine is powered on, additional data are typically created for logging and memory paging among other functions. Thus, when the VMs are copied (since they are already files), a backup of the data and copies of the entire server, OS, applications, and the hardware configuration are created.

Virtual machines can be classified into two. These include a system virtual machine and a virtual process machine. A system virtual machine is similar to all the components and processes of a real computer while the process virtual machine only runs a single application or process. A system virtual machine allows one to use a single server to run multiple operating systems. On the other hand, a process virtual machine exists only when the application is in use. Usually, a process virtual machine is used with a single program that is incompatible with the underlying operating system.

How then are the virtual machines important? Virtual machines are advantageous because they allow users to experiment with numerous operating systems and applications without installing them on physical hardware. Once the purpose for running the virtual machine is complete, the virtual machine can be deleted. The virtual machine is thus most preferred by many users because of its flexibility because it also helps in creating a test environment for a set of users. Moreover, any activity carried out within the virtual machine is generally sandboxed from the rest of the system. This allows for the testing of different apps and systems regardless of the apps affecting the physical network. Most importantly, virtual machines help run software not supported by the host's operating system. Examples of most common and useful virtual machines include Virtual Box (Windows/Mac/Linux), Parallels (Windows/Mac/Linux), VMware (Windows/Linux, Basic), QEMU (Linux), and Windows Virtual PC.

In running and using virtual machines, failures occur. The failure can be in the physical device or the virtual machine. For the hosts, a failure may occur such that it stops functioning. Secondly, there can be isolation; a state where the host becomes network isolated. Also, there can be a problem known as partition where the host loses network connectivity with the master host. Majorly, there are always what is known as proactive HA failures. This type of failure happens when the host component fails to result in a loss of redundancy. When this happens, the functional behavior of the virtual machines in the host does not get affected. Therefore, the paper focuses on protecting the host machine in extreme cases where the host machine exceeds certain limitations.

Research Questions

The main research question in this paper states: how can we protect the host machine from failure when it exceeds certain limitations? From this research question that the paper aims to answer, two main objectives come up. Thus, this research paper aims to:

Define the constraints or the conditions for failure the failure happening
Outlining procedures for protection of 'HW/SW 'of failure

Literature Review

The concept of virtual machines has increased in the IT industry. The growth is attributed to the many benefits that are tied to VMs. In his study of virtual machines' installation, performance, networks, and advantages, Ali & Meghanathan (2011) acknowledges the benefits. Some of the benefits as highlighted in their study are better resource mobilization and the ability to manage the system efficiently. Such studies have proven that virtual machines can be used for experimentation. Moreover, they are useful in the simultaneous deployment of virtual software especially in the academic institutions for research and teaching.

Reliability and Failures of Different Systems

Hardware Subsystems

With all these benefits comes a growth in the computation density alongside that of hardware components and workloads in units of virtual machines. It is worthy to note that the availability of the services for data centers highly depends on the reliability of the physical and virtual machines. Birke & Giurgiu et al. (2012) analyzed a 10K virtual and physical machines hosting five commercial data centers. The observation period for this analysis lasted for one year. The aim was to establish clear differences and similarities between failures of physical and virtual machines.

For such an establishment to occur, there must be clear monitoring of the patterns involved in the physical and virtual machine. For example, the observation entails looking at the failure rates, time distributions between failures and repair times, and the time and space dependency of failures. What follows is a correlation of faults with the resource capacity and run-time usage to identify the features of failing servers (Birke & Giurgiu et al., 2012).

The above study displays critical facts in the reliability of hardware sub-systems. For example, disks have the highest failure rates as compared to other hardware components in the physical machine. Specifically, the average annual risk failure rate is at 2-4%, a figure higher than that of product data sheets. Also, disk failures increase linearly as they age since they have no significant infant mortality effect. The relationship between disk failure, high temperature, and utilization are minimal. Despite that, the failure probability is high. It is disk failure that majorly causes failures in storage systems.

HPC Systems

There is a positive correlation between failure rates in HPC systems and the number of processors; they type of workload and the intensity of workloads running. In capturing the distribution between failures and repair times, there is the use of Gamma and Weibull distributions. Thus, studies have shown that power-related failures, for example, bring about the high probability of a follow-in failure of any kind in HPC systems.

Commercial Systems

As opposed to the other systems, commercial systems exhibit memoryless failures. This means that the probability of follow-on failures is higher than random failures for devices like laptops and desktops. In a study to establish the failure trends between subsystems in notebooks and desktops, and their dependency with other subsystems like increasing CPU speed, Nightingale et al. found that overlocking the CPU speed increases the failure rates for CPU, memory, and disk.

In commercial servers, the server failure rates increase with the number of disks. Thus, in predicting failures in commercial servers, there is a need to look at factors such as data center location, manufacturer brand name, server age, and configuration. The manufacturer of the server bears a significant impact on the failure of different hardware components. However, it is good to not that hardware-related failures, software-related failures, and environment-related failures account for a significant number of crashes in commercial and HPC systems.

Classification of Host Failures: PMs vs. VMs

Host failures in VMs can be classified as network-related, hardware related, software-related, power-outage related and re-boot related failures. In establishing ways to protect the host from such shortcomings, there is a need to review and understand several facts that exist in the PMs and VMs operations. First, is the failure rate. The failure rate is the number of failures divided by the number of servers. This can extend to random failure and recurrent failure probability. Random failure probability is the number of servers that experience one failure about the total number of servers. Intermittent fault, on the other hand, is the probability that the server may fail during any observation period. From these concepts, many studies have shown that PMs have higher failure rates than VMs.

The concept of inter-failure times also comes in when protecting the host from failures. Determining the inter-failure times is significant and vital in designing fault-tolerant systems. Virtual machines and physical machines have similar distributions though VMs have a slightly higher inter-failure time than PMs. Thus, VMs have a lower failure rate than PMs. Also is the concept of repair times. Studies have shown that repair times of PMs are higher as compared to those of VMs. This is because a significant number of VM failures are caused by unexpected reboots that take a short time to repair. The shortest repair times are seen with power-related shortcomings due to their severity. On the hand, hardware and network related failures require long repair times. Software related failures have an average repair time, which shows that their repair times have lower variation compared to others. How then should the host be protected from these failures when they exceed certain limits?

Protection of The Host Machine from Failure

The discussion presented in the literature review section presents the nature failures that occur in virtual machines as well as physical machines. Some of these failures can be prevented. There are several techniques recommended after empirical studies have been done that are vital in the protection of host machines from failure. Some of these techniques discussed in this paper include the failure prediction technique, the Byzantine fault tolerance algorithm, and scheduling algorithms.

Failure Prediction Technique

Every cloud environment requires reliability. Thus, for the host machine to be protected from failure, there is the need for a fault tolerance framework that performs duties such as event logging, parallel job, job monitoring, resource monitoring, environmental monitoring. This framework would analyze the virtual machine to ascertain its reliability and perform fault-tolerance service. Fault tolerance service also requires a...

Cite this page

Protecting the Host Machine From Failure When It Exceeds Certain Limitations. (2022, Sep 26). Retrieved from https://proessays.net/essays/protecting-the-host-machine-from-failure-when-it-exceeds-certain-limitations

If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal: