The increasing popularity of Cloud computing as an attractive alternative to classic information processing systems has increased the importance of its correct and continuous operation even in the presence of faulty components. Fault tolerance is a major concern to guarantee availability and reliability of critical services as well as application execution. In order to minimize failure impact on the system and application execution, failures should be anticipated and proactively handled. Fault tolerance techniques are used to predict these failure and take an appropriate action before failure actually occur. In this paper, we introduce an innovative, system-level, modular perspective on creating and managing fault tolerance in Clouds. We propose a high-level approach at SaaS layer for hiding the implementation details of the fault tolerance techniques to application developers and users. In particular, the service layer hides the user from fault tolerance mechanism, and does not require knowledge about the fault tolerance technique applied and that are available in the Cloud and their implementations.


The fault tolerant technique applied shall use the heartbeat algorithm(s) and gossip algorithm to detect whether the application is working smoothly or not. In case the application is detected to be down then the proposed work deploys an application recovery mechanism applied at the SaaS layer, which will try to start, recover from failure or will reinstall the application so that users can use the same smoothly with minimum downtime.