Handling SDK failure

At Candu, we take reliability very seriously. While we strive to have the highest availability, we recognise that our systems sometimes will fail.

This document is discussed to help you understand all the possible ways in which Candu can fail, and how to handle them so you can provide your users the best experience possible.

We'll be listing all our potential points of failures from the least likely to the most likely. An overview about Candu's client architecture can be found here.

This document is only concerned with discussing SDK potential failures. For a more generic guide, please refer to our Performance & Reliability guide.

S3 Failure

Our assets are stored on S3, in EU West 1. S3 has an extremely high avaliability, up to 99.99%. The only build in safeguard that Candu offers is is caching.

If customers want to safeguard for extreme failure, please refer to this guide.

CDN failure

At Candu, we use S3 and Cloudflare to distribute our assets. You can read more about our CDN and caching strategy here. Cloudflare offers 100% Business SLA on their services.

API Failure

The most likely types of failures are due to API. For SDK APIs we rely on high availability, managed databases such as DynamoDB and MemoryDB to reduce the risk of downtime as much as possible, and to provide extremely fast access times.

In case of failure though, our SDK will be handling gracefully by simply rendering content as if there was no state.

For example, if you have multiple content, but we fail to elaborate which segment you belong to, we assume that you belong to the Everyone segment.

Similarly, forms and other stateful components will appear without state.

Handling extreme failure

At Candu, we want to make sure we can provide the highest availability to our customers. As such, we have build in ways to make sure that content can be rendered even in case S3 or Cloudflare were to fail, down below.

The implications of a CDN or API failure is that content would not be rendering. Our SDK would be handling such failure gracefully, without throwing errors.

SDK failure or extreme lag can be detected by using the DOM API to check if content has been rendered within the desired timeframe.

If not, then it is possible to reinitialise Candu's SDK in a stateless option, which doesn't rely on any external API or config, but will simply render any document that will be passed in. At that point, the customer can load a document from a backup source to provide a fallback experience.