Most web apps set up a healthcheck, it's usually an endpoint that tells you that your databases are up, and websites are up.
I've even got a repo ready to go that I install on servers and enable quickly, based on another middleware healthcheck but with a few changes to make it easier to use, and with endpoints already in place for mongodb, redis and elasticsearch if needed.
I have a version of this healthcheck running on this blog to also include a check for the job feed, and for the postgreSQL server.
These work great, if one thing goes down then your monitor will alert you.
But what about all those third party APIs your app uses?
Or if you use a microservice architecture, how do you ensure everything continues working as expected?
This is where heartbeats come into play, and are a slightly different take on healthchecks.
For example, let's look at an e-commerce site that sells spoons:
- User picks an item to purchase
- Item goes to cart
- User logs in (or signs up) and pays for item
- System registers purchase, and signals warehouse and shipping company about shipment
- Shipment gets picked up
- Users gets his new spoon.
So there are several APIs involved here, we have the payment processor (maybe Stripe), we have the shipping processor, we also have internal APIs such as user management, inventory control, order management and notifications.
In this scenario, let's say the shipping processor never gets notified? or the inventory control doesn't adapt the numbers and we end up showing more in stock than there actually is?
Nobody likes ordering a sold out item thinking they'll have it in a week, and actually waiting a month or two.
Or worse, ordering an item and the shipping system never gets the pick up request.
So we set up a heartbeat. Which is a simulated order that is run every so many hours.
In this case, we can create a special item that never gets seen anywhere else, and a user account that is called
Then you would have your
heartbeat user walk through the order system using automated scripts and if any errors occur, for example an email never arrives in the
heartbeat box, or the order page never shows sucessful order.
You could add some extra checks to make sure the db updates and check responses back from the various end points, heartbeats are allowed to be slower than the actual order process as they are checking everything involved, but the end result...
You get a snapshot of your system every so many hours and can get a heads up in cases of any issues.
Also, most important, perform clean ups once done, this is also why you should use a dedicated user (or even users).
This wasn't a code heavy post, I shared a repo I use on several projects for healthchecks and mostly explained ways to do a heartbeat.
It's hard to code a demo heartbeat as the code is so different based on projects.
So all I can recommend in building a heartbeat is monitoring every step you can. It's the heartbeat of your app and an unhealthy heartbeat is very dangerous.