Hurry! Book your room fast on Goibibo

Ashwin Sinha
Backstage
Published in
3 min readAug 17, 2018

--

How Goibibo manages its Hotel Inventory Persuasions.

At Goibibo, we strive hard to give exciting experience to our customers and enable data-driven business. To achieve this we have a dedicated Data team, which consistently build highly reliable data products which in turn add value to business. One such product is “Inventory Cache”.

Introduction

With lakhs of hotels on-boarded on Goibibo and tens of room-types within each hotel, one can’t hit database to fetch inventory details every time. It creates a need to maintain a separate single cache for hotel-rooms which is extremely reliable and scalable. There are multiple use-cases for this cache and one such is Inventory Persuasion, where if a hotel room inventory is left with lesser than a defined threshold then show this inventory count to user, creating an urgency to book his/her room in that hotel.

Inventory persuasions using Inventory Cache data to show fast filling rooms

Under the hood

It’s Kafka Streams! Yes, the main streaming pipeline behind the show is Kafka Streams. Every change in inventory is captured in Kafka through CDC.

Architecture of Inventory Cache pipeline at Goibibo

Above flow diagram represents the high level architecture of Inventory Cache.

Change logs in hotels database are captured by Debezium and sent to Kafka topics. It consists of 3 streams — Hotel Details, Room Details and Inventory Details. Hotels and Rooms streams are persisted into GlobalKTable using KafkaStreams. GlobalKTable is a log compacted topic in Kafka, whose data is stored locally by streams application in rocksDB.

Now we join inventory details stream with hotels and rooms GlobalKTables to formulate the complete details and data is persisted in Redis in HashSet format with HotelID and CityID keysets. Also this data flows back to another Kafka topic, in case some other service want to use this.

Real Problem - Consistency of the Cache??

Above is the half part of the story. The biggest challenge here is to maintain consistency of the cache. If a single room inventory is not reflected correctly to customer then it can have severe impact on business and our reputation. There can be multiple reasons for such nuances like Debezium pipeline failure, Kafka topic deletion, etc.

Lambda Architecture and CI Testing for Inventory Cache

To handle this critical piece we have introduced Continuous Integration Testing and Lambda Architecture.

Continuous Integration(CI) Testing:

In this two sources of truth are hit hourly for few random records —

* RedisCache

* HotelsAPI

If there is a significant mismatch in data, then metrics are sent to NewRelic where policies are set to alert maintainers.

Next step of this testing is a correction measure-:

Lambda Architecture:

In CI Testing, if mismatch happens then it is obvious that data is not consistent in cache. As a correctness measure we incorporated Lambda Architecture, where a batch pipeline runs beside realtime pipeline for past hour and feeds data from database to separate Kafka topic for streams to process and correct cache.

Once we have these things in place then we are certain about the consistency of our Inventory Cache. This architecture has helped us resolve the anomalies without any manual intervention and leading to win-win situation for both product and developers.

Impact

We have seen that when a hotel is sold out and suddenly a cancellation happens and we show “Almost gone! last 1 room left” persuasion message to customer for that inventory, then average time for it to go back to sold out is 20 minutes. This means that average selling time for last room left is ~20min. This data transparency helps customers to take fast decisions and hence serves business better.

We are solving plenty of interesting problems and would like to work with best talent out there. Join us, we’re hiring. Email at jointhecrew@go-mmt.com

--

--