As part of the Triton project the client wanted a meta search engine for the real estate home search eco-system. The meta search engine was to be built in a location agnostic manner.
DURATION:
2021 - Continuous
TECHNOLOGIES USED:
Java, Spring Boot, Linux, Microservices, Apache Nutch, Elasticsearch, Web Crawling.
TITLE
As part of the Triton project the client wanted a meta search engine for the real estate home search eco-system. The meta search engine was to be built in a location agnostic manner.
PROBLEM STATEMENT
The biggest problem statement was to be able to build a distributed meta search engine for the real estate home search eco-system, allowing the client to be able to create a portal where property seekers could then come and search for properties which were originally posted across a variety of different property portals.
Another problem statement was the ability to extract structured information from semi-structured or unstructured pages available on the internet, and also to be able to keep up with the changing structures of the different portals which were being supported.
This meant that the platform would have to be
Also the client wanted to build a user interface on top of the normalized and indexed information.
OUR SOLUTION
We chose the following tech stack
1
Java
2
Spring Boot
3
Microservices
4
Elasticsearch - for the search index.
5
Apache Nutch for distributed crawling.
6
MySQL as the RDBMS
7
React for the frontend portal.
We were able to set up a distributed crawler, which would pre-fetch all the properties from 15+ portals, we would then setup extraction pipelines based on this data which would then extract structured data from each of the fetched properties.
This structured data was then finally indexed into Elasticsearch to allow for super fast retrieval.
We built APIs using Spring Boot on top of the Elasticsearch index so created allowing a React user interface to be created on top of the database.
To facilitate ease of maintenance each crawler was further divided into the following sub-modules.
Using the above structure we were able to make a highly modular scalable platform for the real estate meta search engine.
DURATION
The initial build of this project was done in 6 months. Post which this project was handled under our "Managed Services" business unit, hence the arrangement is on a continuous basis.
RESULT
We were able to set up a distributed crawler in place that targeted 15+ real estate portal across 3 geographies. Were able to fetch property information, extra, normalize & index data in a unified format.
HOW LOGICLOOP TECH BECOMES
YOUR UNDUE ADVANTAGE