STACC Recommender System through the lens of architecture
Activities 2016-2017STACC built the first recommender system back in 2016. It was an e-mail recommender that was built to help create campaign e-mails by generating personalized recommendations from a selection of campaign products. You can read more about STACC recommender systems from here. Recommender system The central interaction point with the system is the dashboard. In order to access the dashboard, the user has to be in the client’s network, either directly. From the dashboard, the user is able to create a new campaign, choose its desired machine learning model, specify its products, its target users, and custom business rules. After that, the model training process can be started and personalized recommendations for every user will be generated. After the campaign, it is possible to generate a report to see the KPI metrics of the campaign to get an insight into how good the recommendations were. Design The system is developed in Python and is basically one monolithic Django application. The application communicates directly with the client's live database but doesn’t have write permission there. The application’s own database was stored using two methods: a local filesystem and a SQLite database. The system was covered with extensive logging. Deployment The client had a specific requirement regarding the solution being installed on-premise in their infrastructure. The deployment was a bit difficult because STACC could only access the system via Remote Desktop. Also, STACC didn’t have enough know-how regarding CI/CD and automation at that point. Figure 1. The architecture of STACC’s first recommender system built-in 2016.
Activities 2017-2018After acquiring a lot of know-how on building a working recommender system, taking into account all the mistakes STACC made and potential improvement opportunities, STACC decided to try out developing a recommender system for online provides recommendations in real-time in their Magento web stores. Magento extension & API The first challenge that STACC had to solve was regarding data synchronization. It is a common truth that one cannot build a well-performing recommender system without data - the more the merrier. In order to receive events in real-time, STACC had to build two new components: an API on our side that would receive and insert the data into the database and a Magento extension that is installed on the client’s side for sending the live data (in particular, view, add to cart and purchase events). Elasticsearch (& business rules) For providing real-time recommendations on clients’ web stores STACC decided to use a distributed, open-source search and analytics engine called Elasticsearch. STACC inserted all the clients’ items into the Elasticsearch and then ran different queries (including business rules) to find the specified user's best items. Although, after some time we realized that adding, using, and maintaining business rules in the context of Elasticsearch is a fairly painful process. Model explorer An internal tool called model explorer was also developed. The model explorer allows you to visualize the recommendations generated for every user. In simple words, it is a tool that lets oneself validate how the recommendations for a specific user and a specific item would look like in the client’s web store. Monitoring system As STACC served multiple clients at once we needed to keep track of them somehow. The client dashboard gave nice insights into our system's performance, but we also needed an overview of the recommender systems themselves - errors, versions, hardware load, etc. In order to tackle the aforementioned issue, we built an in-house monitoring system that gave a nice overview regarding our system. Turnkey recommender system While acquiring more clients, each having their own custom needs, STACC found out that maintaining them all separately is very expensive and therefore decided to develop our product - a turnkey recommender. The key concept of the turnkey recommender was that everyone would be able to register on our Dashboard, configure the client-specific details (for example, business rules), download and install the extension and everything else would be automated - data synchronization, model training (including business rules) and metrics/reports calculation. Regarding the GDPR, the clients could also download and delete all the data related to them. External services Integration with external data services became necessary. Some of the data services that we integrated with were, for example, weather forecast data, currency conversion rates, and the product images from our clients. Using weather data we could improve our models, the currency and conversion data is needed as our clients have different currencies available on their web stores and images are used to visualize recommendations in model explorer but are also included in some of the models. Recommender core As STACC moved to a microservice architecture, many pieces of code could be re-used, such as data handlers, models, etc. To distribute and maintain these parts of the code, we generated a package and called it Recommender Core. The package was uploaded to our internal infra and could be easily installed in every microservice and, therefore, part of it. Airflow As the project grew bigger and bigger an internal orchestration service had to be taken into use. At first, STACC tried to implement our own queue system, but soon enough it became clear that it makes more sense to use an orchestrator that has already been tested and validated by thousands of users. Therefore, we went with Apache Airflow and have never looked back since. Airflow is also convenient because it can be easily integrated with Slack, enabling to notify every incident directly to a specific channel. Figure 2. The architecture of STACC’s turnkey recommender in 2018.
Activities 2019-2020So far STACC had been relying on clients’ infrastructure and also a lot on our own on-premise infrastructure. At some point, it became evident that it wasn’t feasible anymore because we needed more hardware, had occasional power or internet shortages, etc. Therefore it was time to make another big step forward - move to the cloud. Ansible/Terraform During migrating to the cloud, we discovered that setting up all the infrastructure for different clients (and projects) takes a lot of time and effort. We looked into infrastructure as code and started automating the infrastructure setup and our code deployments to alleviate the problem. For automation, STACC started off with Ansible, but after some time, we found out that (at least with AWS) it contains several unfixed bugs and has some performance issues. Therefore, we switched to Terraform, which seems to be a nice alternative. After a commit to a specific branch (feature, develop, or master), a corresponding deployment pipeline is triggered inside the Bitbucket pipelines that run tests (unit, integration, Sonarcloud, etc.), whole infrastructure, and deploys our services.
How could the recommender system benefit your business?A recommendation system's main purpose is to increase sales by helping people find what they need and offer a personalized experience, from the visitor’s first click on the website to making a purchase and becoming a loyal client. You can read about the STACC profitability calculator here! If you are interested in the STACC recommendation system, contact us here! However, if you are looking for technical challenges and want to take part in our exciting adventures, join our team!
26.01.2021 | NEWS
STACC is looking for an excellent Data Engineer Team LeadData Engineer Team Lead STACC is the leading data science company in Estonia that develops machine learning models, artificial intelligence,…
31.08.2020 | NEWS