Project overview:

In this project, we designed and implemented a robust architecture on Google Cloud Platform (GCP) to streamline the deployment of machine learning models and seamlessly integrate them into the BigQuery environment. The primary objective was to bridge the gap between data scientists and analysts, enabling them to use off-the-shelf machine learning models in their preferred BigQuery environment, without requiring extensive technical knowledge of machine learning frameworks.

Objectives:

  • Seamless Integration: Allow data analysts to run machine learning models within BigQuery, leveraging its powerful SQL-like querying capabilities.
  • Scalability: Design an architecture that can handle large datasets and scale to meet the demands of data-intensive tasks.
  • Ease of Use: Create a user-friendly interface that allows data scientists and analysts to interact with machine learning models without the need for extensive coding or machine learning expertise.
  • Model Flexibility: Support a variety of machine learning models and algorithms to cater to different use cases.

Technologies:

Python, Cloud run, Big Query, GCR, Teraform, Github Action

Key achievements:

The implemented architecture has significantly expedited the process of deploying and serving machine learning models. Presently, we have deployed 30 models on Google Cloud Run, demonstrating the scalability and robustness of the solution. Additionally, our architecture offers the advantage of seamlessly integrating Python functions into BigQuery, even enabling the invocation of external APIs. As a practical application, we’ve already deployed sentiment analysis and text cleaning functions, empowering analysts to create word clouds and obtain sentiment scores directly within their BigQuery environment. This flexibility enhances the data analysis capabilities and empowers analysts to perform more sophisticated analyses without leaving their preferred environment.