Building & Optimizing A Movie Recommendation System

Cynthia Fonderson | Nov 3, 2023 min read

The goal of this project is to build a movie recommendation system, using latest data scraped from GroupLens and The Movie Database. The GroupLens data was last updated on September 26, 2018. The dataset includes data from 283228 users between January 09, 1995 and September 26, 2018, and contains 27,753,444 ratings and 1,108,997 tag applications across 58,098 movies.

Data collection

Movie ids and information were taken from GroupLens and used to scraped movie data from The Movie Database using an API. The scraped data contain information about a movies name, cast, crew, release year, adult rating, poster, revenue and runtime amongst others. Following data cleaning and feature engineering, the dataset had 18 fields descibed below:

VariableDescription
idMovie ID in the TMDb
yearMovie release year
titleMovie title in english
runtimeMovie runtime in minutes
collectionCollection name, if applicable
genresMovie genres
taglineMovie tagline
overviewPlot overview
castNames of the first 5 cast members
directorMovie director name
producerMovie producer(s) name
keywordsMovie’s keywords
adultMovie’s adult rating (bool)
prod_compName of production company/ies
languagesLanguages spoken in the movie
popularityMovie’s popularity on TMDb
vote_countNumber of users who rated the movie on TMDb
vote_avgMovie’s average score on TMDb



Optimization work flow

workflow

Full Project