Our goal was to create a book recommendation system.
We chose a recommendation system because popular platforms such as Netflix, Amazon, and Spotify use them to keep their users interested and takes the guesswork out of having to choose their next show, item to purchase, or song to listen to. There are many different inputs that the model can look at with a recommendation system for the user or subject. It seemed like a broad and relevant topic in today’s data science realm and a good opportunity to practice a number of the skill that we learned over the course of the past several months. As an added bonus, we all enjoy reading!
Google slides: linkSQLITE, web scraping and source of our data
Our Dataset was sourced from Kaggle. It was a large dataset and included most of the categories which we wanted to explore. We started by attempting to merge a few of the datasets that we found to see if we could combine them to get a larger dataset with more categories to explore. This was unsuccessful as there were not enough overlap.
One of the main categories that we was missing was the books genre. This was one of the main ways that we were hoping to group our books for recommendation so we wrote a scraping code to search for the book title on google and based off a list of preapproved genres scrape the books genre. The code for scraping the books genre ran successfully, unfortunately we ran into issues with HTTP Error 429: Too Many Requests and in the end moved forward with dummy data. We chose SQLITE for our database as it suited the size of our dataset and our need for this project.
Data Source: Goodreads Books.csv sourced by Nilim-Kaggle
Preliminary data preprocessing and model choice
For our recommendation system, we built a simple recommender and content-based recommender.
Simple recommenders: offer generalized recommendations to every user, based on book popularity.
Content-based recommenders: suggest similar items based on a particular item. The general idea is that if a person likes a particular book, he or she will like a book similar to that title.
Tableau Public, HTML and GitPages
We choose to create a webpage hosted by GitPages for our dashboard. Our Interactive element is the top navigation bar as well as a click to reveal button in our results section.
Some of the visualizations were created using Tableau Public
Website Template: W3.CSS templates-Catering
We were able to create two working machine learning models which could recommend books (from this dataset) to readers. The simple recommender selected the top 5 rated books from the dataset based on a weighted rating of the rating count and the average rating. Simple, yet useful.
The second recommendation system looked at the dummy data that we provided of a users age, etc and based off a book that they enjoyed, it would suggest a similarly rated book in the same genre.
4.565674 - Harry Potter and the Half-Blood Prince
4.562321- Harry Potter Boxed Set Books 1-5
4.556183 - Harry Potter and the Prisoner of Azkaban
4.546915 - The Complete Calvin and Hobbes
4.508544 - J.R.R. Tolkien 4-Book Boxed Set
Limitations for this content-based model is that we are using randomly generated attributes so our output will not be accurate. When typing in the book title, the title must match exactly as the book title in the list. This can be solved by using a drop-down book title menu.
Yurii Hanley (LinkedIn) (GitHub)
Julianne Itliong (LinkedIn) (GitHub)
Natasha Lamperti (LinkedIn) (GitHub)
Ru Sanjeev (LinkedIn) (GitHub)