Context
A few months ago, I started working on a side project that allows users to search for books by taking a picture of a book cover. The main barrier at the beginning was data quality of book cover datasets available online, so I created this one.
For example, this dataset can be used for building recommendation and Content Based Image Retrieval (CBIR) systems.
Content
main_dataset.csv
This CSV file contains all meta information for each book in the dataset.
image - URLs of book covers. Use this cover to download images yourselves if you need.
name - Title of a book.
author - Author of a book.
format - Physical format of a book (i.e. paperback)
book_depository_stars - Book's rating found on the bookdepository.com(NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)
price - Book's current price found on the bookdepository.com (NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)
currency - Currency of prices found in the dataset.
old_price -Book's old price (if exists) found on the bookdepository.com (NOTE: Due to difference between scraping and download time of the dataset, this information might be different from one on the website)
isbn -ISBN number of a book.
category -Category of a book found on the bookdepository.com
img_paths -Book's cover local path (after scraping).
book-covers
In this folder you can find all book covers, sorted in category based folders, in the .jpg format.
This dataset is contains 33 classes (book categories) and each contains close to 1k images, so it is pretty balanced.
NOTE: Extract this data into folder called dataset so it matches paths provided for you in the main_dataset.csv file.
Acknowledgements
All data found in this dataset was scraped from the https://www.bookdepository.com/. (Not related to them in any way, just a great website 👍 )-