InĀ [15]:
#Libraries for basic data manipulation
import os
import pandas as pd
import numpy as np
#Libraries for the webAPI requests
import http
http.client.HTTPConnection.debuglevel = 1
import json
import requests
#libraries for the JayDeBeAPI
import jaydebeapi
import urllib.parse
#library to calculate time
import time
#Library for loading files into dataframes
import dask.dataframe as dd
#Libraries for graphs
from tabulate import tabulate
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.figure_factory as ff
from IPython.display import Image, display, Markdown
#os.chdir("Your Working Directory") #Put your working directory here
About the DatasetĀ¶
The dataset we use for this analysis can be accessed at https://developer.imdb.com/non-commercial-datasets/.
InĀ [7]:
##Specify the path to your image file
image_path = 'About the Dataset.png'
width = 500
## Display the image
Image(filename=image_path, width=width)
Out[7]:
The instructions to download the dataset are -
- Click the link 'https://datasets.imdbws.com/' as shown in the above snapshot.
- Click to download the specific files 'name.basics.tsv.gz' and 'title.akas.tsv.gz'.
- Save the downloaded files from the 'Downloads' folder to your working directory.
InĀ [3]:
##Specify the path to your image file
image_path = 'Datasets_Step2.png'
width = 500
## Display the image
Image(filename=image_path, width=width)
Out[3]:
- If using a Windows machine, Right-click each tsv.gz file and click 'Extract All' to decompress the file.
- If using a MacOS, double-clicking the file should automatically decompress it and create the corresponding .tsv file.
- If using a Linux-based system, you can use the gunzip command in the terminal - gunzip filename.tsv.gz
InĀ [4]:
##Specify the path to your image file
image_path = 'Extract_file.png'
width = 500
## Display the image
Image(filename=image_path, width=width)
Out[4]: