Web Scraping with Python: Comparing my Favorite Youtube Channels Part 1
- Sonu Kothari
- Sep 10, 2023
- 2 min read
Introduction
Web scraping is a technique used to extract data from websites enabling us to gain insights and conduct analyses. In this project we will employ Python for web scraping in order to analyze and compare some of my favorite YouTube channels. The ultimate goal is to comprehend how factors such as subscribers, views and the number of videos impact the popularity of each channel.
Project Overview
In this undertaking we will utilize the YouTube Data API to gather statistics from selected channels. Subsequently we will analyze this data in order to compare factors such as subscribers, views count and video quantities.
Project Setup
Prior to commencing the project please ensure that you have installed all the libraries;
pip install seaborn
pip install pandas
pip install matplotlib
pip install --upgrade google-api-python-client
# Importing the Libraries
from googleapiclient.discovery import build
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# YouTube API Key and Channel IDs
api_key = 'YOUR_API_KEY'
channel_ids = ['UCiT9RITQ9PW6BhXK0y2jaeg',
'UCLLw7jmFsvfIVaUFsLs8mlQ',
'UC2UXDak6o7rBm23k3Vv5dww',
'UCcfngi7_ASuo5jdWX0bNauQ',
'UCnz-ZXXER4jOvuED5trXfEA',
'UC7cs8q-gJRlGwj4A8OmCmXg',
'UCtYLUTtgS3k1Fg4y5tAhLbw']
youtube = build('youtube', 'v3', developerKey=api_key)
# Function to get channel statistics
def get_channel_stats(youtube, channel_ids):
all_data = []
request = youtube.channels().list(
part='snippet,contentDetails,statistics',
id= ','.join(channel_ids)
)
response = request.execute()
for i in range(len(response['items'])):
data = dict(Channel_name = response['items'][i]['snippet']
['title'],
Subscribers = response['items'][i]['statistics']
['subscriberCount'],
Views = response['items'][i]['statistics']
['viewCount'],
Total_videos = response['items'][i]['statistics']
['videoCount'],
playlist_id = response['items'][i]
['contentDetails']['relatedPlaylists']['uploads'])
all_data.append(data)
return all_data
# Get channel statistics
channel_statistics = get_channel_stats(youtube, channel_ids)
channel_data = pd.DataFrame(channel_statistics)
# Print channel_data Dataframe
channel_data

# Checking Data Type
channel_data.dtypes

# Data Cleaning and Visualization
channel_data['Subscribers'] = pd.to_numeric(channel_data['Subscribers'])
channel_data['Views'] = pd.to_numeric(channel_data['Views'])
channel_data['Total_videos'] = pd.to_numeric(channel_data['Total_videos'])
channel_data.dtypes

# Visualizations
sns.set(rc={'figure.figsize':(15,10)})
# Bar plots
ax = sns.barplot(x='Channel_name', y='Subscribers', data=channel_data)
plt.title('Subscribers')

ax = sns.barplot(x='Channel_name', y='Views', data=channel_data)
plt.title('Views')

ax = sns.barplot(x='Channel_name', y='Total_videos', data=channel_data)
plt.title('Total Videos')

# Scatter plots
fig, ax = plt.subplots(1,2)
sns.scatterplot(data=channel_data, x='Subscribers', y='Views', ax=ax[0])
sns.scatterplot(data=channel_data, x='Total_videos', y='Views', ax=ax[1])
plt.tight_layout()

# Line plots
fig, ax = plt.subplots(1,2)
sns.lineplot(data=channel_data, x='Subscribers', y='Views', ax=ax[0])
sns.lineplot(data=channel_data, x='Total_videos', y='Subscribers', ax=ax[1])
plt.tight_layout()

Conclusion
This project serves as a demonstration of how web scraping with Python and utilizing the YouTube API allows us to collect statistics from different channels, for comparison purposes.
When we visualize data, like the number of subscribers, views and videos we can gather insights about how popular these channels are. We also discovered that more videos does not means more views or more subscribers. However we do see that more subscribers means more views.
In Part 2, we're diving deeper into the analysis by focusing on a single channel. We'll unravel the performance metrics of each video, offering a fascinating look into their individual successes and much more! Don't miss out on the next installment - it's bound to be an exciting journey through the world of YouTube analytics!
Make sure to use this project in accordance, with YouTubes terms of service. Have fun. Analyzing your YouTube channels!
Study Materials
Google Developers. (n.d.). YouTube Data API v3 - Google Developers. Retrieved Aug 15, 2023, from https://developers.google.com/youtube/v3
Seaborn. (n.d.). seaborn.histplot — Seaborn 0.11.2 documentation. Retrieved Aug 17, 2023, from https://seaborn.pydata.org/generated/seaborn.histplot.html
Stack Overflow. (n.d.). Changing width of bars created with catplot or barplot. Retrieved Aug 17, 2023, from https://stackoverflow.com/questions/34888058/changing-width-of-bars-created-with-catplot-or-barplot