The Billboard #1 Track

The Billboard Hot 100 is the music industry standard record chart in the United States for songs, published weekly by Billboard magazine.

Factors that are used to calculate the Billboard Top Tracks

  1. Sales (Physical and Digital)
  2. Radio Play
  3. Online Streaming Data

I scraped this data from Wikipedia's list of Billboard #1 Tracks from 2010 - 2019 https://en.wikipedia.org/wiki/Billboard_Hot_100

The Spotify Magic

Spotify has been a leader in enabling discovery of new music. The company uses audio analysis models to extract features about the song - how danceable it is, how energetic it is, among other things. They use these features to robustly predict what songs a person is more likely to love.

Lucky for us, Spotify gives access to their API here https://developer.spotify.com/

API Calls

Spotipy is a sweet Python package that makes it easy to connect

We'll be primarily using two Spotify API endpoints

  1. Search - takes in a search string and outputs matching songs
  2. Audio Features - takes in a song URL and outputs the audio features of track
In [527]:
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials

cid = 'e33f0325007b4844bb2d8b79a15f94c1'
secret = 'c4e8f9c0404f4aafa1ef2786d4d2a58d'
client_credentials_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(client_credentials_manager = client_credentials_manager)


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from fbprophet import Prophet
from sklearn.metrics import mean_squared_error

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Loading the Billboard Decade 2010-2020 data

Sourced from Wikipedia

In [571]:
billboard = pd.read_csv('billboardtop.csv', parse_dates=["date"])
billboard.head()
Out[571]:
id date artist song weeks link
0 1041 2015-01-17 Mark Ronson featuring Bruno Mars "Uptown Funk"♪ [70] 14 [71]
1 1042 2015-04-25 Wiz Khalifa featuring Charlie Puth "See You Again" 12 [72]
2 1043 2015-06-06 Taylor Swift featuring Kendrick Lamar "Bad Blood" 1 [73]
3 1044 2015-07-25 Omi "Cheerleader" 6 [74]
4 1045 2015-08-22 The Weeknd "Can't Feel My Face" 3 [75]
In [572]:
#Cleaning up
billboard['song'] = billboard['song'].apply(lambda x: x.split('"')[1])
billboard['aSearch'] = billboard['artist'].apply(lambda x : " ".join(x.split(" ")[:2]))
billboard['year'] = billboard['date'].dt.year
billboard['searcher'] = billboard['song'] + " " + billboard['aSearch']
In [573]:
billboard.head()
Out[573]:
id date artist song weeks link aSearch year searcher
0 1041 2015-01-17 Mark Ronson featuring Bruno Mars Uptown Funk 14 [71] Mark Ronson 2015 Uptown Funk Mark Ronson
1 1042 2015-04-25 Wiz Khalifa featuring Charlie Puth See You Again 12 [72] Wiz Khalifa 2015 See You Again Wiz Khalifa
2 1043 2015-06-06 Taylor Swift featuring Kendrick Lamar Bad Blood 1 [73] Taylor Swift 2015 Bad Blood Taylor Swift
3 1044 2015-07-25 Omi Cheerleader 6 [74] Omi 2015 Cheerleader Omi
4 1045 2015-08-22 The Weeknd Can't Feel My Face 3 [75] The Weeknd 2015 Can't Feel My Face The Weeknd

Using the 'search' API Call to find the songs in our Billboard list - on Spotify

In [ ]:
#Example
sp.search('See You Again')
In [580]:
def getTrackURI(searcher):
    result = sp.search(searcher)
    try:
        obj1 = result['tracks']['items'][0]['uri']
        
    except:
        obj1 = None
    return obj1

def getArtists(searcher):
    result = sp.search(searcher)
    try:
        obj2 = result['tracks']['items'][0]['artists']
        names = []
        for name in obj2:
            names.append(name['name'])
        obj2 = ",".join(names)      
    except:
        obj2 = ""
    return obj2



billboard['uri'] = billboard['searcher'].apply(getTrackURI)
billboard['artists'] = billboard['searcher'].apply(getArtists)

billboard = billboard.dropna()
billboard.head()
retrying ...2secs
Out[580]:
id date artist song weeks link aSearch year searcher uri artists
0 1041 2015-01-17 Mark Ronson featuring Bruno Mars Uptown Funk 14 [71] Mark Ronson 2015 Uptown Funk Mark Ronson spotify:track:32OlwWuMpZ6b0aN2RZOeMS Mark Ronson,Bruno Mars
1 1042 2015-04-25 Wiz Khalifa featuring Charlie Puth See You Again 12 [72] Wiz Khalifa 2015 See You Again Wiz Khalifa spotify:track:2JzZzZUQj3Qff7wapcbKjc Wiz Khalifa,Charlie Puth
2 1043 2015-06-06 Taylor Swift featuring Kendrick Lamar Bad Blood 1 [73] Taylor Swift 2015 Bad Blood Taylor Swift spotify:track:273dCMFseLcVsoSWx59IoE Taylor Swift
3 1044 2015-07-25 Omi Cheerleader 6 [74] Omi 2015 Cheerleader Omi spotify:track:023OVLNzXhX0j7CxswUt6D OMI
4 1045 2015-08-22 The Weeknd Can't Feel My Face 3 [75] The Weeknd 2015 Can't Feel My Face The Weeknd spotify:track:22VdIZQfgXJea34mQxlt81 The Weeknd

Now let's extract the audio features provided by Spotify for these songs

danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

energy Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

liveness float Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

loudness float The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.

mode int Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.

speechiness float Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

tempo The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

valence A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

In [582]:
audio_features = []
for uri in billboard['uri'].values:
    audio_features.append(sp.audio_features(uri)[0])
In [583]:
audio_features = pd.DataFrame(audio_features)[['danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness',
                             'acousticness', 'liveness', 'valence', 'tempo', 'duration_ms', 'time_signature']]
audio_features['duration'] = audio_features['duration_ms']/60000
audio_features.head()
Out[583]:
danceability energy key loudness mode speechiness acousticness liveness valence tempo duration_ms time_signature duration
0 0.856 0.609 0 -7.223 1 0.0824 0.00801 0.0344 0.928 114.988 269667 4 4.494450
1 0.689 0.481 10 -7.503 1 0.0815 0.36900 0.0649 0.283 80.025 229526 4 3.825433
2 0.652 0.802 7 -6.114 1 0.1810 0.08710 0.1480 0.295 170.157 211933 4 3.532217
3 0.780 0.680 4 -6.081 1 0.0305 0.14100 0.1380 0.594 118.026 180560 4 3.009333
4 0.705 0.769 9 -5.526 0 0.0426 0.11200 0.1050 0.590 107.939 213520 4 3.558667

Let's engineer a few more features

Season

In [584]:
billboard['year'] = billboard['date'].dt.year
billboard['month'] = billboard['date'].dt.month

def season(month):
    if month in [4, 5, 6]:
        return 'Spring'
    elif month in [7, 8, 9]:
        return 'Summer'
    elif month in [10, 11, 12]:
        return 'Fall'
    else:
        return 'Winter'
    
billboard['season'] = billboard['month'].apply(season)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:14: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [605]:
billDf = pd.concat([billboard, audio_features], axis=1)[['song', 'id', 'artists','date','year','season', 'artist', 'weeks','danceability', 'energy', 
                                                'key', 'loudness', 'mode', 'speechiness',
                             'acousticness', 'liveness', 'valence', 'tempo', 'duration', 'time_signature', 'uri' ]]
In [606]:
billDf.head()
Out[606]:
song id artists date year season artist weeks danceability energy key loudness mode speechiness acousticness liveness valence tempo duration time_signature uri
0 Uptown Funk 1041.0 Mark Ronson,Bruno Mars 2015-01-17 2015.0 Winter Mark Ronson featuring Bruno Mars 14.0 0.856 0.609 0.0 -7.223 1.0 0.0824 0.00801 0.0344 0.928 114.988 4.494450 4.0 spotify:track:32OlwWuMpZ6b0aN2RZOeMS
1 See You Again 1042.0 Wiz Khalifa,Charlie Puth 2015-04-25 2015.0 Spring Wiz Khalifa featuring Charlie Puth 12.0 0.689 0.481 10.0 -7.503 1.0 0.0815 0.36900 0.0649 0.283 80.025 3.825433 4.0 spotify:track:2JzZzZUQj3Qff7wapcbKjc
2 Bad Blood 1043.0 Taylor Swift 2015-06-06 2015.0 Spring Taylor Swift featuring Kendrick Lamar 1.0 0.652 0.802 7.0 -6.114 1.0 0.1810 0.08710 0.1480 0.295 170.157 3.532217 4.0 spotify:track:273dCMFseLcVsoSWx59IoE
3 Cheerleader 1044.0 OMI 2015-07-25 2015.0 Summer Omi 6.0 0.780 0.680 4.0 -6.081 1.0 0.0305 0.14100 0.1380 0.594 118.026 3.009333 4.0 spotify:track:023OVLNzXhX0j7CxswUt6D
4 Can't Feel My Face 1045.0 The Weeknd 2015-08-22 2015.0 Summer The Weeknd 3.0 0.705 0.769 9.0 -5.526 0.0 0.0426 0.11200 0.1050 0.590 107.939 3.558667 4.0 spotify:track:22VdIZQfgXJea34mQxlt81

A song with a valence (sentiment) higher than 0.5 can be assumed to be a 'happy' song

Since acousticness is the confidence that a song is acoustic, we can assume values of acousticness above 0.5 to refer to acoustic tracks

The documentation says rap songs have a speechiness between 0.33 and 0.66

In [607]:
billDf['happy'] = billDf['valence'] >= 0.5
billDf['acoustic'] = billDf['acousticness'] >= 0.5
billDf['rap'] = (billDf['speechiness'] >=0.33) & (billDf['speechiness'] <=0.66)
In [608]:
def tidy_split(df, column, sep='|', keep=False):
    indexes = list()
    new_values = list()
    df = df.dropna(subset=[column])
    for i, presplit in enumerate(df[column].astype(str)):
        values = presplit.split(sep)
        if keep and len(values) > 1:
            indexes.append(i)
            new_values.append(presplit)
        for value in values:
            indexes.append(i)
            new_values.append(value)
    new_df = df.iloc[indexes, :].copy()
    new_df[column] = new_values
    return new_df

artists = tidy_split(billDf, 'artists', ',').dropna()
artists.head(6)
Out[608]:
song id artists date year season artist weeks danceability energy key loudness mode speechiness acousticness liveness valence tempo duration time_signature uri happy acoustic rap
0 Uptown Funk 1041.0 Mark Ronson 2015-01-17 2015.0 Winter Mark Ronson featuring Bruno Mars 14.0 0.856 0.609 0.0 -7.223 1.0 0.0824 0.00801 0.0344 0.928 114.988 4.494450 4.0 spotify:track:32OlwWuMpZ6b0aN2RZOeMS True False False
0 Uptown Funk 1041.0 Bruno Mars 2015-01-17 2015.0 Winter Mark Ronson featuring Bruno Mars 14.0 0.856 0.609 0.0 -7.223 1.0 0.0824 0.00801 0.0344 0.928 114.988 4.494450 4.0 spotify:track:32OlwWuMpZ6b0aN2RZOeMS True False False
1 See You Again 1042.0 Wiz Khalifa 2015-04-25 2015.0 Spring Wiz Khalifa featuring Charlie Puth 12.0 0.689 0.481 10.0 -7.503 1.0 0.0815 0.36900 0.0649 0.283 80.025 3.825433 4.0 spotify:track:2JzZzZUQj3Qff7wapcbKjc False False False
1 See You Again 1042.0 Charlie Puth 2015-04-25 2015.0 Spring Wiz Khalifa featuring Charlie Puth 12.0 0.689 0.481 10.0 -7.503 1.0 0.0815 0.36900 0.0649 0.283 80.025 3.825433 4.0 spotify:track:2JzZzZUQj3Qff7wapcbKjc False False False
2 Bad Blood 1043.0 Taylor Swift 2015-06-06 2015.0 Spring Taylor Swift featuring Kendrick Lamar 1.0 0.652 0.802 7.0 -6.114 1.0 0.1810 0.08710 0.1480 0.295 170.157 3.532217 4.0 spotify:track:273dCMFseLcVsoSWx59IoE False False False
3 Cheerleader 1044.0 OMI 2015-07-25 2015.0 Summer Omi 6.0 0.780 0.680 4.0 -6.081 1.0 0.0305 0.14100 0.1380 0.594 118.026 3.009333 4.0 spotify:track:023OVLNzXhX0j7CxswUt6D True False False

Visualization

In [898]:
yearsAcross = billDf.groupby('year')['id'].count()

f, ax = plt.subplots(figsize=(10, 6))
sns.barplot(yearsAcross.index, yearsAcross.values, palette="rocket")
plt.title('No. of Billboard #1 Tracks Yearly')
plt.xlabel("No. of Tracks")
plt.ylabel("Year")
Out[898]:
Text(0, 0.5, 'Year')
In [899]:
len(billDf)
Out[899]:
116

Artist Collaboration

We can see that most of the Billboard #1 Tracks were Solo songs. A significant number of Duets made it to the top of the chart, but beyond 2, we see very marginal returns

In [891]:
f, ax = plt.subplots(figsize=(10, 6))
noPeople = artists.groupby('id').count()['artists'].value_counts()
sns.barplot(noPeople.index, noPeople.values, palette="rocket")
plt.title('Artist Collaboration on a #1 Billboard Track?')
plt.xlabel("No. of Artists on the track")
plt.ylabel("Frequency")
Out[891]:
Text(0, 0.5, 'Frequency')

Who were the Top Artists on the Billboard #1?

We can see here Katy Perry had a whopping 7 tracks on Billboard #1, followed by a host of other artists (Maroon 5, Justin Bieber, Bruno Mars, Adele, etc) with 4 tracks.

In [768]:
f, ax = plt.subplots(figsize=(10, 6))
artistsCounts = artists.groupby('artists')['id'].count().sort_values(ascending=False)[:10]
sns.barplot(x=artistsCounts.index, y=artistsCounts.values)
plt.xticks(rotation=75)
plt.ylabel('No. of Songs in Billboard #1')
plt.xlabel('Artist')
plt.title('Songs in Billboard #1 : 2010-2019')
Out[768]:
Text(0.5, 1.0, 'Songs in Billboard #1 : 2010-2019')

But is Number of #1 Tracks a good measure?

We can see here that Katy Perry had 7 tracks in a short span between 2010 and 2013, but hasn't landed the #1 in the rest of the decade. You could make a claim that Katy Perry hasn't stayed relevant in the perspective of #1 Songs

In [893]:
f, ax = plt.subplots(figsize=(10, 6))
sns.barplot(artists[artists['artists']=="Katy Perry"]['year'].value_counts().index, artists[artists['artists']=="Katy Perry"]['year'].value_counts().values, palette="rocket")
plt.title('Katy Perry #1 Tracks')
plt.ylabel('Count')
Out[893]:
Text(0, 0.5, 'Count')

Artists - Staying Relevant

Lady Gaga tops the bar here, with Billboard #1 songs across the span of 8 years. But Maroon 5 and Bruno Mars have significantly more Top Songs (4 each) spread over 7 years. We could claim that Bruno Mars and Maroon 5 have been super reliable in rendering top of the charts songs

We can see here Katy Perry has the highest number of Top tracks, but only over three years in the Billboard charts.

New artists like Post Malone and Camilla Cabello have stayed on the Billboard #1 for fewer years, naturally. But in a short span, they've secured 3 top tracks meaning they're on the rise

In [894]:
adf = pd.DataFrame()
timeline = artists.groupby('artists')['year'].max() - artists.groupby('artists')['year'].min()
counts = artists.groupby('artists')['year'].count()
scores = timeline*counts/10

adf['timeline'] = timeline
adf['counts'] = counts
adf['score'] = scores
adf['artists'] = timeline.index

adf = adf.sort_values(by="timeline", ascending=False)

f, ax = plt.subplots(figsize=(15, 6))
sns.barplot(adf['artists'][:15], adf['timeline'][:15])
plt.plot(adf['artists'][:15], adf['counts'][:15], color="orange", label="No. of Billboard #1 Tracks")
plt.title('How Relevant were the Top Artists?')
plt.ylabel('Number of Years')
plt.xticks(rotation=75)
plt.legend()

for i, txt in enumerate(adf['counts'][:15]):
    ax.annotate(txt, (adf['counts'][:15].index[i], adf['counts'][:15].values[i]))

Songs and Presidential Elections

Valence is a measure of how happy the song is on a scale of 0 to 1.

Presidential Election cycles always yield interesting results when analying time series. Here we can see that the 2016 Presidential Election, with a sharp drop, saw people listening to more sad music than ever. I'll leave the interpretation of the results to you.

In [737]:
billyear = billDf.groupby('year').mean().reset_index()
billyear['year'] = billyear['year'].apply(int)
In [860]:
f, ax = plt.subplots(figsize=(10, 6))
sns.pointplot(x="year", y="valence", data=billyear)
plt.axvline(2, linestyle='--', color='blue')
plt.axvline(6, linestyle='--', color='red')
plt.text(1,0.51, '2012 Presidential Election', color='black', fontsize=8)
plt.text(5,0.51, '2016 Presidential Election', color='black', fontsize=8)
plt.xticks(rotation=70)
plt.title('How Happy were #1 songs during Presidential Election years?')
plt.ylabel('Valence -- Happiness Index')
Out[860]:
Text(0, 0.5, 'Valence -- Happiness Index')

Track Loudness

There's been a strong movement back towards music that is of lower loudness, with a -1.5 decibels drop over the decade. The previous decades saw artists competing with each other on the loudness scale, so as to be more poppy on Radio Channels. Songs that are mastered to be loud, often have poor dynamics and are not great for enjoying on good audio gear.

In addition, songs are now mastered to be played on Youtube, Apple Music and Spotify - all of which impose loudness thresholds to maximize audio quality. Thanks to the decentivizing, and an increasing preference for audio quality by consumers - tracks are becoming less loud.

In [767]:
f, ax = plt.subplots(figsize=(10, 6))
sns.regplot(x="year", y="loudness", data=billyear)

plt.xticks(rotation=70)
plt.title('Loudness across the Years')
plt.ylabel('Loudness dB')
Out[767]:
Text(0, 0.5, 'Loudness dB')

Drop in Energy

Spotify defines Energy as how bright and fast a song is - songs that have a lot of high frequencies - think of blaring synths, or simply singers shreaking (yikes!) - burn up pretty high on the energy spectrum.

We can see that Energy has been steadily dropping over the decade by over 25%, and has been becoming a non issue for Music Producers. Parallely, we can see that Acoustic tracks have seen a 100% lift across the decade. People are definitely liking gentler music now

In [863]:
f, ax = plt.subplots(figsize=(10, 6))
sns.regplot(x="year", y="energy", data=billyear, label="Energy")
sns.regplot(x="year", y="acousticness", data=billyear, label="Acousticness")
plt.legend()
plt.xticks(rotation=70)
plt.title('Gentler music is winning')
plt.ylabel('Metric')
Out[863]:
Text(0, 0.5, 'Metric')

How danceable is a track?

Danceability has seen a significant upward trend across the decade, despite the Energy on a decline. People like to dance to tracks that are less energetic? How do we reconcile the differences

Let's look at tracks on both ends of the spectrum -

2010-11: Teenage Dream by Katy Perry and Rude Boy by Rihanna ruled the charts. These are high-energy tracks, with bright frequencies dominating.

2018-19: Old Town Road by Lil Nas X, Sucker by Jonas Brothers, Without Me by Halsey are all very warm and dark tracks (if one could visualize music) - but they are utterly groovy.

There was an association previously that energetic tracks are more danceable, but this decade has been gearing towards the groovy and funky above energetic pattern.

In [889]:
f, ax = plt.subplots(figsize=(12, 7))
sns.regplot(x="year", y="energy", data=billyear, label="Energy")
sns.regplot(x="year", y="danceability", data=billyear, label="Danceability")
plt.legend()
plt.xticks(rotation=70)
plt.title('Energy vs Danceability')
plt.ylabel('Metric')
plt.text(2010,0.6, 'Rude Boy - Rihanna', color='black', fontsize=8)
plt.text(2008,0.7, 'Teenage Dream - Katy Perry', color='black', fontsize=8)
plt.text(2009,0.65, 'Not Afraid - Eminem', color='black', fontsize=8)
plt.text(2019,0.77, 'Sucker - Jonas Brothers', color='black', fontsize=8)
plt.text(2018,0.7, 'Old Town Road - Lil Nas X', color='black', fontsize=8)
plt.text(2018,0.75, 'Without Me - Halsey', color='black', fontsize=8)
Out[889]:
Text(2018, 0.75, 'Without Me - Halsey')
In [862]:
billDf[(billDf.danceability>0.7) & (billDf.energy < 0.6)].sort_values(by="year").dropna()[-5:][['song', 'artist']]
Out[862]:
song artist
39 Sicko Mode Travis Scott
35 I Like It Cardi B, Bad Bunny and J Balvin
40 Without Me Halsey
44 Sucker Jonas Brothers
45 Old Town Road Lil Nas X solo or featuring Billy Ray Cyrus2
In [861]:
billDf[(billDf.danceability>0.7) & (billDf.energy > 0.6)].sort_values(by="year").dropna()[:5][['song', 'artist']]
Out[861]:
song artist
66 Like a G6 Far East Movement featuring The Cataracs and Dev
64 Teenage Dream Katy Perry
61 Not Afraid Eminem
59 Nothin' on You B.o.B featuring Bruno Mars
58 Rude Boy Rihanna

Effect of Seasons on Music Listening

In [826]:
seasonsDf = billDf.groupby('season').agg({'valence':np.median, 'danceability':np.median, 'happy':np.mean, 'duration':np.mean})
seasonsDf['order'] = [4,2,3,1]
seasonsDf.sort_values(by="order", inplace=True)
seasonsDf
seasonsDf['valence'] = (seasonsDf['valence'] - np.mean(seasonsDf['valence']))*100/np.mean(seasonsDf['valence'])
seasonsDf['danceability'] = (seasonsDf['danceability'] - np.mean(seasonsDf['danceability']))*100/np.mean(seasonsDf['danceability'])
seasonsDf['happy'] = (seasonsDf['happy'] - np.mean(seasonsDf['happy']))*100/np.mean(seasonsDf['happy'])
seasonsDf['duration'] = (seasonsDf['duration'] - np.mean(seasonsDf['duration']))*100/np.mean(seasonsDf['duration'])
seasonsDf
Out[826]:
valence danceability happy duration order
season
Winter 15.769594 1.291248 21.608980 -4.855333 1
Spring -4.438149 -1.147776 -4.760967 1.678389 2
Summer -11.992446 2.582496 -16.055339 0.656996 3
Fall 0.661001 -2.725968 -0.792674 2.519948 4

People listen to more happy tracks during Winter

The data seems to be pointing to the idea that people like listening to more happy songs during Winter. This seems counterintuitive, but from a psychological standpoint people are more likely to be sad during Winter. Music can often function as an antidote - a happy song on a bad day can really elevate the mood. I have been looping through Feels by Calvin Harris to pump myself up and avoid the perils of the Minnesotan Winter.

In [850]:
f, ax = plt.subplots(figsize=(10, 6))
sns.barplot(seasonsDf.index, seasonsDf.happy, palette="vlag")
plt.title("Effect of Seasons on Song Happiness")
plt.ylabel('% Increase in Happiness Index')
plt.axhline(0, linestyle='-', color='grey')
plt.text(2.3,0.5, 'Baseline: Mean Happiness', color='black', fontsize=12)
Out[850]:
Text(2.3, 0.5, 'Baseline: Mean Happiness')

People like more 'dancy' songs in Summer

Summer has the highest number of danceable songs - and this makes sense.

In [858]:
f, ax = plt.subplots(figsize=(10, 6))
sns.barplot(seasonsDf.index, seasonsDf.danceability, palette="rocket")
plt.title("Summer has more danceable songs")

plt.ylabel('% Increase in Danceability of Billboard Songs')
plt.axhline(0, linestyle='-', color='grey')
plt.text(2.55,0.1, 'Baseline: Mean Danceability', color='black', fontsize=9)
Out[858]:
Text(2.55, 0.1, 'Baseline: Mean Danceability')