World Happiness report

A Journey through World Happiness:)

High-level Goal

To create an interactive Shiny app that visualizes and analyzes the World Happiness Report data, enabling users to explore and understand the factors that contribute to happiness worldwide.

Goal and Motivation

Our primary goal with this project is to develop an interactive shiny application that makes the complex data from the World Happiness Report accessible and engaging for a wide audience. The World Happiness Report is a valuable resource that provides insights into the well-being and happiness of people in various countries, but the data can be overwhelming and challenging to interpret. We aim to simplify this process by building an intuitive and user-friendly app that offers a dynamic visual representation of the data. Our motivation for this project is to bridge the gap between valuable but complex data and the end-users who seek to understand it. We believe that by creating an interactive Shiny app for the World Happiness Report, we can empower individuals, researchers, and policymakers to explore and gain insights into global happiness, contributing to a better understanding of well-being and, potentially, to informed decisions that can improve the quality of life for people around the world.

Introduction

“Are you happy?” Despite being a straightforward question, philosophers, artists, and scientists have been captivated by it for millennia. We take a wonderful tour through the World Happiness Report dataset in our quest to discover what makes people truly happy around the world. This proposal describes how we want to investigate and examine this interesting dataset, which runs from 2015 to 2023. This project attempts to solve the puzzles of happiness in a playful and interesting way. Happiness is a universal quest.

Dataset

df_2015 <- read.csv("data/2015.csv")
df_2016 <- read.csv("data/2016.csv")
df_2017 <- read.csv("data/2017.csv")
df_2018 <- read.csv("data/2018.csv")
df_2019 <- read.csv("data/2019.csv")
df_2020 <- read.csv("data/2020.csv")
df_2021 <- read.csv("data/2021.csv")
df_2022 <- read.csv("data/2022.csv")
df_2023 <- read.csv("data/2023.csv")

Dataset Description

We are excited to embark on a journey to explore the World Happiness Report dataset, encompassing the years 2015 to 2023. This engaging and illuminating dataset offers a unique glimpse into the factors influencing global happiness and well-being. Let’s dive into the details of our chosen dataset:

The World Happiness Report dataset is sourced directly from the official World Happiness Report website. This esteemed report is published annually by the United Nations, offering insights into the state of happiness and well-being across countries. The dataset is meticulously compiled, verified, and updated each year, making it a reliable source for our analysis.

Dimensions

Multiple annual observations, varying from 149 to 158 countries per year.

Common variables across dataset:

Variable name	data Type	description
Country	Character (chr)	The name of the country for which the happiness metrics are recorded.
Happiness Rank	Integer (int)	The rank of the country is based on its happiness score.
Happiness Score	Double (dbl)	The score that quantifies the happiness level, is based on various factors such as GDP per capita, social support, life expectancy, freedom to make life choices, generosity, and perceptions of corruption.
GDP per Capita	Double (dbl)	Description: A measure of the country’s economic output that accounts for its population.
Family	Double (dbl)	The degree to which social support is provided in the country, reflecting family ties or community support.
Health (Life Expectancy)	Double (dbl)	Average life expectancy in the country, indicating the overall health of the population.
Freedom	Double (dbl)	The level of freedom citizens have to make life choices in the country.
Trust (Government Corruption)	Double (dbl)	The level of trust in the country’s government and the perceived amount of corruption.
Generosity	Double (dbl)	The average level of generosity of the country’s citizens.
Dystopia Residual	Double (dbl)	A hypothetical measure that accounts for the unexplained components of happiness from the six factors evaluated.

Reason for Choosing this Dataset

The World Happiness Report dataset offers a perfect blend of intriguing questions and meaningful insights. By exploring this dataset, we aim to understand the factors that contribute to happiness and well-being across diverse societies. Furthermore, we chose this dataset for the following reasons:

1. Global Relevance: Happiness is a fundamental aspect of human well-being, and this dataset provides a unique perspective on happiness trends worldwide. Our analysis can offer insights into what makes societies and individuals happy, transcending borders and cultures.

2. Temporal Insights: The dataset covers a wide range of years, enabling us to track happiness trends and identify potential causative factors over time. This temporal dimension adds depth to our analysis.

Cleaned Dataset for all Years(2015-2023)

# Creating a list of all dataframes for easier handling
list_of_dfs <- list(cleaned_2015_df, cleaned_2016_df, cleaned_2017_df, cleaned_2018_df, 
                    cleaned_2019_df, cleaned_2020_df, cleaned_2021_df, cleaned_2022_df, 
                    cleaned_2023_df)

# Merging all dataframes into one
happiness_all_years_df <- dplyr::bind_rows(list_of_dfs)

# Check the combined dataframe
glimpse(happiness_all_years_df)

Rows: 1,368
Columns: 11
$ Country                         <chr> "Switzerland", "Iceland", "Denmark", "…
$ `Happiness Rank`                <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,…
$ `Happiness Score`               <dbl> 7.587, 7.561, 7.527, 7.522, 7.427, 7.4…
$ `GDP per Capita`                <dbl> 1.39651, 1.30232, 1.32548, 1.45900, 1.…
$ Family                          <dbl> 1.34951, 1.40223, 1.36058, 1.33095, 1.…
$ `Health (Life Expectancy)`      <dbl> 0.94143, 0.94784, 0.87464, 0.88521, 0.…
$ Freedom                         <dbl> 0.66557, 0.62877, 0.64938, 0.66973, 0.…
$ `Trust (Government Corruption)` <dbl> 0.41978, 0.14145, 0.48357, 0.36503, 0.…
$ Generosity                      <dbl> 0.29678, 0.43630, 0.34139, 0.34699, 0.…
$ `Dystopia Residual`             <dbl> 2.51738, 2.70201, 2.49204, 2.46531, 2.…
$ Year                            <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 20…

write.csv(happiness_all_years_df, file = "data/happiness_all_years_df.csv")
#glimpse(happiness_all_years_df)

Questions

How do various factors contribute to the overall happiness score across different countries over the years?
Can we identify patterns of change in happiness scores related to global events such as the COVID-19 pandemic?

Analysis plan

Approach for Question 1

The different factors involved are as follows : economy (“GDP per capita”), family (“social support”), health(“healthy life expectancy at birth”), freedom (“freedom to make life choices”), trust(“perceptions of corruption”) and generosity (“generosity”).

Objective: Enable users to interactively explore the impact of different variables on happiness scores using a customized interface. The app will feature three drop-down menus: one for selecting countries, one for the x-axis variable, and another for the y-axis variable.
Visualization: The app will present a customizable scatter plot. Users can select specific years using buttons to display a single point on the graph. Hovering over each point will reveal detailed information about the chosen variables for that point.
Insights and Exploration: The goal is to identify consistent factors impacting happiness scores across countries and years by allowing users to dynamically visualize and explore data points.

Approach for Question 2

Variables Involved:

Year (Data Type: Integer): To identify the annual data points and capture trends over time.

Country (Data Type: Character): For geographical mapping and comparison between nations.

Happiness Score and related factors (Data Type: Double): To analyze changes and correlations.
Variables to be Created:

Pandemic Period (Data Type: Character): A derived variable indicating pre-pandemic, during-pandemic, and post-pandemic periods.

Change in Happiness Score (Data Type: Double): The difference in happiness scores from one year to the next.

Average Change per Period (Data Type: Double): The average change in happiness scores for each period (pre, during, post-pandemic).
Analysis and Visualization:

Trend Analysis: Create line graphs to visualize happiness scores over the years, with a clear demarcation of the pandemic period.

Pandemic Impact: Zoom in on the period of the pandemic to analyze specific shifts or anomalies in happiness trends.

Correlation and Comparison: Utilize heatmaps to provide a visual correlation matrix between happiness scores and other variables. Conduct statistical tests (like t-tests or ANOVA) to contrast the differences in happiness levels across different periods.

Causality and Control: Implement a causal analysis, such as difference-in-differences, to evaluate the pandemic’s impact on happiness while controlling for other variables like GDP per capita.

Visualizations: Develop visualizations that highlight the onset of the pandemic, using markers or shaded areas on the line graphs. Create scatter plots and box plots to compare happiness against other variables across different timeframes.

Shiny App Features: Intuitive UI guiding through data and findings. Reactive visuals that update with user input. Interactive elements like sliders and drop downs for custom analyses.

Analysis and Reporting: Note regional happiness shifts and anomalies. Combine statistical results and visual insights. Conclude on the link between pandemic and happiness scores, considering data limits.

Note:

In the event of encountering global occurrences beyond COVID-19 that have a sustained impact on the world, we will incorporate them into our future considerations.

Plan of Attack

Week 1: (Oct 16 - Oct 22) [Everyone]

Data Selection and Initial Planning

Tasks:
- Identify and download the World Happiness Report dataset.
- Clean and preprocess the data, removing any inconsistencies or missing values.
- Familiarizing with the dataset’s structure and variables.
- Analysing more on the specific questions we want to address and the visualizations.
- Begin designing a rough sketch of the shiny app’s layout.

Week 2: (Oct 23 - Oct 29)[Everyone]

Proposal Writing

Tasks:
- Write a project proposal that includes the project’s goals, objectives, and scope.
- Define the questions you aim to answer with the shiny application.
- Outline the design and interactivity features we plan to include.
- Create a project timeline with specific milestones.

Week 3: (Oct 30 - Nov 5) [Everyone]

Data Exploration and Visualization Planning

Tasks:
- Conduct exploratory data analysis (EDA) to understand the dataset better.
- Identify interesting patterns, correlations, or outliers in the data.
- Sketch initial ideas for the app’s layout and visualizations.

Week 4: (Nov 6 - Nov 12) [Everyone]

Shiny Application Layout Designing

Tasks:
- Select the tools and libraries to be used for creating the app.
- Plan the layout, including the placement of plots, widgets, and navigation.
- Create a wireframe for user interface.

Week 5: (Nov 13 - Nov 19) [TBD]

Plot Creation

Tasks:
- Develop the plots and interactive components.
- Ensure that the plots represent the data accurately and are visually appealing.
- Initial integration of interactive elements into the shiny app.

Week 6: (Nov 20 - Nov 26) [TBD]

Week 7: (Nov 27 - Dec 3) [Everyone]

Testing and Modifications

Tasks:
- Conduct testing to identify and fix any bugs or issues.
- Gather feedback.
- Make improvements based on the feedback received.

Week 8: (Dec 4 - Dec11) [Everyone]

Finalization and Presentation

Tasks:
- Make any final refinements to the application.
- Complete the write-up or documentation.
- Prepare a presentation for the project.

Repo Organisation

‘_extra/’: Serves as a repository for informal materials, notes, experimental content, and other items not directly related to the project but retained for potential future reference, without undergoing formal review or grading.
‘_freeze/’: Reserved for storing generated files during the build process, representing the static state of the website at a specific point in time.
‘_site/’: Used to contain the static website files generated after processing the quarto document.
‘.github/’: Folder for organizing GitHub templates and workflows.
‘data/’: Reserved for storing essential data files used in the project, such as input csv files.
‘images/’: Dedicated to housing image files utilized within the project or generated images.
‘presentation_files/’: Designated folder for managing files related to presentations.