This study delves into a comprehensive analysis of aircraft crashes in the United States spanning from 1980 to 2022. It focuses on exploring crash locations, timings, consequences, and the influencing factors behind these incidents. And the dataset also comprises of crashes with all kinds of airways like commerical, personal, navy etc,. Leveraging a detailed dataset sourced from the National Transportation Safety Board (NTSB), the research utilizes data visualization techniques and time-series analyses to uncover correlations and trends associated with these aviation mishaps.
Introduction:
The objective of this study is to meticulously examine aircraft crashes’ locations, timings, and consequences during the specified time frame. The research aims to discern correlations, if any, contributing to these incidents and to ascertain whether certain regions are more prone to a higher number of crashes.
The research methodology involves a thorough analysis of the NTSB dataset, employing various data visualization tools and statistical analyses. Specifically, a choropleth is generated to visualize crash frequencies across different regions, while a radial bar plot illustrates crashes during specific flight phases. Additionally, the study investigates crash causes and their correlation with the severity of outcomes through waffle chart to visualize proportions and distribution chart to go over the distrubtion of that categegory over the time. A radar plot is utilized to explore crash occurrences concerning weather conditions and months. And then incorporating the whole visualization into a Shiny App to make it accessible to everyone.
Examining Aircraft Crashes, with a focus on their locations, timings, and consequences
Time series analysis of fatalities, and types of injuries
Approach
We aim to analyze the historical data on fatalities and injuries to understand the trends over the years. To initiate our investigation, we have created two animations. The first animation illustrates the cumulative count of both fatalities and injuries over time, employing the geom_line() function and incorporating a flight image to signify the movement of data points within the plot. In the second animation, we have categorized fatalities based on the severity of injuries, providing a visual representation of how different types of fatalities have evolved over time.
Time Series - Total Fatalities
# Creating Animated Plot for Total FaltalitiesImage <-"images/airplane.png"#Save Image instead of inserting emoji# Labelling the axes and adding title to the plot for Fatalitiesp <-ggplot(flights_ntsb_timeseries,aes(x = event_year,y = total_fatalities)) +geom_line() +geom_image(aes(image = Image),size =0.05) +labs(title ="Yearly Aircraft Crash Fatalities",x ="Year",y ="Total Fatalities") +theme_minimal()# Animating the plotanimated_plot <- p +transition_reveal(event_year) +ease_aes('linear') +shadow_mark()# Animating the plotanimate( animated_plot,nframes =200,width =800,height =600,renderer =gifski_renderer())animated_plot
Findings
There has been a general decrease in the number of total fatalities from 1980 to 2022. A notable spike in fatalities was observed in 2001, attributed to the 9/11 attacks. Post-2001, a significant decline in fatalities was noted. This can be interpreted as more and more safety measures has been taken into consideration as the time passed by which has lead to the decrease in air crashes. There might be some situations where commercial air crashes happen due to new technology being introduced and pilot’s being unable to adapt to it as per this article.
Choropleth map on number of crashes in different regions(US map)
Approach
The approach for creating a choropleth map of flight crashes in different U.S. regions involves data preparation, map generation, and animation. The dataset is filtered for valid latitude and longitude values, and relevant columns are selected. To ensure completeness, unique states and all years are identified. A function, is defined to create maps for each input year, customizing color scales. Maps are saved for each year using a loop.
The animation is created by loading saved map images, joining them into a sequence, and generating the animation. The resulting animation visually represents the temporal evolution of flight crashes across U.S. states.
Choropleth Map - Temporal analysis
# Creating a function to plot a mapgetHexBinMap <-function(input_year) { plot_data <- flights_ntsb_maps |>filter(event_year == input_year)plot_usmap(data = plot_data,values ="total_crashes", color ="black") +theme_void() +scale_fill_gradientn(colors =met.brewer("Hokusai2"),name ="Number of Crashes",limits =c(0, 400) ) +labs(title =sprintf("Flight crashes in US states during %d", input_year)) +theme(legend.position ='right',plot.title =element_text(size =40, hjust =0.5, face ="bold", vjust =-1),legend.text =element_text(size =18),legend.title =element_text(size =20, face ="bold") )}
Findings
Looking at the choropleth map California, Florida, Alaska, Texas has the highest number of crashes. Though the number of crashes has reduced significantly over the time but these states remained at the top in number of crashes. The probable reason might be these states being the east and west border of the Country and consists of more air traffic. And these states also have more air force bases where non-commercial air crashes must have been recorded. Alaska being involved in the list can be related to it’s weather conditions and pilot’s being over confident in those situation as reported in this thread.
Radial Bar Plot
Approach
We plotted the radial bar plot to explore the distribution of flight crashes and associated injuries across different phases of flight. By categorizing flight phases into Landing, Takeoff, Approach, Maneuvering, Climb, and Other this visualization aims to uncover insights into the critical moments during a flight where incidents are more likely to occur. The first graph highlights the count of crashes in each phase, while the second graph focuses on the count of injuries, providing a broad perspective on the safety challenges associated with each phase.
Radial Bar Plot - Total Crashes
# Plotting a Radial Barplot to show the Total Count of Crashes and phase of flight flights_radial_bar_crashes <-ggplot(flights_ntsb_radial,aes(x =fct_rev(flight_phase), y = total_crashes, fill = flight_phase)) +geom_bar(stat ="identity", width =0.8) +geom_text(hjust =1.2, size =3, aes(y =0, label =comma(total_crashes))) +coord_polar(theta ="y") +labs(x =NULL,y =NULL,fill ="Phase of Flight",title ="A Radial View of Total Crashes",subtitle ="as per the phase of flight",caption ="" ) +scale_y_continuous(breaks =seq(0, 27000, by =5000),limits =c(0, 27000) ) +scale_x_discrete(expand =c(0.35, 0)) +scale_fill_frontiers() +theme(legend.position ="bottom",legend.title =element_text(size =12),axis.text =element_blank(),panel.grid.minor =element_blank(),panel.grid.major =element_blank(),plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12) ) +guides(fill =guide_legend(nrow =1,direction ="horizontal",title.position ="top",title.hjust =0.5,label.position ="bottom",label.hjust =1,label.vjust =1,label.theme =element_text(lineheight =0.25, size =11),keywidth =1.5,keyheight =0.5 ) )flights_radial_bar_crashes
Radial Bar Plot - Total Crashes
# remaining - titles, some texts# interactive - plotly# animate - plotly by years or gganimate bar growth
Findings
The radial bar plot offers an intuitive and visually appealing representation of the distribution of crashes and injuries throughout various phases of flight. In the first graph, the bars radiating from the center depict the count of crashes in each phase, allowing for a quick comparison of their frequencies. This visualization enables the identification of phases that might be particularly prone to incidents, guiding further investigation into the contributing factors.
Radial Bar Plot - Total Injuries
# Plotting a Radial Barplot to show the Total Count of Injuries and phase of flight flights_radial_bar_injuries <-ggplot(flights_ntsb_radial,aes(x =fct_rev(flight_phase), y = total_injuries, fill = flight_phase)) +geom_bar(stat ="identity", width =0.8) +geom_text(hjust =1.2, size =3, aes(y =0, label =comma(total_injuries))) +coord_polar(theta ="y") +labs(x =NULL,y =NULL,fill ="Phase of Flight",title ="A Radial View of Total Injuries",subtitle ="as per the phase of flight",caption ="" ) +scale_y_continuous(breaks =seq(0, 13200, by =1000),limits =c(0, 13200) ) +scale_x_discrete(expand =c(0.35, 0)) +scale_fill_frontiers() +theme(legend.position ="bottom",legend.title =element_text(size =12),axis.text =element_blank(),panel.grid.minor =element_blank(),panel.grid.major =element_blank(),plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12) ) +guides(fill =guide_legend(nrow =1,direction ="horizontal",title.position ="top",title.hjust =0.5,label.position ="bottom",label.hjust =1,label.vjust =1,label.theme =element_text(lineheight =0.25, size =11),keywidth =1.5,keyheight =0.5 ) ) flights_radial_bar_injuries
Radial Bar Plot - Total Injuries
# remaining - titles, some texts# interactive - plotly# animate - plotly by years or gganimate bar growth
The second graph, depicting injuries, provides an additional layer of analysis. By comparing the counts of injuries across different flight phases, we can discern whether certain phases are more likely to result in severe consequences. This insight is crucial for understanding the potential risks associated with specific segments of a flight, informing safety measures and protocols.
Analysis of Causes of Crashes
We focused on the text report column of the air crashes data which is probable_cause and we have summarized the text report using hugging face in python which can be found in the folder Probable_Cause. We used the summarized text in exploring the probable causes of these crashes.
Waffle chart
Approach
We prepare data by calculating percentages and determining tile numbers for causes of airplane crashes. It generates a waffle chart using ggplot2, displaying causes represented by ‘✈’ symbols in a grid layout, each tile proportional to the cause’s percentage, and then converts it into an interactive plot using plotly for visualization.
Waffle Chart - Probable Cause Proportion
waffle_data <- count_waffle %>%slice(rep(1:n(), times = number)) %>%mutate(fill_value =ifelse(row_number() <=sum(floored), floored, floored +1))plot_data <-expand.grid(x =0:9, y =0:9) %>%rowwise() |>mutate(index =1+sum(x *10+ y >=cumsum(count_waffle$number)),cause = count_waffle$cause_summary[[index]])# Create a waffle plotp <-ggplot(plot_data, aes(x, y, color = cause, group = cause), show.legend = F) +geom_text(label ="✈",family ='sans',size =8) +scale_color_manual(values =met.brewer("Redon")) +coord_equal() +labs(title ="Waffle chart showing different causes of crashes",colour ='Cause')flight_plot <-ggplotly(p, tooltip =c("color")) |>layout(xaxis =list(showgrid =FALSE,zeroline =FALSE,showticklabels =FALSE,title ="" ),yaxis =list(showgrid =FALSE,zeroline =FALSE,showticklabels =FALSE,title ="" ),showlegend =FALSE )flight_plot
Findings
The waffle chart effectively showcased the distribution of crash causes, emphasizing the prominence of human error(pilot failures) in aviation incidents. The data highlighted the need for enhanced safety measures and training to address the identified causes of crashes.
Density Plot
Approach
We plotted the density plot using ggplot’s geom_density function is to visually analyze the distribution of flight crashes over the years based on their probable causes. By utilizing the probable_cause_flights dataset and focusing on the cause_summary column, this visualization aims to provide insights into the changing patterns and trends of aviation incidents. The x-axis represents the years, offering a chronological perspective, while the y-axis portrays the density of crashes associated with specific causes. Here, we have focused on the attribute Pilot's Failure
This visual representation allows us to identify clusters of high density, indicating periods or years where certain causes were more prevalent. Additionally, it facilitates the detection of outliers or shifts in patterns, enabling a more nuanced exploration of the dataset. Here, we can see that injuries in particular have reduced overtime with the number of Fatal Injuries reducing significantly over the past few decades.
Assessing the Influence of Weather Conditions on Crashes
Radar Plot
Approach
We plot the Radar Plot using Plotly for R to comprehensively assess the influence of weather conditions on flight crashes. Leveraging the flights_ntsb_radar dataset which we derived from the original flights_ntsb dataset and categorizing flight crashes based on Visual Meteorological Conditions (VMC) and Instrument Meteorological Conditions (IMC). This visualization seeks to highlight the varying degrees of impact these conditions have on aviation safety. By layering both datasets on a radar plot, we aim to provide a holistic perspective on how different weather scenarios contribute to flight incidents.
Radar Chart - Weather Relation
# Plotting an Interactive Radar Plot using Plotlyradar_chart <-plot_ly(type ='scatterpolar',fill ='toself',mode ='lines+markers')# first plot tracing of VMC dataradar_chart <- radar_chart |>add_trace(r = radar_trace_r2,theta = radar_theta,name ='VMC' )# second plot tracing of IMC dataradar_chart <- radar_chart |>add_trace(r = radar_trace_r1,theta = radar_theta,name ='IMC' )# plot layout configurationradar_chart <- radar_chart |>layout(polar =list(radialaxis =list(visible = T,range =c(0, 10500) )))radar_chart
Findings
The radar plot serves as an effective means to showcase the multivariate nature of weather conditions and their relationship with flight crashes. Each axis on the radar represents a specific parameter related to aviation safety, such as visibility, cloud cover, wind speed, and temperature. The radar plot allows for the simultaneous comparison of these parameters for VMC and IMC, unveiling patterns and discrepancies in their respective contributions to incidents.
Conclusion
Over the 1980-2020 period, aviation has witnessed a significant decrease in the number of crashes and fatalities. Notably, the events of 9/11 in 2001 led to a spike in fatalities, but overall trends show a consistent decline. The aviation industry’s implementation of stringent rules has played a crucial role in enhancing safety. With California, Alaska, Texas, and Florida experiencing higher frequencies. In addition to human factors and mechanical issues, weather patterns also played a role in aviation incidents. A radar plot revealed distinct correlations between specific weather conditions and increased crash occurrences in certain months.
This collective progress signifies a positive shift towards safer air travel. With fatalities and crashes decreasing over time, the aviation industry is advancing towards a safer and more efficient mode of transport. These improvements align with the industry’s goal of providing faster and safer global travel, with flights covering vast distances in just a couple of hours.
---title: "From Takeoff to Touchdown: Dissecting Data on Air Disasters"subtitle: "INFO 526 - Project Final"author: - name: "Infographic Innovators - Antonio, Bharath, Eshaan, Thanoosha" affiliations: - name: "School of Information, University of Arizona"description: "A shiny app integration with aircraft crash analysis"format: html: code-tools: true code-overflow: wrap embed-resources: trueeditor: visualexecute: warning: false echo: false---## Abstract:This study delves into a comprehensive analysis of aircraft crashes in the United States spanning from 1980 to 2022. It focuses on exploring crash locations, timings, consequences, and the influencing factors behind these incidents. And the dataset also comprises of crashes with all kinds of airways like commerical, personal, navy etc,. Leveraging a detailed dataset sourced from the National Transportation Safety Board (NTSB), the research utilizes data visualization techniques and time-series analyses to uncover correlations and trends associated with these aviation mishaps.## Introduction:The objective of this study is to meticulously examine aircraft crashes' locations, timings, and consequences during the specified time frame. The research aims to discern correlations, if any, contributing to these incidents and to ascertain whether certain regions are more prone to a higher number of crashes.The research methodology involves a thorough analysis of the NTSB dataset, employing various data visualization tools and statistical analyses. Specifically, a choropleth is generated to visualize crash frequencies across different regions, while a radial bar plot illustrates crashes during specific flight phases. Additionally, the study investigates crash causes and their correlation with the severity of outcomes through waffle chart to visualize proportions and distribution chart to go over the distrubtion of that categegory over the time. A radar plot is utilized to explore crash occurrences concerning weather conditions and months. And then incorporating the whole visualization into a [Shiny App](https://99javc-velamala-bharath.shinyapps.io/fromtakeofftotouchdown/) to make it accessible to everyone.```{r load_packages, message=FALSE, include=FALSE}# GETTING THE LIBRARIESif (!require(pacman))install.packages(pacman)pacman::p_load(tidyverse, dplyr, janitor, dlookr, here, ggpubr, maps, plotly, gganimate, MetBrewer, ggsci, scales, fmsb, gifski, ggimage, ggtext, emojifont, magick, lubridate, patchwork, viridis, usmap, sf, sp, animation)pacman::p_load_gh("BlakeRMills/MoMAColors")``````{r ggplot_setup, message=FALSE, include=FALSE}# setting theme for ggplot2ggplot2::theme_set(ggplot2::theme_minimal(base_size =14, base_family ="sans"))# setting width of code outputoptions(width =65)# setting figure parameters for knitrknitr::opts_chunk$set(fig.width =8, # 8" widthfig.asp =0.618, # the golden ratiofig.retina =1, # dpi multiplier for displaying HTML output on retinafig.align ="center", # center align figuresdpi =200, # higher dpi, sharper imagemessage =FALSE)``````{r load_dataset, include=FALSE}# Reading the data using read_csvflights_ntsb <-read_csv(here("data", "flight_crash_data_NTSB.csv"))probable_cause_flights <-read_csv(here("data","new_flights_PC.csv"))``````{r remove_columns, include=FALSE}# selecting columns which are required for our analysisflights_ntsb <- flights_ntsb |>select( EventType, EventDate, City, State, ReportType, HighestInjuryLevel, FatalInjuryCount, SeriousInjuryCount, MinorInjuryCount, ProbableCause, Latitude, Longitude, AirCraftCategory, NumberOfEngines, AirCraftDamage, WeatherCondition ) |># cleaning column names using janitor packageclean_names()probable_cause_flights["CauseSummary"]<-probable_cause_flights["Probable_Cause"]#removing null valuesprobable_cause_flights <-subset(probable_cause_flights, probable_cause_flights$HighestInjuryLevel!="")probable_cause_flights <- probable_cause_flights |>select( EventType, EventDate, City, State, ReportType, HighestInjuryLevel, FatalInjuryCount, SeriousInjuryCount, MinorInjuryCount, ProbableCause, Latitude, Longitude, AirCraftCategory, NumberOfEngines, AirCraftDamage, WeatherCondition, CauseSummary ) |># cleaning column names using janitor packageclean_names()``````{r data_wrangle, include=FALSE}# Modifying the data using mutate()flights_ntsb <- flights_ntsb |># getting event time from the date time columnmutate(event_time =format(event_date, "%H:%M"),# places the column after event_date.after = event_date) |>mutate(# getting event date from the date time columnevent_date =as.Date(event_date),# extracting flight phase based on NTSB's designated terminologyflight_phase =case_when(grepl("Landing", probable_cause, ignore.case =TRUE) ~"Landing",grepl("Stop", probable_cause, ignore.case =TRUE) ~"Landing",grepl("Approach", probable_cause, ignore.case =TRUE) ~"Approach",grepl("Takeoff", probable_cause, ignore.case =TRUE) ~"Takeoff",grepl("Take-off", probable_cause, ignore.case =TRUE) ~"Takeoff",grepl("Maneuvering", probable_cause, ignore.case =TRUE) ~"Maneuvering",grepl("Climb", probable_cause, ignore.case =TRUE) ~"Climb",grepl("Descent", probable_cause, ignore.case =TRUE) ~"Descent",grepl("Taxi", probable_cause, ignore.case =TRUE) ~"Taxi",grepl("Cruise", probable_cause, ignore.case =TRUE) ~"Cruise",grepl("Hover", probable_cause, ignore.case =TRUE) ~"Hover",grepl("Standing", probable_cause, ignore.case =TRUE) ~"Standing",grepl("Uncontrolled Descent", probable_cause, ignore.case =TRUE) ~"Uncontrolled Descent",grepl("Emergency", probable_cause, ignore.case =TRUE) ~"Emergency",grepl("Holding", probable_cause, ignore.case =TRUE) ~"Holding", ),# places the flight phase column after probable cause.after = probable_cause ) |># getting year and month of the eventsmutate(event_year =year(event_date),event_month =month(event_date) )```## Examining Aircraft Crashes, with a focus on their locations, timings, and consequences### Time series analysis of fatalities, and types of injuries**Approach**We aim to analyze the historical data on fatalities and injuries to understand the trends over the years. To initiate our investigation, we have created two animations. The first animation illustrates the cumulative count of both fatalities and injuries over time, employing the `geom_line()` function and incorporating a flight image to signify the movement of data points within the plot. In the second animation, we have categorized fatalities based on the severity of injuries, providing a visual representation of how different types of fatalities have evolved over time.```{r timeseries_data, include=FALSE}# Creating a dataset for the timeSeries plotflights_ntsb_timeseries <- flights_ntsb |>group_by(event_year) |>summarise(total_fatalities =sum(fatal_injury_count, na.rm =TRUE),total_serious_injuries =sum(serious_injury_count, na.rm =TRUE),total_minor_injuries =sum(minor_injury_count, na.rm =TRUE) )#time series plot ----tp <-ggplot(flights_ntsb_timeseries,aes(x = event_year, y = total_fatalities)) +geom_line() +labs(title ="Yearly Aircraft Crash Fatalities",x ="Year",y ="Total Fatalities") +theme_minimal()# Interactive plot with ploty---- total fatalitiesp <-ggplot(flights_ntsb_timeseries,aes(x = event_year, y = total_fatalities)) +geom_line() +geom_text(aes(label ="✈️"),vjust =-0.5,hjust =0.5,size =5) +labs(title ="Yearly Aircraft Crash Fatalities",x ="Year",y ="Total Fatalities") +theme(plot.title =element_text(size =20),axis.text =element_text(size =12),axis.title =element_text(hjust =0, size =12))ggplotly(p)``````{r time_series_gif, echo=TRUE}#| code-fold: true#| code-summary: "Time Series - Total Fatalities"#| fig-alt: "Time series analysis of Total Fatalaties due to air crashed over the year which was visualized using an animation where a flight image is used to signfiy the transition."# Creating Animated Plot for Total FaltalitiesImage <-"images/airplane.png"#Save Image instead of inserting emoji# Labelling the axes and adding title to the plot for Fatalitiesp <-ggplot(flights_ntsb_timeseries,aes(x = event_year,y = total_fatalities)) +geom_line() +geom_image(aes(image = Image),size =0.05) +labs(title ="Yearly Aircraft Crash Fatalities",x ="Year",y ="Total Fatalities") +theme_minimal()# Animating the plotanimated_plot <- p +transition_reveal(event_year) +ease_aes('linear') +shadow_mark()# Animating the plotanimate( animated_plot,nframes =200,width =800,height =600,renderer =gifski_renderer())animated_plot``````{r timeseries_animated, include=FALSE}# Labelling the axes and adding title to the plot for Serious Injuriesp_serious_injuries <-ggplot(flights_ntsb_timeseries,aes(x = event_year,y = total_serious_injuries)) +geom_line() +geom_image(aes(image = Image),size =0.05) +labs(title ="Yearly Aircraft Crash Serious Injuries",x ="Year",y ="Total Serious Injuries") +theme_minimal()# Animating the Aircraft serious injuries plotanimated_serious_injuries <- p_serious_injuries +transition_reveal(event_year) +ease_aes('linear') +shadow_mark()animate( animated_serious_injuries,nframes =200,width =800,height =600,renderer =gifski_renderer())# Plotting a timeseries graph for total minor injuriesp_minor_injuries <-ggplot(flights_ntsb_timeseries,aes(x = event_year,y = total_minor_injuries)) +geom_line() +geom_image(aes(image = Image), size =0.05) +labs(title ="Yearly Aircraft Crash Minor Injuries",x ="Year",y ="Total Minor Injuries") +theme_minimal()# Animating the timeseries plot for total minor injuriesanimated_minor_injuries <- p_minor_injuries +transition_reveal(event_year) +ease_aes('linear') +shadow_mark()animate( animated_minor_injuries,nframes =200,width =800,height =600,renderer =gifski_renderer())# Reshaping the data to a longer formatlong_fcdata <- flights_ntsb_timeseries |>pivot_longer(cols =starts_with("total_"),names_to ="Category",values_to ="Count")# Static plot for 3 animations- interactive# Time series plot with all categoriestp <-ggplot(long_fcdata,aes(x = event_year,y = Count,color = Category)) +geom_line() +labs(title ="Yearly Aircraft Crash Statistics",x ="Year",y ="Count") +theme_minimal()# Interactive plot with plotlyp <-ggplot(long_fcdata, aes(x = event_year,y = Count,color = Category)) +geom_line() +geom_text(aes(label ="✈️"),vjust =-0.5,hjust =0.5,size =5) +labs(title ="Yearly Aircraft Crash Statistics",x ="Year",y ="Count")ggplotly(p)```**Findings**There has been a general decrease in the number of total fatalities from 1980 to 2022. A notable spike in fatalities was observed in 2001, attributed to the 9/11 attacks. Post-2001, a significant decline in fatalities was noted. This can be interpreted as more and more safety measures has been taken into consideration as the time passed by which has lead to the decrease in air crashes. There might be some situations where commercial air crashes happen due to new technology being introduced and pilot's being unable to adapt to it as per this [article](https://commercial.allianz.com/news-and-insights/expert-risk-articles/how-aviation-safety-has-improved.html).```{r animated_labels, echo=TRUE}#| code-fold: true#| code-summary: "Time Series - Injuries severity"#| fig-alt: "Time series analysis of all injuries which are categorized to fatal, minor and serious due to air crashed over the year which was visualized using an animation where a horizontal line and a labelled text is used to visualize the tranistion. And 2001 crashes is highlighted."# all 3 plots with label - animated fcdata <- flights_ntsb_timeseries |>ungroup() |>pivot_longer(cols =starts_with("total_"),names_to ="Category",values_to ="Count") |>mutate(Category =case_when( Category =="total_fatalities"~"Fatalities", Category =="total_serious_injuries"~"Serious\nInjuries", Category =="total_minor_injuries"~"Minor\nInjuries" ) )# Creating the animated plotanimation_fcdata <-ggplot(fcdata,aes(event_year, Count,group = Category,color = Category)) +geom_line(size =1.3, show.legend =FALSE) +geom_segment(aes(xend =max(event_year) +0.1,yend = Count),linetype =2,colour ='grey',show.legend =FALSE ) +geom_point(size =2,show.legend =FALSE) +geom_text(aes(x =max(event_year) +0.1,label = Category,color ="#000000" ),size =5,hjust =0,show.legend =FALSE ) +geom_vline(xintercept =2001, color ="red", size =1.25) +transition_reveal(event_year) +coord_cartesian(clip ='off') +theme_minimal() +theme(plot.title =element_text(size =20),plot.margin =margin(5.5, 40, 5.5, 5.5),axis.text =element_text(size =12),axis.title =element_text(hjust =0, size =12)) +labs(title ='Aircraft Crash Statistics Over Time',y ='Number of Crashes',x =element_blank())# Animating the plotanimated_fcdata <-animate( animation_fcdata,fps =10,duration =25,width =1200,height =700,renderer =gifski_renderer("images/animated_fcdata.gif"))# Display or save the animated plotanimated_fcdata```### Choropleth map on number of crashes in different regions(US map)**Approach**The approach for creating a choropleth map of flight crashes in different U.S. regions involves data preparation, map generation, and animation. The dataset is filtered for valid latitude and longitude values, and relevant columns are selected. To ensure completeness, unique states and all years are identified. A function, is defined to create maps for each input year, customizing color scales. Maps are saved for each year using a loop.The animation is created by loading saved map images, joining them into a sequence, and generating the animation. The resulting animation visually represents the temporal evolution of flight crashes across U.S. states.```{r mapl_plot_data, include=FALSE}# Filtering out NA and invalid Latitude and Longitude valuesflights_ntsb_maps <- flights_ntsb |>subset(!is.na(longitude) &!is.na(latitude)& latitude <75& latitude >10& longitude <-60)# Creating a dataset for the map plot and picking the columns that we needflights_ntsb_maps <- flights_ntsb_maps |># getting total crashes and total injuriesgroup_by(state, event_year) |>summarise(total_crashes =n(),total_injuries =sum(fatal_injury_count, na.rm =TRUE) +sum(serious_injury_count, na.rm =TRUE) +sum(minor_injury_count, na.rm =TRUE) ) |>drop_na()# getting unique states in the dataunique_states <-unique(flights_ntsb_maps$state)# getting all years involved in the dataall_years <-unique(flights_ntsb_maps$event_year)# combining above data so that we will be generating data of the states # which are missing in the original dataall_states_data <-expand.grid(state = unique_states, event_year = all_years)flights_ntsb_maps <-# using left_join() to combine the data based on state and event_yearleft_join(all_states_data, flights_ntsb_maps,by =c("state", "event_year")) |>mutate(total_crashes =ifelse(is.na(total_crashes), 0, total_crashes),total_injuries =ifelse(is.na(total_injuries), 0, total_injuries) )``````{r function_hexbin, echo=TRUE}#| code-fold: true#| code-summary: "Choropleth Map - Temporal analysis"# Creating a function to plot a mapgetHexBinMap <-function(input_year) { plot_data <- flights_ntsb_maps |>filter(event_year == input_year)plot_usmap(data = plot_data,values ="total_crashes", color ="black") +theme_void() +scale_fill_gradientn(colors =met.brewer("Hokusai2"),name ="Number of Crashes",limits =c(0, 400) ) +labs(title =sprintf("Flight crashes in US states during %d", input_year)) +theme(legend.position ='right',plot.title =element_text(size =40, hjust =0.5, face ="bold", vjust =-1),legend.text =element_text(size =18),legend.title =element_text(size =20, face ="bold") )}``````{r images_yearwise, include=FALSE}# Creating a list of Years from datasetyears_list <- flights_ntsb_maps |>arrange(event_year) |>distinct(event_year) |>pull()for (i inseq_along(years_list)) { my_maps <-paste0("images/map_plot/flight_crash_us_states", years_list[i],".jpg")getHexBinMap(input_year = years_list[i])ggsave( my_maps,height =9,width =15,unit ="in",dpi =200 )}``````{r animation_hexbin}#| fig-alt: "Animation of US maps where each state is highlighted based on the number of crashes occurred across the time."# making gif using gganimate packagehex_bin_maps <-list.files(path ="images/map_plot/", full.names =TRUE)hex_bin_maps_list <-lapply(hex_bin_maps, image_read)# Joining all the saved imagesjoined_plots <-image_join(hex_bin_maps_list)# Animating the images using image_animate() and restting the resolution# Setting fps = 1hex_bin_maps_animation <-image_animate(image_scale(joined_plots, "2000x1000"), fps =2)# Saving gif to the repositoryimage_write(image = hex_bin_maps_animation,path ="images/flight_crash_us_states.gif")hex_bin_maps_animation```**Findings**Looking at the choropleth map California, Florida, Alaska, Texas has the highest number of crashes. Though the number of crashes has reduced significantly over the time but these states remained at the top in number of crashes. The probable reason might be these states being the east and west border of the Country and consists of more air traffic. And these states also have more air force bases where non-commercial air crashes must have been recorded. Alaska being involved in the list can be related to it's weather conditions and pilot's being over confident in those situation as reported in this [thread](https://www.quora.com/Why-are-there-so-many-plane-crashes-in-Alaska).### Radial Bar Plot**Approach**We plotted the radial bar plot to explore the distribution of flight crashes and associated injuries across different phases of flight. By categorizing flight phases into `Landing, Takeoff, Approach, Maneuvering, Climb, and Other` this visualization aims to uncover insights into the critical moments during a flight where incidents are more likely to occur. The first graph highlights the count of crashes in each phase, while the second graph focuses on the count of injuries, providing a broad perspective on the safety challenges associated with each phase.```{r radial_plot, include=FALSE}# Creating a filtered dataset for Radial Plotflights_ntsb_radial <- flights_ntsb |># getting total crashes and total injuriesgroup_by(flight_phase) |>summarise(total_crashes =n(),total_injuries =sum(fatal_injury_count, na.rm =TRUE) +sum(serious_injury_count, na.rm =TRUE) +sum(minor_injury_count, na.rm =TRUE) ) |>drop_na() |>arrange(desc(total_crashes))# getting top 5 flight phases where flight crashes occuredflights_ntsb_radial <- flights_ntsb_radial |>slice(1:5) |>bind_rows( flights_ntsb_radial |>slice(-(1:5)) |>summarize(flight_phase ="Other",total_crashes =sum(total_crashes),total_injuries =sum(total_injuries) ) ) |>mutate(flight_phase =fct_inorder(flight_phase))flights_ntsb_radial``````{r radial_plot_crashes, echo=TRUE}#| code-fold: true#| code-summary: "Radial Bar Plot - Total Crashes"#| fig-alt: "Radial Bar Plot visualizing the total number of crashes occurred at different phases of the flight."# Plotting a Radial Barplot to show the Total Count of Crashes and phase of flight flights_radial_bar_crashes <-ggplot(flights_ntsb_radial,aes(x =fct_rev(flight_phase), y = total_crashes, fill = flight_phase)) +geom_bar(stat ="identity", width =0.8) +geom_text(hjust =1.2, size =3, aes(y =0, label =comma(total_crashes))) +coord_polar(theta ="y") +labs(x =NULL,y =NULL,fill ="Phase of Flight",title ="A Radial View of Total Crashes",subtitle ="as per the phase of flight",caption ="" ) +scale_y_continuous(breaks =seq(0, 27000, by =5000),limits =c(0, 27000) ) +scale_x_discrete(expand =c(0.35, 0)) +scale_fill_frontiers() +theme(legend.position ="bottom",legend.title =element_text(size =12),axis.text =element_blank(),panel.grid.minor =element_blank(),panel.grid.major =element_blank(),plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12) ) +guides(fill =guide_legend(nrow =1,direction ="horizontal",title.position ="top",title.hjust =0.5,label.position ="bottom",label.hjust =1,label.vjust =1,label.theme =element_text(lineheight =0.25, size =11),keywidth =1.5,keyheight =0.5 ) )flights_radial_bar_crashes# remaining - titles, some texts# interactive - plotly# animate - plotly by years or gganimate bar growth```**Findings**The radial bar plot offers an intuitive and visually appealing representation of the distribution of crashes and injuries throughout various phases of flight. In the first graph, the bars radiating from the center depict the count of crashes in each phase, allowing for a quick comparison of their frequencies. This visualization enables the identification of phases that might be particularly prone to incidents, guiding further investigation into the contributing factors.```{r radial_plot_injuries, echo=TRUE}#| code-fold: true#| code-summary: "Radial Bar Plot - Total Injuries"#| fig-alt: "Radial Bar Plot visualizing the total number of injuries occurred at different phases of the flight."# Plotting a Radial Barplot to show the Total Count of Injuries and phase of flight flights_radial_bar_injuries <-ggplot(flights_ntsb_radial,aes(x =fct_rev(flight_phase), y = total_injuries, fill = flight_phase)) +geom_bar(stat ="identity", width =0.8) +geom_text(hjust =1.2, size =3, aes(y =0, label =comma(total_injuries))) +coord_polar(theta ="y") +labs(x =NULL,y =NULL,fill ="Phase of Flight",title ="A Radial View of Total Injuries",subtitle ="as per the phase of flight",caption ="" ) +scale_y_continuous(breaks =seq(0, 13200, by =1000),limits =c(0, 13200) ) +scale_x_discrete(expand =c(0.35, 0)) +scale_fill_frontiers() +theme(legend.position ="bottom",legend.title =element_text(size =12),axis.text =element_blank(),panel.grid.minor =element_blank(),panel.grid.major =element_blank(),plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12) ) +guides(fill =guide_legend(nrow =1,direction ="horizontal",title.position ="top",title.hjust =0.5,label.position ="bottom",label.hjust =1,label.vjust =1,label.theme =element_text(lineheight =0.25, size =11),keywidth =1.5,keyheight =0.5 ) ) flights_radial_bar_injuries# remaining - titles, some texts# interactive - plotly# animate - plotly by years or gganimate bar growth```The second graph, depicting injuries, provides an additional layer of analysis. By comparing the counts of injuries across different flight phases, we can discern whether certain phases are more likely to result in severe consequences. This insight is crucial for understanding the potential risks associated with specific segments of a flight, informing safety measures and protocols.## Analysis of Causes of CrashesWe focused on the text report column of the air crashes data which is `probable_cause` and we have summarized the text report using `hugging face` in python which can be found in the folder `Probable_Cause`. We used the summarized text in exploring the probable causes of these crashes.### Waffle chart**Approach**We prepare data by calculating percentages and determining tile numbers for causes of airplane crashes. It generates a waffle chart using ggplot2, displaying causes represented by '✈' symbols in a grid layout, each tile proportional to the cause's percentage, and then converts it into an interactive plot using plotly for visualization.```{r datawragling,include=FALSE}count <- probable_cause_flights |>group_by(cause_summary)|>summarize(perc=n())|>mutate(perc= (perc/sum(perc)) )count_waffle <- count |>mutate(remainder = perc *100-floor(perc *100),floored =floor(perc *100) ) |>arrange(desc(remainder)) |>mutate(number =ifelse(100-sum(floored) >=row_number(), floored +1, floored)) |>arrange(perc)``````{r function-waffle, echo=TRUE}#| code-fold: true#| code-summary: "Waffle Chart - Probable Cause Proportion"#| fig-alt: "A waffle chart using ggplot and plotly to visualize the proportion probable causes of flight crashes using a flight emoji each waffle square/grid."waffle_data <- count_waffle %>%slice(rep(1:n(), times = number)) %>%mutate(fill_value =ifelse(row_number() <=sum(floored), floored, floored +1))plot_data <-expand.grid(x =0:9, y =0:9) %>%rowwise() |>mutate(index =1+sum(x *10+ y >=cumsum(count_waffle$number)),cause = count_waffle$cause_summary[[index]])# Create a waffle plotp <-ggplot(plot_data, aes(x, y, color = cause, group = cause), show.legend = F) +geom_text(label ="✈",family ='sans',size =8) +scale_color_manual(values =met.brewer("Redon")) +coord_equal() +labs(title ="Waffle chart showing different causes of crashes",colour ='Cause')flight_plot <-ggplotly(p, tooltip =c("color")) |>layout(xaxis =list(showgrid =FALSE,zeroline =FALSE,showticklabels =FALSE,title ="" ),yaxis =list(showgrid =FALSE,zeroline =FALSE,showticklabels =FALSE,title ="" ),showlegend =FALSE )flight_plot```**Findings**The waffle chart effectively showcased the distribution of crash causes, emphasizing the prominence of human error(pilot failures) in aviation incidents. The data highlighted the need for enhanced safety measures and training to address the identified causes of crashes.### Density Plot**Approach**We plotted the density plot using ggplot's `geom_density` function is to visually analyze the distribution of flight crashes over the years based on their probable causes. By utilizing the `probable_cause_flights` dataset and focusing on the `cause_summary` column, this visualization aims to provide insights into the changing patterns and trends of aviation incidents. The x-axis represents the years, offering a chronological perspective, while the y-axis portrays the density of crashes associated with specific causes. Here, we have focused on the attribute `Pilot's Failure````{r distribution, echo=TRUE}#| code-fold: true#| code-summary: "Distrubtion Chart - Pilot's Fault?"#| fig-alt: "A Distrubtion chart to visualize the distribution of the probable causes of flight crashes due to pilot across the years."ggplot(subset( probable_cause_flights, probable_cause_flights$cause_summary =="pilot's failure" )) +geom_density(aes(x =year(event_date),fill =factor(highest_injury_level, levels =c("Fatal", "Serious", "Minor")) ), alpha =0.7) +scale_fill_manual(values =c("#bcd67c", "#d398ff", "#82dfe2")) +labs(title ="Distribution of crashes caused by pilot's negligence over time",x ="Year",y ="Density",fill ="Highest Injury Level") +theme(legend.position ="bottom",panel.grid.major =element_blank(),plot.title =element_text(size =16, face ="bold"),plot.subtitle =element_text(size =12),axis.text =element_text(size =10),axis.title =element_text(hjust =0, size =10) )```**Findings**This visual representation allows us to identify clusters of high density, indicating periods or years where certain causes were more prevalent. Additionally, it facilitates the detection of outliers or shifts in patterns, enabling a more nuanced exploration of the dataset. Here, we can see that injuries in particular have reduced overtime with the number of Fatal Injuries reducing significantly over the past few decades.## Assessing the Influence of Weather Conditions on Crashes### Radar Plot**Approach**We plot the Radar Plot using Plotly for R to comprehensively assess the influence of weather conditions on flight crashes. Leveraging the `flights_ntsb_radar` dataset which we derived from the original `flights_ntsb` dataset and categorizing flight crashes based on `Visual Meteorological Conditions (VMC)` and `Instrument Meteorological Conditions (IMC)`. This visualization seeks to highlight the varying degrees of impact these conditions have on aviation safety. By layering both datasets on a radar plot, we aim to provide a holistic perspective on how different weather scenarios contribute to flight incidents.```{r radarchart_data, include=FALSE}# Assigning different weather conditions to variables# IFR and IMC are same conditions so we are combining them# VFR and VMC are same conditions so we are combining themflights_ntsb_radar <- flights_ntsb |>mutate(weather_condition =case_when( weather_condition =="IFR"~"IMC", weather_condition =="VFR"~"VMC", weather_condition =="Unknown"~"UNK",is.na(weather_condition) ~"UNK",TRUE~ weather_condition ) ) |># getting total crashes and total injuriesgroup_by(event_month, weather_condition) |>summarise(total_crashes =n(),total_injuries =sum(fatal_injury_count, na.rm =TRUE) +sum(serious_injury_count, na.rm =TRUE) +sum(minor_injury_count, na.rm =TRUE) ) |>arrange(event_month)``````{r radar_chart, include=FALSE}# Filtering out weather condition 'IMC'radar_trace_r1 <- flights_ntsb_radar |>filter(weather_condition %in%c("IMC")) |>pull(total_crashes)radar_trace_r1 <-c( radar_trace_r1, flights_ntsb_radar |>filter(weather_condition =="IMC"& event_month ==1) |>pull(total_crashes) )# Filtering out weather condition 'VMC'radar_trace_r2 <- flights_ntsb_radar |>filter(weather_condition %in%c("VMC")) |>pull(total_crashes)radar_trace_r2 <-c( radar_trace_r2, flights_ntsb_radar |>filter(weather_condition =="VMC"& event_month ==1) |>pull(total_crashes) )radar_theta <-c("Jan" , "Feb" , "Mar" , "Apr", "May", "Jun", "Jul", "Aug", "Sept", "Oct", "Nov", "Dec", "Jan")``````{r radar_chart_plotly, echo=TRUE}#| code-fold: true#| code-summary: "Radar Chart - Weather Relation"#| fig-alt: "A Radar chart visualizing the impact of both months and weather technology involved in air control in air crashes."# Plotting an Interactive Radar Plot using Plotlyradar_chart <-plot_ly(type ='scatterpolar',fill ='toself',mode ='lines+markers')# first plot tracing of VMC dataradar_chart <- radar_chart |>add_trace(r = radar_trace_r2,theta = radar_theta,name ='VMC' )# second plot tracing of IMC dataradar_chart <- radar_chart |>add_trace(r = radar_trace_r1,theta = radar_theta,name ='IMC' )# plot layout configurationradar_chart <- radar_chart |>layout(polar =list(radialaxis =list(visible = T,range =c(0, 10500) )))radar_chart```**Findings**The radar plot serves as an effective means to showcase the multivariate nature of weather conditions and their relationship with flight crashes. Each axis on the radar represents a specific parameter related to aviation safety, such as visibility, cloud cover, wind speed, and temperature. The radar plot allows for the simultaneous comparison of these parameters for VMC and IMC, unveiling patterns and discrepancies in their respective contributions to incidents.## ConclusionOver the 1980-2020 period, aviation has witnessed a significant decrease in the number of crashes and fatalities. Notably, the events of 9/11 in 2001 led to a spike in fatalities, but overall trends show a consistent decline. The aviation industry's implementation of stringent rules has played a crucial role in enhancing safety. With California, Alaska, Texas, and Florida experiencing higher frequencies. In addition to human factors and mechanical issues, weather patterns also played a role in aviation incidents. A radar plot revealed distinct correlations between specific weather conditions and increased crash occurrences in certain months.This collective progress signifies a positive shift towards safer air travel. With fatalities and crashes decreasing over time, the aviation industry is advancing towards a safer and more efficient mode of transport. These improvements align with the industry's goal of providing faster and safer global travel, with flights covering vast distances in just a couple of hours.::: {.callout-note collapse="true" appearance="minimal"}## ReferencesTime Series info - [2001 crashes](https://en.wikipedia.org/wiki/American_Airlines_Flight_587)Waffle Plot - [Waffle Chart](https://www.filipr.com/posts/waffle-chart/)Shiny - [Shiny](https://shiny.posit.co/r/gallery/)gganimate - [gganimate](https://r-graph-gallery.com/288-animated-barplot-transition.html)plotly - [plotly](https://plotly.com/r/animations/)issues - [legend](https://community.plotly.com/t/how-to-remove-aa-from-the-legend/41506)NTSB Dashboard - [NTSB](https://www.ntsb.gov/safety/data/Pages/GeneralAviationDashboard.aspx):::