From Takeoff to Touchdown: Dissecting Data on Air Disasters
Proposal
High Level Goal
Build a Shiny app where users can look at different plots to analyze the flight crashes occurred in the United States form 1980 to 2022.
Dataset
# Reading the data using read_csv
flights_ntsb <- read_csv(here("data", "flight_crash_data_NTSB.csv"))
#selecting columns
flights_ntsb <- flights_ntsb |>
select(
EventType, EventDate,
City, State,
HasSafetyRec,
ReportType, HighestInjuryLevel,
FatalInjuryCount, SeriousInjuryCount,
MinorInjuryCount, ProbableCause,
Latitude, Longitude,
AirCraftCategory, AirportID,
AirportName, AmateurBuilt,
NumberOfEngines, AirCraftDamage,
WeatherCondition
) |>
clean_names()
flights_ntsb
In preparation for our analysis, we loaded the data and preliminary reviewed its structure and contents. For example, using inline code such as clean_names
, diagnose
, describe
, and formattable
. We obtained an overview of the dataset’s variables and initial statistics. This preliminary step confirmed the dataset’s suitability for answering our research questions, which revolve around identifying temporal patterns and finding relations between parameters that effect crashes.
Background
The dataset central to our investigation was procured through a request to the National Transportation Safety Board (NSTB). It encompasses detailed records of aircraft crashes in the U.S. from January 1, 1980, to December 31, 2022. The dataset is structured as a tibble with 89,134 rows and 38 columns, out of which a few unnecessary columns has been removed for our purpose which resulted in a tibble with 89,134 rows and 20 columns. This rich dataset is ideal for our purpose because it enables both time-series, geo-spatial analysis and also other factors that affect the crashes - methods we believe are critical for understanding the dynamics and geo-spatial distribution of aircraft accidents over time.
Flights Report Data Diagnosis Code
variables | types | missing_count | missing_percent | unique_count | unique_rate |
---|---|---|---|---|---|
event_type | character | 8 | 0.008975251 | 3 | 3.365719e-05 |
event_date | POSIXct | 0 | 0.000000000 | 80322 | 9.011376e-01 |
city | character | 8 | 0.008975251 | 19038 | 2.135885e-01 |
state | character | 155 | 0.173895483 | 58 | 6.507057e-04 |
has_safety_rec | logical | 0 | 0.000000000 | 2 | 2.243813e-05 |
report_type | character | 1 | 0.001121906 | 5 | 5.609532e-05 |
highest_injury_level | character | 126 | 0.141360199 | 5 | 5.609532e-05 |
fatal_injury_count | numeric | 0 | 0.000000000 | 52 | 5.833913e-04 |
serious_injury_count | numeric | 0 | 0.000000000 | 24 | 2.692575e-04 |
minor_injury_count | numeric | 0 | 0.000000000 | 49 | 5.497341e-04 |
probable_cause | character | 29568 | 33.172526757 | 56119 | 6.296026e-01 |
latitude | numeric | 0 | 0.000000000 | 61321 | 6.879642e-01 |
longitude | numeric | 0 | 0.000000000 | 62243 | 6.983082e-01 |
air_craft_category | character | 13 | 0.014584782 | 38 | 4.263244e-04 |
airport_id | character | 40230 | 45.134292189 | 9656 | 1.083313e-01 |
airport_name | character | 32675 | 36.658289766 | 25807 | 2.895304e-01 |
amateur_built | character | 0 | 0.000000000 | 7 | 7.853344e-05 |
number_of_engines | character | 1546 | 1.734467207 | 40 | 4.487625e-04 |
air_craft_damage | character | 21 | 0.023560033 | 29 | 3.253528e-04 |
weather_condition | character | 173 | 0.194089797 | 7 | 7.853344e-05 |
Flights Report Data Describe Code
described_variables | n | na | mean | sd | se_mean | IQR | skewness | kurtosis | p00 | p01 | p05 | p10 | p20 | p25 | p30 | p40 | p50 | p60 | p70 | p75 | p80 | p90 | p95 | p99 | p100 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
fatal_injury_count | 89134 | 0 | 0.3635538 | 2.251023e+00 | 7.539771e-03 | 0.00000 | 65.05258 | 5829.500 | 0 | 0.00 | 0.0000 | 0.0000 | 0.00000 | 0.00 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 2.00000 | 4.00000 | 2.650000e+02 |
serious_injury_count | 89134 | 0 | 0.1950210 | 7.612223e-01 | 2.549704e-03 | 0.00000 | 37.67354 | 3044.159 | 0 | 0.00 | 0.0000 | 0.0000 | 0.00000 | 0.00 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 1.00000 | 2.00000 | 8.100000e+01 |
minor_injury_count | 89134 | 0 | 0.3146386 | 1.399480e+00 | 4.687541e-03 | 0.00000 | 42.29602 | 3134.592 | 0 | 0.00 | 0.0000 | 0.0000 | 0.00000 | 0.00 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 0.00000 | 1.00000 | 2.00000 | 3.00000 | 1.370000e+02 |
latitude | 89134 | 0 | 988.6591476 | 2.011036e+05 | 6.735939e+02 | 12.22944 | 211.29325 | 44673.978 | 0 | 0.00 | 0.0000 | 0.0000 | 21.31861 | 29.18 | 31.18077 | 33.94250 | 35.97083 | 38.55095 | 40.43017 | 41.40944 | 42.56583 | 46.36235 | 58.41937 | 64.40744 | 4.351118e+07 |
longitude | 89134 | 0 | -123.9442394 | 6.343362e+03 | 2.124701e+01 | 34.84990 | -149.65240 | 22435.028 | -1004241 | -158.58 | -147.7193 | -122.1001 | -116.60010 | -111.85 | -106.08916 | -96.58913 | -90.31158 | -84.50032 | -80.45880 | -77.00008 | -70.81000 | 0.00000 | 0.00000 | 0.00000 | 1.741242e+02 |
Value
Our project’s primary goal is to conduct an in-depth analysis of aircraft crash incidents within the United States to uncover temporal trends and geographical patterns. This objective is fueled by our diverse personal experiences with air travel, which range from enthusiasm to apprehension. This stems from talking about places we’ve traveled to, we chose the data as some of us are more comfortable with flying than others. We decided by looking at aircraft crash data that it could be a chance to get more informed about aircraft crashes within the United States. By examining the data, we hope to gain a clearer understanding of the factors contributing to these incidents and alleviate some of the concerns regarding aviation safety. We wanted do time series analysis as well as geo-spatial analysis and this data presented itself as a great opportunity to do both. We intend to design a shiny app so that we can display the a time series change of crashes throughout the years.
Problem Statement
We conducted Exploratory Data Analysis on the dataset and categorized our analysis into three main areas:
- Examining Aircraft Crashes, with a focus on their locations, timings, and consequences.
- Investigating the Causes of Crashes.
- Assessing the Influence of Weather Conditions on Crashes.
Plan of Action
Examining Aircraft Crashes, with a focus on their locations, timings, and consequences
Choropleth Map - An Animated Choropleth Map of Aircraft Crashes in the United States Over Time:
The animated Choropleth Map provides a dynamic and visually compelling representation of aircraft crashes across the United States. This data visualization leverages the US states of crash locations and spans multiple years to reveal temporal patterns. This Choropleth Map not only facilitates the identification of regions with higher crash frequencies but also offers insights into the evolving dynamics of air travel safety over time.
Time-series Plot- Analysis of Aircraft Crashes and Fatalities:
We visualize the trend in aircraft crashes historically. We are going to plot this trend based on different aspects like total fatalities which is represented using column
fatal_injury_count
,serious_injury_count
andminor_injury_count
. Here they classify a particular injury count as aminor
,serious
andfatal
injury based on the level of casualties that occurred in the crash. We also would like to highlight few notable airplane crashes in the history in our time-series analysis.Radial bar Plot - A Radial Perspective on Aircraft Crashes During Flight Phases:
We will be representing the count of crashes on the x-axis and the phases of flight (
takeoff, landing
etc,.) on the y-axis in a circular manner. We can transform the bar plot into a radial form, emphasizing the distribution of crashes during these critical flight stages. This visualization technique allows for a quick understanding of the relative frequency of crashes during takeoff and landing, enabling insights into the safety challenges faced during these flight phases.
Analysis of Causes of Crashes
Waffle plot - Common Causes of Aircraft Crashes:
The waffle plot offers an overview of the most common causes of aircraft crashes throughout the United States. Leveraging the
probable_cause
column, which contains concise descriptions of crash events, the waffle plot employs text summarization techniques to extract and categorize these causes into manageable groups. This visualization provides a straightforward and insightful representation of the key factors contributing to aviation incidents, aiding in the identification of the primary causes that warrant further investigation.Density plot - Causes and Severity of Aircraft Crashes:
The density polot delves into the relationship between specific crash cause ("pilot's failure") and the severity of injuries incurred. Drawing from the `probable_cause` column, as well as the `severity` and `event_date` columns, this visualization quantifies the number of crashes attributed to cause "pilot's failure"while distinguishing between varying levels of severity. By visualizing this data in a density plot, it becomes evident how severe are the crashes.
Assessing the Influence of Weather Conditions on Crashes
Radar Plot - Analysis of Aircraft Crashes by Month and Weather Conditions:
Using radar plot we would like to display multiple weather conditions on the axes of the plot, while the spokes represent different months. Each radar plot point signifies the frequency of crashes occurring in a specific month under a particular weather condition. This comprehensive visualization enables a quick assessment of the relationship between weather conditions, crash occurrences, and the month when the crash has occurred.
Ultimately, we plan to utilize the interactive features of the Shiny application to present these plots in a user-friendly manner.
Variables of focus
Variable | Description |
---|---|
event_type | Type of event - accident, incident or occurrence |
event_date | Date time of when the event has occurred |
city | The city or place location closest to the site of the event |
state | The state in which the site of the event is present |
highest_injury_level | Indicate the highest level of injury among all injuries sustained as a result of the event |
fatal_injury_count | The total number of fatal injuries from an event |
serious_injury_count | The total number of serious injuries from an event |
minor_injury_count | The total number of minor injuries from an event |
probable_cause | The probable cause for the aircraft crash as per the NTSB report |
latitude | Latitude for the event site in degrees and decimal degrees. |
longitude | Longitude for the event site in degrees and decimal degrees. |
weather_condition | The basic weather conditions at the time of the event |
Implementation
Weekly Plan
Week | Weekly Tasks | Persons in Charge | Backup |
---|---|---|---|
until November 8th | Explore and finalize the data set and the problem statements | Everyone | Everyone |
- | Complete the proposal and assign some high-level tasks | Everyone | Everyone |
November 9th to 15th | Getting to know about Shiny application | Antonio | Bharath |
- | Data cleaning and Data pre-processing | Thanoosha | Eshaan |
- | Question specific exploration and data categorization | Thanoosha, Bharath | Antonio, Eshaan |
November 16th to 22nd | Generating plots for Heat-Map and Time-Series | Antonio, Bharath | Eshaan |
- | Generating plots for Radial bar Plot and Bar Plot | Eshaan, Antonio | Thanoosha |
- | Generating plots for Stacked Area Chart and Radar Plot | Thanoosha | Bharath |
- | Exploring on how to integrate our specific visualizations and shiny | Eshaan | Antonio |
November 23rd to 29th | Generating remaining parts of the plots for all the plots | Everyone | Everyone |
- | Improving the generated plots | Bharath | Thanoosha |
- | Start integrating shiny and our visualizations | Eshaan | Antonio |
November 30th to December 6th | Refining the code for code review with comments | Everyone | Everyone |
- | Continue with the integration of shiny and our plots | Bharath | Thanoosha |
December 7th to 13th | Complete the shiny application with multiple user functionality | Antonio | Eshaan |
- | Review the generated plots and shiny integration | Thanoosha | Bharath |
- | Write-up and presentation for the project | Everyone | Everyone |
Repo Organization
The following are the folders involved in the Project repository.
‘data/’: Used for storing any necessary data files for the project, such as input files.
‘images/’: Used for storing image files used in the project.
‘presentation_files/’: Folder for having presentation related files.
‘_extra/’: Used to brainstorm our analysis which won’t impact our project workflow.
‘_freeze/’: This folder is used to store the generated files during the build process. These files represent the frozen state of the website at a specific point in time.
‘_site/’: Folder used to store the generated static website files after the site generator processes the quarto document.
‘.github/’: Folder for storing github templates and workflow.
We will be creating few folders inside images/
folder for storing question specific images and presentation related images which are generated through out the plot. We will be creating images/Q1
, images/Q2
and images/Presentation
for those respective files.
These are the planned approaches, and we intend to explore and solve the problem statement which we came up with. Parts of our approach might change in the final implementation.