dataset <- read.csv("../../../../data/hospital_data.csv")Project Hospital Report
Hospital Wait Time Report
Step 0
Initially we selected our dataset to study. We selected the QLD Hospital data.
Step 1
Goal: identify which variables are discrete (categorical) and which are continuous.
To do this we: First: loaded our data
Second: viewed the column names of our data
names(dataset) [1] "X.1"
[2] "X"
[3] "Facility.HHS.Code"
[4] "Facility.HHS.Desc"
[5] "Last.Month.in.QTR"
[6] "Triage.Category"
[7] "Number.of.Attendances"
[8] "Variation.in.the.number.of.attendances...."
[9] "Patients.Seen.within.clinically.recommended.times...."
[10] "Median.Waiting.time.to.treatment..minutes."
[11] "Patients.who.did.not.wait.for.treatment...."
[12] "Patients.admitted.from.the.Emergency.Department...."
[13] "Admissions.to.hospital.within.4.hours...."
Third: looked at the structure to see type of data in each column
str(dataset)'data.frame': 9442 obs. of 13 variables:
$ X.1 : int 1 2 3 4 5 6 7 8 9 10 ...
$ X : int 0 1 2 3 4 5 6 7 8 9 ...
$ Facility.HHS.Code : int 1 1 1 1 1 1 4 4 4 4 ...
$ Facility.HHS.Desc : chr "Mater Adult" "Mater Adult" "Mater Adult" "Mater Adult" ...
$ Last.Month.in.QTR : chr "Jun-24" "Jun-24" "Jun-24" "Jun-24" ...
$ Triage.Category : chr "1" "2" "3" "4" ...
$ Number.of.Attendances : int 58 2998 6035 3767 513 13371 215 4986 13663 9169 ...
$ Variation.in.the.number.of.attendances.... : num -17.1 17.4 -4.4 -8 -20.8 -2.3 -0.5 -12.9 0.7 -8.1 ...
$ Patients.Seen.within.clinically.recommended.times....: num 100 37.6 54.2 78.9 94.8 58.7 100 73.2 61.1 73.9 ...
$ Median.Waiting.time.to.treatment..minutes. : num 0 16 27 28 29 25 0 7 22 28 ...
$ Patients.who.did.not.wait.for.treatment.... : num 0 1.2 3.1 6.4 15.8 4.1 0 0.2 2.5 4.5 ...
$ Patients.admitted.from.the.Emergency.Department.... : num 72.4 44.9 40.8 20.8 6 34.9 71.2 59.5 41.5 13.8 ...
$ Admissions.to.hospital.within.4.hours.... : num 31 22.2 26.1 39.4 41.9 27.4 43.1 44.9 34.1 40.3 ...
Fourth: crated a summary of the dataset
summary(dataset) X.1 X Facility.HHS.Code Facility.HHS.Desc
Min. : 1 Min. : 0.0 Min. : 1 Length:9442
1st Qu.:2361 1st Qu.:157.0 1st Qu.: 68 Class :character
Median :4722 Median :314.0 Median : 117 Mode :character
Mean :4722 Mean :314.2 Mean : 1091
3rd Qu.:7082 3rd Qu.:472.0 3rd Qu.: 192
Max. :9442 Max. :629.0 Max. :99999
Last.Month.in.QTR Triage.Category Number.of.Attendances
Length:9442 Length:9442 Min. : 0
Class :character Class :character 1st Qu.: 84
Mode :character Mode :character Median : 375
Mean : 3754
3rd Qu.: 1501
Max. :640258
Variation.in.the.number.of.attendances....
Min. :-100.000
1st Qu.: -6.313
Median : 5.000
Mean : 13.734
3rd Qu.: 22.095
Max. :1000.000
NA's :724
Patients.Seen.within.clinically.recommended.times....
Min. : 0.00
1st Qu.: 0.00
Median : 19.00
Mean : 40.49
3rd Qu.: 92.00
Max. :100.00
NA's :235
Median.Waiting.time.to.treatment..minutes.
Min. : 0.00
1st Qu.: 6.00
Median : 74.00
Mean : 56.76
3rd Qu.: 98.28
Max. :100.00
NA's :168
Patients.who.did.not.wait.for.treatment....
Min. : 0.00
1st Qu.: 2.00
Median : 7.00
Mean : 17.36
3rd Qu.: 26.59
Max. :100.00
NA's :1000
Patients.admitted.from.the.Emergency.Department....
Min. : 0.00
1st Qu.: 21.15
Median : 50.00
Mean : 52.34
3rd Qu.: 87.50
Max. :100.00
NA's :669
Admissions.to.hospital.within.4.hours....
Min. : 0.00
1st Qu.: 59.00
Median : 86.00
Mean : 76.48
3rd Qu.: 96.97
Max. :100.00
NA's :468
Fifth: Created a summary of a specific columnn, in this case the Median wait time to gain treatment (mins).
summary(dataset$Median.Waiting.time.to.treatment..minutes.) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00 6.00 74.00 56.76 98.28 100.00 168
Step 2
Goal: filter by a condition or group by and aggregate over a particular variable The condition we chose to filter by was Hospital Facility Name - Mater Adult. We did this by loading dplyr, created another copy of the filtered data called “subset”. It was filtered by Mater Adult to see only information from this facility.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
subset=dataset %>% filter(Facility.HHS.Desc=="Mater Adult")To group we created a new dataset and then grouped by Facility code number
aggregated=dataset
aggregated %>% group_by(Facility.HHS.Code)# A tibble: 9,442 × 13
# Groups: Facility.HHS.Code [105]
X.1 X Facility.HHS.Code Facility.HHS.Desc Last.Month.in.QTR
<int> <int> <int> <chr> <chr>
1 1 0 1 Mater Adult Jun-24
2 2 1 1 Mater Adult Jun-24
3 3 2 1 Mater Adult Jun-24
4 4 3 1 Mater Adult Jun-24
5 5 4 1 Mater Adult Jun-24
6 6 5 1 Mater Adult Jun-24
7 7 6 4 Prince Charles Jun-24
8 8 7 4 Prince Charles Jun-24
9 9 8 4 Prince Charles Jun-24
10 10 9 4 Prince Charles Jun-24
# ℹ 9,432 more rows
# ℹ 8 more variables: Triage.Category <chr>, Number.of.Attendances <int>,
# Variation.in.the.number.of.attendances.... <dbl>,
# Patients.Seen.within.clinically.recommended.times.... <dbl>,
# Median.Waiting.time.to.treatment..minutes. <dbl>,
# Patients.who.did.not.wait.for.treatment.... <dbl>,
# Patients.admitted.from.the.Emergency.Department.... <dbl>, …
We then had to rename a column due to some errors (removed for reruns -
aggregated <- rename(aggregated, "Median_wait" = "Median.Waiting.time.to.treatment..minutes.")Then Grouped by facility and summarised mean average attendance, median wait time and Patients that didn’t wait for treatment for each facility. We also had to remove missing data from the ptients not waititng for treatment column.
Hosp_wait_time <- aggregated %>%
group_by(Facility.HHS.Code) %>%
summarise(avg_attendance = mean(Number.of.Attendances),
median_wait_time = median(Median_wait),
No_non_wait = mean(Patients.who.did.not.wait.for.treatment....,na.rm=TRUE))Step 3
Goal: create a visualisation of one to three variables in your summary data. To visualise this data we: load ggplot2
library(ggplot2)Create a scatter plot of avg attendance vs med wait time
ggplot(data = Hosp_wait_time,
mapping= aes(x = avg_attendance,
y = median_wait_time)) +
geom_point()Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).

This showed us a large outlier so we removed this using the code below:
filter(Hosp_wait_time, Facility.HHS.Code != "99999")# A tibble: 104 × 4
Facility.HHS.Code avg_attendance median_wait_time No_non_wait
<int> <dbl> <dbl> <dbl>
1 1 4761. 40.2 22.6
2 4 10404 50.8 25.1
3 11 5747. 55.2 34.8
4 15 6389. 46.5 27.1
5 16 6463. 52.5 26.1
6 22 5290. 66.2 27.5
7 28 4911. 66.6 22.6
8 29 8901 51.4 28.6
9 30 5328. 63 24.8
10 32 7682. 68.1 26.7
# ℹ 94 more rows
subset <- Hosp_wait_time %>% filter(Facility.HHS.Code !="99999")We Then reran the graph without the outlier and added lables and a trend line.
ggplot(data = subset,
mapping= aes(x = avg_attendance,
y = median_wait_time)) +
geom_point()+
labs(title = "Does Average Attendance Number Effect Median Wait Time?",
x="Average Attendance",
y="Median Wait Time") +
geom_smooth()`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 42 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).

With the adjusted data we then also ran 2 more scatters with trend lines looking at other factors contributing to wait time and patients leaving without treatment.
ggplot(data = subset,
mapping= aes(x = median_wait_time,
y = No_non_wait)) +
geom_point()+
labs(title = "Does wait time influence Number of People Leaving without Treatment?",
x="Median Wait Time",
y="Number leaving without Treatment") +
geom_smooth()`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 42 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).

ggplot(data = subset,
mapping= aes(x = No_non_wait,
y = avg_attendance)) +
geom_point()+
labs(title = "Does Number Attending Hospital Influence Number of People Leaving without Treatment?",
x="Number leaving without Treatment",
y="Average Hospital Attendance Numbers") +
geom_smooth()`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Linear Regression
looking at number of people that leave based on attendance numbers
first filter out missing median_wait_time data, then created a linear model
subset <- subset %>%
filter(median_wait_time !="Missing")
main_model <- lm("No_non_wait~avg_attendance", subset)Took the coefficients of the intercept and slope
b0 <- main_model$coefficients[1] #intercept
b1 <- main_model$coefficients[2] #slopeThen made linear regression line on scatter with labels.
library(ggplot2)
ggplot(subset,
aes(x=avg_attendance,
y=No_non_wait))+
geom_point()+
labs(title = "Does Number Attending Hospital Influence Number of People Leaving without Treatment?",
x="Average Hospital Attendance Numbers",
y="Number leaving without Treatment") +
geom_abline(intercept = b0, slope = b1)
Then compared to Wait time to number of people leaving.
model3 <- lm("median_wait_time ~No_non_wait", subset)
b00 <- model3$coefficients[1] #intercept
b11 <- model3$coefficients[2] #slope
ggplot(subset,
aes(x=No_non_wait,
y=median_wait_time))+
geom_point()+
labs(title = "Does Number of People Leaving without Treatment Influence Median Wait Time?",
x="Number leaving without Treatment",
y="Median Wait Time") +
geom_abline(intercept = b00, slope = b11)