Project Hospital Report

25Winter

data: hospital_data.csv

Author

Casey Atkins & Jess

Published

July 1, 2025

Hospital Wait Time Report

Step 0

Initially we selected our dataset to study. We selected the QLD Hospital data.

Step 1

Goal: identify which variables are discrete (categorical) and which are continuous.

To do this we: First: loaded our data

dataset <- read.csv("../../../../data/hospital_data.csv")

Second: viewed the column names of our data

names(dataset)

 [1] "X.1"                                                  
 [2] "X"                                                    
 [3] "Facility.HHS.Code"                                    
 [4] "Facility.HHS.Desc"                                    
 [5] "Last.Month.in.QTR"                                    
 [6] "Triage.Category"                                      
 [7] "Number.of.Attendances"                                
 [8] "Variation.in.the.number.of.attendances...."           
 [9] "Patients.Seen.within.clinically.recommended.times...."
[10] "Median.Waiting.time.to.treatment..minutes."           
[11] "Patients.who.did.not.wait.for.treatment...."          
[12] "Patients.admitted.from.the.Emergency.Department...."  
[13] "Admissions.to.hospital.within.4.hours...."

Third: looked at the structure to see type of data in each column

str(dataset)

'data.frame':   9442 obs. of  13 variables:
 $ X.1                                                  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ X                                                    : int  0 1 2 3 4 5 6 7 8 9 ...
 $ Facility.HHS.Code                                    : int  1 1 1 1 1 1 4 4 4 4 ...
 $ Facility.HHS.Desc                                    : chr  "Mater Adult" "Mater Adult" "Mater Adult" "Mater Adult" ...
 $ Last.Month.in.QTR                                    : chr  "Jun-24" "Jun-24" "Jun-24" "Jun-24" ...
 $ Triage.Category                                      : chr  "1" "2" "3" "4" ...
 $ Number.of.Attendances                                : int  58 2998 6035 3767 513 13371 215 4986 13663 9169 ...
 $ Variation.in.the.number.of.attendances....           : num  -17.1 17.4 -4.4 -8 -20.8 -2.3 -0.5 -12.9 0.7 -8.1 ...
 $ Patients.Seen.within.clinically.recommended.times....: num  100 37.6 54.2 78.9 94.8 58.7 100 73.2 61.1 73.9 ...
 $ Median.Waiting.time.to.treatment..minutes.           : num  0 16 27 28 29 25 0 7 22 28 ...
 $ Patients.who.did.not.wait.for.treatment....          : num  0 1.2 3.1 6.4 15.8 4.1 0 0.2 2.5 4.5 ...
 $ Patients.admitted.from.the.Emergency.Department....  : num  72.4 44.9 40.8 20.8 6 34.9 71.2 59.5 41.5 13.8 ...
 $ Admissions.to.hospital.within.4.hours....            : num  31 22.2 26.1 39.4 41.9 27.4 43.1 44.9 34.1 40.3 ...

Fourth: crated a summary of the dataset

summary(dataset)

      X.1             X         Facility.HHS.Code Facility.HHS.Desc 
 Min.   :   1   Min.   :  0.0   Min.   :    1     Length:9442       
 1st Qu.:2361   1st Qu.:157.0   1st Qu.:   68     Class :character  
 Median :4722   Median :314.0   Median :  117     Mode  :character  
 Mean   :4722   Mean   :314.2   Mean   : 1091                       
 3rd Qu.:7082   3rd Qu.:472.0   3rd Qu.:  192                       
 Max.   :9442   Max.   :629.0   Max.   :99999                       
                                                                    
 Last.Month.in.QTR  Triage.Category    Number.of.Attendances
 Length:9442        Length:9442        Min.   :     0       
 Class :character   Class :character   1st Qu.:    84       
 Mode  :character   Mode  :character   Median :   375       
                                       Mean   :  3754       
                                       3rd Qu.:  1501       
                                       Max.   :640258       
                                                            
 Variation.in.the.number.of.attendances....
 Min.   :-100.000                          
 1st Qu.:  -6.313                          
 Median :   5.000                          
 Mean   :  13.734                          
 3rd Qu.:  22.095                          
 Max.   :1000.000                          
 NA's   :724                               
 Patients.Seen.within.clinically.recommended.times....
 Min.   :  0.00                                       
 1st Qu.:  0.00                                       
 Median : 19.00                                       
 Mean   : 40.49                                       
 3rd Qu.: 92.00                                       
 Max.   :100.00                                       
 NA's   :235                                          
 Median.Waiting.time.to.treatment..minutes.
 Min.   :  0.00                            
 1st Qu.:  6.00                            
 Median : 74.00                            
 Mean   : 56.76                            
 3rd Qu.: 98.28                            
 Max.   :100.00                            
 NA's   :168                               
 Patients.who.did.not.wait.for.treatment....
 Min.   :  0.00                             
 1st Qu.:  2.00                             
 Median :  7.00                             
 Mean   : 17.36                             
 3rd Qu.: 26.59                             
 Max.   :100.00                             
 NA's   :1000                               
 Patients.admitted.from.the.Emergency.Department....
 Min.   :  0.00                                     
 1st Qu.: 21.15                                     
 Median : 50.00                                     
 Mean   : 52.34                                     
 3rd Qu.: 87.50                                     
 Max.   :100.00                                     
 NA's   :669                                        
 Admissions.to.hospital.within.4.hours....
 Min.   :  0.00                           
 1st Qu.: 59.00                           
 Median : 86.00                           
 Mean   : 76.48                           
 3rd Qu.: 96.97                           
 Max.   :100.00                           
 NA's   :468

Fifth: Created a summary of a specific columnn, in this case the Median wait time to gain treatment (mins).

summary(dataset$Median.Waiting.time.to.treatment..minutes.)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   0.00    6.00   74.00   56.76   98.28  100.00     168

Step 2

Goal: filter by a condition or group by and aggregate over a particular variable The condition we chose to filter by was Hospital Facility Name - Mater Adult. We did this by loading dplyr, created another copy of the filtered data called “subset”. It was filtered by Mater Adult to see only information from this facility.

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

subset=dataset %>% filter(Facility.HHS.Desc=="Mater Adult")

To group we created a new dataset and then grouped by Facility code number

aggregated=dataset
aggregated %>% group_by(Facility.HHS.Code)

# A tibble: 9,442 × 13
# Groups:   Facility.HHS.Code [105]
     X.1     X Facility.HHS.Code Facility.HHS.Desc Last.Month.in.QTR
   <int> <int>             <int> <chr>             <chr>            
 1     1     0                 1 Mater Adult       Jun-24           
 2     2     1                 1 Mater Adult       Jun-24           
 3     3     2                 1 Mater Adult       Jun-24           
 4     4     3                 1 Mater Adult       Jun-24           
 5     5     4                 1 Mater Adult       Jun-24           
 6     6     5                 1 Mater Adult       Jun-24           
 7     7     6                 4 Prince Charles    Jun-24           
 8     8     7                 4 Prince Charles    Jun-24           
 9     9     8                 4 Prince Charles    Jun-24           
10    10     9                 4 Prince Charles    Jun-24           
# ℹ 9,432 more rows
# ℹ 8 more variables: Triage.Category <chr>, Number.of.Attendances <int>,
#   Variation.in.the.number.of.attendances.... <dbl>,
#   Patients.Seen.within.clinically.recommended.times.... <dbl>,
#   Median.Waiting.time.to.treatment..minutes. <dbl>,
#   Patients.who.did.not.wait.for.treatment.... <dbl>,
#   Patients.admitted.from.the.Emergency.Department.... <dbl>, …

We then had to rename a column due to some errors (removed for reruns -

aggregated <- rename(aggregated, "Median_wait" = "Median.Waiting.time.to.treatment..minutes.")

Then Grouped by facility and summarised mean average attendance, median wait time and Patients that didn’t wait for treatment for each facility. We also had to remove missing data from the ptients not waititng for treatment column.

Hosp_wait_time <- aggregated %>%  
  group_by(Facility.HHS.Code) %>%  
  summarise(avg_attendance = mean(Number.of.Attendances),
            median_wait_time = median(Median_wait),
            No_non_wait = mean(Patients.who.did.not.wait.for.treatment....,na.rm=TRUE))

Step 3

Goal: create a visualisation of one to three variables in your summary data. To visualise this data we: load ggplot2

library(ggplot2)

Create a scatter plot of avg attendance vs med wait time

ggplot(data = Hosp_wait_time, 
       mapping= aes(x = avg_attendance,
       y = median_wait_time)) +
  geom_point()

Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).

This showed us a large outlier so we removed this using the code below:

filter(Hosp_wait_time, Facility.HHS.Code != "99999")

# A tibble: 104 × 4
   Facility.HHS.Code avg_attendance median_wait_time No_non_wait
               <int>          <dbl>            <dbl>       <dbl>
 1                 1          4761.             40.2        22.6
 2                 4         10404              50.8        25.1
 3                11          5747.             55.2        34.8
 4                15          6389.             46.5        27.1
 5                16          6463.             52.5        26.1
 6                22          5290.             66.2        27.5
 7                28          4911.             66.6        22.6
 8                29          8901              51.4        28.6
 9                30          5328.             63          24.8
10                32          7682.             68.1        26.7
# ℹ 94 more rows

subset <- Hosp_wait_time %>% filter(Facility.HHS.Code !="99999")

We Then reran the graph without the outlier and added lables and a trend line.

ggplot(data = subset, 
       mapping= aes(x = avg_attendance,
                    y = median_wait_time)) +
  geom_point()+
  labs(title = "Does Average Attendance Number Effect Median Wait Time?",
       x="Average Attendance",
       y="Median Wait Time") +
  geom_smooth()

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Warning: Removed 42 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).

With the adjusted data we then also ran 2 more scatters with trend lines looking at other factors contributing to wait time and patients leaving without treatment.

ggplot(data = subset, 
       mapping= aes(x = median_wait_time,
                    y = No_non_wait)) +
  geom_point()+
  labs(title = "Does wait time influence Number of People Leaving without Treatment?",
       x="Median Wait Time",
       y="Number leaving without Treatment") +
  geom_smooth()

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Warning: Removed 42 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 42 rows containing missing values or values outside the scale range
(`geom_point()`).

ggplot(data = subset, 
       mapping= aes(x = No_non_wait,
                    y = avg_attendance)) +
  geom_point()+
  labs(title = "Does Number Attending Hospital Influence Number of People Leaving without Treatment?",
       x="Number leaving without Treatment",
       y="Average Hospital Attendance Numbers") +
  geom_smooth()

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Linear Regression

looking at number of people that leave based on attendance numbers

first filter out missing median_wait_time data, then created a linear model

subset <- subset %>%   
  filter(median_wait_time !="Missing")
main_model <- lm("No_non_wait~avg_attendance", subset)

Took the coefficients of the intercept and slope

b0 <- main_model$coefficients[1] #intercept
b1 <- main_model$coefficients[2] #slope

Then made linear regression line on scatter with labels.

library(ggplot2)
ggplot(subset,
       aes(x=avg_attendance,
           y=No_non_wait))+
  geom_point()+
  labs(title = "Does Number Attending Hospital Influence Number of People Leaving without Treatment?",
       x="Average Hospital Attendance Numbers",
       y="Number leaving without Treatment") +
  geom_abline(intercept = b0, slope = b1)

Then compared to Wait time to number of people leaving.

model3 <- lm("median_wait_time ~No_non_wait", subset)
b00 <- model3$coefficients[1] #intercept
b11 <- model3$coefficients[2] #slope
ggplot(subset,
       aes(x=No_non_wait,
           y=median_wait_time))+
  geom_point()+
  labs(title = "Does Number of People Leaving without Treatment Influence Median Wait Time?",
       x="Number leaving without Treatment",
       y="Median Wait Time") +
  geom_abline(intercept = b00, slope = b11)