Excess Heat Indices

1 Introduction
2 Codes
3 CaseStudies
- 3.1 bom-downloads: auto-download-bureau-meteorology-diurnal-data

1 Introduction

2 Codes

2.1 ExcessHeatIndices-package.Rd

2.2 tests

2.3 EHF

2.3.1 test

2.3.2 R

2.3.3 man

3 CaseStudies

3.1 bom-downloads: auto-download-bureau-meteorology-diurnal-data

We;re looking at health impacts of high temperatures at work
need to see the highest temperatures during the working hours
bom provides hourly data for download, but only 3 days at a time
we build a script and set it on a schedule to run every day, download the data and collate the results

3.1.1 First the FTP server URL structure

The URLS are predictable, just need the station id, state and a code if metro or rural

3.1.2 table

Station_ID	State	City₉_or_regional₈_
94774	N	9
95719	N	8
94768	N	9
94763	N	9
94767	N	9
94910	N	8
94929	N	8
95896	N	8
94693	N	8
94691	N	8
95677	S	9
94675	S	9
94672	S	9
94866	V	9
95867	V	9
94868	V	9
94875	V	8

now create a script called "bom_download.r"
it takes the station details and paste into the URLs
downloads the files
stores in a directory for each days downloads

3.1.3 R Code: bom_download.r

filename = "~/data/ExcessHeatIndices/inst/doc/weather_stations.csv"
output_directory = "~/bom-downloads"
setwd(output_directory)

urls <- read.csv(filename)
urls_list <- paste(sep = "", "http://www.bom.gov.au/fwo/ID",
                  urls$State,
                  "60", 
                  urls$City_9_or_regional_8_,
                  "01/ID",
                  urls$State,
                  "60",
                  urls$City_9_or_regional_8_,
                  "01.",
                  urls$Station_ID,
                  ".axf")

output_directory <- file.path(output_directory,Sys.Date())
dir.create(output_directory)

for(url in urls_list)
{
  output_file <- file.path(output_directory,basename(url))
  download.file(url, output_file, mode = "wb")

}
print("SUCCESS")

Now the data can be combined
clean up the header and extraneous extra line at the bottom

3.1.4 R Code: bom_collation.r

# this takes data in directories from bom_download.r
 
# first get list of directories
filelist <- dir(pattern = "axf", recursive = T)
filelist
 
# next get directories for days we haven't done yet
if(file.exists("complete_dataset.csv"))
{
complete_data <- read.csv("complete_dataset.csv", stringsAsFactors = F)
#str(complete_data)
last_collated <- max(as.Date(complete_data$date_downloaded))
#max(complete_data$local_hrmin)
 
days_downloaded <- dirname(filelist)
filelist <- filelist[which(as.Date(days_downloaded) > as.Date(last_collated))]
}
 
# for these collate them into the complete file
for(f in filelist)
{
  #f <- filelist[2]
  print(f)
  fin <- read.csv(f, colClasses = c("local_date_time_full.80." = "character"), 
    stringsAsFactors = F, skip = 19)
  fin <- fin[1:(nrow(fin) - 1),]
  fin$date_downloaded <- dirname(f)
  fin$local_year <- substr(fin$local_date_time_full.80., 1, 4)
  fin$local_month <- substr(fin$local_date_time_full.80., 5, 6)
  fin$local_day <- substr(fin$local_date_time_full.80., 7, 8)
  fin$local_hrmin <- substr(fin$local_date_time_full.80., 9, 12)
  fin$local_date <- paste(fin$local_year, fin$local_month, fin$local_day, sep = "-")
  if(file.exists("complete_dataset.csv"))
  {
  write.table(fin, "complete_dataset.csv", row.names = F, sep = ",", append = T, col.names = F)
  } else {
  write.table(fin, "complete_dataset.csv", row.names = F, sep = ",")
  }
}

so now let;s automate the process
make a BAT file

3.1.5 BAT file (windoze)

"C:\Program Files\R\R-2.15.2\bin\Rscript.exe" "~\bom-downloads\bom_download.r"

add this bat file to the scheduled tasks in your control panel
use chron for a linux version

3.1.6 check the data

#### name:check the data ####
require(plyr)

setwd("~/bom-downloads")
source("bom_download.r")
dir()
source("bom_collation.r")

complete_data <- read.csv("complete_dataset.csv", stringsAsFactors = F)
str(complete_data)

# Quick and dirty de-duplication
table(complete_data$name.80.)
qc <- subset(complete_data, name.80. == "Broken Hill Airport")
qc <- ddply(qc, "local_date_time_full.80.",
  summarise, apparent_temp = mean(apparent_t))

names(qc)
png("qc-diurnal-plot.png")
with(qc,
     plot(apparent_temp, type= "l")
     )
dev.off()

3.1.7 Conclusions

watch the data roll on in
each day there are about 3 days downloaded
meaning duplicates will be frequent, need to write a script to de-duplicate
cheers!

Excess Heat Indices

Table of Contents

1 Introduction

2 Codes

2.1 ExcessHeatIndices-package.Rd

2.2 tests

2.3 EHF

2.3.1 test

2.3.2 R

2.3.3 man

3 CaseStudies

3.1 bom-downloads: auto-download-bureau-meteorology-diurnal-data

3.1.1 First the FTP server URL structure

3.1.2 table

3.1.3 R Code: bomdownload.r

3.1.4 R Code: bomcollation.r

3.1.5 BAT file (windoze)

3.1.6 check the data

3.1.7 Conclusions

3.1.3 R Code: bom_download.r

3.1.4 R Code: bom_collation.r