##########################################################################
##########################################################################

This file contains the new OCR-SSS model for the Chesapeake Bay sub-region

###########################
File list and descriptions:

"CHES_OCR-SSS.1.12.Rdata"

This is the model itself as an R data object (extension .Rdata). It can only be loaded and used within the R environment. The model requires input data in the same format as the data it was trained with. The data format is a matrix of input satellite data from MODIS-Aqua with minor data transformations. The columns of the input data matrix are as follows:

LAT, LON, R412, R412/R547, R443/R547, R443, R488/R547, SST

The R in front of the number means remote sensing reflectance at the specified wavelength. The three band ratios are also shown. R412/R547 means that for each pixel, we divided R412 by R547 and included the value as an input and so on. 

Data transformations are necessary to make the input data suitable for the neural network model. The package "nnet" in R requires all data to be normalized between 0 and 1. The following transoformations were made to the data to comply with this requirement:

1. SST was converted to degrees fahranheit via the equation SST*(9/5)+32
2. Longitude was converted to the 0-360 scale by adding 360 to each value
3. All data was divided by 2500 (an arbitrarily large number) to scale it down between 0 and 1

Lastly, the input data needs to be restricted within the bounds of the training data. This is the most important part of making the model work since it can not be trusted to extrapolate outside of its boundaries. This is, however, a step that results in the loss of much of the input data and therefore the reduction in coverage. An obvious solution is more trainnig data that captures a wider range of possibilities experienced in MODIS data. The ranges of training data are included as a separate file. There are also an example R script for running the model.

*** IMPORTANT NOTE: this is a very conservative model. The option to relax these "flags" on the data is available for a map of greater coverage. 

"R_script_model_application.R"

This is an R script (text file with extension .R) that has step by step procedures for running the model. It references most of the files included in this folder. The working directories in the script need to be changed to your working directory. I have included some example MODIS satellite data from my directory so you can see the data transformations happening in R. Since R is an object-based programming language, typing ls() into the R command promt will return you a list of the current objects in your workspace. To glance at each object type the function head(object name) and you will be shown the first few columns of the data contained in that object. This script will output salinity in a text file with columns Lon, Lat, Sal. It also includes optional code to plot the salinity right in the R console. This portion is commented out with # signs. 

"R_script_OCR-SSS_model_building.R"

This is an R script that has step by step procedures for building a neural network model in R. This is the same code used to construct the model provided. Many parts of this code were commented out with # signs becuase they save files to the directory that could overwrite the existing files. This won't work however, until you change the working directories within the code to your own.

"satellite-matched_in_situ_salinity_data.R"

This is a text file containing the results of the satellite matching that has been taking so long. It is the raw (more or less) data I used to train the models I have made including this one. The model building script references this data and transforms it. This data could be used to train your own models if so desired.

"training_data.Rdata"

This is an R data object containing the exact training data I used to train this specific model. Unlike the previous file, it is in the transformed, normalized format already. It is loaded into R using the load() function. Once loaded use ls() to see what the object is called in R. Use head() to glance at its contents. 

"training_salinity.Rdata"

Similar to the previous file, this is the salinity data associated with the training data for this model. It is in normalized units.

"training_data_range.Rdata"

Similar to the previous file, but more inportant for the implementation of the model, this R data object contains the ranges of the training data that the input MODIS-Aqua data must be restricted by. It is referenced by the "R_script_model_application.R" script. The ranges within the object itself are for the normalized data and must be multiplied by 2500 to see the actual ranges. 

"training_salinity_range.Rdata"

R data object containing the range of training salinity data (in normalized units). It is not referenced by any of the scripts, it is merely there in case you are curious. Multiply by 2500 for real values.

"testing_data.Rdata"

R data object containing the data that was "unseen" by the model to test its performance. After the model was built in the "R_script_OCR-SSS_model_building.R" script, this data was input to the model and its output salinity compared to this data's actual associated salinity. The statistics of this comparison are saved in "model_statistics.Rdata". This data is in transformed, normalized units. 

"testing_salinity.Rdata"

R data object containing the salinity data associated with "testing_data.Rdata". This was compared to the output of the model from the testing data. This data is in normalized units.

"output_salinity.Rdata"

R data object containing the output salinity from the testing data run of the model. It was directly compared to the testing salinity for the model statistics. 

"model_statistics.Rdata"

R data object containing the model statistics.

"coastline.dat"

Text file of Lat, Lon positions of a high definition coastline for plotting salinity within the Chesapeake Bay. This is optional for use, but it's referenced in a commented out section of the model application script "R_script_model_application.R".

"eample_MODIS_files"

Folder contianing ncdf4 files of MODIS passes. These files are eight day averages. 

"CHES_EDA_A2005252190000_sal.png" and "CHES_EDA_A2007269184000_sal.png"

Output .png files from the commented out part of the "R_script_model_application.R" script. They are for the first two files in "example_MODIS_files"


##################################################################################
##################################################################################

Other notes:

It is encouraged to try the model on eight day averaged MODIS data since coverage of single day passes is sparse due to cloud cover. On top of this, the model trianing data ranges removes a lot of data. 

This model is extremely conservative. For the salinity it does return, you can say it is within +- 1.12 PSU. 

If you relax the flags (e.g. comment out the section where input data is restricted by training ranges) the values within the ranges do not appear to change, but the areas of increased coverage should not ahve the same error of 1.12 PSU applied to them.

As always, clouds and standard MODIS flags are an issue for data coverage.