Structure of input data

## 
! {RMODFLOW} is still in its experimental lifecycle stage.
## 
! Use at your own risk, and submit issues here:
## 
! <https://github.com/rogiersbart/RMODFLOW/issues>

Input for MODFLOW models is read from user/GUI-generated files. Typically, every package has a separate file and these are accessed and read on a line-by-line basis. The type, number and order of variables being read are strictly predefined for every file and are documented in the MODFLOW user-manual and online help. The MODFLOW source code is almost entirely written in FORTRAN and has evolved over more than three decades. As such, the capabilities of reading input have evolved as well, maintaining backward compatibility. Initially, only fixed-format input could be read, where every value has a predefined amount of characters. In later versions, the less strict free-format could be used. Because of these rigid rules, writing MODFLOW input files by hand can be a cumbersome task. This is one of the reasons to use a GUI or script-based library such as RMODFLOW.

Input for MODFLOW packages can generally be thought of in three forms:

Scalar values
Array-directed input
List-directed input

1. Scalar values

The first type consists of one or more values on a single line that are read by MODFLOW. These are mostly integers or doubles. For example, the dis file contains the number of rows, columns, layers and stress periods, all as integers. RMODFLOW treats these type of values as scalars. It uses the internal functions rmfi_parse_variables and rmfi_write_variables to read and write these values, respectively. By default, the free-format is used (format = 'free' argument) which simply reads every value on the line. If format = 'fixed', 10-character fields are read for n values. Missing values are assigned zero. When writing or creating, it is advised to consistently use the free-format (default). If you are having trouble reading in a file and don’t know if free-format is being used, try passing the format = 'fixed' argument to the call of the read function. There are no explicit create functions for scalars; they are simply assigned through arguments in the function calls to the rmf_create_* functions, e.g. rmf_create_dis(nrow = 42, ncol = 42, nlay = 3).

2. Array-directed input

Array-directed input represents continuous variables over the MODFLOW grid such as the model elevation (dis$top) or hydraulic conductivity (e.g. lpf$hk). MODFLOW output often comes in the form of arrays as well, e.g. simulated heads or drawdowns. In RMODFLOW, MODFLOW arrays are handled by the rmf_array classes and corresponding S3 methods. There are three rmf_array classes: rmf_2d_array, rmf_3d_array and rmf_4d_array. Arrays that are returned from rmf_read_* functions are always of one of these classes. Arrays passed to rmf_write_* functions are also of these classes.

Creating

To create a rmf_array, use the rmf_create_array() function. When this function is called on an existing matrix or array, it simply sets the class depending on the dim argument (2D, 3D or 4D). Alternatively, you can supply a dim argument yourself:

ar <- matrix(42, nrow = 4, ncol = 5)
rmf_create_array(ar)
#> RMODFLOW 2d array with 4 rows and 5 columns, representing the i & j dimensions. 
#> Not representing stress period data 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   42   42   42   42   42
#> [2,]   42   42   42   42   42
#> [3,]   42   42   42   42   42
#> [4,]   42   42   42   42   42

rmf_create_array(42, dim = c(4, 5))
#> RMODFLOW 2d array with 4 rows and 5 columns, representing the i & j dimensions. 
#> Not representing stress period data 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   42   42   42   42   42
#> [2,]   42   42   42   42   42
#> [3,]   42   42   42   42   42
#> [4,]   42   42   42   42   42

Besides setting the class, rmf_create_array also adds a dimlabel attribute which holds the names of MODFLOW dimensions the array represents. By default, these are i, j, k & l -representing rows, columns, layers and time steps, respectively- for a 4D array, i, j & k for a 3D array and i & j for a 2D array. These can also be set by supplying a dimlabel argument:

rmf_create_array(10, dim = c(2, 5), dimlabels = c('rows', 'columns'))
#> RMODFLOW 2d array with 2 rows and 5 columns, representing the rows & columns dimensions. 
#> Not representing stress period data 
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   10   10   10   10   10
#> [2,]   10   10   10   10   10

A rmf_array is at least 2D, possibly 3D or 4D. When creating, it is implicitely assumed that the first dimension of the rmf_array represents rows, the second dimension columns, the optional third dimension layers and the optional fourth dimension time-steps. So if you have a one-layered model, a rmf_4d_array would have a third dimension of length one:

rmf_create_array("my_value", dim = c(2, 2, 1, 3))
#> RMODFLOW 4d array with 2 rows, 2 columns, 1 layer and 3 timesteps, representing the i, j, k & l dimensions. 
#> Not representing stress period data 
#> , , 1, 1
#> 
#>      [,1]       [,2]      
#> [1,] "my_value" "my_value"
#> [2,] "my_value" "my_value"
#> 
#> , , 1, 2
#> 
#>      [,1]       [,2]      
#> [1,] "my_value" "my_value"
#> [2,] "my_value" "my_value"
#> 
#> , , 1, 3
#> 
#>      [,1]       [,2]      
#> [1,] "my_value" "my_value"
#> [2,] "my_value" "my_value"

A 1D rmf_array does therefore not exist; it is simply an atomic vector. Calling rmf_create_array on an atomic vector returns an error:

rmf_create_array(rep(10, 4))
#> Error: Please provide 2d matrix, or 2d, 3d or 4d array.

In base MODFLOW-2005, 1D arrays are only used to assign the DELR and DELC variables (representing widths along rows and columns). It makes little sense to do this in R with verbose calls such as matrix(100, nrow = 42, ncol = 1) or array(100, dim = c(42, 1)) instead of a simple rep(100, 42).

Subsetting

A rmf_array behaves like a common array in R with two exceptions related to subsetting.

Unlike R arrays, drop = FALSE by default for rmf_arrays. Subsetting a rmf_array will therefore not drop dimensions with length 1 or collapse into the lowest possible dimension unless the result is 1D. This behaviour can be overwritten by setting drop = TRUE.
Unlike R arrays, subsetting a rmf_array will not drop the class. As such, subsetting a rmf_array will always return a rmf_array as long as it has a dim argument, i.e. the subsetted result is 2D, 3D or 4D.

# R array drops dims of length 1
ar <- array(1, dim = c(2,2,1,3))
dim(ar[,,,2])
#> [1] 2 2

# rmf_array keeps dims of length 1
rmf_ar <- rmf_create_array(ar)
dim(rmf_ar[,,,2])
#> [1] 2 2 1

# collapses into atomic vector
class(ar[1,1,,])
#> [1] "numeric"

# remains 4D array since 4th dimension is not subsetted
class(rmf_ar[1,1,,])
#> [1] "rmf_4d_array"

# collapses
class(rmf_ar[1,1,,, drop = TRUE])
#> [1] "numeric"

# drops class
ar <- structure(array(12, dim = c(4, 4, 2)), class = 'not_rmf')
class(ar[,,1])
#> [1] "matrix" "array"

# subsetting a rmf_array always returns a rmf_array...
class(rmf_ar)
#> [1] "rmf_4d_array"
class(rmf_ar[,,1,1:2])
#> [1] "rmf_4d_array"
class(rmf_ar[,,,1])
#> [1] "rmf_3d_array"
class(rmf_ar[,,1,1])
#> [1] "rmf_2d_array"

# ... unless return is 1D
class(rmf_ar[,1,1,1])
#> [1] "numeric"

This creates a consistency when using rmf_arrays. In the example with a one-layered model, classic R arrays would drop the unused 3th dimension. This would result in a 3D array where the third dimension suddenly represents time-steps instead of layers. Furthermore, as long as a subsetted rmf_array remains in fact an array (has a dim argument), it should still be one of the rmf_array classes. For example, extracting a layer from a rmf_3d_array should return a rmf_2d_array instead of becoming a matrix:

# extracting a single layer returns a rmf_array
dis <- rmf_example_file("example-model.dis") %>% 
  rmf_read_dis()

dis$botm[,,5]
#> RMODFLOW 2d array with 10 rows and 10 columns, representing the i & j dimensions. 
#> Not representing stress period data 
#>            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
#>  [1,] -14.50490 -14.83648 -14.92529 -14.85315 -14.88318 -15.22660 -15.60427
#>  [2,] -14.81713 -14.90889 -14.83753 -14.99092 -15.69238 -16.18429 -16.32294
#>  [3,] -14.96890 -14.85298 -15.06713 -15.79833 -16.28175 -16.32756 -16.73219
#>  [4,] -14.99913 -14.83854 -15.57881 -16.28461 -16.33874 -17.08000 -17.76279
#>  [5,] -14.95689 -15.00021 -16.16134 -16.32473 -17.10558 -17.77778 -16.96283
#>  [6,] -14.84188 -15.63926 -16.31119 -17.02771 -17.78451 -17.00444 -16.31083
#>  [7,] -14.89941 -16.07933 -16.50189 -17.78640 -17.02192 -16.31098 -15.87303
#>  [8,] -14.86487 -15.41835 -16.31006 -16.37534 -16.31068 -15.96870 -14.89909
#>  [9,] -14.90482 -14.86794 -14.92409 -15.38964 -15.45983 -14.93413 -14.91179
#> [10,] -14.61301 -15.13022 -15.15843 -14.96830 -14.88495 -14.88413 -14.86478
#>            [,8]      [,9]     [,10]
#>  [1,] -15.74037 -15.36730 -14.83495
#>  [2,] -16.31697 -16.19521 -14.95860
#>  [3,] -17.61780 -16.32182 -14.97984
#>  [4,] -16.82500 -16.19919 -14.84082
#>  [5,] -16.31050 -15.32750 -14.89233
#>  [6,] -15.77645 -14.83797 -14.88540
#>  [7,] -14.87565 -14.90725 -14.53460
#>  [8,] -14.92088 -14.75577 -14.16998
#>  [9,] -14.89658 -14.34830 -14.10139
#> [10,] -14.41982 -14.11653 -14.18420

Time

Time in MODFLOW is represented in two ways: through time-steps and stress periods. Time-steps are inherit to the finite-difference equations used by MODFLOW whereas stress periods are used as a convenience for the user. A stress period encompases one or more time-steps during which all temporal input remains constant. Arrays for input files such as recharge arrays can therefore vary per stress period. Output arrays such as simulated heads however, vary on a time-step basis. This creates a design choice for the rmf_array: does the 4th dimension represent time-steps or stress-periods? As stated above, the 4th dimension for a rmf_array represent time-steps since the other dimensions (rows, columns and layers) are also inherited from the finite-difference approximation. If you want a rmf_array to represent constant input over one or more stress periods, you can supply a kper argument to rmf_create_array:

rmf_create_array(1.5, dim = c(2, 2), kper = c(1,4))
#> RMODFLOW 2d array with 2 rows and 2 columns, representing the i & j dimensions. 
#> Active during stress periods: 1 4 
#>      [,1] [,2]
#> [1,]  1.5  1.5
#> [2,]  1.5  1.5

This sets the kper attribute which is used by the create functions for boundary-conditions to know during which stress-periods this array should be active.

Reading & writing rmf_arrays

In a MODFLOW input file, an array is preceded by a single-line, the array-control record. This line contains information on the type and format of values in the following array and where to read them. There are two types of array-control records:

Fixed-format

LOCAT CNSTNT FMTIN IPRN

Free-format. This type can take one of four possible forms:

CONSTANT CNSTNT
INTERNAL CNSTNT FMTIN IPRN
EXTERNAL Nunit CNSTNT FMTIN IPRN
OPEN/CLOSE FNAME CNSTNT FMTIN IPRN

To add to the complexity, fixed-format and free-format EXTERNAL and OPEN/CLOSE headers can point to binary files which contain the array. After the array-control records, the array is being read. Each control-record is explained in the manual and online help. RMODFLOW can read fixed-format and all 4 types of free-format array-headers. There is no support for writing fixed-format arrays.

To read MODFLOW arrays, RMODFLOW uses the internal rmfi_parse_array function which is called in all rmf_read_* functions that need to read in arrays. If the array-control record is fixed-format, a nam object (as returned from rmf_read_nam) needs to be supplied in the call to the rmf_read_* function (unless LOCAT = 0). This is because the LOCAT variable represent a FORTRAN unit number which points to the file containing the array. Without the corresponding NAME file, it is impossible to know which file this unit number represents. The same is true when reading EXTERNAL files. If the array is binary, the user must know if the array is single or double precision. By default, the precision argument is set to 'single'.

Writing arrays to input files is handled by the internal rmfi_write_array function. By default, arrays are written in free-format using either the CONSTANT or INTERNAL array-control header, depending on whether or not there is more than one unique value in the supplied array. There is support for writing EXTERNAL and OPEN/CLOSE arrays to external files, but this is discouraged. Keeping all data in a single file creates a more clear and cohesive structure. To write EXTERNAL arrays, you will need to supply the external argument to the rmf_write_* call which is a character vector with the names of the data sets to write. It will also need a nam object to check for a unique unit number (Nunit). You will have to add this unit number manually to a NAM file yourself. This is because the rmfi_write_array function is not intended to alter existing RMODFLOW objects; only to write arrays. To write an OPEN/CLOSE file, a fname argument needs to be given which is a character vector with the names of the data sets to write. Each OPEN/CLOSE file can contain only one 2D array. Both EXTERNAL and OPEN/CLOSE files can be written as binary by supplying a binary argument which is a character vector with the names of the data sets to write as binary.

dis <- rmf_example_file("example-model.dis") %>% rmf_read_dis()
nam <- rmf_example_file("example-model.nam") %>% rmf_read_nam()

# this will create separate files for the 3D botm array and for the 2D top array using the EXTERNAL header
# it will also generate a warning as a reminder to add Nunit to the NAME file 
rmf_write_dis(dis = dis, file = "input.dis", external = c("botm", "top"), nam = nam)

# this will create separate binary files for each layer in botm using the OPEN/CLOSE header
rmf_write_dis(dis = dis, file = "input.dis", fname = c("botm"), binary = c("botm"))

Lastly, the rmf_read_array() and rmf_write_array() functions can be used to read/write an rmf_array from/to separate files. These functions are not called when reading MODFLOW packages, but rather serve as utility functions when a user needs to read/write an array from/to MODFLOW supported files instead of writing an entire MODFLOW package. Because they are intended to be used outside the context of packages, they do not recognize array-control records. They can however read/write a MODFLOW output header such as:

KSTP KPER PERTIM TOTIM DESC NCOL NROW ILAY

3. List-directed input

Where array-directed input represents continuous data over the MODFLOW grid, list-directed input can be thought of as discrete (spatial) features. Examples include wells, rivers, drains etc. Some output comes in list-directed form as well, most notably certain cell-by-cell fluxes from discrete boundary conditions. List-directed input basically looks like a R data.frame, containing rows and columns. Each row represent a discrete feature, the columns contain the variables. A MODFLOW list always contains the following columns: i, j, k representing the row, column and layer indices of the feature, respectively. Additional columns contain the variables, e.g. the well package has the q variable representing the volumetric flux of the well.

In RMODFLOW, MODFLOW lists are handled by the rmf_list class and corresponding S3 methods. A rmf_list is basically a data.frame. This might be confusing, since a list is a different class in R. We’ve chosen to follow the naming conventions in MODFLOW however.

Creating

To create a rmf_list, use the rmf_create_list() function on a data.frame-like object which contains at least i, j and k columns. The passed object will be coerced to a data.frame.

wells <- tibble::tribble(~i, ~j, ~k, ~q,
                1,   1,  1, -25,
                2,   1,  1, -50) %>% rmf_create_list()
str(wells)
#> Classes 'rmf_list' and 'data.frame': 2 obs. of  4 variables:
#>  $ i: num  1 2
#>  $ j: num  1 1
#>  $ k: num  1 1
#>  $ q: num  -25 -50

A rmf_list is mostly used when handling boundary condition packages. Because their input may vary per stress period, a kper attribute can be set on rmf_list as well:

wel <- matrix(c(1, 1, 2, -25), ncol = 4) 
colnames(wel) <- c("i", "j", "k", "q")

rmf_create_list(wel, kper = c(1,4)) %>% str()
#> Classes 'rmf_list' and 'data.frame': 1 obs. of  4 variables:
#>  $ i: num 1
#>  $ j: num 1
#>  $ k: num 2
#>  $ q: num -25
#>  - attr(*, "kper")= num [1:2] 1 4

Time-steps are not readily handled by a rmf_list at the moment.

Reading & writing rmf_lists

The internal function rmfi_parse_list is used by rmf_read_* functions that need to read MODFLOW lists. RMODFLOW can handle reading external and open/close lists as well. There is also support for reading the rarely used SFAC value. Additionally, binary lists can also be read. There is no mention of binary lists in the MODFLOW-2005 manual, but the source code does contain functions for handling these types of lists.

There is no separate rmfi_write_list function. Writing a list consist of sequentially calling rmfi_write_variables on each row. There is currently no support for writing external or open/close lists. Writing of the SFAC value and of binary lists is also not available. Because of the limited use of these capabilities, these functionalities might not be added at all in RMODFLOW.

Conversion

RMODFLOW can convert between a rmf_list and a rmf_array with the rmf_as_list() and rmf_as_array() functions.

dis <- rmf_create_dis(nrow = 3, ncol = 2, nlay = 1)
ar <- rmf_create_array(dim = c(dis$nrow, dis$ncol, dis$nlay, 2))
ar[] <- 1:prod(dim(ar))
ar
#> RMODFLOW 4d array with 3 rows, 2 columns, 1 layer and 2 timesteps, representing the i, j, k & l dimensions. 
#> Not representing stress period data 
#> , , 1, 1
#> 
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6
#> 
#> , , 1, 2
#> 
#>      [,1] [,2]
#> [1,]    7   10
#> [2,]    8   11
#> [3,]    9   12

rmf_as_list(ar[,,,1])
#> RMODFLOW list with 6 features and 1 variable 
#> Not representing stress period data 
#>   i j k value
#> 1 1 1 1     1
#> 2 2 1 1     2
#> 3 3 1 1     3
#> 4 1 2 1     4
#> 5 2 2 1     5
#> 6 3 2 1     6

# a rmf_as_list.rmf_4d_array needs a l index for the 4th dimension
lst <- rmf_as_list(ar, l = 2)
lst 
#> RMODFLOW list with 6 features and 1 variable 
#> Not representing stress period data 
#>   i j k value
#> 1 1 1 1     7
#> 2 2 1 1     8
#> 3 3 1 1     9
#> 4 1 2 1    10
#> 5 2 2 1    11
#> 6 3 2 1    12

# rmf_as_array needs a dis object to guess the dimensions of the array
rmf_as_array(lst, dis = dis)
#> RMODFLOW 2d array with 3 rows and 2 columns, representing the i & j dimensions. 
#> Not representing stress period data 
#>      [,1] [,2]
#> [1,]    7   10
#> [2,]    8   11
#> [3,]    9   12

Cas Neyens

2021-10-12

1. Scalar values

2. Array-directed input

Creating

Subsetting

Time

Reading & writing rmf_arrays

3. List-directed input

Creating

Reading & writing rmf_lists

Conversion