Name:
Andrew ID:
Collaborated with:

On this homework, you can collaborate with your classmates, but you must identify their names above, and you must submit your own homework as an knitted HTML file on Canvas, by Sunday 10pm, this week.

Introduction to percolation

In this homework, we’ll be coding a series of functions to investigate percolation via simulation. We’ll spend this section to discuss the problem and setup. You can read related literature on this topic at Wikipedia, but we’ll be working with a simplified setting for this homework. As a word of caution, this is a coding heavy homework (as opposed to statistical), so be prepared to spend a lot of time debugging and testing.

Here’s the idea. Imagine you have a square board (10 by 10 squares) like the left board show below. This board consist of white “open” squares and black “blocked” squares. We are interested in knowing (abstractly) if we “pour water” from the top of the matrix, does the water “leak” from the bottom of the matrix. This is demonstrated in the right board. (Don’t worry, we’ll explain all the necessary specifics later.) You can think of this as the following: we indefinitely keep pouring water into each white square in the top row at the same time, and water runs through the board by spreading to any adjacent white square (left, right, top, and bottom). We keep pouring water until all the possibly-flooded squares are flooded. These are the blue squares shown below in the right board. Once we’re done, we see if the water reached any of the open squares in the bottom row of the board (i.e., are there any blue squares in the bottom row?). If so, we say the board “percolates”. Otherwise, it does not percolate. In the figure shown below, the displayed board percolates.

However, not all boards percolate. Consider the next board, shown below. Once again, we show the initial board on the left, and we start pouring water, resulting in the board on the right.

Now imagine we had a way to randomly generate these boards. We would like to see if there are certain types of randomly-generated boards that more likely to percolate. We’ll formalize this task in this homework.

So what are the goals of this homework? We will be writing the following functions: generate.board() will generate a random board with white “open” squares and black “blocked” squares. is.valid() will check whether or not a given board is correctly formatted. plot.board() will plot the board, similar to the four boards shown above. The last two functions contribute the challenges to this homework: percolate.board() determines whether or not a board percolates, and read.board() reads in a text file that specifies many boards.

We will not give you explicit guidance on how to debug and test throughout most of this homework, but the concepts and tools you’ve learned in lecture will certainly be beneficial as you write and try out your code.

Note: The grading of this homework with depend mainly on whether or not you pass the test cases provided. Your implementation of percolate.board() and read.boards() might look quite different from another classmates, but be you sure that your code passes the tests if you want full credit!

Generating and plotting boards

In the first section, we will write generate.board(), is.valid() and plot.board(). To no surprise, we can represent these boards as square, numeric matrices with dimension n by n. These matrices will only have values 0 (for black “blocked” squares), 1 (for white “open and dry” squares), and 2 (for blue “open and flooded”) squares.

For example, we provide a specific board below, mat.example. The desired plot you should produce when running plot.board(mat.example) is also shown below. Plot your output for plot.board(mat.example).

mat.example = matrix(c(0,1,1,1,0, 0,1,1,0,1, 0,0,1,0,0, 0,0,0,2,2, 2,2,2,2,0), 5, 5)

Percolating the board

We will now write percolate.board() that takes as input mat (a matrix) and outputs a list with two entries: result.mat, the resulting matrix after “water has been poured”, and result, a boolean on whether or not mat percolates. The function should use the is.valid() function to check that mat is a valid matrix first, and then use a separate call to assert_that() to ensure mat contains only 0’s and 1’s.

When computing result.mat, any open & dry square (i.e, value of 1) in the top row of mat are changed to open & flooded (i.e, changed to a value of 2). When done, result.mat should have the same exact pattern of blocked squares (i.e., value of 0) as mat. In addition, any open & dry square that is adjacent to any open & flooded square becomes open & flooded (via left, right, top and bottm). Once no open & dry square can become open & flooded, your algorithm is done computing result.mat. To determine whether or not mat percolates, check to see if there are any open & flooded square along the bottom row of result.mat.

Important: This is not a course about algorithms. You do not need to write an efficient algorithm, but it needs to be correct. Feel free to use any strategy for this algorithm, even if it’s computationally inefficient.

mat.example.list = list(matrix(c(1,1,1,1,0, 0,0,0,1,0, 1,1,1,1,0, 0,1,0,0,0, 0,1,1,1,1), 5, 5), 
                        matrix(c(1,1,1,1,0, 0,0,0,1,0, 0,1,1,1,0, 0,1,0,0,0, 0,1,1,1,1), 5, 5),
                        matrix(c(1,1,1,1,0, 0,0,0,1,0, 0,1,1,0,0, 0,1,0,0,0, 0,1,1,1,1), 5, 5))

You may (optionally) add additional test cases to this question (Q2c) to test specific boards that we provided in mat.list or boards of your own. This is to your own benefit to do so since you want to have some sort of guarantee that as you “fix” your code, old bugs that have been “dealt with” do not “reappear”.

test_that("percolate.board() works with all the test cases",{
  load(url("http://stat.cmu.edu/~ryantibs/statcomp-S18/data/percolation_test.RData"))
  
  your.result.list = lapply(mat.list, percolate.board)
  
  bool.vec = sapply(1:length(mat.list), function(x){
    identical(result.list[[x]], your.result.list[[x]])
  })
  
  expect_true(all(bool.vec))
})
n = 500
system.time(matrix(rnorm(n^2), n, n) %*% matrix(rnorm(n^2), n, n))
##    user  system elapsed 
##   0.147   0.006   0.157

The elapsed number is the total time that has passed. We can extract this number in the following way. (This number might be slightly different from above since we’re running two separate instances of matrix multiplication.)

system.time(matrix(rnorm(n^2), n, n) %*% matrix(rnorm(n^2), n, n))[3]
## elapsed 
##   0.229

For p=0.4, run 10 trials for each value of n from 10 to 100 with spacing of 10. For each trial, generate a random board using generate.board(). Then, record the elapsed time to compute whether or not the board percolates using percolate.board(). That is, there will be a total of 10*10 = 100 times percolate.board() will be called. After all the trials are completed, compute average elapsed time for each n. (Hint: Writing this set of simulations as an sapply() might make your life easier.) (Note: This function might take a while to run, so it’s highly encouraged to fully debug your code beforehand and use coding practices you learned Week 8 when doing this.)

Then, make a line plot of elapsed time verses n^2. Label your plot appropriately. After inspecting your graph, does it seem like your algorithm’s elapsed time (often called the complexity of the algorithm) linear with n^2? (For students more comfortable with CS theory: you can try other values of n and change your x-axis to other terms such as n^2*log(n) if needed.)

This challenge problem is for the CS-savy students. The trick is to use efficient data structures that allow faster computation compared to the matrix. If you want a hint on what to code, you can read about the union/find data structure in this this paper for inspiration.

After inspecting your graph, do you notice a “phase transition” effect where a small change in p results in a large change in the probability of percolation? Roughly what value of p do you estimate this value to be, and how do you think it relates to n?

Reading in boards

In this final section, we will write read.boards(), a function that takes in a filepath to a text file and outputs a list of matrices that the text file represents. We describe the specific format of how read.boards() should work. Let file denote the text file that conatins our boards. Each board in file is separated by exactly ----, four dashes. There must be ---- before and after each board. If not, read.boards() should error, stating that the file is not properly formatted. This is the only way read.boards() should throw an error. Hence, the length of the returned list is equal to the number of ---- lines in file minus one.

The first non-empty line strictly between the two ---- lines must be a positive integer. This number represents n, the size of the board. The next n lines afterwards then denote the precise pattern of the n by n board, row-wise. Each line represents a row of the board and should have exactly n visible characters, * or .. The * character represents a blocked square, while the . character represents a open square. If the lines between the two ---- lines do not meet these specifications, then return an NA (for an incorrectly specified board) instead of an n by n matrix for this particular board.

Other than storing information pertinent to boards, file should not contain any other visible characters, but file might have extra spaces or extra line breaks.

We provide a small example of what to expect. The file at http://stat.cmu.edu/~ryantibs/statcomp-S18/data/percolation_write_example.txt provides 3 boards, the first two of which are correctly specified and the last of which is not. The raw text of this file looks like this:

----
4
* . . *
. . * *
. * . .
. . . .
----
5
. . . . .
* . . . *
. . . . *
. * . . *
* * * * .
----
4
. . . .
* . . *
. . a *
. * . .
----

The output of read.boards("http://stat.cmu.edu/~ryantibs/statcomp-S18/data/percolation_write_example.txt") should then be the following:

## [[1]]
##      [,1] [,2] [,3] [,4]
## [1,]    0    1    1    0
## [2,]    1    1    0    0
## [3,]    1    0    1    1
## [4,]    1    1    1    1
## 
## [[2]]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    1    1    1    1
## [2,]    0    1    1    1    0
## [3,]    1    1    1    1    0
## [4,]    1    0    1    1    0
## [5,]    0    0    0    0    1
## 
## [[3]]
## [1] NA

Note: To receive full credit for this question, you need to correctly provide the output to Q3a, pass the tests in Q3b, and plot the correct figure in Q3c. You do not need to deal with all the nuisances of how to handle misspecified boards. However, that is handled in the challenge question below if you choose to pursue this.