General Comments: Overall the section does a decent job of explaining the steps for the SPEW algorithm. I get the impression that the code is well designed, and made with an eye towards the future and easy expansion and reuse. The biggest problem throughout this section is that you don't take credit for all of the work you have done! There were a few things that confused me, which I describe below, but most of my suggestions are to take more credit for your work. Specific Comments: Paragraph 1: I think you can combine the first two sentences: "In the previous section we showed that we require population counts, geographies, and microdata to generate synthetic ecosystems." Paragraph 2: In general in this paragraph it feels like you say SPEW does something, and then say that it happens to solve a problem. I think it would read better and be more convincing if instead you framed it in terms of: We designed SPEW to do something so that it solves this problem. I think you could begin by just saying what exactly the purpose of SPEW is: "We created the R Pacakge SPEW to provide a general engine for generation of synthetic ecosystems from the three data sources described above." I think begining with what you did adds emphasis to it, and calls your accomplisment to the reader's attention. Then you can list some of the achievements of SPEW: 1) We designed SPEW to read data in a standard format, and this format helps us understand precisely what our data sources must look like to generate a synthetic ecosystem. 2) We also designed SPEW to general enough to create synthetic ecosystems for any set of data sources. 3) This standardization and generality makes it easier to obtain reliable results and extend the functionality to new methods. I think changing this framework allows you to take credit for the great work that you have done! Figure 1: You never reference this figure Paragraph 3: I wonder if you could remove this paragraph and just mention where the code is available when you introduce the package at the beginning of paragraph 2. All of the things you say about the benefits of having the code available online are definitely true, but I like the way Brian said that it's better if you let the reader realize how awesome it is without having to tell them. Top of Page 6: spew should probably be SPEW for consistency purposes. I think a set of mutually exclusive sets that join to create the full set is a partition. I wonder if you could define this here, and use it to make talking about your partition into separate regions easier. Page 6, paragraph 2: You talk a lot about PUMS and PUMA, and the similarity of these two acronyms really confused me. Maybe it's impossible to not use both acronyms, but maybe you can try to minimize going back and forth by rearranging sentences so that they are all about PUMS, then all about PUMA in this paragraph. I was a bit confused about how the PUMA and the tracts interacted I think that the idea is that there is a region ID, and this is the PUMA, and within each PUMA there are tract numbers, and the tract numbers are not unique, but the combination of PUMA and tract number is. If there is this hierarchy of labeling, I think pointing that out would be useful. Pseudocode: for loop line 3: For the pseudocode you say you attach people to households. Are the people also simulated at some point? for loop line 4: I wonder if you could name the other data. Something like: "Add supplementary variables (e.g.: schools, workplaces, etc...)" Then in the output you can say Synthetic Households, People, and Supplementary Variables. The ... at the end of the output makes me think that the output is not fixed. I understand that it will be, but it seemed strange to me when I first saw it. Last paragraph of page 6: Again use SPEW instead of spew for consistency Since you haven't used IPF in this section yet, maybe you can expand it out here. My take away from this paragraph is that you are using simple random sampling currently as a way to get a working version of the code running, but ideally you could use better methods in the future. I'm not sure, but I'm guessing the code is designed so that changing the random sampling step is relatively easy. If this is the case, you should say so! It's another example of how modular your code is and I think calling attention to that would be great. Section 4.2: Paragraph 2: I don't know if you need to tell the reader the specfici breakdown of the nodes/processor/cores breakdown. I think just saying you can have 1536 processes running in parallel is sufficient. Paragraph 3: While it is true that your code is "embarrasingly parallel", I don't know that you need to tell the reader this! Also, you haven't actually defined embarrasingly parallel, so the reader may only take away that the task was "embarrasingly" easy. I don't actually think this is the case. Instead, I think "embarrasingly parallel" essentially means that the sampling process for each tract/PUMA is independent. Instead I think you could rewrite this paragraph to say: 1) the sampling processes are independent (maybe say intuitively why this is the case as well) 2) You can take advantage of this independence to run the sampling processes in parallel. While this task may be easier than other parallelization tasks, you should still take credit for doing it!