Evolving with R #2: Loops & vectors

We’ll explore the simple simulation from the last lesson to learn more about controlling the program with loops, and storing data with vectors. To start, let’s try to understand the format and effect of the squiggly brackets used below.

pop = c(0, 1)
for(i in 1:98)
{
   pop = c(pop, sample(pop, 1))
}

We can actually rewrite this to be a little shorter without any change to how the program runs.

pop = c(0, 1)
for(i in 1:98) pop = c(pop, sample(pop, 1))

So why did we use the squiggly brackets? They create a block of code, which is basically a bunch of lines that are grouped together. This lets us control the execution of those lines as a group. Compare the execution of these two versions:

Code A

pop = c(0,1)
for(i in 1:98) newIndividual = sample(pop, 1)
pop = c(pop, newIndividual)

Code B

pop = c(0,1)
for(i in 1:98)
{
   newIndividual = sample(pop, 1)
   pop = c(pop, newIndividual)
}

Why didn’t version A work in the same way as version B? Simply because, without the brackets to define a block, the first statement following the declaration of the for loop is treated as the entirety of the code to loop over. Then, after 98 repetitions, pop = c(pop, newIndividual) executes once. Blocking the two lines together lets us control them together using the for loop.

Code B also has some formatting that doesn’t matter to R, but helps us; I’ve put each statement in the block on its own line and given each the same indent. This is optional; the following code, using the semicolon to separate statements, works just as well.

pop = c(0,1); for(i in 1:98) {newIndividual = sample(pop, 1); pop = c(pop, newIndividual)}

Generally, you want to use whitespace to format the code nicely and consistently to help you read it. This becomes much more important when you start to nest blocks–that is, write loops or other control structures within other loops. Nested loops are a simple way to perform replicate simulations.

Founder effect with replication

reps = 10
results = NULL
for(r in 1:reps)
{
   pop = c(0,1)
   for(i in 1:98)
   {
       newIndividual = sample(pop, 1)
       pop = c(pop, newIndividual)
   }
   results = c(results, mean(pop))
}

results
[1] 0.37 0.96 0.44 0.02 0.46 0.70 0.09 0.09 0.02 0.69

Note that we assembled this code inside-out: we wrote and tested the inner loop, which performs a single simulation, then wrapped another loop around it to form a more complex program.

The above code used a new trick: declaring results as equal to the special word NULL. This makes a variable with no contents and a length of zero which we can then grow by using c(). If you’re using R’s built-in editor then words like NULL should appear in a distinct color, along with other reserved words: terms that mean special things and can’t be reassigned to variables or functions. These are case-sensitive, so the special coloring helps you see if you’ve typed them correctly.

Using c() to grow a vector by adding results as you compute them is a pretty flexible way to store data because you don’t need to predict how many things you will have to store. However, the cost of this flexibility is execution speed: overuse of c() can be slow, because R has to allocate more memory to the variable as it grows. One option is to make an empty vector of the correct length before you start filling it in with data. We can easily do this using the rep() function.

results = rep(0, reps)
results
[1] 0 0 0 0 0 0 0 0 0 0

Now we need a way to access each box within the vector. There are actually a few options, but the simplest is to used the built-in address of each box. We call the address of a particular box in a vector an index; indices start with 1 and count up.

Note again the number in square brackets that prints to the side of the vector contents above; this is a little hint that we can use square brackets to access the contents of the vector by index.

results[5] = 10
results
[1] 0 0 0 0 10 0 0 0 0 0


Exercise

Modify the code ‘Founder effect with replication’ to make a vector of zeros to store the results of each replicate.


Accessing single boxes in a vector is useful, but R lets you do a lot more than that. You can access multiple boxes at once by putting a vector of indices inside the square brackets.

results[3:5] = 15
results
[1] 0 0 15 15 15 0 0 0 0 0
results[1:4] = c(10, 20, 30, 40)
results
[1] 10 20 30 40 15 0 0 0 0 0
results[1:4] = c(10, 20)
results
[1] 10 20 10 20 15 0 0 0 0 0

With this last example, you may be sensing a theme: R’s flexibility means that code might execute but not do what you want. Because the length of the subvector “results[1:4]” is a multiple of the length of the vector on the right, R cycles through the shorter vector to assign values to the longer one. If you made a mistake in constructing the vector on the right and left out two values, you might wish this command had generated an error. In fact, R will complain if we modify the example a little.

> results[1:5] = c(10, 20)
Warning message:
In results[1:5] = c(10, 20) :
number of items to replace is not a multiple of replacement length

Notice that this is a warning; a warning doesn’t stop the statement from executing, but it does send a message for your attention. Pay attention! Warnings can almost always be fixed and generally indicate real problems.

Another aspect of this flexibility is visible in how R treats the length of a vector. We’ve already seen that you can grow a vector by using c() to add elements to the end. You can also use square brackets to do this. For example, the code below would result in serious errors in a C-like programming environment, but in R it executes without error, but with some interesting side effects.

 x = 1:10
x
[1] 1 2 3 4 5 6 7 8 9 10
x[15]
[1] NA
x[15] = 12
x
[1] 1 2 3 4 6 7 8 9 10 NA NA NA NA NA 12

NA is a special type of object; we’ll look at its properties in the next lesson when we learn about comparing variables. Here it serves to indicate a value in a vector that you haven’t set, and it’s the only indication that you’ve extended this vector past the limits of its original contents.

One less intuitive ability of square brackets is in removing elements from a vector.

x = 1:10
x
[1] 1 2 3 4 5 6 7 8 9 10
x[-5]
[1] 1 2 3 4 6 7 8 9 10
x
[1] 1 2 3 4 5 6 7 8 9 10
x = x[-5]
x
[1] 1 2 3 4 6 7 8 9 10


Exercise

Modify the code ‘Founder effect with replication’ to start with a population of size 100, with fifty ‘1’s and fifty ‘0’s. Then modify the inner loop to remove a random individual after each offspring is added, so you simulate reproduction and death in equal quantities. Gather and analyze the results of doing 100 replication-and-death events for multiple replicate populations.


 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s