I have been working on learning R for several months now, and continue to get better at it and enjoy it more all the time. I am currently working on a spoken word recognition project at work. The task we are using is quite simple. Participants listen to words that have been mixed with multi-talker babble (kind of like background conversation at a cocktail party), and type in what they hear. We are analyzing the errors they make to try to discover what sorts of words are activated in the brain, and how people organize the words they know in their brains.
We started running subjects in the summer of 2007, and after running 48 subjects, we made a few methodological changes. One change is that we switched from using the same segment of babble as background noise to using a random segment of babble. We also re-leveled the sound files after adding the babble so that all the words were presented at about the same volume. I recently analyzed the data from these two groups. It turns out the the results from the first 48 subjects performed significantly better than subjects 49-96. However, I realized that these are different subjects, so it could be due to the different subjects, and not the change in methods. Therefore I split each group into 2 subgroups. Now I had 4 groups — 1-24, 24-48, 49-72, and 73-96. The subgroups did not differ statistically from one another, but the main groups did. This seemed to indicate that listeners were becoming habituated to the noise, and could therefore tune it out a bit better, and thus hear the words better.
I originally displayed this in a table, but it was pretty cumbersome, so I decided to make a bar chart instead. I have seen some people produce bar charts such as these where they indicate which groups are statistically different using brackets to group bars, and an asterisk to mark them as statistically different. I also wanted to add error bars in. Adam Krawitz (another post-doc at IU) gave me some code to get started on the error bars, and I found some code from a post to the R-help list about making brackets. First here is the figure:
And here is the code:
# make a nice bar plot
# first we find the mean of each group
bardata <- c(mean(items1to24$pc,na.rm=T),
mean(items25to48$pc,na.rm=T),
mean(items49to72$pc,na.rm=T),
mean(items73to96$pc,na.rm=T)
)
#Then we do t-tests to calculate the 95% confidence intervals for our error bars
items1to24T=t.test(items1to24$pc)
items25to48T=t.test(items25to48$pc)
items49to72T=t.test(items49to72$pc)
items73to96T=t.test(items73to96$pc)
lowerConf=c(items1to24T$conf.int[1],
items25to48T$conf.int[1],
items49to72T$conf.int[1],
items73to96T$conf.int[1]
)
upperConf=c(items1to24T$conf.int[2],
items25to48T$conf.int[2],
items49to72T$conf.int[2],
items73to96T$conf.int[2]
)
# we set up some nice colors
barcolors=c('#FF3333','#FF3333','#4444FF','#4444FF')
bar <- barplot(bardata, ylim = c(0,1), col=barcolors, names.arg=c("1-24","25-48","49-72", "73-96"),ylab='proportion correct', xlab='subject group',xlim=c(0,1), width=.24,tck=.04)
#we use the arrows function to do the error bars
arrows(bar, upperConf, bar, lowerConf,
length = 0.05, # width of the arrowhead
angle = 90,
code = 3
)
# function to draw curly braces in red
# x1...y2 are the ends of the brace
# for upside down braces, x1 > x2 and y1 > y2
# taken from R help
# http://finzi.psych.upenn.edu/R/Rhelp02a/archive/112010.html
library(grid) #requires the grid library
Brack <- function(x1,y1,x2,y2,h)
{
x2 <- x2-x1; y2 <- y2-y1
v1 <- viewport(x=x1,y=y1,width=sqrt(x2^2+y2^2),
height=h,angle=180*atan2(y2,x2)/pi,
just=c("left","bottom"),gp=gpar(col="black"))
pushViewport(v1)
grid.curve(x2=0,y2=0,x1=.125,y1=.5,curvature=.5)
grid.move.to(.125,.5)
grid.line.to(.375,.5)
grid.curve(x1=.375,y1=.5,x2=.5,y2=1,curvature=.5)
grid.curve(x2=1,y2=0,x1=.875,y1=.5,curvature=-.5)
grid.move.to(.875,.5)
grid.line.to(.625,.5)
grid.curve(x2=.625,y2=.5,x1=.5,y1=1,curvature=.5)
popViewport()
}
# Now we use the brack function
# brackY1 and brack Y2 determine the y-value for the placement of the brackets.
# I had to fudge this a bit to get it to look right
brackY1=1.05*max(upperConf[1:2])
brackY2=1.21*max(upperConf[3:4])
Brack(bar[1]+.07,brackY1,bar[2]-.01,brackY1,.07)
Brack(bar[3]-.09,brackY2,bar[4]-.17,brackY2,.11)
# And now we also make some straight line groupings
arrows(mean(bar[1:2]),1.4*upperConf[1],mean(bar[3:4]), 1.4*upperConf[1],length=0)
arrows(mean(bar[1:2]),1.4*upperConf[1],mean(bar[1:2]), 1.35*upperConf[1],length=0)
arrows(mean(bar[3:4]),1.4*upperConf[1],mean(bar[3:4]), 1.35*upperConf[1],length=0)
arrows(mean(bar[2:3]),1.4*upperConf[1],mean(bar[2:3]), 1.45*upperConf[1],length=0)
# we label the groupings
text(mean(bar[1:2]),1.25*upperConf[1],"fixed babble",cex=.9,col=barcolors[1])
text(mean(bar[3:4]),1.25*upperConf[1],"random babble",cex=.9,col=barcolors[3])
# we add the stats in
text(mean(bar[2:3]),1.55*upperConf[1],paste("t =", formatC(itemBetweenT$statistic,digits=3), ", p", prettyPval(itemBetweenT$p.value)),cex=.9)
# and we add the values on the bars as well
text(bar[1],.7*bardata[1],formatC(bardata[1],digits=3),col='white')
text(bar[2],.7*bardata[2],formatC(bardata[2],digits=3),col='white')
text(bar[3],.7*bardata[3],formatC(bardata[3],digits=3),col='white')
text(bar[4],.7*bardata[4],formatC(bardata[4],digits=3),col='white')
# and then I print it to a pdf file
dev.print(pdf,"fig/48v96.pdf",height=2.8,width=4,pointsize=11)
While these results were suggestive of a difference, it is also possible that the difference was due to the re-leveling, and not the fixed vs. random babble. If the listeners were really habituating to the fixed babble, this should show up as an improvement over the course of the experiment. Therefore I also compared the learning rates of these two groups. It is normal for participants to get better as they get more familiar with a task, so we will always expect some learning. However if the participants are becoming habituated to a particular aspect of the stimuli, (in this case the fact that the same segment of babble is being used over and over again), then we would expect more learning in this group. This figure displays the percent correct over a moving 30 trial window. That is, the first point represents trials 1-30, the second point 2-31 and so on. It looks like the fixed-babble group is learning more. To test this statistically, I subtracted the fixed babble values from the random babble values, and performed a correlation on these values against the trial window. If the learning rates are the same, then there should be no correlation. If the fixed babble group is learning more rapidly however, we should see the difference between the two groups increase over the course of the experiment, which is what we found. Here is the nice figure:
And here is the code:
#compute the percent correct for each window
numPoints=min(length(unique(Trials1$Trial)),length(unique(Trials1$Trial)))-trialWindow
pc1=c(1:numPoints)
pc2=c(1:numPoints)
xvalues=c(1:numPoints)
for (i in 0:numPoints) {
upperTrials1=subset(Trials1,Trials1$Trial>i)
theseTrials1=subset(upperTrials1,upperTrials1$Trial<=i+trialWindow)
corr1 = (nrow(subset(theseTrials1, theseTrials1$correct=="CORRECT")))
wrong1 = (nrow(subset(theseTrials1,theseTrials1$correct=="WRONG")))
missing1 = (nrow(subset(theseTrials1,theseTrials1$correct=="MISSING")))
nonword1 = (nrow(subset(theseTrials1,theseTrials1$correct=="NONWORD")))
pc1[i] = corr1/(corr1+wrong1+nonword1+missing1)
upperTrials2=subset(Trials2,Trials2$Trial>i)
theseTrials2=subset(upperTrials2,upperTrials2$Trial<=i+trialWindow)
corr2 = (nrow(subset(theseTrials2, theseTrials2$correct=="CORRECT")))
wrong2 = (nrow(subset(theseTrials2,theseTrials2$correct=="WRONG")))
missing2 = (nrow(subset(theseTrials2,theseTrials2$correct=="MISSING")))
nonword2 = (nrow(subset(theseTrials2,theseTrials2$correct=="NONWORD")))
pc2[i] = corr2/(corr2+wrong2+nonword2+missing2)
xvalues[i]=i;
}
# compute the differences between the two groups
diffs=pc1-pc2
#set up some graphical paramets
par(mar=c(3,3,0.3,3),
fin=c(4.1,2.9),
fig=c(0,1,0,1),
mgp=c(2,1,0),
#usr=c(min(xdata),max(xdata),0,1),
lab=c(10,10,3),
#ps=11,
family='serif'
)
# plot the first group
plot(xvalues,pc1,col='#EE0000',pch="+",ylim=c(.4,.65),axes=F,xlab='trial window', ylab='proportion correct',cex=.8)
# add the second group
points(xvalues,pc2,col='#0000EE',pch="x",ylim=c(.4,.65),cex=.8)
# how much to add to the fitline y values to make it fit on the plot
fudge=.42
# plot the slope of the line we fitted to the differences
fitline=lsfit(xvalues,diffs+fudge)
abline(fitline,col='#000F00')
# do a correlation test
slope=cor.test(xvalues,diffs);
# the stats to put on the graph
stats=paste('r= ', formatC(slope$estimate,digits=3), ", p ", prettyPval(slope$p.value),sep='')
# put the values on the axes
axis(1) #bottom
axis2values=c(.4,.45,.5,.55,.6,.65)
axis(2,at=axis2values) #left
axis(4,at=axis2values,lab=formatC(axis2values-fudge,digits=2)) #right
box() #draw a box around the figure
# put the stats on the graph
text(280,.53,stats,cex=.75)
#label the right axis
mtext("difference between groups", side=4,line=2)
# label the points
text(80,.64,"fixed babble",col='#EE0000', cex=.8)
text(275,.44,"random babble",col='#0000EE', cex=.8)
# print as pdf
dev.print(pdf,"fig/learning.pdf",height=2.8,width=4,pointsize=11)