在图中添加标准差
误差线作为数据波动和可信度的衡量,是必须的科研绘图元素。
常见的误差数据有3种:标准差、标准误以及置信区间。
- 标准差:Standard Deviations,又叫做标准偏差,代表着单次抽样个体之间区别有多大,大部分情况下图表中使用的都是标准差;
- 标准误:Standard Errors,标准误差,它代表多次抽样,样本之间的均值分布差异;SE比SD小,当样本量较大时,SE趋近于0;
- 置信区间:Confidence Intervals,是指由样本统计量所构造的总体参数的估计区间, 和标准误是相关的。
这三种误差数据在R中的计算方式如下:
# 1. 标准差
sd <- sd(vec)
sd <- sqrt(var(vec))
# 2. 标准误
se = sd(vec) / sqrt(length(vec))
# 3. 置信区间
alpha=0.05
t=qt((1-alpha)/2 + .5, length(vec)-1) # tend to 1.96 if sample size is big enough
CI=t*se
1
2
3
4
5
6
7
8
9
10
11
2
3
4
5
6
7
8
9
10
11
想要在图中绘制误差线,可以使用ggplot2
中的geom_errorbar()
, 或者Base plot 里的arrow()
函数。
# 1. 在ggplot2中绘制误差线
# Load ggplot2
library(ggplot2)
library(dplyr)
# Data
data <- iris %>% select(Species, Sepal.Length)
# Calculates mean, sd, se and IC
my_sum <- data %>%
group_by(Species) %>%
summarise(
n=n(),
mean=mean(Sepal.Length),
sd=sd(Sepal.Length)
) %>%
mutate( se=sd/sqrt(n)) %>%
mutate( ic=se * qt((1-0.05)/2 + .5, n-1))
# Standard deviation
ggplot(my_sum) +
geom_bar( aes(x=Species, y=mean), stat="identity", fill="forestgreen", alpha=0.5) +
geom_errorbar( aes(x=Species, ymin=mean-sd, ymax=mean+sd), width=0.4, colour="orange", alpha=0.9, size=1.5) +
ggtitle("using standard deviation")
# Standard Error
ggplot(my_sum) +
geom_bar( aes(x=Species, y=mean), stat="identity", fill="forestgreen", alpha=0.5) +
geom_errorbar( aes(x=Species, ymin=mean-se, ymax=mean+se), width=0.4, colour="orange", alpha=0.9, size=1.5) +
ggtitle("using standard error")
# Confidence Interval
ggplot(my_sum) +
geom_bar( aes(x=Species, y=mean), stat="identity", fill="forestgreen", alpha=0.5) +
geom_errorbar( aes(x=Species, ymin=mean-ic, ymax=mean+ic), width=0.4, colour="orange", alpha=0.9, size=1.5) +
ggtitle("using confidence interval")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 2. 在base plot中绘制误差线
#Let's build a dataset : height of 10 sorgho and poacee sample in 3 environmental conditions (A, B, C)
data <- data.frame(
specie=c(rep("sorgho" , 10) , rep("poacee" , 10) ),
cond_A=rnorm(20,10,4),
cond_B=rnorm(20,8,3),
cond_C=rnorm(20,5,4)
)
#Let's calculate the average value for each condition and each specie with the *aggregate* function
bilan <- aggregate(cbind(cond_A,cond_B,cond_C)~specie , data=data , mean)
rownames(bilan) <- bilan[,1]
bilan <- as.matrix(bilan[,-1])
#Plot boundaries
lim <- 1.2*max(bilan)
#A function to add arrows on the chart
error.bar <- function(x, y, upper, lower=upper, length=0.1,...){
arrows(x,y+upper, x, y-lower, angle=90, code=3, length=length, ...)
}
#Then I calculate the standard deviation for each specie and condition :
stdev <- aggregate(cbind(cond_A,cond_B,cond_C)~specie , data=data , sd)
rownames(stdev) <- stdev[,1]
stdev <- as.matrix(stdev[,-1]) * 1.96 / 10
#I am ready to add the error bar on the plot using my "error bar" function !
ze_barplot <- barplot(bilan , beside=T , legend.text=T,col=c("blue" , "skyblue") , ylim=c(0,lim) , ylab="height")
error.bar(ze_barplot,bilan, stdev)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
编辑 (opens new window)
上次更新: 2021/09/22, 09:44:39