14. 2.データを加工する
• 簡単な操作・続続
FORループ: for (i in 1:n){…}
(例) 各列の和を計算する
ans <- c(0, 0, 0)
for (i in 1:nrow(data)){
ans[i, ] <- sum(data[1,], na.rm=TRUE)
}
行番号 X … Y
1 101 12345
2 102 678
3 103 910
配列はc(…)
欠損値NAは無視する。
これが無いと、NAが混在していた場合、
結果もNAとなってしまう。
19. 2.データを加工する {dplyr} +α
☕ ☕ ☕ Rの記法
• 標準: ans <- func1(data, …)
ans <- func2(func1(data))
ans <- func3(func2(fanc1(data)))
• チェーン記法: ans <- data %>% func1()
ans <- data %>% func1() %>% func2()
ans <- data %>% func1() %>% func2() %>% func3()
処理の順に書き下すことができる
20. 2.データを加工する {dplyr} +α
データを…
• グループ毎に計算
• 整形
• 追加、抽出、ソート
• 結合
Fruits <- Fruits[,-7]
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
21. 2.データを加工する {dplyr} +α
• グループ毎に計算する
Fruits %>%
group_by(Fruit) %>%
summarise(avg_sales=mean(Sales, na.rm=T),
sum_sales=sum(Sales, na.rm=T))
• group_by: グループ化する項目
• summarise: グループ毎に計算
Fruit avg_sales sum_sales
Apples 99.33333 298
Bananas 86.66667 260
Oranges 95.66667 287
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
22. 2.データを加工する {dplyr} +α
• グループ毎に計算する
Fruits %>%
group_by(Fruit) %>%
do(avg_sales=mean(.$Sales, na.rm=T),
corr=lm(.$Sales~.$Profit)$coefficients[2]) %>%
as.data.frame()
• group_by: グループ化する項目
• do: グループ毎に計算
• .(ドット) Fruits
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
lm: 線形回帰
lm(y~x, data)
coefficients[2]: 傾き
Fruit avg_sales sum_sales
Apples 99.333333 1.1498195
Bananas 86.666667 1.5930233
Oranges 95.666667 -0.384615
23. 2.データを加工する {dplyr} +α
• 整形する: 縦方向
{tidyr} を利用
tidyr::gather
FruitsEdited <- Fruits %>%
gather(var, val, -Fruit, -Year, -Location)
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
Fruit Year Location var val
Apples 2008 West Sales 98
Apples 2009 West Sales 111
Apples 2010 West Sales 89
Oranges 2008 East Sales 96
Bananas 2008 East Sales 85
Oranges 2009 East Sales 93
Bananas 2009 East Sales 94
Oranges 2010 East Sales 98
Bananas 2010 East Sales 81
Apples 2008 West Expenses 78
Apples 2009 West Expenses 79
Apples 2010 West Expenses 76
Oranges 2008 East Expenses 81
Bananas 2008 East Expenses 76
Oranges 2009 East Expenses 80
Bananas 2009 East Expenses 78
Oranges 2010 East Expenses 91
Bananas 2010 East Expenses 71
Apples 2008 West Profit 20
Apples 2009 West Profit 32
Apples 2010 West Profit 13
Oranges 2008 East Profit 15
Bananas 2008 East Profit 9
Oranges 2009 East Profit 13
Bananas 2009 East Profit 16
Oranges 2010 East Profit 7
Bananas 2010 East Profit 10FruitsEdited
Fruits
24. 2.データを加工する {dplyr} +α
• 整形する: 横方向
{tidyr} を利用
tidyr::spread
Fruits <- FruitsEdited %>%
spread(var, val)
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
Fruit Year Location var val
Apples 2008 West Sales 98
Apples 2009 West Sales 111
Apples 2010 West Sales 89
Oranges 2008 East Sales 96
Bananas 2008 East Sales 85
Oranges 2009 East Sales 93
Bananas 2009 East Sales 94
Oranges 2010 East Sales 98
Bananas 2010 East Sales 81
Apples 2008 West Expenses 78
Apples 2009 West Expenses 79
Apples 2010 West Expenses 76
Oranges 2008 East Expenses 81
Bananas 2008 East Expenses 76
Oranges 2009 East Expenses 80
Bananas 2009 East Expenses 78
Oranges 2010 East Expenses 91
Bananas 2010 East Expenses 71
Apples 2008 West Profit 20
Apples 2009 West Profit 32
Apples 2010 West Profit 13
Oranges 2008 East Profit 15
Bananas 2008 East Profit 9
Oranges 2009 East Profit 13
Bananas 2009 East Profit 16
Oranges 2010 East Profit 7
Bananas 2010 East Profit 10FruitsEdited
Fruits
25. 2.データを加工する {dplyr} +α
• 整形する: 列の結合
{tidyr} を利用
tidyr::unite
FruitsUnited <- Fruits %>%
unite(ID, Fruit, Year, sep=":")
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
ID Location Sales Expenses Profit
Apples:2008 West 98 78 20
Apples:2009 West 111 79 32
Apples:2010 West 89 76 13
Oranges:2008 East 96 81 15
Bananas:2008 East 85 76 9
Oranges:2009 East 93 80 13
Bananas:2009 East 94 78 16
Oranges:2010 East 98 91 7
Bananas:2010 East 81 71 10
Fruits
FruitsUnitedまとめる項目(元)まとめる項目(まとめ先)
26. 2.データを加工する {dplyr} +α
• 整形する: 列の分解
{tidyr} を利用
tidyr::unite
Fruits <- FruitsUnited %>%
separate(ID, c(“Fruit”, “Year”), sep=":")
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
ID Location Sales Expenses Profit
Apples:2008 West 98 78 20
Apples:2009 West 111 79 32
Apples:2010 West 89 76 13
Oranges:2008 East 96 81 15
Bananas:2008 East 85 76 9
Oranges:2009 East 93 80 13
Bananas:2009 East 94 78 16
Oranges:2010 East 98 91 7
Bananas:2010 East 81 71 10
Fruits
FruitsUnited分解先分解元
27. 2.データを加工する {dplyr} +α
• 追加、抽出、ソート
列の追加や変更 dplyr::mutate
Fruits %>%
mutate(over100 = (Sales>100))
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
over100
FALSE
TRUE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
28. 2.データを加工する {dplyr} +α
• 追加、抽出、ソート
列の抽出 dplyr::select
Fruits %>%
mutate(over100 = (Sales>100)) %>%
dplyr::select(-over100)
• dplyr:: とするのは、MASS::selectと区別するため
• dplyr::select(Fruit:Proft)も同じ結果
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
over100
FALSE
TRUE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
FALSE
29. 2.データを加工する {dplyr} +α
• 追加、抽出、ソート
行の抽出 dplyr::filter
Fruits %>%
dplyr::filter(Fruit==“Apples”)
• dplyr:: とするのは、stats::filterと区別するため
• Fruits %>% subset(Fruit==“Apples”) や
Fruits[Fruits$Fruit==“Apples”,] も同じ結果
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
30. 2.データを加工する {dplyr} +α
• 追加、抽出、ソート
ソート dplyr::arrange
Fruits %>%
arrange(Fruit, Sales)
• arrange(優先順位1, 優先順位2, …)
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
Fruit Year Location Sales Expenses Profit
Apples 2010 West 89 76 13
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Bananas 2010 East 81 71 10
Bananas 2008 East 85 76 9
Bananas 2009 East 94 78 16
Oranges 2009 East 93 80 13
Oranges 2008 East 96 81 15
Oranges 2010 East 98 91 7
31. 2.データを加工する {dplyr} +α
• データを結合する
dplyr::left_join
C <- data.frame(Fruit=c(“Apples”, “Bananas”, “Hatena”),
Color=c(“green”, “yellow”, “blue”))
Fruits %>%
left_join(data.frame(C, by=“Fruit”)
right_joinはCにFruitsを結合させる
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
Fruit Color
Apples green
Bananas yellow
Hatena blue
Fruit Year Location Sales Expenses Profit Color
Apples 2008 West 98 78 20 green
Apples 2009 West 111 79 32 green
Apples 2010 West 89 76 13 green
Oranges 2008 East 96 81 15 NA
Bananas 2008 East 85 76 9 yellow
Oranges 2009 East 93 80 13 NA
Bananas 2009 East 94 78 16 yellow
Oranges 2010 East 98 91 7 NA
Bananas 2010 East 81 71 10 yellow
C
Fruits
32. 2.データを加工する {dplyr} +α
• データを結合する
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Bananas 2009 East 94 78 16
Oranges 2010 East 98 91 7
Bananas 2010 East 81 71 10
Fruit Year Location Sales Expenses Profit Color
Apples 2008 West 98 78 20 green
Apples 2009 West 111 79 32 green
Apples 2010 West 89 76 13 green
Oranges 2008 East 96 81 15 NA
Bananas 2008 East 85 76 9 yellow
Oranges 2009 East 93 80 13 NA
Bananas 2009 East 94 78 16 yellow
Oranges 2010 East 98 91 7 NA
Bananas 2010 East 81 71 10 yellow
Hatena NA NA NA A NA blue
Fruit Color
Apples green
Bananas yellow
Hatena blue
C
Fruit Year Location Sales Expenses Profit Color
Apples 2008 West 98 78 20 green
Apples 2009 West 111 79 32 green
Apples 2010 West 89 76 13 green
Bananas 2008 East 85 76 9 yellow
Bananas 2009 East 94 78 16 yellow
Bananas 2010 East 81 71 10 yellow
Fruit
33. 2.データを加工する {dplyr} +α
• データを結合する
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
cbind(Fruits1, Fruits2)
Fruit Year Location Sales Expenses Profit
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
Fruits1 <- Fruits[1:3,]
Fruits2 <- Fruits[c(4:6),]
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Fruit Year Location Sales Expenses Profit
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
rbind(Fruits1, Fruits2)
Fruit Year Location Sales Expenses Profit
Apples 2008 West 98 78 20
Apples 2009 West 111 79 32
Apples 2010 West 89 76 13
Fruit Year Location Sales Expenses Profit
Oranges 2008 East 96 81 15
Bananas 2008 East 85 76 9
Oranges 2009 East 93 80 13
46. 3.データを可視化する
☕ ☕ ☕ ggplotの便利さ
データ加工と可視化を同時に行う
ggplot(FruitsEdited, aes(Year, val)) +
geom_line() +
facet_grid(Fruit~var)
facet_grid(y~x)により
デバイス(画面)を分割し
(x, y)に該当するグラフを描画する。
このときx軸、y軸は自動で統一される。
Fruit Year Location var val
Apples 2008 West Sales 98
Apples 2009 West Sales 111
Apples 2010 West Sales 89
Oranges 2008 East Sales 96
Bananas 2008 East Sales 85
Oranges 2009 East Sales 93
Bananas 2009 East Sales 94
Oranges 2010 East Sales 98
Bananas 2010 East Sales 81
Apples 2008 West Expenses 78
Apples 2009 West Expenses 79
Apples 2010 West Expenses 76
Oranges 2008 East Expenses 81
Bananas 2008 East Expenses 76
Oranges 2009 East Expenses 80
Bananas 2009 East Expenses 78
Oranges 2010 East Expenses 91
Bananas 2010 East Expenses 71
Apples 2008 West Profit 20
Apples 2009 West Profit 32
Apples 2010 West Profit 13
Oranges 2008 East Profit 15
Bananas 2008 East Profit 9
Oranges 2009 East Profit 13
Bananas 2009 East Profit 16
Oranges 2010 East Profit 7
Bananas 2010 East Profit 10
FruitsEdited
自動で振り分けてくれる
軸も統一されている