tidylogというライブラリがなかなか良さげ。簡単に概要を書くと、「tidy系の処理結果のサマリーを実行時に表示してくれる」。

製作者のgit丸写しだけど挙動を以下に示す。

GitHub - elbersb/tidylog: Tidylog provides feedback about basic dplyr operations. It provides simple wrapper functions for the most common functions, such as filter, mutate, select, and group_by.

事前準備

library(tidyverse)
library(tidylog)

summary <- mtcars %>%
  select(mpg, cyl, hp, am) %>%
  filter(mpg > 15) %>%
  mutate(mpg_round = round(mpg)) %>%
  group_by(cyl, mpg_round, am) %>%
  tally() %>%
  filter(n >= 1)

filter

どれだけのデータがfilterされたか実数および割合で出る。

a <- filter(mtcars, mpg > 20)
# => filter: removed 18 out of 32 rows (56%)

join

left_joinではどれだけ行が増えたか、inner_joinではどれだけ行が減ったかを示す。

left_joinでの意図しない1対多結合や、inner_joinでのデータ消失が可視化される（超大事）。

a <- left_join(band_members, band_instruments, by = "name")
# => left_join: added 0 rows and added one column (plays)

a <- inner_join(band_members, band_instruments, by = "name")
# => inner_join: removed one row and added one column (plays)

group_by, summarize

group化される具体的な内容と、summarize結果のデータ量が表示される。前者はgroup化の漏れないか確認しなくてもわかるから楽。

a <- mtcars %>%
  group_by(cyl, carb) %>%
  summarize(total_weight = sum(wt))
# => group_by: 2 grouping variables (cyl, carb)
#       summarize: now 9 rows and 3 columns, one group variable remaining (cyl)