brmsというStanのラッパーパッケージで遊ぶ。

概要

例えば、rstanを使う場合はStanコードを別ファイルの.stanに記述してそれを呼び出す形でbayes推定をおこなう。一方、brmsを用いるとStanコードをわざわざ書かなくてもbrmsパッケージの関数を用いればbayes推定ができる。正確には、関数を介して内部的にStanコードを走らせているらしい。そのため、brmsを用いて書いたbayes modelが内部的に持っているStanコードはどうなっているか知りたい場合はそのコードを出力することも可能。

また、指定するための事前分布が豊富に存在するので、例えばStanで記述するのが面倒なゼロ過剰ポアソン分布なども簡単に使えるらしい。

ちなみに、brmsははBayesian Regression Models using Stanの略。

今回、brmsの練習のために「StanとRでベイズ統計モデル」のコードをいくつかbrmsで解く。

StanとRでベイズ統計モデリング (Wonderful R)

作者:健太郎, 松浦
発売日: 2016/10/25
メディア: 単行本

github.com

線形回帰

以下のデータに対して重回帰をする。

library(tidyverse)
library(tidylog)
library(rstan)
library(brms)

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

df = read_csv('./chap05/input/data-attendance-1.txt')

# # A tibble: 50 x 3
# A Score     Y
# <dbl> <dbl> <dbl>
#   1     0    69 0.286
# 2     1   145 0.196
# 3     0   125 0.261
# 4     1    86 0.109
# 5     1   158 0.23 
# 6     0   133 0.35 
# 7     0   111 0.33 
# 8     1   147 0.194
# 9     0   146 0.413
# 10     0   145 0.36 
# # … with 40 more rows

OLS推定

まずはlmで以下をOLS推定。

$Y_i = \alpha A_i + \beta Score + \gamma_i + \epsilon_i$
$\epsilon_i \sim Normal(0, \sigma)$

lm_out = df %>% 
  lm(Y ~ A + Score, data = .)
summary(lm_out)

# Call:
#   lm(formula = Y ~ A + Score, data = .)
# 
# Residuals:
#   Min        1Q    Median        3Q       Max 
# -0.104746 -0.030294 -0.004062  0.026596  0.116984 
# 
# Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.1233719  0.0323877   3.809 0.000404 ***
#   A           -0.1437849  0.0145441  -9.886 4.63e-13 ***
#   Score        0.0016243  0.0002558   6.350 7.93e-08 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.05038 on 47 degrees of freedom
# Multiple R-squared:  0.7438, Adjusted R-squared:  0.7329 
# F-statistic: 68.21 on 2 and 47 DF,  p-value: 1.268e-14

結果は $Y = -0.1438 A + 0.002 Score + 0.123$ となる。

ベイズ推定

次に、brmを使ってベイズ推定。

ちなみに、オプションにsave_modelでmodel作成時に裏で走っているStanコードを.stanで保存できる。また、fileで作成したmodelを.rdsで保存できる。これはreadRDS関数で読み込める。

brm_out = df %>% 
  brm(Y ~ A + Score, family="gaussian", data = .,
           save_model = "test_code", # test_code.stanという名前でstanファイルを保存
           file = "test_model") # test_model.rdsという名前でmodelを保存
summary(brm_out)

# Family: gaussian 
# Links: mu = identity; sigma = identity 
# Formula: Y ~ A + Score 
# Data: . (Number of observations: 50) 
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
# total post-warmup samples = 4000
# 
# Population-Level Effects: 
#   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
# Intercept     0.12      0.03     0.06     0.19       3988 1.00
# A            -0.14      0.01    -0.17    -0.11       2029 1.00
# Score         0.00      0.00     0.00     0.00       4109 1.00
# 
# Family Specific Parameters: 
#   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
# sigma     0.05      0.01     0.04     0.06       2018 1.00
# 
# Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
# is a crude measure of effective sample size, and Rhat is the potential 
# scale reduction factor on split chains (at convergence, Rhat = 1).

結果は $Y = -0.14 A + 0.0 Score + 0.12$ で、（下三桁以下がわからないが）OLSと同様になった。

Stanのsamplingの仕方は以下のようになっている(デフォルト)。

Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000

任意のものにしたい場合はオプションに指定する。

df %>% 
  brm(Y ~ A + Score, family="gaussian", data = .,
      iter = 10,
      warmup = 10,
      seed = 1234,
      chain = 2)

今回familyにgaussianを使ったため、パラメータとしてmuとsigmaが生成される。それらのリンク関数は以下のように、identityとなる。

Links: mu = identity; sigma = identity

また、get_prior関数に作成したいmodelを入れると、そのmodelを使った場合に自動で作成される事前分布が表示される。
事前分布として、Intercept, sigmaともにstudent_t(3, 0, 10)が設定される。書いてないb(各回帰係数)は無情報事前分布が設定される。Intercept, sigmaが無情報じゃなくて弱情報分布なのは謎。sigmaはこのスライドみたいな理由かな？

df %>% 
  get_prior(Y ~ A + Score, family="gaussian", data = .)
#                           prior          class       coef group resp dpar nlpar bound
# 1                                                 b                                  
# 2                                                b           A                            
# 3                                                b    Score                            
# 4 student_t(3, 0, 10)   Intercept                                  
# 5 student_t(3, 0, 10)         sigma

任意の事前分布を入れたい場合はオプションにpriorを入れる。指定方法は何種類かある。

df %>% 
  brm(Y ~ A + Score, family="gaussian", data = .,
            prior = c(prior_string("normal(0, 10)", class = "b"), # 文字列入れる指定方法
                      prior_string("normal(0, 10)", class = "b", coef = "A"), # 文字列入れる指定方法
                      prior(student_t(3, 19, 10), class = Intercept), # 直接入れる指定方法
                      prior_(~student_t(3, 0, 10), class = ~sigma) # ~と入れる指定方法
                      )
            )

make_stancode関数にmodelを記載したらStanコードを吐く。

df %>% 
make_stancode(Y ~ A + Score, family="gaussian", data = .,
              prior = c(prior_string("normal(0, 10)", class = "b"), # 文字列入れる指定方法
                        prior_string("normal(0, 10)", class = "b", coef = "A"), # 文字列入れる指定方法
                        prior(student_t(3, 19, 10), class = Intercept), # 直接入れる指定方法
                        prior_(~student_t(3, 0, 10), class = ~sigma) # ~と入れる指定方法
              )
)

また、コンパイル後でも内部的に走ったStanのコードも見ることができる。
こんな書き方しねー、って感じだけど基本的にはいい感じに早い書き方をしてくれる（らしい）。

brm_out$model`

# // generated with brms 2.8.0
# functions {
# }
# data {
#   int<lower=1> N;  // number of observations
#   vector[N] Y;  // response variable
#   int<lower=1> K;  // number of population-level effects
#   matrix[N, K] X;  // population-level design matrix
#   int prior_only;  // should the likelihood be ignored?
# }
# transformed data {
#   int Kc = K - 1;
#   matrix[N, K - 1] Xc;  // centered version of X
#   vector[K - 1] means_X;  // column means of X before centering
#   for (i in 2:K) {
#     means_X[i - 1] = mean(X[, i]);
#     Xc[, i - 1] = X[, i] - means_X[i - 1];
#   }
# }
# parameters {
#   vector[Kc] b;  // population-level effects
#   real temp_Intercept;  // temporary intercept
#   real<lower=0> sigma;  // residual SD
# }
# transformed parameters {
# }
# model {
#   vector[N] mu = temp_Intercept + Xc * b;
#   // priors including all constants
#   target += student_t_lpdf(temp_Intercept | 3, 0, 10);
#   target += student_t_lpdf(sigma | 3, 0, 10)
#   - 1 * student_t_lccdf(0 | 3, 0, 10);
#   // likelihood including all constants
#   if (!prior_only) {
#     target += normal_lpdf(Y | mu, sigma);
#   }
# }
# generated quantities {
#   // actual population-level intercept
#   real b_Intercept = temp_Intercept - dot_product(means_X, b);
# }

plot関数を用いると結果が可視化できる。他にも限界効果や交互作用を見るmarginal_effectsなどもある。

plot(brm_out)

pp_check(brm_out)

f:id:chito_ng:20190521091922p:plain

f:id:chito_ng:20190521092436p:plain

ある程度はbrms内でできるが細かい可視化は、前回の記事で紹介したようなパッケージが使えるのでそちらに投げると良い。

launch_shiny(brm_out)

knknkn.hatenablog.com

階層ベイズ

以下のようなモデルとそこから生成されるデータを考える。このモデルは推定量は会社毎に異なるが、業界が同じだとある程度似るモデル。

$Y_{会社} = \alpha_{会社} + \beta_{会社} X + \epsilon_{会社} \\ \\ \alpha_{会社} = \alpha_{業界共通} + \alpha_{会社差} \\ \alpha_{業界共通} = \alpha_{全体共通} + \alpha_{業界差} \\ \\ \beta_{会社} = \beta_{業界共通} + \beta_{会社差} \\ \beta_{業界共通} = \beta_{全体共通} + \beta_{業界差}$

  id  X KID GID        a         b    Y_sim
1  1 10   1   1 285.9306  9.499843 403.9857
2  2 28   2   2 336.9215  5.253227 535.2640
3  3 14   3   1 288.4397 15.351148 491.0800
4  4 31   4   2 306.2665 12.613492 639.5556
5  5 33   1   1 285.9306  9.499843 624.5689
6  6  1   2   2 336.9215  5.253227 324.4447

例えば以下の式において、 $\alpha_{会社}$ は $\alpha_{業界共通}$ という固定効果と $\alpha_{会社差}$ に分解できる。
$\alpha_{会社} = \alpha_{業界共通} + \alpha_{会社差}$

階層ベイズモデルを実装する：lme4とbrmsパッケージを用いたマルチレベルモデルの基礎

このような場合、brmsのformula式では ( X | 会社ID) という指定の仕方で表現することができる(おおもとで言えば、混合線形モデル用のパッケージlme4の書き方）。

そのため、今回のモデルでは以下のような書き方となる？ (1 + X | KID)に対して、1 + Xの1は切片、XはXに対する係数となり、それらに対してKIDグループ化(KIDで効果が異なる)、
(1 + X | GID)に対して、GIDでグループ化、ということを表している・・・らしい。

例えば切片部分(1)で考えるなら、 $切片 = (1 | GID) + (1 | KID) = 切片_{業界共通} + 切片_{業界差} + 切片_{会社共通} + 切片_{会社差}$ となり、業界が同じなら $(1 | GID)$ が共通なので似た値になるということが実現されている。そのため、KID GID どちらともで指定すると2階階層となる。

formula = Y_sim ~ 1 + X + (1 + X | KID) + (1 + X | GID)

また、全体的なコードとしては以下のようになる？事前分布として、[tex: X{会社} ~ Normal(X{業界共通}, \sigma)] とかそういう構造は下記のpriorの指定であってるのだろうか・・・？見たままだとNormal(0, 10)なのでなってない気がするが。。。このあたりはもうちょっとちゃんと調べます。

model = brm(Y_sim ~ 1 + X + (1 + X | KID) + (1 + X | GID),
            data    = data,
          prior   = c(prior_string("normal(0,10)", class = "sd", coef = "X", group = "KID"),
                      prior_string("normal(0,10)", class = "sd", coef = "Intercept", group = "KID")
                      ),
            chains  = 4,
            iter    = 1000,
            warmup  = 500)


summary(model)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Y_sim ~ 1 + X + (1 + X | KID) + (1 + X | GID) 
   Data: d (Number of observations: 40) 
Samples: 4 chains, each with iter = 1000; warmup = 500; thin = 1;
         total post-warmup samples = 2000

Group-Level Effects: 
~GID (Number of levels: 2) 
                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)       70.08     71.79     1.49   276.74       1060 1.00
sd(X)               10.13     10.92     0.21    42.22        261 1.02
cor(Intercept,X)    -0.06      0.61    -0.96     0.94       1063 1.00

~KID (Number of levels: 4) 
                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept)       49.69     43.94     7.12   158.23        665 1.01
sd(X)                8.84      6.36     2.84    24.23        461 1.01
cor(Intercept,X)    -0.43      0.49    -0.98     0.74        717 1.01

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept   311.99     60.61   191.35   449.71        861 1.00
X             8.30      4.46    -1.26    16.56        769 1.00

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma    25.53      3.39    19.78    33.23       1305 1.00

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
 警告メッセージ: 
There were 169 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup