Under the frequentist setting, the P-score is built on the quantiles
$$P_{kh} = \Phi((\hat{d}_{1k} - \hat{d}_{1h})/s_{kh}),$$
where \(\hat{d}_{1k}\) and \(\hat{d}_{1h}\) are the point estimates of treatment effects for \(k\) vs. 1 and \(h\) vs. 1, respectively, and \(s_{kh}\) is the standard error of \(\hat{d}_{1k} - \hat{d}_{1h}\) (Rucker and Schwarzer, 2015). Moreover, \(\Phi(\cdot)\) is the cumulative distribution function of the standard normal distribution. The quantity \(P_{kh}\) can be interpreted as the extent of certainty that treatment \(k\) is better than \(h\). The frequentist P-score of treatment \(k\) is \(\frac{1}{K-1} \sum_{h \neq k} P_{kh}\).
Analogously to the frequentist P-score, conditional on \(d_{1h}\) and \(d_{1k}\), the quantities \(P_{hk}\) from the Bayesian perspective can be considered as \(I(d_{1k} > d_{1h})\), which are Bernoulli random variables. To quantify the overall performance of treatment \(k\), we may similarly use
$$\bar{P}_k = \frac{1}{K-1} \sum_{h \neq k} I(d_{1k} > d_{1h}).$$
Note that \(\bar{P}_k\) is a parameter under the Bayesian framework, while the frequentist P-score is a statistic. Moreover, \(\sum_{h \neq k} I(d_{1k} > d_{1h})\) is equivalent to \(K - R_k\), where \(R_k\) is the true rank of treatment \(k\). Thus, we may also write \(\bar{P}_k = (K - R_k)/(K - 1)\); this corresponds to the findings by Rucker and Schwarzer (2015). Consequently, we call \(\bar{P}_k\) the scaled rank in the network meta-analysis (NMA) for treatment \(k\). It transforms the range of the original rank between 1 and \(K\) to a range between 0 and 1. In addition, note that \(E[I(d_{1k} > d_{1h} | Data)] = \Pr(d_{1k} > d_{1h} | Data)\), which is analogous to the quantity of \(P_{kh}\) under the frequentist framework. Therefore, we use the posterior mean of the scaled rank \(\bar{P}_k\) as the Bayesian P-score; it is a counterpart of the frequentist P-score.
The scaled ranks \(\bar{P}_k\) can be feasibly estimated via the MCMC algorithm. Let \(\{d_{1k}^{(j)}; k = 2, \ldots, K\}_{j=1}^J\) be the posterior samples of the overall relative effects \(d_{1k}\) of all treatments vs. the reference treatment 1 in a total of \(J\) MCMC iterations after the burn-in period, where \(j\) indexes the iterations. As \(d_{11}\) is trivially 0, we set \(d_{11}^{(j)}\) to 0 for all \(j\). The \(j\)th posterior sample of treatment \(k\)'s scaled rank is \(\bar{P}_k^{(j)} = \frac{1}{K-1} \sum_{h \neq k}I(d_{1k}^{(j)} > d_{1h}^{(j)})\). We can make inferences for the scaled ranks from the posterior samples \(\{\bar{P}_k^{(j)}\}_{j=1}^{J}\), and use their posterior means as the Bayesian P-scores. We may also obtain the posterior medians as another set of point estimates, and the 2.5% and 97.5% posterior quantiles as the lower and upper bounds of 95% credible intervals (CrIs), respectively. Because the posterior samples of the scaled ranks take discrete values, the posterior medians and the CrI bounds are also discrete.
Based on the idea of the Bayesian P-score, we can similarly define the predictive P-score for a future study by accounting for the heterogeneity between the existing studies in the NMA and the new study. Specifically, we consider the probabilities in the new study
$$P_{new,kh} = \Pr(\delta_{new,1k} > \delta_{new,1h}),$$
conditional on the population parameters \(d_{1h}\), \(d_{1k}\), and \(\tau\) from the NMA. Here, \(\delta_{new,1k}\) and \(\delta_{new,1h}\) represent the treatment effects of \(k\) vs. 1 and \(h\) vs. 1 in the new study, respectively. The \(P_{new,kh}\) corresponds to the quantity \(P_{kh}\) in the NMA; it represents the probability of treatment \(k\) being better than \(h\) in the new study. Due to heterogeneity, \(\delta_{new,1k} \sim N(d_{1k}, \tau^2)\) and \(\delta_{new,1h} \sim N(d_{1h}, \tau^{2})\). The correlation coefficients between treatment comparisons are typically assumed to be 0.5; therefore, such probabilities in the new study can be explicitly calculated as \(P_{new,kh} = \Phi((d_{1k} - d_{1h})/\tau)\), which is a function of \(d_{1h}\), \(d_{1k}\), and \(\tau\). Finally, we use
$$\bar{P}_{new,k} = \frac{1}{K-1} \sum_{h \neq k} P_{new,kh}$$
to quantify the performance of treatment \(k\) in the new study. The posterior samples of \(\bar{P}_{new,k}\) can be derived from the posterior samples of \(d_{1k}\), \(d_{1h}\), and \(\tau\) during the MCMC algorithm.
Note that the probabilities \(P_{new,kh}\) can be written as \(E[I(\delta_{new,1k} > \delta_{new,1h})]\). Based on similar observations for the scaled ranks in the NMA, the \(\bar{P}_{new,k}\) in the new study subsequently becomes
$$\bar{P}_{new,k} = \frac{1}{K-1} E\left[\sum_{h \neq k} I(\delta_{new,1k} > \delta_{new,1h})\right] = E\left[\frac{K - R_{new,k}}{K - 1}\right],$$
where \(R_{new,k}\) is the true rank of treatment \(k\) in the new study. Thus, we call \(\bar{P}_{new,k}\) the expected scaled rank in the new study. Like the Bayesian P-score, we define the predictive P-score as the posterior mean of \(\bar{P}_{new,k}\). The posterior medians and 95% CrIs can also be obtained using the MCMC samples of \(\bar{P}_{new,k}\). See more details in Rosenberger et al. (2021).