Thierry Warin, PhD: [Article] Nigeria’s 2015 Presidential Election: A Spatial and Econometric Perspective Based on a Framing Strategy

William Sanger; Thierry Warin

doi:10.6084/m9.figshare.7990835.v2

Robustness Checks

The econometric models used in the core article to which is appendix is related were adapted from a previous article about the Province of Quebec’s elections (Sanger & Warin, 2018). We obtained for each day the main topic of the conversation (one of the four main topics). Further on, for each topic and for each day, we observed which candidate was mentioned the most (either Jonathan or Buhari). Finally, we segmented our dataset in four periods of time (from March 1st to 6th ; from March 7th to 13th ; from March 14th to 20th and from March 21st to March 27th ).

In this context, the question is to select a logistic estimation that will extract high quality information. We have to decide first between a discrete modelisation with only two outcomes or a discrete modelisation with more than two outcomes. The choice of two outcomes infers that we create four variables capturing the 4 categories of our initial dependent variable.

Secondly, if we decide to keep our category variable with 4 categories, then another decision has to be made: choosing between a multinomial logistic estimation or an ordered logistic estimation. The issue is not trivial. Indeed, although the demarcation line is often clear, in our context, we are in a grey area.

Due to the nature of the study, it is not clear what estimation technique is best. Indeed, in a traditional setting, a ranking can be done. In this example, there is no order. Here, we collect tweets and we aggregate the number of tweets at the end of the day per category. In our context, a person can tweet about a topic and then tweet at another time of day about another topic.

In many regards, it is like looking at a permanent poll, which raises interesting statistical questions and thus requires or allows for new techniques or protocols. Indeed, even if the categories we chose have no order, in fact the persons tweeting during the day make a choice like in a poll and we can assume that if they tweet more about a topic, it is because they do believe this topic matters more to them than another one.¹ If this hypothesis is right, then the next question is to know the order. A specific logistic estimator is in fact designed for this kind of characteristics: the stereotype logistic estimator. Unlike ordered logistic models, stereotype logistic models do not impose the proportional-odds assumption. Stereotype logistic models are often used when subjects are requested to assess or judge something. For these validity tests, we propose here:

A plain-vanilla - unordered - multinomial estimation
A mixed-ordered estimation: a stereotype-ordered logistic estimation
The multinomial logistic estimation fits maximum likelihood models with discrete dependent variables when the dependent variable takes on more than two outcomes and the outcomes have no natural ordering (Greene, 2012; Hosmer Jr, Lemeshow, & Sturdivant, 2013; Long, 1997; Long & Freese, 2014; Treiman, 2009).

Finally, we will also estimate a model that is a compromise between the ordered and unordered logistic estimations: the aforementioned stereotype logistic estimation (Anderson, 1984; Greenland, 1985), because there is an uncertainty about the relevance of the ordering.

The multinomial estimator assumes that there is no order in the different categories used for the coding of the dependent variable. But for the stereotype estimator, it relies on one hypothesis: in fact the persons tweeting during the day make a choice like in a poll and we can assume that if they tweet more about a topic, it is because they do believe this topic matters more to them than another one. But unlike the ordered logit estimator, we make the reasonable assumption that we do not know all the latent variables to make a proper ranking.

In the following table, we calculate the relative risk ratios. Compared to the base outcome (social category), Mr. Jonathan is less likely to be associated with the integrity category than Mr. Buhari. The same is true for the economy category.

Table 1. Dependent variable: topic {social; integrity; economy; geopolitics}

Model: multinomial logit

Coef.

Relative Risk Ratios

Coef.

Relative Risk Ratios

Independent variables
Topic 1 (social)	base outcome

Topic 2 (integrity)
Jonathan	-0.0027595***	0.9972443***	-0.0028211***	0.9971828***
Buhari	0.0011461*	1.001147*	0.0011777*	1.001178*
PDP	0.0031242**	1.003129**	0.0031851**	1.00319**
APC	-0.0010757	0.9989249	-0.001078	0.9989226
Periods (ref = Period 1)
____Period 2			-0.030186	0.970265
____Period 3			0.0424364	1.04335
____Period 4			0.0435374	1.044499
Constant	0.0271056		0.0124693

Topic 3 (economy)
Jonathan	-0.001625***	0.9983764***	-0.0016934***	0.998308***
Buhari	0.0013541**	1.001355**	0.0014034**	1.001404*
PDP	-0.0014103	0.9985907	-0.0012894	0.9987115
APC	0.004636**	1.004647**	0.004534**	1.004544**
Periods (ref = Period 1)
____Period 2			-0.1076294	0.8979603
____PPeriod 3			-0.000384	0.9996161
____PPeriod 4			-0.0027362	0.9972675
Constant	-0.3239868**		-0.294147

Topic 4 (geopolitics)
Jonathan	0.000212	1.000212	0.0002218	1.000222
Buhari	0.0005398	1.00054	0.0005451	1.000545
PDP	-0.0004569	0.9995432	-0.0003948	0.9996053
APC	0.0055712***	1.005587***	0.0055733***	1.005589***
Periods (ref = Period 1)
____Period 2			0.097285	1.102174
____Period 3			-0.0254721	0.9748496
____Period 4			-0.2292305	0.7951452
Constant	-0.5721883***		0.5426475*

Predicted probabilities	Coef.	Coef.
Pr(y=1)	0.251581***	0.2520359***
Pr(y=0)	0.748419***	0.7479641***
Topic = 1 (social)	0.251581***	0.2520359***
Topic = 2 (integrity)	0.2182243***	0.2184104***
Topic = 3 (economy)	0.2602211***	0.2602559***
Topic = 4 (geopolitics)	0.2699736***	0.2692978***

Statistics
Number of observations	540	540
LR chi2	96.93	98.79
Prob > chi2	0.0000	0.0000
Pseudo R2	0.0647	0.0660
Log likelihood	-700.1345	-699.20412
P-value: \(<0.1\), \(<0.05\), \(**<0.01\)

We compute the marginal effects as a robustness check. The following table shows that Mr. Jonathan is more associated with the conversations about the social and geopolitics categories.

Table 2. Dependent variable: topic {social, integrity, economy, geopolitics}

Model: multinomial logit	Marginal Effects
Independent variables	Social	Integrity	Economy	Geopolitics
Jonathan	0.0002435***	-0.000391***	-0.000171*	0.0003185***
Buhari	-0.0001882*	0.0000868	0.0001577**	-0.000563
PDP	-0.000482	0.00064***	-0.0004168	-0.000175
APC	-0.0006228*	-0.000775**	0.0005621**	0.0008357***

Statistics
Number of observations	540
LR chi2	96.93
Prob > chi2	0.0000
Pseudo R2	0.0647
Log likelihood	-700.1345
P-value: \(<0.1\), \(<0.05\), \(**<0.01\)

Now, let us present the results based on the stereotype logistic regressions. Stereotype logistic models are used in particular when categories may be indistinguishable. The stereotype logistic model should be seen as a restriction on the multinomial model.

Table 3. Dependent variable: topic {social, integrity, economy, geopolitics}

Model: stereotype ordered logit	Without constraint	With constraint
Independent variables	Coef.	Coef.
Jonathan	0.0011052***	0.000968**
Buhari	-0.0001663	0.0000215
PDP	-0.0013216**	-0.0012697
APC	0.0036409**	0.0055193***

/phi1_1	1***	1***
/phi1_2	1.856209***	1***
/phi1_3	0.5181356**	0.3122322***
/phi1_4

/theta1	0.4423166***	0.659083***
/theta2	0.607536***	0.659083***
/theta3	0.2966307**	0.2649479*
/theta4	0
(category 4 is the base outcome )

Statistics
Number of observations	540
Wald chi2	17.69
Prob > chi2	0.0014
Log likelihood	-715.05661
P-value: \(<0.1\), \(<0.05\), \(**<0.01\)

In the previous table, we can observe that Mr. Jonathan is more associated with the conversations about the geopolitics category (\(coef.=0.001^*\)), as well as the APC party (\(coef.=0.0036\)). Those are interesting results since they validate the ones we got with the plain multinomial logit estimator, although being a little less focused in terms of interpretation since they have to be interpreted vis-a-vis the base category (here, geopolitics). It is interesting anyway to be able to use a stereotype logistic estimator based on our dataset. Indeed, our framing strategy seems to allow us to perform the latter analysis and provide some robustness to the analysis of conversations on Twitter.

References

Anderson, J. A. (1984). Regression and Ordered Categorical Variables. Journal of the Royal Statistical Society. Series B (Methodological), 46(1), 1–30.

Dupont, W. D., & Dupont, W. D. (2009). Statistical modeling for biomedical researchers: a simple introduction to the analysis of complex data. Cambridge University Press.

Gould, W. (2000). Interpreting logistic regression in all its forms. Stata Technical Bulletin, 9(53).

Greenland, S. (1985). An Application of Logistic Models to the Analysis of Ordinal Responses. Biometrical Journal, 27(2), 189–197. http://doi.org/10.1002/bimj.4710270212

Greene, W. H. (2012). Econometric analysis. Boston; London: Pearson.

Hilbe, J. M. (2009). Logistic regression models. CRC press.

Hosmer Jr, D. W., & Lemeshow, S. (2004). Applied logistic regression. John Wiley & Sons.

Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons.

Kleinbaum, D. G., & Klein, M. (2010). Logistic Regression. New York, NY: Springer New York.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. SAGE.

Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using stata. Stata Press.

Pagano, M., Gauvreau, K., & Pagano, M. (2000). Principles of biostatistics (Vol. 2). Duxbury Pacific Grove, CA.

Pampel, F. C. (2000). Logistic regression: A primer (Vol. 132). Sage.

Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. Wiley.

For a clear introduction to logistic regression, see Hosmer Jr & Lemeshow (2004), Pagano, Gauvreau, & Pagano (2000), or Pampel (2000); for a non mathematical presentation of logistic regression, see Kleinbaum & Klein (2010); and for a thorough presentation, more formal, see Hosmer Jr, Lemeshow, & Sturdivant (2013). Consider also Gould (2000), Dupont & Dupont (2009) or Hilbe (2009) for an interpretation of the results.↩︎

[Article] Nigeria’s 2015 Presidential Election: A Spatial and Econometric Perspective Based on a Framing Strategy

Robustness Checks

References

Citation