Risk Factors Associated with Tuberculosis Among Men; A Study of South Africa

: Background: One of the public health problems all over the world is tuberculosis. An important factor for human well-being is good health. Worldwide, there are more cases of men with tuberculosis than women. Therefore, identifying risk factors associated with tuberculosis among men is essential. This study uses a survey logistic regression model to identify risk factors associated with tuberculosis in South Africa using the 2016 South African Demographic Health Survey data. Methods: Based on the fact that tuberculosis status is a binary variable, logistic regression and survey logistics were used for analysis. Results and Conclusion: The findings using the survey logistic model are presented. The results suggest that a survey logistic model that accounts for complex sampling design is better than logistic regression. The findings from the study show that the risk factors associated with tuberculosis are: chronic disease, current age, region, race, number of times away from home, marital status, weight, smoking status, the interaction effect of chronic disease and age, and the interaction effect of smoking status and number of household members. These factors can be used to implement strategies for reducing the risk of having tuberculosis.


INTRODUCTION
One of the public health threats that remains in all countries is tuberculosis. An essential factor for human wellbeing is good health. At the same time, tuberculosis is a contagious infection that usually attacks the lungs. However, it can also spread to other parts of the body, such as the brain and spine [1]. A type of bacteria called Mycobacterium causes tuberculosis, which spreads through the air. However, being infected by the tuberculosis bacteria does not always mean a person will get sick. The diseases have two different forms, which are: Latent Tuberculosis and Active Tuberculosis. Latent TB is when someone has the bacteria, but their immune system prevents the bacteria from spreading. Active TB is when someone has bacteria that can multiply (spread) and attack their organs. TB can affect anyone anywhere, but the literature shows that most people who develop the disease are adults.
There are more cases of men than women [2]. In this study, we looked at the factors that affect the spread of tuberculosis among adult men and how they affect it so that those factors can be addressed. The number of persons developing TB can be reduced, and thus the number of deaths.
According to the World Health Organization [3], ten million people develop tuberculosis (TB). Even though the disease is preventable and curable, 1.5 million people die from TB each year, "making it the world's top infectious killer." About two-thirds of new TB cases in 2019 are in these eight countries: Bangladesh, Indonesia, China, India, Nigeria, Pakistan, Philippines, and South Africa, which are low-and middle-income countries [4]. Insights Statistics South Africa [5] has released a report dealing with mortality and the cause of death in South Africa. This is predicated on data collected from deaths in 2010 and was registered at the Department of Home Affairs. The total number of deaths decreased by 6.3% in 2010 compared to 2009. Data shows that more males than females died due to tuberculosis, and the highest number of deaths recorded was among the age group 30-39 years. TB was the leading cause of death in South Africa, accounting for about 12% of deaths which occurred in 2010 [5].
General household survey (GHS) results showed that 2.9% of tuberculosis sufferers said they were sick [6]. The most common age groups affected by TB were 25-34, 35-44, 45-54, and 55-64. The data showed that there were more Black Africans compared to the other race groups who were sick. Most sick or injured people included in the survey and who suffered from TB resided in KwaZulu-Natal.
The critical problems with tuberculosis are that it is not easy to diagnose the patient fast enough and treat them before spreading the germs to the communities. Another problem is to control the spread in public areas and public transport. In this study, we help determine the relationship of the factors affecting the spread and determine the risk factors for tuberculosis.

Data Source
South Africa is one of the sub-Saharan Africa countries located in the southernmost region of Africa. South African Demographic and Health Surveys were conducted in 1998, 2003 and 2016. The 2016 South African Demographic and Health Survey (SADHS) was used in this study. The survey was implemented by Statistics South Africa (StatsSA) with the South African Medical Research Council (SAMRC). The sampling technique was performed in two-stage stratified sampling. In the first stage, samples were selected with a probability proportional to the sampling size of the primary sampling units. Secondly, systematic sampling was used in all dwelling units. Since SA is divided into nine provinces, primary sampling units were used to ensure survey precision across regions. Each region was stratified into urban, farm, and traditional areas [7].

Study Variable
The dependent or response variable in this study is when a male person was told by a health worker that he has tuberculosis, which is a binary variable. The explanatory variables utilised in this study are current age, region, ethnicity, smoking status, chronic disease, health, weight, education level, marital status, number of times away from home, wealth index, and the number of household members. Numerous researchers suggested these variables.

Statistical Methods
An outcome with two categories is called a binary outcome. The dichotomous response variable used in this research lends itself to logistic regression models as an obvious choice [8 -11]. The binary logistic model is intended to depict a probability number between zero and one [12]. The logistic regression model is used to determine the association between a dichotomous response and a set of explanatory variables. This model assumes that the data is collected by using a simple random sample. The logistic regression model can be defined mathematically as: where π(x) = probability of having tuberculosis, α = the intercept, βs = slope parameters, and Xs = the independent or explanatory variables for the model.
However, a complex survey design was used due to the DHS sampling technique [13,14]. Survey logistic regression is an extension of logistic regression, which includes the effect of sampling design to adjust estimates of standard errors and variability [15 -20].
The survey logistic regression for a dichotomous dependent variable Y ijh , i =1….,n hj ; j = 1….,n h ; h = 1….H, where h is the stratum, j is the cluster and I is the household and denotes the sampling weight for ijh th observation as w ijh and x ijh . The row vector of the matrix corresponding to the i th adult man within the j th primary sampling unit, nested in h th cluster. Suppose that π ijh =P (Y ijh = 1| X ijh ) is the probability of having tuberculosis. Survey logistic regression model is then defined as: or where x ijh is covariate matrix, and β are an unknown vector of regression coefficients to be estimated. Pseudo-maximum likelihood was applied to achieve the estimation of unknown parameters of the model. The pseudo-maximum likelihood approach incorporates the sample weights and sample design to estimate unknown parameters [21,22]. PROC SURVEYLOGISTIC in SAS 9.4 was used to fit a model to the data. Table 1 shows that the interaction effect of smoking status and the number of household members has a negative association. As the number of household members increases, the odds of having tuberculosis for men who do not smoke and those who smoke sometimes decreases compared to those who smoke every day with an odds ratio of 0.827 and 0.756, respectively. The main effect of men with chronic disease was a statistically significantly higher risk of having tuberculosis than those without a chronic disease (OR=24.989, p-value<0.0001). Age has a positive association with the risk of having tuberculosis. This implies that with a one-unit increase in age, the risk of having tuberculosis increases by (1.067-1) %=6.7%. All the regions that are statistically significant to tuberculosis have a positively associated risk of having tuberculosis. Men from Western Cape have higher odds than Limpopo (OR=6.343, p-value=0.0005), followed by men from Northern Cape compared to men from Limpopo (OR=5.434, p-value=0.0012). The risk of having tuberculosis for men from the Eastern Cape is 3.975 times higher compared to men from Limpopo (p-value =0.0012, for men from KwaZulu-Natal, it is 3.004 times higher compared to men from Limpopo with p-value= 0.0103. Furthermore, the risk of having tuberculosis for men from Mpumalanga is 2.385 times higher than in Limpopo men. Whites, Colored, Indians, and others are all negatively associated with the risk of having tuberculosis compared to Blacks/ Africans. Table 1 also suggests that the increase in the number of times away from home for adult men increases the risk of having TB by (1.045-1) %=4.5%. The risk of having tuberculosis for widowers is 4.041 times higher than for men who were never in a union, followed by men who are divorced, which is 3.562 times higher than those men never in a union. The risk of having tuberculosis for overweight men is (1-0.332)%=66.8% less likely than underweight men; furthermore, as the number of household members increases, the risk of tuberculosis increases by (1.114-1)%= 11.4%. Fig. (1) suggests that up to the approximate age of 50, the risk of tuberculosis is smaller for those with no chronic disease. After the approximate age of 50, the risk is higher for those with no chronic disease. Fig. (2) shows that for up to approximately five household members, the risk of tuberculosis is lesser for daily smokers than for a non-smoker and those who smoke sometimes. After approximately five household members, the risk of tuberculosis is higher for daily smokers. Table 2 provides the results of the two models, the fitted classical logistic regression model and survey logistic regression. The estimated coefficients and standard error from survey logistic regression were different from the logistic regression model under simple random sampling. The survey logistic estimates of the parameters of current age, chronic disease, Northern Cape region, White and Colored ethnicity, widowed and divorced were increased and significant for both models. The estimates of Western Cape, Eastern Cape, Free State, KwaZulu-Natal, and Mpumalanga regions, and the number of household members was decreased and significant in survey logistic regression compared to logistic. The coefficient of chronic disease in the survey logistic model increased by 53.84%, and standard errors were increased by 48.66%. The coefficient of White and Colored increased by 9.69% and 57.39%, respectively, with an increase in standard error by 25.58% and 50.11%, respectively. The coefficient of widowed and divorced was increased by 72.91% and 7.67%, with an increase in standard error by 98.63% and 13.49%, respectively. Northern Cape and KwaZulu-Natal coefficient were increased by 15.87% and 3.935%, respectively, increasing standard error by 38.56% and 38.72%, respectively. The coefficient of the Western Cape, Eastern Cape, Free State and Mpumalanga regions were decreased by 1.78%, 3.21%, 3.78%, and 18%, respectively. The standard error was increased by 36.85%, 46.68%, 37.1%, and 37.9%, respectively. Fig. (1). Interaction effect for the current age and chronic disease for survey logistic regression. The standard errors of significant parameters in the survey logistic regression were higher than the corresponding standard errors of the significant parameters in the logistic regression model. In the survey logistic model, the variables weight (overweight), smoking status and the interaction effect of the number of household members and those who smoke sometimes, and the interaction effect of current age and number of times away from home were significant, while in logistic regression these were not. This suggests that excluding the complex design may lead to false precision and estimates. Thus, survey logistic regression model was suitable for this study. Fig. (2). Interaction for number of household members and smoking status for SLR.

No. of household members and smoking status(ref=Everyday)
No. of household members and do not smoke -0.1900 0.0638 0.0030 -0.1306 0.0447 0.0035

DISCUSSION
The likelihood ratio, score tests, and Wald test are statistically significant at a 5% level of significance. This means that there is a significant contribution of covariates in the prediction of having tuberculosis. Table 3 shows that 79.8% of the probabilities are predicted correctly, suggesting a perfect association between the predicted and actual probabilities. The concordant is 79.4%, Gamma is 60%, and Somers' D is 59.5%.