Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Faragó, Teodóra"

Sort by: Order: Results:

  • Faragó, Teodóra (2024)
    Tobacco smoking has a huge impact on health, increasing the risk of cardiovascular diseases, respiratory diseases, and various types of cancer. Therefore, assessing a patient’s smoking history is crucial for identifying potential risk factors. Smoking also induces alteration in DNAm. The large effect of smoking makes it a crucial confounding factor in EWAS. However, smoking status information is not always available in the data. Even so, it is not always reliable due to depending on self-reporting, which can cause bias in the analysis. DNAm can be used as an excellent biomarker for smoking since it can be measured in a cost-effective, non-invasive way through methylation arrays. Already, multiple DNAm-based smoking predictors are available; some return a smoking score associated with smoking, and others return a smoking status, whether the individual is a current, never, or former smoker. These predictors are based on the Infinium 450k array from Illumina, and there is no available predictor for the Infinium Methylation EPIC array, which contains almost twice as many CpG sites as the previous one. We developed two machine learning models (Model1, Model2) that can classify individuals into three smoking statuses: never-smoker, current-smoker, and former-smoker. Both models were LASSO logistic regressors trained on EPIC array DNAm data of the Young Finns Study cohort. Model1 was trained on the beta matrix pre-processed with the standard minfi pipeline, while Model2 was trained on a beta matrix derived from QN normalized intensity values. Model1 and Model2 were both evaluated on an independent test dataset, the Finnish Twin Cohort, resulting in overall accuracies of 57.4% and 64.29%, respectively. The models can separate the classes from each other with a micro-average OvA AUC of 0.79 and 0.81. They can distinguish never- and current-smoker categories with an average OvO AUC of 0.94 and 0.93. Misclassifications aligned with the individuals’ smoking intensities and the methylation levels of the well-known smoking-associated CpG site, cg05575921.