Identifying Important Features For Exoplanet Detection: A Machine Learning Approach

Abdul Karim; Jamal Uddin; Md. Mahmudul Hasan Riyad

doi:10.5281/zenodo.10566250

Abdul Karim Department of Applied Mathematics, Noakhali Science and Technology University, Noakhali-3814, Bangladesh.
Jamal Uddin Department of Applied Mathematics, Noakhali Science and Technology University, Noakhali-3814, Bangladesh.
Md. Mahmudul Hasan Riyad Department of Applied Mathematics, Noakhali Science and Technology University, Noakhali-3814, Bangladesh.

DOI: https://doi.org/10.5281/zenodo.10566250

Keywords: Exoplanet, Machin Learning, KCOI, Important Features

Abstract

The study and discovery of exoplanets (planets outside the solar system) have been a major focus in astronomy. Many efforts have been made to discover exoplanets using ground based and space based observatory, NASA’s Exoplanet Exploration Program being one of them. It has developed modern satellites like Kepler which are capable of collecting large array of data to help researchers with these objects. With the increasing number of exoplanet candidates, identifying and verifying their existence becomes a challenging task. In this research, we propose a statistical and machine learning approach to identify important features for exoplanet identification. For this purpose, we use the Kepler Cumulative Object of Interest (KCOI) dataset. After pre-processing the data we utilize statistical methods namely ANOVA F-test, Mutual Information Gain (MIG), Recursive Feature Elimination (RFE) to select the most significant features and have trained 10 state-of-the-art classifiers on them recursively to identify the features that leads to best performance. According to the results of our investigation, classifiers trained on features chosen by Recursive Feature Elimination with Random Forest as estimator produces superior results, with CatBoost classifier being the best with an accuracy of 99.61%. Our findings demonstrate the potential of machine learning in helping astronomers to efficiently and accurately verify exoplanet candidates in large astronomical datasets.

Downloads

Download data is not yet available.

References

[1] “NASA Exoplanet Archive.” https://exoplanetarchive.ipac.caltech.edu/ (accessed Feb. 09, 2023).
[2] C. J. Shallue and A. Vanderburg, “Identifying Exoplanets with Deep Learning: A Five-planet Resonant Chain around Kepler-80 and an Eighth Planet around Kepler-90,” Astron J, vol. 155, no. 2, p. 94, Jan. 2018, doi: 10.3847/1538-3881/aa9e09.
[3] Y. Jin, L. Yang, and C.-E. Chiang, “Identifying Exoplanets with Machine Learning Methods: A Preliminary Study,” International Journal on Cybernetics & Informatics, vol. 11, no. 2, pp. 31–42, Apr. 2022, doi: 10.5121/ijci.2022.110203.
[4] A. Malik, B. P. Moster, and C. Obermeier, “Exoplanet Detection using Machine Learning,” Nov. 2020, doi: 10.1093/mnras/stab3692.
[5] G. Clayton Sturrock, B. Manry, S. Rafiqi, G. Clayton, and G. Sturrock, “Machine Learning Pipeline for Exoplanet Classification.”.
[6] M. Bugueno, F. Mena, and M. Araya, “Refining exoplanet detection using supervised learning and feature engineering,” in Proceedings - 2018 44th Latin American Computing Conference, CLEI 2018, Oct. 2018, pp. 278–287. doi: 10.1109/CLEI.2018.00041.
[7] M. Jara-Maldonado, V. Alarcon-Aquino, R. Rosas-Romero, O. Starostenko, and J. M. Ramirez-Cortes, “Transiting Exoplanet Discovery Using Machine Learning Techniques: A Survey,” Earth Science Informatics, vol. 13, no. 3. Springer, pp. 573–600, Sep. 01, 2020. doi: 10.1007/s12145-020-00464-7.
[8] “5 Ways to Find a Planet | Explore – Exoplanet Exploration: Planets Beyond our Solar System.” https://exoplanets.nasa.gov/alien-worlds/ways-to-find-a-planet/ (accessed Feb. 09, 2023).
[9] T. D. Morton et al., “ False Positive Probabilities For All Kepler Objects Of Interest: 1284 Newly Validated Planets And 428 Likely False Positives.” Astrophys J, vol. 822, no. 2, p. 86, May 2016, doi: 10.3847/0004-637x/822/2/86.
[10] M. Kumar, N. K. Rath, A. Swain, and S. K. Rath, “Feature Selection and Classification of Microarray Data using MapReduce based ANOVA and K-Nearest Neighbor,” in Procedia Computer Science, 2015, vol. 54, pp. 301–310. doi: 10.1016/j.procs.2015.06.035.
[11] L. Hu, W. Gao, K. Zhao, P. Zhang, and F. Wang, “Feature selection considering two types of feature relevancy and feature interdependency,” Expert SystAppl, vol. 93, pp. 423–434, Mar. 2018, doi: 10.1016/j.eswa.2017.10.016.
[12] N. Kwak and C. H. Choi, “Input feature selection for classification problems,” IEEE Trans Neural Netw, vol. 13, no. 1, pp. 143–159, Jan. 2002, doi: 10.1109/72.977291.
[13] X. Tang, Y. Dai, P. Sun, and S. Meng, “Interaction-based feature selection using Factorial Design,” Neurocomputing, vol. 281, pp. 47–54, Mar. 2018, doi: 10.1016/j.neucom.2017.11.058.
[14] X. Wang, B. Guo, Y. Shen, C. Zhou, and X. Duan, “Input Feature Selection Method Based on Feature Set Equivalence and Mutual Information Gain Maximization,” IEEE Access, vol. 7, pp. 151525–151538, 2019, doi: 10.1109/ACCESS.2019.2948095.
[15] T. Hastie, J. Friedman, and R. Tibshirani, “The Elements of Statistical Learning,” 2001, doi: 10.1007/978-0-387-21606-5.
[16] M. P. LaValley, “Logistic regression,” Circulation, vol. 117, no. 18. pp. 2395–2399, May 2008. doi: 10.1161/CIRCULATIONAHA.106.682658.
[17] A. A. Ibrahim, R. L. Ridwan, M. M. Muhammed, R. O. Abdulaziz, and G. A. Saheed, “Comparison of the CatBoost Classifier with other Machine Learning Methods,” 2020.[Online]. Available: www.ijacsa.thesai.org
[18] A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, “An introduction to decision tree modeling,” Journal of Chemometrics, vol. 18, no. 6. pp. 275–285, Jun. 2004. doi: 10.1002/cem.873.
[19] R. Punnoose and C. Xlri -Xavier, “Prediction of Employee Turnover in Organizations using Machine Learning Algorithms A case for Extreme Gradient Boosting,” IJARAI) International Journal of Advanced Research in Artificial Intelligence, vol. 5, no. 9, 2016, Accessed: Feb. 10, 2023. [Online]. Available: www.ijarai.thesai.org
[20] S. Lessmann and S. Voß, “A reference model for customer-centric data mining with support vector machines,” Eur J Oper Res, vol. 199, no. 2, pp. 520–530, Dec. 2009, doi: 10.1016/J.EJOR.2008.12.017.
[21] T. Chen and C. Guestrin, “XGBoost: Reliable Large-scale Tree Boosting System”.
[22] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” Adv Neural Inf Process Syst, vol. 31, 2018, Accessed: Feb. 11, 2023. [Online]. Available: https://github.com/catboost/catboost
[23] Y. Freund and R. E. Schapire, “A Short Introduction to Boosting,” Journal of Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp. 771–780, 1999, Accessed: Feb. 11, 2023. [Online]. Available: www.research.att.com/fyoav,
[24] J. Stephen Bassi, E. Gbenga Dada, A. AbdulkadirHamidu, M. Dauda Elijah, and C. Author, “Students Graduation on Time Prediction Model Using Artificial Neural Network,” vol. 21, no. 3, pp. 28–35, doi: 10.9790/0661-2103012835.