TY - JOUR
T1 - Ensemble candidate classification for the LOTAAS pulsar survey
AU - Tan, Chia Min
AU - LYON, ROBERT
AU - Stappers, Benjamin
AU - Cooper, Sally
AU - Hessels, Jason
AU - Kondratiev, Vald
AU - Michilli, Daniele
AU - Sanidas, Sotiris
PY - 2018/3/1
Y1 - 2018/3/1
N2 - One of the biggest challenges arising from modern large-scale pulsar surveys is the number of candidates generated. Here, we implemented several improvements to the machine learning (ML) classifier previously used by the LOFAR Tied-Array All-Sky Survey (LOTAAS) to look for new pulsars via filtering the candidates obtained during periodicity searches. To assist the ML algorithm, we have introduced new features which capture the frequency and time evolution of the signal and improved the signal-to-noise calculation accounting for broad profiles. We enhanced the ML classifier by including a third class characterizing RFI instances, allowing candidates arising from RFI to be isolated, reducing the false positive return rate. We also introduced a new training data set used by the ML algorithm that includes a large sample of pulsars misclassified by the previous classifier. Lastly, we developed an ensemble classifier comprised of five different Decision Trees. Taken together these updates improve the pulsar recall rate by 2.5 per cent, while also improving the ability to identify pulsars with wide pulse profiles, often misclassified by the previous classifier. The new ensemble classifier is also able to reduce the percentage of false positive candidates identified from each LOTAAS pointing from 2.5 per cent (∼500 candidates) to 1.1 per cent (∼220 candidates).
AB - One of the biggest challenges arising from modern large-scale pulsar surveys is the number of candidates generated. Here, we implemented several improvements to the machine learning (ML) classifier previously used by the LOFAR Tied-Array All-Sky Survey (LOTAAS) to look for new pulsars via filtering the candidates obtained during periodicity searches. To assist the ML algorithm, we have introduced new features which capture the frequency and time evolution of the signal and improved the signal-to-noise calculation accounting for broad profiles. We enhanced the ML classifier by including a third class characterizing RFI instances, allowing candidates arising from RFI to be isolated, reducing the false positive return rate. We also introduced a new training data set used by the ML algorithm that includes a large sample of pulsars misclassified by the previous classifier. Lastly, we developed an ensemble classifier comprised of five different Decision Trees. Taken together these updates improve the pulsar recall rate by 2.5 per cent, while also improving the ability to identify pulsars with wide pulse profiles, often misclassified by the previous classifier. The new ensemble classifier is also able to reduce the percentage of false positive candidates identified from each LOTAAS pointing from 2.5 per cent (∼500 candidates) to 1.1 per cent (∼220 candidates).
KW - Methods: Data analysis
KW - Methods: Statistical
KW - Pulsars: General
UR - http://www.scopus.com/inward/record.url?scp=85040230377&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040230377&partnerID=8YFLogxK
UR - http://www.mendeley.com/catalogue/ensemble-candidate-classification-lotaas-pulsar-survey
U2 - 10.1093/mnras/stx3047
DO - 10.1093/mnras/stx3047
M3 - Article (journal)
SN - 0035-8711
VL - 474
SP - 4571
EP - 4583
JO - Monthly Notices of the Royal Astronomical Society
JF - Monthly Notices of the Royal Astronomical Society
IS - 4
ER -