Improving survey specifications are causing an exponential rise in pulsar candidate numbers and data volumes. We study the candidate filters used to mitigate these problems during the past 50 years. We find that some existing methods such as applying constraints on the total number of candidates collected per observation, may have detrimental effects on the success of pulsar searches. Those methods immune to such effects are found to be ill-equipped to deal with the problems associated with increasing data volumes and candidate numbers, motivating the development of new approaches. We therefore present a new method designed for online operation. It selects promising candidates using a purpose-built tree-based machine learning classifier, the Gaussian Hellinger Very Fast Decision Tree, and a new set of features for describing candidates. The features have been chosen so as to (i) maximize the separation between candidates arising from noise and those of probable astrophysical origin, and (ii) be as survey-independent as possible. Using these features our new approach can process millions of candidates in seconds (∼1 million every 15 s), with high levels of pulsar recall (90 per cent+). This technique is therefore applicable to the large volumes of data expected to be produced by the Square Kilometre Array. Use of this approach has assisted in the discovery of 20 new pulsars in data obtained during the Low-Frequency Array Tied-Array All-Sky Survey.
- pulsar candidate selection