This paper provides a summary of the IWSPA Anti-Phishing shared task pilot. The pilot consisted of two subtasks, identifying phishing emails from a collection of legitimate and phishing email bodies, and separating phishing emails from legitimate emails when given full emails, i.e., with headers and bodies. For both subtasks, training datasets were made available approximately a month before the test data was released. Sixteen teams registered for the task and nine submitted models and predictions for the test data. We discuss the collection sources and preprocessing of the datasets, and the performance of the teams on the test data from several different perspectives. A unique aspect of the dataset was that it included synthetic attacks. Another emphasis in both subtasks was that the phishing class was much smaller than the legitimate class to reflect the real-world scenario. Hence, we introduce two evaluation metrics, called balanced detection rate and normalized balanced detection rate, which to our knowledge are new and moresuitable for unbalanced datasets. We then evaluate the performance of the teams on the usual metrics as well as metrics for unbalanced datasets, including the new metrics.
Add the publication’s full text or supplementary notes here. You can use rich formatting such as including code, math, and images.