First Security and Privacy Analytics Anti-Phishing Shared Task (IWSPA-AP 2018)

Results will be presented at the IWSPA 2018, a workshop co-located with
the ACM CODASPY - Conference on Data and Applications Security and Privacy, Tempe, Arizona



Call for Participation

You are invited to participate in the IWSPA-AP Shared Task at IWSPA 2018. The shared task will be on Detection and Analysis of Email nature.

The International Workshop on Security and Privacy Analytics (IWSPA) - Anti Phishing Shared Task will feature an exercise in the field of applied machine learning and text analysis in cyber security. The participants will be asked to build a classifier that will be able to detect phishing emails from spam and legitimate ones in an "unbalanced" dataset. In order to make the task relatable to a real world situation, the training and testing dataset will have realistic ratios of malicious and legitimate emails (not 50:50). A sample training data will be provided and the results will be evaluated on a testing dataset that will be posted a week before the results are due. We ask of the participants to send us their trained model and the results they achieved on the testing dataset. The participants are encouraged to use any data in their possession, in addition to the one provided, to train their model. The participants are also free to use any kind of feature engineering, and any type of classifiers. However keep in mind that the dataset is "Unbalanced".

The proceedings will be published online in the CEUR publication service. This year also we will invite the authors of selected system papers at the Shared Task, to submit extended versions to a special issue of a journal (Details coming soon).


The overall task description consists of the following:

  • Use training dataset offered and/or any dataset available online.
  • Analyze email content (Header, Body, URLs). The emails will be in .txt format.
  • Preferably come up with new and interesting features and/or use existing ones in the literature.
  • Build and train a machine learning model or use an already existing one.
  • Finally report the results based on the evaluation metrics specified in what follows.
  • A few probable SubTasks: We may post two types of training datasets -

  • Emails with headers: For this type of dataset, the participants are free to use all the content available in an email to extract information.
  • Emails with no headers: This task will only focus on the body of the emails. Participants may use any type of information extraction related to the body.
  • Evaluation Metrics: The evaluation metrics expected are: Confusion Matrix (FP, FN, TP, TN), Accuracy, F-Score, Precision, Recall, Weighted average of recall and precision.


    The registration link to EasyChair is HERE!. The deadline is January 23rd, 2018.

    Organizations wishing to participate in the AP Shared Task track at IWSPA 2018 are invited to register on EasyChair. Participants are advised to register as soon as possible in order to receive timely access to evaluation resources, including development and testing data. Registration for the task does not commit you to participation - but is helpful to know for planning. All participants who submit system runs are welcome to present their system at the IWSPA 2018.




    We will post the details for the training corpus on January 25th, 2018. Stay tuned!



    Important Dates

    Please consult the IWSPA 2018 Workshop for official dates for the workshop.

    The important deadlines for the Shared Task:
    Event Date
    Registration Deadline January 23, 2018
    Training Data Release Before January 25, 2018
    Test Data Release February 25. 2018
    Model + Results Submission March 3, 2018
    Start of Evaluation March 5, 2018
    End of Evaluation March 20, 2018

    All deadlines for the shared task are calculated as 11:59pm Baker Island Time (BIT: UTC/GMT-12).





    IWSPA 2018

    This is the fourth workshop in the series of workshops on Security and Privacy Analytics. Increasingly, sophisticated techniques from machine learning, data mining, statistics and natural language processing are being applied to challenges in security and privacy fields. However, experts from these areas have had no medium in the past where they can meet and exchange ideas so that strong collaborations can emerge, and cross-fertilization of these areas can occur. Moreover, current courses and curricula in security do not sufficiently emphasize background in these areas and students in security and privacy are not emerging with deep knowledge of these topics. Hence, we propose to continue the workshop that we started in the year 2015 to address the research and development efforts in which analytical techniques from machine learning, data mining, natural language processing and statistics are applied to solve security and privacy challenges (“security and privacy analytics”). Submissions of papers related to methodology, design, techniques and new directions for security and privacy that make significant use of machine learning, data mining, statistics or natural language processing are welcome. Furthermore, submissions on educational topics and systems in the field of security analytics are also highly encouraged.



    Organising Committee

  • Dr. Rakesh Verma, Professor, University of Houston
  • Shahryar Baki, PhD candidate, University of Houston
  • Avisha Das, PhD candidate, University of Houston
  • Ayman Elassal, PhD candidate, University of Houston
  • Luis Felipe Teixeira De Moraes, PhD candidate, University of Houston