Instructions

Competition Instructions

Register your email
Please enter your email to receive the login information to download the dataset, access the datasets and descriptions of previous competitions (NN3, NN5) and to receive future announcements. (Please note, that your old login information for the NN3 and NN5 competitions will still allow you to access the old datasets and presentations, but not the new datasets of NN GC1!):

Select one or more Dataset
The competition will offer 18 datasets consisting of 11 time series each. These 18 datasets are predicted in 3 distinct tournaments to be held in 2009 and 2010. The datasets will be released in groups of 6 datasets in different stages during 2009-2010, which we consider an individual 'tournament'. Each of the three tournaments will include 6 datasets of 11 homogeneous time series with a different time series frequencies: Each dataset has a homogeneous time frequency, including low-frequency time series of yearly data (NNG-A), quarterly data (NNG-B) and monthly data time (NNG-C), and high-frequency time series of weekly data (NNGC1-D), daily data (NNG-E) and hourly data (NNG-F). Only a small subset, one to a maximum of 6 datasets, have to be predicted at any time. All time series within a dataset represent empirical transportation data with an identical time series frequency, e.g. all monthly or all hourly data.

Each set of 6 datasets represents a complete tournament that allows the evaluation of forecasting accuracy of a particular method across up to 66 time series of different time frequencies. Participants can choose to participate only in a single dataset (e.g. 11 series) or multiples thereof, a single complete time series frequency (e.g. 33 time series) or multiples thereof, a complete tournament (e.g. 66 series) or - ideally - all tournaments and all time series!

Dataset	Tournament 1	Tournament 2	Tournament 3	Dataset Winners
NNG-A - Yearly	1.A 11 series	2.A 11 series	3.A 11 series	x.A 33 time series
NNG-B - Quarterly	1.B 11 series	2.B 11 series	3.B 11 series	x.B 33 time series
NNG-C - Monthly	1.C 11 series	2.C 11 series	3.C 11 series	x.C 33 time series
NNG-D - Weekly	1.D 11 series	2.D 11 series	3.D 11 series	x.D 33 time series
NNG-E - Daily	1.E 11 series	2.E 11 series	3.E 11 series	x.E 33 time series
NNG-F - Hourly	1.F 11 series	2.F 11 series	3.F 11 series	x.F 33 time series
Tournament Winner	1.x winner 66 series	2.x winner 66 series	3.x winner 66 series	Grand Total Winner 198 time series

In order to limit the effort into building models for the competition the datasets of each tournament will be released sequentially, releasing 2 datasets of a tournament every 3 months. The datasets will be released in these three stages (of 2 datasets each) in order to you to focus your time and attention on each set separately. Datasets C and E are similar in structure to the NN3 and NN5 competitions of monthly and daily data respectively, in order to to reflect experiences and learning from past competitions and to allow participants to explore their previously developed algorithms on this new but similar data.

Choose one, two, three, four, five or all six datasets of a tournament! Only those forecasting multiple datasets (either all sets per tournament or at least 2 datasets across all 3 tournaments) will be eligible to win the competition!

Download the data
- Click on the download link below and enter your login & password in the dialog-box (case sensitive entry!) to download the datasets. The login is provided in step 1 when you register your email-address and personal details.
- The datasets have the following format:

a)      each dataset on a different excel-file

b)      one series per column

c)      for each series:

Series identification

Number of observations (N)

Starting Date

Ending Date

Description of the time series

Observations per smallest seasonal cycle (e.g. days per week, hours per day)

Time Series with N observations, one per cell, vertically

Currently, only 2 datasets of Tournament 1 are released - datasets 1.C (monthly) and 1.E (daily). Additional datasets will be made available here with information sent to registered members of this site.

Forecast Dataset(s)

Develop a single methodology to use on all time series - ideally in software code or though exactly repeated steps & tests conducted by a human expert (see the FAQ if this is unclear).
In addition - if possible - please also provide the predictions of your in-sample training data with a one step ahead forecast (as there is no room to provide 56-step ahead forecasts for each time origin of the training data). This data will be used to validate goodness of fit, but will not be used to evaluate and rank the performance of your submission. If you cannot provide this, please leave it empty
Forecast the next observations as a trace forecast for a forecasting horizon of h=1, 2, ..., H for each of the 11 time series in a dataset. For the datasets of different time frequency please use the following forecasting horizons H:

Time Series Dataset	Forecasting Horizon H (number of forecasted data points)
A - Yearly	6 years
B - Quarterly	8 (e.g. 2 years)
C - Monthly	12 (e.g. 1 year)
D - Weekly	26 (e.g. 1/2 year)
E - Daily	14 (e.g. 2 weeks)
F - Hourly	48 (e.g. 2 days)

Write Description of your Method

Documented the methodology you have used in a brief summary of 2 to 6 pages IEEE format (you may use the Microsoft Word or LaTeX template found on the DMIN submission website).
Only papers prepared in PDF format will be accepted. Violations of any of the above paper specifications may result in rejection of your paper.
- Paper Size: US Letter format (8.5" x 11").
- Paper Length: Maximum 6 pages, including figures, tables & references.
- Paper Formatting: double column, single spaced, 10pt font.
- Margins: Left, Right, and Bottom: 0.75" (19mm). The top margin must be 0.75" in (19 mm), except for the title page where it must be 1" (25 mm).
- File Size Limitation: 5.0MB.
- Do not number your manuscript pages.
If you have already submitted a full paper to a workshop of session please resubmit this paper separately with your results! If you have only submitted an abstract without submitting a formal paper submitting this document is mandatory to receive your error results and ranking. All submissions not fully documented will not be evaluated for forecasting accuracy!

Submit your forecasts
- Record your forecasts in the original Microsoft Excel data files you downloaded.
- Rename the forecasting file to include your last name (of the main author / contestant from a group).
- Create the PDF of your method descriptions
- Create an email which MUST:
  - be addressed to submission@neural-forecasting-competition.com
  - have the text "NNGC1 submission" in the subject line
  - Include your name & email contact information in the main text
  - inlcude the names & contact emails of ALL co-workers
- Attach the following two files to the email:
  - attach the Excel file of your predictions
  - attach the PDF-description of the methodology used
  - Send the email by the submission deadline for the dataset and tournament round (CET)

If you encounter any problems in submitting please contact sven.crone@neural-forecasting.com immediately!

General Instructions

Submissions are restricted to one entrance per competitor.
The competitors must certify upon submission that they didn’t try to retrieve the original data.
As this is predominantly an academic competition, all advertising based upon or referencing the results or participation in this competition requires prior written consent from the organisers.
Submitting your predictions to us will not automatically allow you to present your method at a conference. In addition to submitting, we therefore encourage you to submit to one of the conferences where we will host special sessions. This will allow you to
- submit only an abstract & present at the International Symposium on Forecasting, ISF'08 without publication of a paper
- submit a full draft-paper, publish your paper and present your results at the International Conference on Data Mining DMIN'08
- submit abstract & paper and attend & present at two or more conferences!
- submit to the post-competition publications (currently under negotiation)
Please check back here regularly for information on submission deadlines & dates for theses conferences.

Experimental Design

The competition design and dataset adhere to previously identified requirements to derive valid and reliable results.

Evaluation on multiple time series, using 11 and 111 daily time series
Representative time series structure for cash machine demand
No domain knowledge, no user intervention in the forecasting methodology
Ex ante (out-of-sample) evaluation
Single time series origin (1-fold cross validation) in order to limit effort in computation & comparisons
Fixed time horizon of 56 days into the future t+1, t+2, ..., t+56
Evaluation using multiple, unbiased error measures
Evaluation of "novel" methods against established statistical methods & software benchmarks
Evaluation of "novel" methods against standard Neural Networks software packages
Testing of conditions under which NN & statistical methods perform well (using multiple hypothesis)

Datasets

Two datasets are provided, which may be found [here].

Methods

The competition is open to all methods from Computational Intelligence, listed below. The objective requires a single methodology, that is implemented across all time series. This does not require you to build a single neural network with a pre-specified input-, hidden and output-node structure but allows you to develop a process in which to run tests and determine a best setup for each time series. Hence you can come up with 111 different network architectures, fuzzy membership functions, mix of ensemble members etc. for your submission. However, the process should always lead to selecting the same final model structure as a rigorous process.

Feed forward Neural Networks (MLP etc.)
Recurrent Neural Networks (TLRNN, ENN, ec.)
Fuzzy Predictors
Decision & Regression Trees
Particle Swarm Optimisation
Support Vector Regression (SVR)
Evolutionary & Genetic Algorithms
Composite & Hybrid approaches
Others

These will be evaluated against established statistical forecasting methods

Naïve
Single, Linear, Seasonal & Dampened Trend Exponential Smoothing
ARIMA-Methods

Statistical benchmarks will be calculated using the software AUTOBOX and ForecastPro, two of the leading expert system software packages for automatic forecasting (provided by courtesy of Dave Reilly and Eric Stellwagen -THANKS!). We hope to also evaluate a number of additional packages: SAS, NeuralWorks (pending), Alyuda Forecatser (peding), NeuroDimensions (pending). In addition, the competition is open for submissions from statistical benchmark methods. Although these can be submitted and evaluated as benchmarks, only methods from computational intelligence are eligible to "win".

Evaluation

We assume no particular decision problem of the underlying forecasting competition and hence assume symmetric cost of errors. To account for a different number of observations in the individual data sub-samples of training and test set, and the different scale between individual series we propose to use a mean percentage error metric, which is also established best-practice in industry and in previous competitions. All submissions will be evaluated using the mean Symmteric Mean Absolute Percent Error (SMAPE) across al time series. The SMAPE calculates the symmetric absolute error in percent between the actuals X and the forecast F across all observations t of the test set of size n for each time series s with

(attention: corrected formula from previously published flawed error measure)

The SMAPE of each series will then be averaged over all time series in the dataset for a mean SMAPE. To determine a winner, all submissions will be ranked by mean SMAPE across all series. However, biases may be introduced in selecting a “best” method based upon a single metric, particularly in the lack of a true objective or loss function. Therefore, while our primary means of ranking forecasting approaches is mean SMAPE, alternative metrics will be used so as to guarantee the integrity of the presented results. All submitted forecasts will also be evaluated on a number of additional statistical error measures in order to analyse sensitivity to different error metrics. Additional Metrics for reporting purposes include:

Average SMAPE (main metric to determine winner)
Median SMAPE
Median absolute percentage error (MdAPE)
Median relative absolute error (MdRAE)
Average Ranking based upon the error measures
…

Publication & Non-Disclosure of Results

We respect the decision of individuals to withhold their name should they feel unsatisfied with their results. Therefore each contestant reserves the right to withdraw their name and software package used after they have learned their relative rank on the datasets. However, we reserve the right to publish an anonymised version of the descriptions of themethod and methodology used, i.e. MLP, SVR etc without the name of the contributor.