Open Issues

Identify which algorithms would be the most useful to implement first. For a list of existing implementations take a look at the examples repository, for a list of scikit-learn implementations take a look at this. Add the roadmap to the readme and to the website.

Transforming features into a format that is more suitable for specific algorithms is an integral part of machine learning. See this for existing implementations and see this for scikit-learn implementations.

Currently, data can be read from CSV files with this piece of code, but the implementation is very limited; only numerical data (doubles) can be read, non-numerical fields result in Double.NaN which is used to encode missing data in doddle-model. The new implementation should be able to encode numerical and categorical variables with missing values, i.e. numerical features should be encoded as doubles directly and categorical features should first be encoded to numerical representation (take a look at label ecoder) and only then converted to doubles. The function will probably look something like loadCsvDataset(filePath: String, naString: String, headerLine: Boolean = true): DenseMatrix[Double].

A 3rd party library should be used to parse CSV files (take a look at this discussion for some starting points).