Support

Requirements

Python 2.7 is currently supported by BigMLer.

BigMLer requires bigml 1.6.0 or
higher. Using proportional missing strategy will additionally request
the use of the numpy and
scipy libraries. They are not
automatically installed as a dependency, as they are quite heavy and
exclusively required in this case. Therefore, they have been left for
the user to install them if required.

Note that using proportional missing strategy for local predictions can also
require numpy and
scipy libraries. They are not installed by
default. Check the bindings documentation
for more info.

BigMLer Installation

You can also install the development version of bigmler directly
from the Git repository:

$ pip install -e git://github.com/bigmlcom/bigmler.git#egg=bigmler

For a detailed description of install instructions on Windows see the
BigMLer on Windows section.

BigML Authentication

All the requests to BigML.io must be authenticated using your username
and API key and are always
transmitted over HTTPS.

BigML module will look for your username and API key in the environment
variables BIGML_USERNAME and BIGML_API_KEY respectively. You can
add the following lines to your .bashrc or .bash_profile to set
those variables automatically when you log in:

BigMLer on Windows

In addition to that, you’ll need the pip tool to install BigMLer. To
install pip, first you need to open your command line window (write cmd in
the input field that appears when you click on Start and hit enter),
download this python file
and execute it:

c:\Python27\python.exe distribute_setup.py

After that, you’ll be able to install pip by typing the following command:

c:\Python27\Scripts\easy_install.exe pip

And finally, to install BigMLer, just type:

c:\Python27\Scripts\pip.exe install bigmler

and BigMLer should be installed in your computer. Then
issuing:

bigmler --version

should show BigMLer version information.

Finally, to start using BigMLer to handle your BigML resources, you need to
set your credentials in BigML for authentication. If you want them to be
permanently stored in your system, use:

If you do not specify the path to an output file, BigMLer will auto-generate one for you under a
new directory named after the current date and time (e.g., MonNov1212_174715/predictions.csv).
With --prediction-info
flag set to brief only the prediction result will be stored (default is
normal and includes confidence information).

A different objective field (the field that you want to predict) can be selected using:

If you do not explicitly specify an objective field, BigML will default to the last
column in your dataset.

Also, if your test file uses a particular field separator for its data,
you can tell BigMLer using --test-separator.
For example, if your test file uses the tab character as field separator the
call should be like:

If you don’t provide a file name for your training source, BigMLer will try to
read it from the standard input:

cat data/iris.csv | bigmler --train

BigMLer will try to use the locale of the model both to create a new source
(if --train flag is used) and to interpret test data. In case
it fails, it will try en_US.UTF-8
or English_United States.1252 and a warning message will be printed.
If you want to change this behaviour you can specify your preferred locale:

If you check your working directory you will see that BigMLer creates a file
with the
model ids that have been generated (e.g., FriNov0912_223645/models).
This file is handy if then you want to use those model ids to generate local
predictions. BigMLer also creates a file with the dataset id that has been
generated (e.g., TueNov1312_003451/dataset) and another one summarizing
the steps taken in the session progress: bigmler_sessions. You can also
store a copy of every created or retrieved resource in your output directory
(e.g., TueNov1312_003451/model_50c23e5e035d07305a00004f) by setting the flag
--store.

Prior Versions Compatibility Issues

BigMLer will accept flags written with underscore as word separator like
--clear_logs for compatibility with prior versions. Also --field-names
is accepted, although the more complete --field-attributes flag is
preferred. --stat_pruning and --no_stat_pruning are discontinued
and their effects can be achived by setting the actual --pruning flag
to statistical or no-pruning values respectively.

1.8.6 (2014-05-22)

1.8.5 (2014-05-19)

1.8.4 (2014-05-07)

Fixing bug in analyze –nodes. The default node steps could not be found.

1.8.3 (2014-05-06)

Setting dependency of new python bindings version 1.3.1.

1.8.2 (2014-05-06)

Fixing bug: –shared and –unshared should be considered only when set
in the command line by the user. They were always updated, even when absent.

Fixing bug: –remote predictions were not working when –model was used as
training start point.

1.8.1 (2014-05-04)

Changing the Gazibit report for shared resources to include the model
shared url in embedded format.

Fixing bug: train and tests data could not be read from stdin.

1.8.0 (2014-04-29)

Adding the analyze subcommand. The subcommand presents new features,
such as:

--cross-validation that performs k-fold cross-validation,
--features that selects the best features to increase accuracy
(or any other evaluation metric) using a smart search algorithm and
--nodes that selects the node threshold that ensures best accuracy
(or any other evaluation metric) in user defined range of nodes.

1.7.1 (2014-04-21)

Fixing bug: –no-upload flag was not really used.

1.7.0 (2014-04-20)

Adding the –reports option to generate Gazibit reports.

1.6.0 (2014-04-18)

Adding the –shared flag to share the created dataset, model and evaluation.

1.5.1 (2014-04-04)

Fixing bug for model building, when objective field was specified and
no –max-category was present the user given objective was not used.

Fixing bug: max-category data stored even when –max-category was not
used.

1.5.0 (2014-03-24)

Adding –missing-strategy option to allow different prediction strategies
when a missing value is found in a split field. Available for local
predictions, batch predictions and evaluations.

Adding new –delete options: –newer-than and –older-than to delete lists
of resources according to their creation date.

Adding –multi-dataset flag to generate a new dataset from a list of
equally structured datasets.

1.4.7 (2014-03-14)

Bug fixing: resume from multi-label processing from dataset was not working.

Bug fixing: max parallel resource creation check did not check that all the

older tasks ended, only the last of the slot. This caused
more tasks than permitted to be sent in parallel.