It is based on xlrd library but also
checked for compatibility with xlwingsCOM-client library.
It requires numpy and (optionally) pandas.
It is developed on python-3 but also tested on python-2 for compatibility.

The xl-ref notation extends ordinary A1 and RC excel coordinates with
conditional traversing operations, based on the cell’s empty/full state.
For instance, to extract a contigious table near the A1 cell,
and make a pandas.DataFrame out of it use this:

frompandaloneimportxleash,SheetsFactoryshfac=SheetsFactory()shfac.list_sheetnames(''path/to/workbook.xlsx')[Sheet1', ...]## Search and capture the first contiguous table from the 1st sheet# as a pandas-DataFrame:df=xleash.lasso('path/to/workbook.xlsx#0!A1(DR):..(DR):RLDU:["df"]',sheets_factory=shfac)## Assuming the sheet contain a single table, a lone `:` fetches# the same contents. Additionally, it is possible# to skip the sheetname/sheet-index (1st 1st sheet implied).df=xleash.lasso('#:["df"]',url_file=path/to/workbook.xlsx,sheets_factory=shfac)

Assuming that the full-cell of the 1st sheet of the workbook on disk are
those marked with 'X', then the result capture-rect of the above call
would be a 2D list-of-lists with the values contained in C2:E4:

A B C D E
1 ┌─────┐
2 │ X│
3 │X │
4 │ X │
5 └─────┘

If another sheet is desired, add its name or 0-based ordinal immediately after #
separated by a ! with the rest of the xl-ref - which inthat case
might be empty:

The 4 primitive directions that are denoted with one of the letters
LURD.
Thee are used to express both target-moves and expansions.

coordinate

coordinates

Any pair of a cell/column coordinates specifying cell positions,
(i.e. landing-cell, target-cell, bounds of the capture-rect)
written as the first part of the edge syntax, or implicitely resolved.
They can be expressed in A1 or RC format or as a zero-based
(row,col) tuple (num).
Each coordinate might be absolute or dependent, independently.

Due to state-change on the ‘exterior’ cells the capture-rect
might be smaller that a wider contigious but “convex” rectangular area.

The expansions attempt to remedy this by providing for expanding on
arbitrary directions accompanied by a multiplicity for each one.
If multiplicity is unspecified, infinite assumed, so it expands
until an empty/full row/column is met.

absolute

Any cell row/col identified with column-characters, row-numbers, or
the following special-characters:

The structure to specify some function call in the filter part;
it can either be a json string, list or object like that:

string: "func_name"

list: ["func_name",["arg1","arg2"],{"k1":"v1"}]
where the last 2 parts are optional and can be given in any order;

object: {"func":"func_name","args":["arg1"],"kwds":{"k":"v"}}
where the args and kwds are optional.

If the outer-most filter is a dictionary, a 'pop' kwd is popped-out
as the opts.

opts

Key-value pairs affecting the lassoing (i.e. opening xlrd-workbooks).
Read the code to be sure what are the available choices :-(
They are a combination of options specified in code (i.e. in the
lasso() and those extracted from filters by the ‘opts’ key,
and they are stored in the Lasso.

backend

backends

IO level object providing the actual spreadsheet cells for capturing.
Each backend may provide for its workbooks and sheets corresponding to:
- different implementations (e.g.``xlrd`` or xlwings library), or
- different origins (e.g. file-based, network-based per url ).

The decision which backend to use is taken by the sheet-factory
following a bidding process.

sheets-factory

IO level object acting as the caching manager for spreadsheets
fetched from different backends. The caching happens per
spreadsheet.

bid

backend-bidding

All backends are asked to provide their willingness to handle
some xl-ref (see SimpleSheetFactory.decide_backend())).
For a sibling sheet, always the parent backend is used.

More specifically, the capturing will always fetch the same values
regardless of “row-first” or “column-first” order; this is not the case
with targeting (LURD) moves.

For instance, to capture B4:E5 in the above sheet we may use
_5(L):E.(U).
In that case the target cells are B5 and E4 and the target-moves
to reach the 2nd one are UR which are different from the U
specified on the 2nd cell.

The plugins are initialized during import time in a 2-stage procedure
by init_plugins().
A plugin is loaded and optionally installed if the setup-configuration
above specifies a no-args <plugin-install-func> callable.
Any collected <plugin-install-func> callables are invoked AFTER all
plugin-modules have finished loading.

Tip

For example, study this project how it sets backend and filters.

Warning

When appending into “hook” lists during installation, remember to avoid
re-inserting duplicate items. In general try to well-behave even when
plugins are initialized multiple times!

>>> st_edge=Edge(Cell('^','_'),'L')# Landing is full, so 'L' ignored.>>> nd_edge=Edge(Cell('_','_'),'L','+')# '+' or would also stop.>>> rect==resolve_capture_rect(states_matrix,(up,dn),st_edge,nd_edge)True

sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
the new SheetsFactory created is closed afterwards.
Delegated to make_default_Ranger(), so items override
default ones; use a new Ranger if that is not desired.

If True, values are contained in the returned Lasso instance,
along with all other artifacts of the lassoing procedure.

For more debugging help, create a Range yourself and
inspect the Ranger.intermediate_lasso.

context_kwds (Lasso) – Default Lasso fields in case parsed ones are None
(i.e. you can specify the sheet like that).

Variables:

base_opts – Opts affecting the lassoing procedure that are deep-copied and used
as the base-opts for every Ranger.do_lasso(), whether invoked
directly or recursively by recursive_filter().
Read the code to be sure what are the available choices.
Delegated to make_default_Ranger(), so items override
default ones; use a new Ranger if that is not desired.

Returns:

Either the captured & filtered values or the final Lasso,
depending on the return_lassos arg.

sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe None, but do_lasso() will scream unless invoked
with a context_lasso arg containing a concrete ABCSheet.

base_opts (dict) – The opts that are deep-copied and used as the defaults
for every do_lasso(), whether invoked directly or
recursively by recursive_filter().
If unspecified, no opts are used, but this attr is set to an
empty dict.
See get_default_opts().

or None available_filters (dict) – The filters available for a xl-ref to use.
If None, then uses xleash.installed_filters.
Use an empty dict not to use any filters.

intermediate_lasso (Lasso) – A ('stage',Lasso) pair with the last Lasso instance
produced during the last execution of the do_lasso().
Used for inspecting/debuging.

sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new SheetsFactory is created.
Remember to invoke its SheetsFactory.close() to clear
resources from any opened sheets.

pandalone.xleash.installed_filters = {'dict': {'func': <function install_default_filters.<locals>.<lambda> at 0x7f81f0314b70>, 'desc': "dict() -> new empty dictionary\ndict(mapping) -> new dictionary initialized from a mapping object's\n (key, value) pairs\ndict(iterable) -> new dictionary initialized as if via:\n d = {}\n for k, v in iterable:\n d[k] = v\ndict(**kwargs) -> new dictionary initialized with the name=value pairs\n in the keyword argument list. For example: dict(one=1, two=2)"}, 'numpy': {'func': <function install_default_filters.<locals>.<lambda> at 0x7f81f0cf5e18>, 'desc': '\n MagicMock is a subclass of Mock with default implementations\n of most of the magic methods. You can use MagicMock without having to\n configure the magic methods yourself.\n\n If you use the `spec` or `spec_set` arguments then *only* magic\n methods that exist in the spec will be created.\n\n Attributes and the return value of a `MagicMock` will also be `MagicMocks`.\n '}, 'odict': {'func': <function install_default_filters.<locals>.<lambda> at 0x7f81f0314bf8>, 'desc': 'Dictionary that remembers insertion order'}, 'pipe': {'func': <function pipe_filter at 0x7f81f0343a60>}, 'py': {'func': <function py_filter at 0x7f81f0314a60>}, 'pyeval': {'func': <function pyeval_filter at 0x7f81f03149d8>}, 'recurse': {'func': <function recursive_filter at 0x7f81f0314840>}, 'redim': {'func': <function redim_filter at 0x7f81f0314378>}, 'sorted': {'func': <function install_default_filters.<locals>.<lambda> at 0x7f81f0314c80>, 'desc': 'Return a new list containing all items from the iterable in ascending order.\n\nA custom key function can be supplied to customise the sort order, and the\nreverse flag can be set to request the result in descending order.'}}¶

>>> st_edge=Edge(Cell('^','_'),'L')# Landing is full, so 'L' ignored.>>> nd_edge=Edge(Cell('_','_'),'L','+')# '+' or would also stop.>>> rect==resolve_capture_rect(states_matrix,(up,dn),st_edge,nd_edge)True

A bulk-filter that passes values through a python-expression using asteval library.

The expr may access read-write all locals() of this method
(ranger, lasso), the numpy funcs, and the pandalone.xleash
module under the xleash variable.

The expr may return either:

the processed values, or

an instance of the Lasso, in which case only its opt
field is checked and replaced with original if missing.
So better use namedtuple._replace() on the current lasso
which exists in the expr’s namespace.

Parameters:

expr (str) – The python-expression, which may comprise of multiple statements.

The expr fecthed from term:`capturing may access read-write
all locals() of this method (ie: ranger, lasso),
the numpy funcs, and the pandalone.xleash module under
the xleash variable.

The expr may return either:

the processed values, or

an instance of the Lasso, in which case only its opt
field is checked and replaced with original if missing.
So better use namedtuple._replace() on the current lasso
which exists in the expr’s namespace.

sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe None, but do_lasso() will scream unless invoked
with a context_lasso arg containing a concrete ABCSheet.

base_opts (dict) – The opts that are deep-copied and used as the defaults
for every do_lasso(), whether invoked directly or
recursively by recursive_filter().
If unspecified, no opts are used, but this attr is set to an
empty dict.
See get_default_opts().

or None available_filters (dict) – The filters available for a xl-ref to use.
If None, then uses xleash.installed_filters.
Use an empty dict not to use any filters.

intermediate_lasso (Lasso) – A ('stage',Lasso) pair with the last Lasso instance
produced during the last execution of the do_lasso().
Used for inspecting/debuging.

sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
the new SheetsFactory created is closed afterwards.
Delegated to make_default_Ranger(), so items override
default ones; use a new Ranger if that is not desired.

If True, values are contained in the returned Lasso instance,
along with all other artifacts of the lassoing procedure.

For more debugging help, create a Range yourself and
inspect the Ranger.intermediate_lasso.

context_kwds (Lasso) – Default Lasso fields in case parsed ones are None
(i.e. you can specify the sheet like that).

Variables:

base_opts – Opts affecting the lassoing procedure that are deep-copied and used
as the base-opts for every Ranger.do_lasso(), whether invoked
directly or recursively by recursive_filter().
Read the code to be sure what are the available choices.
Delegated to make_default_Ranger(), so items override
default ones; use a new Ranger if that is not desired.

Returns:

Either the captured & filtered values or the final Lasso,
depending on the return_lassos arg.

sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new SheetsFactory is created.
Remember to invoke its SheetsFactory.close() to clear
resources from any opened sheets.

The “magic” autocreates psteps as they referenced, making writing code
that access data-tree paths, natural, while at the same time the “model”
of those tree-data gets discovered.

Each pstep keeps internally the name of a data-tree step, which,
when created through recursive referencing, concedes with parent’s
branch leading to this step. That name can be modified with Pmod
so the same data-accessing code can refer to differently-named values
int the data-tree.

pname (str) – this pstep’s name which must coincede with the name of
the parent-pstep’s attribute holding this pstep.
It is stored at _orig and if no alias and unmapped by pmod,
this becomes the alias.

another pstep to clone attributes from
(used when replacing an existing child-pstep), or

None.

The mappings will apply only if Pmod.descend()
match pname and will derrive the alias.

alias (str) – Will become the super-str object when no mappaings specified
(_proto_or_pmod is a dict from some prototype pstep)
It gets jsonpointer-escaped if it exists
(see pandata.escape_jsonpointer_part())

The making of a model involves, among others, schema-validating, reading subtree-branches
from URIs, cloning, converting and merging multiple sub-models in a single unified-model tree,
without side-effecting given input.
All these happen in 4+1 steps:

All steps are executed “lazily” using generators (with yield).
Before proceeding to the next step, the previous one must have completed successfully.
That way, any ad-hoc code in building-step-5(curation), for instance, will not suffer a horrible death
due to badly-formed data.

[TODO] The storing of a model simply involves distributing model parts into different files and/or formats,
again without side-effecting the unified-model.

Building model

Here is a detailed description of each building-step:

_resolve() and substitute any json-references
present in the submodels with content-fragments fetched from the referred URIs.
The submodels are cloned first, to avoid side-effecting them.

Although by default a combination of JSON and CSV files is expected, this can be customized,
either by the content in the json-ref, within the model (see below), or
as explained below.

The extended json-refs syntax supported provides for passing arguments into _read_branch()
and _write_branch() methods. The syntax is easier to explain by showing what
the default _global_cntxt corresponds to, for a DataFrame:

And here what is required to read and (later) store into a HDF5 local file with a predefined name:

{"$ref":"file://./filename.hdf5","$inp":["AUTO"],"$out":["HDF5"]}

Warning

Step NOT IMPLEMENTED YET!

Loosely _prevalidate() each sub-model separately with json-schema,
where any pandas-instances (DataFrames and Series) are left as is.
It is the duty of the developer to ensure that the prevalidation-schema is loose enough that
it allows for various submodel-forms, prior to merging, to pass.

Recursively clone and _merge() sub-models in a single unified-model tree.
Branches from sub-models higher in the stack override the respective ones from the sub-models below,
recursively. Different object types need to be converted appropriately (ie. merging
a dict with a DataFrame results into a DataFrame, so the dictionary has to convert
to dataframe).

The required conversions into pandas classes can be customized as explained
below. Series and DataFrames cannot merge together, and Sequences do not merge
with any other object-type (themselfs included), they just “overwrite”.

The default convertor-functions defined both for submodels and models are listed in the following table:

The required conversions from pandas classes can be customized as explained
below.

The default convertor-functions are the same as above.

(Optionally) Apply the _curate() functions on the the model to enforce dependencies and/or any
ad-hoc generation-rules among the data. You can think of bash-like expansion patterns,
like ${/some/path:=$HOME} or expressions like %len(../other/path).

Storing model

When storing model-parts, if unspecified, the filenames to write into will be deduced from the jsonpointer-path
of the $out’s parent, by substituting “strange” chars with undescores(_).

Warning

Functionality NOT IMPLEMENTED YET!

Customization

Some operations within steps (namely conversion and IO) can be customized by the following means
(from lower to higher precedance):

>>> mm=MyModel()>>> mm.add_submodel({'a':1})## According to the schema, this should have been a string,>>> mm.add_submodel({'b':'string'})## and this one, a number.

>>> sorted(mm.build_iter(),key=lambdaex:ex.message)## Fetch a list with all validation errors. [<ValidationError: "'string' is not of type 'number'">, <ValidationError: "1 is not of type 'string'">, <ValidationError: 'Gave-up building model after step 1.prevalidate (out of 4).'>]