Excel formulas interpreter in Python.

vinci1it2000, updated πŸ•₯ 2023-03-24 21:22:33

.. _start-quick:

formulas: An Excel formulas interpreter in Python.

|pypi_ver| |test_status| |cover_status| |docs_status| |dependencies| |github_issues| |python_ver| |proj_license| |binder|

:release: 1.2.6 :date: 2022-12-13 23:15:00 :repository: https://github.com/vinci1it2000/formulas :pypi-repo: https://pypi.org/project/formulas/ :docs: http://formulas.readthedocs.io/ :wiki: https://github.com/vinci1it2000/formulas/wiki/ :download: http://github.com/vinci1it2000/formulas/releases/ :donate: https://donorbox.org/formulas :keywords: excel, formulas, interpreter, compiler, dispatch :developers: .. include:: AUTHORS.rst :license: EUPL 1.1+ <https://joinup.ec.europa.eu/software/page/eupl>_

.. _start-pypi: .. _start-intro:

What is formulas?

formulas implements an interpreter for Excel formulas, which parses and compile Excel formulas expressions.

Moreover, it compiles Excel workbooks to python and executes without using the Excel COM server. Hence, Excel is not needed.

Installation

To install it use (with root privileges):

.. code-block:: console

$ pip install formulas

Or download the last git version and use (with root privileges):

.. code-block:: console

$ python setup.py install

Install extras

Some additional functionality is enabled installing the following extras:

  • excel: enables to compile Excel workbooks to python and execute using: :class:~formulas.excel.ExcelModel.
  • plot: enables to plot the formula ast and the Excel model.

To install formulas and all extras, do:

.. code-block:: console

$ pip install formulas[all]

Development version

To help with the testing and the development of formulas, you can install the development version:

.. code-block:: console

$ pip install https://github.com/vinci1it2000/formulas/archive/dev.zip

.. _end-quick:

Basic Examples

The following sections will show how to:

  • parse a Excel formulas;
  • load, compile, and execute a Excel workbook;
  • extract a sub-model from a Excel workbook;
  • add a custom function.

Parsing formula

An example how to parse and execute an Excel formula is the following:

>>> import formulas
>>> func = formulas.Parser().ast('=(1 + 1) + B3 / A2')[1].compile()

To visualize formula model and get the input order you can do the following:

.. dispatcher:: func :opt: graph_attr={'ratio': '1'} :code:

>>> list(func.inputs)
['A2', 'B3']
>>> func.plot(view=False)  # Set view=True to plot in the default browser.
SiteMap([(=((1 + 1) + (B3 / A2)), SiteMap())])

Finally to execute the formula and plot the workflow:

.. dispatcher:: func :opt: workflow=True, graph_attr={'ratio': '1'} :code:

>>> func(1, 5)
Array(7.0, dtype=object)
>>> func.plot(workflow=True, view=False)  # Set view=True to plot in the default browser.
SiteMap([(=((1 + 1) + (B3 / A2)), SiteMap())])

Excel workbook

An example how to load, calculate, and write an Excel workbook is the following:

.. testsetup::

>>> import os.path as osp
>>> from setup import mydir
>>> fpath = osp.join(mydir, 'test/test_files/excel.xlsx')
>>> dir_output = osp.join(mydir, 'test/test_files/tmp')

.. doctest::

>>> import formulas
>>> fpath, dir_output = 'excel.xlsx', 'output'  # doctest: +SKIP
>>> xl_model = formulas.ExcelModel().loads(fpath).finish()
>>> xl_model.calculate()
Solution(...)
>>> xl_model.write(dirpath=dir_output)
{'EXCEL.XLSX': {Book: <openpyxl.workbook.workbook.Workbook ...>}}

.. tip:: If you have or could have circular references, add circular=True to finish method.

To plot the dependency graph that depict relationships between Excel cells:

.. dispatcher:: dsp :code:

>>> dsp = xl_model.dsp
>>> dsp.plot(view=False)  # Set view=True to plot in the default browser.
SiteMap([(ExcelModel, SiteMap(...))])

To overwrite the default inputs that are defined by the excel file or to impose some value to a specific cell:

>>> xl_model.calculate(
...     inputs={
...         "'[excel.xlsx]'!INPUT_A": 3,  # To overwrite the default value.
...         "'[excel.xlsx]DATA'!B3": 1  # To impose a value to B3 cell.
...     },
...     outputs=[
...        "'[excel.xlsx]DATA'!C2", "'[excel.xlsx]DATA'!C4"
...     ] # To define the outputs that you want to calculate.
... )
Solution([("'[excel.xlsx]'!INPUT_A", <Ranges>('[excel.xlsx]DATA'!A2)=[[3]]),
          ("'[excel.xlsx]DATA'!B3", <Ranges>('[excel.xlsx]DATA'!B3)=[[1]]),
          ("'[excel.xlsx]DATA'!A2", <Ranges>('[excel.xlsx]DATA'!A2)=[[3]]),
          ("'[excel.xlsx]DATA'!A3", <Ranges>('[excel.xlsx]DATA'!A3)=[[6]]),
          ("'[excel.xlsx]DATA'!D2", <Ranges>('[excel.xlsx]DATA'!D2)=[[1]]),
          ("'[excel.xlsx]'!INPUT_B", <Ranges>('[excel.xlsx]DATA'!A3)=[[6]]),
          ("'[excel.xlsx]DATA'!B2", <Ranges>('[excel.xlsx]DATA'!B2)=[[9.0]]),
          ("'[excel.xlsx]DATA'!D3", <Ranges>('[excel.xlsx]DATA'!D3)=[[2.0]]),
          ("'[excel.xlsx]DATA'!C2", <Ranges>('[excel.xlsx]DATA'!C2)=[[10.0]]),
          ("'[excel.xlsx]DATA'!D4", <Ranges>('[excel.xlsx]DATA'!D4)=[[3.0]]),
          ("'[excel.xlsx]DATA'!C4", <Ranges>('[excel.xlsx]DATA'!C4)=[[4.0]])])

To build a single function out of an excel model with fixed inputs and outputs, you can use the compile method of the ExcelModel that returns a DispatchPipe_. This is a function where the inputs and outputs are defined by the data node ids (i.e., cell references).

.. dispatcher:: func :code:

>>> func = xl_model.compile(
...     inputs=[
...         "'[excel.xlsx]'!INPUT_A",  # First argument of the function.
...         "'[excel.xlsx]DATA'!B3"   # Second argument of the function.
...     ], # To define function inputs.
...     outputs=[
...         "'[excel.xlsx]DATA'!C2", "'[excel.xlsx]DATA'!C4"
...     ] # To define function outputs.
... )
>>> func
<schedula.utils.dsp.DispatchPipe object at ...>
>>> [v.value[0, 0] for v in func(3, 1)]  # To retrieve the data.
[10.0, 4.0]
>>> func.plot(view=False)  # Set view=True to plot in the default browser.
SiteMap([(ExcelModel, SiteMap(...))])

.. _DispatchPipe: https://schedula.readthedocs.io/en/master/_build/schedula/utils/dsp/schedula.utils.dsp.DispatchPipe.html#schedula.utils.dsp.DispatchPipe

Alternatively, to load a partial excel model from the output cells, you can use the from_ranges method of the ExcelModel:

.. dispatcher:: dsp :code:

>>> xl = formulas.ExcelModel().from_ranges(
...     "'[%s]DATA'!C2:D2" % fpath,  # Output range.
...     "'[%s]DATA'!B4" % fpath,  # Output cell.
... )
>>> dsp = xl.dsp
>>> sorted(dsp.data_nodes)
["'[excel.xlsx]'!INPUT_A",
 "'[excel.xlsx]'!INPUT_B",
 "'[excel.xlsx]'!INPUT_C",
 "'[excel.xlsx]DATA'!A2",
 "'[excel.xlsx]DATA'!A3",
 "'[excel.xlsx]DATA'!A3:A4",
 "'[excel.xlsx]DATA'!A4",
 "'[excel.xlsx]DATA'!B2",
 "'[excel.xlsx]DATA'!B3",
 "'[excel.xlsx]DATA'!B4",
 "'[excel.xlsx]DATA'!C2",
 "'[excel.xlsx]DATA'!D2"]

JSON export/import ~~~~~~~~~~~~~~~~~~ The ExcelModel can be exported/imported to/from a readable JSON format. The reason of this functionality is to have format that can be easily maintained (e.g. using version control programs like git). Follows an example on how to export/import to/from JSON an ExcelModel:

.. testsetup::

>>> import formulas
>>> import os.path as osp
>>> from setup import mydir
>>> fpath = osp.join(mydir, 'test/test_files/excel.xlsx')
>>> xl_model = formulas.ExcelModel().loads(fpath).finish()

.. doctest::

>>> import json
>>> xl_dict = xl_model.to_dict()  # To JSON-able dict.
>>> xl_dict  # Exported format. # doctest: +SKIP
{
 "'[excel.xlsx]DATA'!A1": "inputs",
 "'[excel.xlsx]DATA'!B1": "Intermediate",
 "'[excel.xlsx]DATA'!C1": "outputs",
 "'[excel.xlsx]DATA'!D1": "defaults",
 "'[excel.xlsx]DATA'!A2": 2,
 "'[excel.xlsx]DATA'!D2": 1,
 "'[excel.xlsx]DATA'!A3": 6,
 "'[excel.xlsx]DATA'!A4": 5,
 "'[excel.xlsx]DATA'!B2": "=('[excel.xlsx]DATA'!A2 + '[excel.xlsx]DATA'!A3)",
 "'[excel.xlsx]DATA'!C2": "=(('[excel.xlsx]DATA'!B2 / '[excel.xlsx]DATA'!B3) + '[excel.xlsx]DATA'!D2)",
 "'[excel.xlsx]DATA'!B3": "=('[excel.xlsx]DATA'!B2 - '[excel.xlsx]DATA'!A3)",
 "'[excel.xlsx]DATA'!C3": "=(('[excel.xlsx]DATA'!C2 * '[excel.xlsx]DATA'!A2) + '[excel.xlsx]DATA'!D3)",
 "'[excel.xlsx]DATA'!D3": "=(1 + '[excel.xlsx]DATA'!D2)",
 "'[excel.xlsx]DATA'!B4": "=MAX('[excel.xlsx]DATA'!A3:A4, '[excel.xlsx]DATA'!B2)",
 "'[excel.xlsx]DATA'!C4": "=(('[excel.xlsx]DATA'!B3 ^ '[excel.xlsx]DATA'!C2) + '[excel.xlsx]DATA'!D4)",
 "'[excel.xlsx]DATA'!D4": "=(1 + '[excel.xlsx]DATA'!D3)"
}
>>> xl_json = json.dumps(xl_dict, indent=True)  # To JSON.
>>> xl_model = formulas.ExcelModel().from_dict(json.loads(xl_json))  # From JSON.

Custom functions

An example how to add a custom function to the formula parser is the following:

>>> import formulas
>>> FUNCTIONS = formulas.get_functions()
>>> FUNCTIONS['MYFUNC'] = lambda x, y: 1 + y + x
>>> func = formulas.Parser().ast('=MYFUNC(1, 2)')[1].compile()
>>> func()
4

.. _end-pypi:

Next moves

Things yet to do: implement the missing Excel formulas.

.. _end-intro: .. _start-badges: .. |test_status| image:: https://github.com/vinci1it2000/formulas/actions/workflows/tests.yml/badge.svg?branch=master :alt: Build status :target: https://github.com/vinci1it2000/formulas/actions/workflows/tests.yml?query=branch%3Amaster

.. |cover_status| image:: https://coveralls.io/repos/github/vinci1it2000/formulas/badge.svg?branch=master :target: https://coveralls.io/github/vinci1it2000/formulas?branch=master :alt: Code coverage

.. |docs_status| image:: https://readthedocs.org/projects/formulas/badge/?version=stable :alt: Documentation status :target: https://formulas.readthedocs.io/en/stable/?badge=stable

.. |pypi_ver| image:: https://img.shields.io/pypi/v/formulas.svg? :target: https://pypi.python.org/pypi/formulas/ :alt: Latest Version in PyPI

.. |python_ver| image:: https://img.shields.io/pypi/pyversions/formulas.svg? :target: https://pypi.python.org/pypi/formulas/ :alt: Supported Python versions

.. |github_issues| image:: https://img.shields.io/github/issues/vinci1it2000/formulas.svg? :target: https://github.com/vinci1it2000/formulas/issues :alt: Issues count

.. |proj_license| image:: https://img.shields.io/badge/license-EUPL%201.1%2B-blue.svg? :target: https://raw.githubusercontent.com/vinci1it2000/formulas/master/LICENSE.txt :alt: Project License

.. |dependencies| image:: https://requires.io/github/vinci1it2000/formulas/requirements.svg?branch=master :target: https://requires.io/github/vinci1it2000/formulas/requirements/?branch=master :alt: Dependencies up-to-date?

.. |binder| image:: https://mybinder.org/badge_logo.svg :target: https://mybinder.org/v2/gh/vinci1it2000/formulas/master?urlpath=lab%2Ftree%2Fbinder%2Findex.ipynb :alt: Live Demo .. _end-badges:

Issues

Accumulator / Stateful / async functions

opened on 2023-03-29 20:56:00 by dberardo

I am evaluating the usage of this library to perform simple simulations based on different type of functions.

Before I start contributing to the code myself, i have a couple of questions:

  • Can asynchronous functions be seamlessly implemented using the library (e.g. i assume that the google.function is doing exactly that, right, as in calling an external API and returning a value)
  • is this library already handling exceptions generating within the workflow?
  • Would it be possible to implement rolling statistics (like rolling means, etc.) And also various Stateful operators like Integrals, etc. with this library,? Where would be the best place to state store? Should it be a "virtual cell" or is there a place in memory for that?

And also: Can input arguments be given in a more flexible way? Can we use dictionaries/kwargs, where order of arguments doesn't metter, and. Can we also pass null values to excel-like functions?

Error in ExcelModel().loads(filename) - 'DefinedNameDict' object has no attribute 'definedName' (Python 3.9)

opened on 2023-02-06 14:30:05 by horkah

Describe the bug The test_excel.py script provided here on Git fails to complete without error.

To Reproduce I downloaded (6feb2023) the files test_excel.py and test_files from here: https://github.com/vinci1it2000/formulas/tree/master/test

I use python 3.9.16 and installed the missing required packages (dill, schedula, formulas[all])

I appended the following lines at the end of test_excel.py: [...] if __name__ == '__main__': print("hello world") t = TestExcelModel() t.setUp() t.test_excel_model() exit()

and receive the following error message: ``` hello world

[info] test_excel_model: Loading excel-model. Traceback (most recent call last): File "[...]\PycharmProjects[...]\test_excel.py", line 285, in t.test_excel_model() File "[...]\PycharmProjects[...]\test_excel.py", line 112, in test_excel_model xl_mdl.loads(self.filename) File "[...]\PycharmProjects[...]\venv\lib\site-packages\formulas\excel__init__.py", line 108, in loads self.load(filename) File "[...]\PycharmProjects[...]\venv\lib\site-packages\formulas\excel__init__.py", line 112, in load book, context = self.add_book(filename) File "[...]\PycharmProjects[...]\venv\lib\site-packages\formulas\excel__init__.py", line 188, in add_book data['references'] = self.add_references(book, context=context) File "[...]\PycharmProjects[...]\venv\lib\site-packages\formulas\excel__init__.py", line 93, in add_references for n in book.defined_names.definedName: AttributeError: 'DefinedNameDict' object has no attribute 'definedName'

Process finished with exit code 1 ```

Expected behavior The test_excel.py provided here with the repository is expected to work under Python 3.9 with all required packages installed.

Desktop: - OS: Windows 11 - Python 3.9.16

Regex doesn't match when sheet has a double quotation in its name (").

opened on 2023-02-01 16:27:23 by jonathanpedersen

Describe the bug I have a sheet title with the word "Paper" in it. When I load it using the follow code...

xl_model = formulas.ExcelModel().loads(filepath)

it fails with the error...

for k, v in _re_range.match(ref).groupdict().items(): AttributeError: 'NoneType' object has no attribute 'groupdict'

Suggesting that the _re_range.match(ref) returns None, suggesting the regex doesn't match.

When I remove the double quotations from the sheet's title, it works flawlessly.

To Reproduce I would like to avoid this if possible, as the excel spreadsheet contains sensitive data that would be hard to remove.

Expected behavior Normal loading.

Desktop (please complete the following information): - OS: macOS Monterey - Version 12.2

Additional context Add any other context about the problem here.

added function NORMSDIST + passing test

opened on 2022-10-20 21:58:40 by codinguncut

Status

READY

Description

Added function NORMSDIST + passing test

Possible to implement TEXT() and VALUE() functions?

opened on 2022-09-28 06:01:18 by viguice

I have the following function '=TEXT(VALUE(A1),"00")' where A1 can be a number or a text. The result is always a text using the correct formatting: A1 -> Result 1 -> 01 10 ->10 '5 -> 05 test -> #VALUE!

The value and text functions don't work as I get #NAME? value for this formula. Can you add them?

Thanks a lot,

CΓ©dric

Update operand.py

opened on 2022-09-16 12:49:16 by tyrbonit

fix formula error, if using named range with backslash in name, for example '=A1 + _1234\5678\90'

Status

READY/IN DEVELOPMENT/HOLD

Description

A few sentences describing the overall goals of the pull request's commits.

Related Issues

List related Issues against other branches:

Issue | Description ----- | ------------ link | ... link | ...

Todos

  • [ ] Tests
  • [ ] Documentation

Steps to Test or Reproduce

Outline the steps to test or reproduce the PR here.

sh git pull --prune git checkout <feature_branch> python ...