Set of tools to handle raster and vector data sets in Python.
This package offers Python classes and functions as well as command line tools to work with both geospatial raster and vector datasets. It is built upon rasterio and GeoPandas. In a single command it can import any geo-referenced dataset that is understood by these libraries, complete with all geo-referencing information, various helper functions and interface between vector/raster data.
bash
conda install --channel conda-forge --strict-channel-priority geoutils
The --strict-channel-priority
flag seems essential for Windows installs to function correctly, and is recommended for UNIX-based systems as well.
From PyPI:
bash
pip install geoutils
Or from the repository tarball: make sure GDAL and PROJ are properly installed, then:
bash
pip install https://github.com/GlacioHack/geoutils/tarball/main
See the full documentation at https://geoutils.readthedocs.io.
GeoUtils is composed of three libraries:
- georaster.py
to handle raster data set. In particular, a Raster class to load a raster file along with metadata.
- geovector.py
to handle vector data set. In particular, a Vector class to load a raster file along with metadata.
- projtools.py
with various tools around projections.
You can find ways to improve the libraries in the issues section. All contributions are welcome.
GlacioHack/geoutils:master
.Direct pushing to the GlacioHack repository is not permitted.
A more detailed contribution instruction can be found here.
Ideas based on Astropy and their amazing NDData class to work in a World CRS with unit/uncertainty (we've been working quite a lot on the error subject also! :stuck_out_tongue:): https://github.com/astropy/astropy/blob/main/astropy/nddata/nddata.py https://github.com/astropy/astropy/blob/main/astropy/nddata/nduncertainty.py
We could add something similar to GeoUtils: the unit
could be derived from the CRS, the uncertainty
would be optional, but propagate naturally if present!
We could have the same structure of uncertainty and same propagation algorithms (simplified by being in 2D)! :smile:
Work in progress in NumPy to add __array_ufunc__
and __array_function__
to np.ma
, after which we'll be able to integrate that much more easily in GeoUtils:
https://github.com/numpy/numpy/pull/22914
https://github.com/numpy/numpy/pull/22913
See #356 This specific case is not fixed after #354.
Describe the bug There seems to be a diagonal ~ half-pixel shift between the DEM.show() displays and the DEM.value_at_coord() output.
To Reproduce
import xdem
img = xdem.DEM(gu.examples.get_path('everest_landsat_b4'))
target_x, target_y = 486811, 3097050 # example of coordinate
fig,ax=plt.subplots(1,1)
img.show(vmin=50, vmax=200, ax=ax)
ax.set_xlim(target_x-60,target_x+60); ax.set_ylim(target_y-60,target_y+60) # zoom on the area
ax.scatter(target_x, target_y,c='r') # plot the desired coordinate on the image plot
ax.set_title('at ({}, {}), dem.value_at_coords={}'.format(target_x, target_y, np.round(img.value_at_coords(target_x, target_y), 2))) # prints as figure title the value returned by value_at_coords()
ax.plot([486805, 486805, 486805+30, 486805+30, 486805],[3097056,3097056-30, 3097056-30, 3097056, 3097056]) # if coord falls in this blue polygon
ax.plot([486820, 486820, 486820+30, 486820+30, 486820],[3097040,3097040-30, 3097040-30, 3097040, 3097040]) # it returns the value of this orange one
Expected behavior The value_at_coord() value does not return same value as the plot most of the time, I expect that the value_at_coords() (169 in the example, value of the orange box pixel) matches the show() pixels (131 in the example, under the red point).
System (please complete the following information):
- OS: Ubuntu 22
- Environment : xdem (0.0.7, 823fff239daa864748677e4d6bcb6bd153c15969 commit), geoutils (0.0.10)
It looks like we are missing these functions:
- [x] Vector.save()
- [x] Vector.show()
- [x] Vector.__str__()
- [x] Vector.__repr__()
And these properties are not defined as such for the API:
- [x] Vector.ds
,
- [x] All of additional SatelliteImage
attributes,
- [x] Raster.crs
and Raster.transform
.
Also, looking at our geospatial tools, looks like we could add:
- [x] Vector.simplify()
to simplify/degrade resolution of a Vector
.
- [x] Make indexing []
possible with boolean ndarrays
.
- [x] Allow to pass a Mask
in Raster.set_mask()
,
- [x] Make "in_value" and "target_value" attribute name consistent between polygonize
and proximity
.
- [x] Make equal_raster
and georeferenced_grid_equal
naming more consistent (start with equal for both?)
- [x] Automatically load data when calling arithmetic, logical or NumPy operation? (remove some load_data=True
in the docs due to this)
- [x] Make show() plot on top of same ax by default.
- [x] Add clip
option to Vector.crop()
.
- [x] A subsample
wrapper function directly in the Raster
class (allow to return coordinates).
- [x] Ensure support of Ufuncs through np.ufunc.reduce
and other similar operations,
- [x] Rename cropGeom
into crop_geom
for consistency with other argument syntax,
- [x] Streamline main description of Raster
and Vector
methods and attributes,
- [x] Add tests for Mask
logical operations and overloaded methods,
- [x] Change nodata
from list
to tuple
(needs to be immutable as a property),
- [x] Add tests for untested functions (Raster.shift()
, Raster.to_xarray()
, etc),
- [x] Improve general test coverage, in particular checking warnings raised directly by GeoUtils.
- [x] Added as_array
option to create_mask
, as it nows returns a Mask
by default. The shape is also always 2D to avoid the squeeze
previously used everywhere.
- [x] Fixed loading multi-band raster explicitly after instantiation with load()
, failed because of how out_shape
was defined.
Bug found:
- [x] Maximum recursion depth calling np.ma.mean
on a Raster
.
- [x] In doc, array interface: isclose
is returning a masked_array
instead of Mask
?
- [x] Cast from Mask
to Raster
does not work in Mask.proximity
, need to modify Raster.from_array
?
- [x] Fixed coords(grid=True)
: add more tests
Changed (need to add tests in the test suite):
- [x] Defined Raster.__repr__
, modified __str__
to be consistent with NumPy,
- [x] Set load_data=False
as default,
- [x] Loading data by default if stats=True
is passed to Raster.info()
,
- [x] Cast array to float automatically in get_nanarray()
, otherwise was raising error,
- [x] Added overload for proximity
and polygonize
for Mask
,
- [x] Changed default parameter of polygonize
to "all"
.
To finish or move to another issue:
- [ ] Polygonizing all values on the Landsat_B4 example creates some LineString inside polygon?
- [ ] Raster.resample
to change only .res
,
- [ ] Homogenize array shape input of interp_points
and value_at_coords
, and variable names.
- [ ] Make output shape consistent for interp_points
and others: if singe-coordinate float
is passed, return the same. Otherwise, if ArrayLike
is passed, return ArrayLike
.
- [ ] A Vector
random point generation in an outline, can pass a Raster
object as match-reference to get a random point in its extent.
- [ ] Simply input of function dealing with points: accept a Vector
of type point, accept any shape of input list. Write a reshaping function that does checks and reshaping smoothly somewhere, and use it everywhere?
- [ ] Nodata plotting for GeoDataFrame (see regular grid interpolation example).
- [ ] Change name of value_at_coords
to value_at_window
? Add mask options for window (disk?)
- [ ] Overload Mask.show
to only a discontinuous, 2-color colorbar,
- [ ] Looks like Raster.to_points
is erroneous, fix for Warp gallery example (to show the grid on first plot)
- [ ] Allow consistently a single shape
or res
argument for Vector rasterizing methods (rasterize
, create_mask
, proximity
).
This PR writes the documentation for GeoUtils! :partying_face:
It also:
- Streamlines several features and behaviours to better fit the objective of the package,
- Re-organizes the structure of several modules,
- Homogenizes the syntax of the docstrings,
- Interfaces GeoPandas/Shapely functionalities in Vector
,
- Adds many tests, solving several bugs and bringing test coverage to more than 90% (was 75% previously).
This PR is branched from #350 on the Mask
addition (now closed and everything is in this PR, as some Mask
fixes started to depend on new behaviours only available here).
The hidden branch with the rendered documentation is here: https://geoutils-rhugonnet.readthedocs.io/en/docs/index.html
Regarding theme and rendering, this PR:
- Implements the sphinx-book-theme (same as OGGM and many other packages, it's a sub-theme of the PyData theme, used by NumPy, Pandas, SciPy, etc). Good news also: the dark theme PR has been merged in their repo, we'll have it available at their next version release (EDIT: It's released! I just need to update the logo files to support the Dark theme :D; SECOND EDIT: Done!),
- Switches to MyST-parser which is an extension of Markdown: partially to avoid re-learning another language with rST, but mostly because it is getting popular thanks to its many functionalities. See next point.
- Switches to MyST-NB that allows to add Jupyter code-cells to Markdown to render code in the documentation, and easily switch to interactive mode with a Binder. It allows to write Python code in Markdown, or to write Markdown in Jupyter notebook. Both will be rendered the same! Older packages (and even more recent ones like Xarray) had to rely on the IPython sphinx extension, which was really a hassle (multi-line is hard, few rendering options, etc, for example here: https://docs.xarray.dev/en/stable/user-guide/indexing.html#positional-indexing).
- Removes the /doc/code
folder and the literalinclude:: code/coregistration.py :lines: 5-21
we were using in xDEM, which was a hassle to maintain, thanks to the previous point. Proper errors should be raised by Sphinx or can be tested in the CI directly (we can run the .md
files as a Jupyter notebook).
- Switches from pip
to mambaforge
for the Readthedocs in .readthedocs.yaml
, which was failing and slow. Also adds pip -e .
to dev-environment.yml
to simplify dev installation of the package.
- Adds sphinx-design
for rendering nice tables in the documentation.
- Soon, the sphinx inheritance diagrams will support intersphinx, so the diagram I created that shows inheritance from GeoUtils to xDEM will finally work (right now it only renders locally). See https://github.com/sphinx-doc/sphinx/pull/10614.
- Adds the sphinx_gallery_conf
option remove_config_comments = True
to remove config comments in gallery examples, such as for choosing thumbnails: # sphinx_gallery_thumbnail_number = 2
,
- Adds a post-build sphinx routine to remove myraster.tif
and myvector.gpkg
created during the "Open/Save" examples by Sphinx-Gallery, as advised here: https://github.com/sphinx-gallery/sphinx-gallery/issues/361.
Regarding the contents of the docs, I let you have a look there directly! In short: - Structure recommended for writing docs (Getting started/Features/Examples/References), - An inviting landing page, with a logo :stuck_out_tongue: and grid-cards shortcuts, - The classic "About/Install/Quick start" section, the first one important to make clear our place in the landscape of geospatial packages, - An explanation of the core concepts in "Fundamentals", key to understand the behaviour of the package (and its added value), with small embedded examples, - A descriptive documentation in "Rasters", "Vectors", with small embedded examples as well, - A full gallery of examples separated in "I/O", "Handling" and "Analysis", with 20 short gallery examples. - An API declined by section to make it more user-friendly, so semi-automatic (how it's done in NumPy, SciPy, Xarray, etc) where we need to specify any new function we add.
This PR:
- Moves satimg
to georaster/
,
- Renames georaster.py
and geovector.py
into raster.py
and vector.py
(the "geo" already being in geoutils.raster
, as previously discussed),
- Re-organizes spatial_tools
into raster/array.py
, raster/multiraster.py
and raster/sampling.py
,
- Moves geoviewer.py
to a new bin/
folder.
This PR:
- Homogenize argument names across functions that use similar arguments,
- Removes camel-case (e.g., cropGeom
renamed into crop_geom
),
- Streamlines docstring names: a "raster" or "vector" is used to designate a Raster
or Vector
object (like an "array" is used for a np.array
), all docstrings start without article (Raster to do that),
- Adds many missing docstrings for class properties.
This PR:
- Adds crs
and transform
as properties to Raster
, with a data.setter
, to ensure those are consistent and listed as properties in the API.
- Adds ds
, crs
and bounds
as properties to Vector
, also for consistency and API,
- Adds datetime
, etc... as properties to SatelliteImage
, also for consistency and API,
- Adds "Is loaded?" into Raster.info()
,
- Adds __repr__
and modifies __str__
for Raster
and Vector
to be consistent with what is done in NumPy and GeoPandas, those now only prints "not_loaded" when the data is unloaded,
- Adds _repr_html_
in Raster
and Vector
for representing rasters and vectors more stylishly in HTML,
- Adds a copy_doc
decorator in misc.py
to automaticallt generated the docstring of wrapped GeoPandas function (their __doc__
) by replacing geopandas.GeoSeries
or geopandas.GeoDataFrame
by Vector
, and adding an intersphinx link to the GeoPandas API in our doc,
- Adds Vector.save
as a wrapper around GeoDataFrame.to_file
for consistency with Raster
,
- Adds Vector.show
as a wrapper around GeoDataFrame.plot
for consistency with Raster
.
- Makes Raster.show
and Vector.show
recognize and respect the first axis, to be able to plot everything on top by simply doing raster.show(); vector.show()
. Axes can still be passed explicitly through ax=
. To plot on a new axis, one can add a plt.figure()
or simply do rst.show(ax="new")
.
Mask
classSee #350, and the documentation. This PR adds tests, fixes behaviour of logical functions and adds overridden reproject
, crop
, polygonize
and proximity
to Mask
.
Notable changes for Raster
include:
- Renames __eq__
to raster_equal
for consistency with NumPy, __eq__
is now used by logical bitwise operations (see documentation and #350) and casts a Mask
,
- Renames equal_georeferenced_grid
to georeferenced_grid_equal
for naming consistency.
Vector
This PR:
- Wraps all geometric functionalities of the geopandas.base.GeoPandasBase
non-public class (inherited by both GeoSeries
and GeoDataFrame
for geometric functionalities) and the geopandas.GeoDataFrame
class to Vector
. Everything can be called using Vector
, for example Vector.unary_union
. For the output, three cases: 1/ returns a Vector
when it is a geometric output, 2/ proposes to append a non-geometric per-feature output (pandas.Series
) to Vector.ds
, and 3/ when it's another type of output, it returns that output directly. Why is this useful? Practicality and consistency: it can be a hassle to work with GeoPandas sometimes as a "geometric" operation on a GeoDataFrame
can return a GeoDataFrame
, a GeoSeries
or a shapely.Geometry
. And "non-geometric per-feature" functions like .area
or .length
return a pandas.Series
of the same length. While GeoPandas' team is still improving a couple things upstream, many behaviours are actually intentional and fixed to maintain a lower-level API for all types of user. End-users focusing on analysis like us don't need this, and it slows us down, so it's quite practical to have everything logically and automatically re-cast to a GeoDataFrame
in the Vector
, or appended to it! :smile:
- Adds many tests to check all functionalities behave as intended, that all GeoPandas functions are covered (with a couple exceptions), that all arguments are still up-to-date, as well as their default values. Inheritance is out of the question as it would prevent us from re-encapsulating the output (so we'd have to override everything the same, anyway). This is how GeoPandas wrap Shapely on their side, also. Right now, we don't wrap the Pandas functionalities, those have to be called through Vector.ds
directly.
- Automatically fetches the documentation from GeoPandas with a renaming for Vector
for the basic description, and points to GeoPandas's API for details, to ensure a minimal maintenance effort.
This PR:
- Changes the default load_data
of Raster.__init__
to False
, and adds a condition to run .load()
when .data
is called (if data is not already loaded). This is, in short, implicit loading! :smile: Adds tests to check this functionality, and modifies a couple things that were breaking this behaviour in .load()
and __init__
. Now implicit loading also works properly with the downsample
and indexes
arguments! Also fixes the behaviour for multi-band rasters, and adds tests,
- Changes default in_valiue
of polygonize
to "all"
, as Mask.polygonize()
now automatically overrides Raster.polygonize
for a boolean input,
- Makes indexing __getitem__
(i.e., []) possible with boolean np.ndarray
and not only Mask
,
- Adds index assignment __setitem
(i.e., [] = x) to Raster
,
- Adds a clip
argument to Vector.crop
, to optionally clip the geometries to the extent,
- Adds a Raster.subsample
function to perform subsampling directly from the raster,
- Adds support for universal function methods through the array interface of Raster
(e.g., reduce
as in np.logical_or.reduce()
),
- Adds Vector.from_bounds_projected
to automatically create a polygon vector in any CRS from the bounds of Raster
or Vector
,
- Adds as_array
argument to Vector.create_mask
to return a boolean np.ndarray (returns a Mask
by default).
This PR:
- Fixes value_at_coords
on several bands and adds tests,
- Fixes coords
that returned a wrong output with its default grid=True
argument and adds tests,
- Fixes a bug in get_nanarray
to run filled(np.nan)
when original data was not of float type and adds tests,
- Fixes a maximum recursion depth error when trying to run np.ma
functions on Raster
due to __array__
(not supported by NumPy, they don't have an array interface for masked-array functions),
- Fixes bugs to geoviewer.py
and adds tests,
- Fixes mutability of Raster.nodata
by defining it as tuple
instead of list
,
- Fixes many untested functions and adds tests: Raster.to_xarray
, Raster.shift
, misc.diff_yml
, etc...
This PR: - Resolves #247, - Resolves #225, - Resolves #143, - Resolves #311, - Resolves #260, - Resolves #122, - Resolves #81, - Resolves #241, - Resolves #355, - Resolves #352, - Resolves #101, - Resolves #338, - Resolves #337.
Added proximity
class method for Raster
and Vector
matching GDAL's.
Added subsetting (crop
) of Raster
and Vector
through []
(__getitem__
) and homogenized crop
functionalities.
Added buffer_metric
method in Vector
class.
Optimized CI runs with caching.
Several bug fixes.
deprecate
version check by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/320pre-commit
by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/325pip
dependency listing in CI by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/328SatelliteImage
class by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/334Raster
or Vector
by bracket call []
, consistency and bug fixes by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/335proximity
and buffer_metric
functionalities with metric
option based on finding local UTM by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/336Full Changelog: https://github.com/GlacioHack/geoutils/compare/v0.0.9...v0.0.10
In summary, the major changes in this release are:
Raster
MemoryFile and fixes on all resulting issues,Raster
, including arithmetic operations,data.setter
and nodata.setter
of Raster
with more intuitive behaviour,Full list:
geoutils
by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/283geoutils-data
including an additional float32
raster with NaNs by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/284HOW_TO_RELEASE
by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/288environment.yml
can import geoutils, and use diff with dev-environment.yml
during CI by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/297yaml
method for diff calculation of environment.yml
and dev-env.yml
by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/301data.setter
of Raster
for consistent array output by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/300Raster
for consistency with np.ma.masked_array
operations by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/302get_nanarray
method by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/308set_nodata
behaviour, improve related warnings and tests by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/309__eq__
, make arithmetic operations fully consistent with masked arrays, deprecate gu.misc.array_equal
and bug fixes by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/313misc.test_deprecate
by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/315__eq__
docstring by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/317Full Changelog: https://github.com/GlacioHack/geoutils/compare/v0.0.8...v0.0.9
Exactly the same as v0.0.7. Only created a new release so that it can be published on PyPI.
For Raster class:
* Add a masking functionality set_mask
by @adehecq in https://github.com/GlacioHack/GeoUtils/pull/271
* Make reproject work even with data not loaded by @adehecq in https://github.com/GlacioHack/GeoUtils/pull/274
* Set default resampling to bilinear instead of nearest in reproject by @adehecq in https://github.com/GlacioHack/GeoUtils/pull/275
Full Changelog: https://github.com/GlacioHack/GeoUtils/compare/v0.0.6...v0.0.7
Minor bugfixes
Full Changelog: https://github.com/GlacioHack/GeoUtils/compare/v0.0.4...v0.0.5
geopandas geospatial-processing gis python raster-data rasterio vector-data masking polygonization rasterization reprojection