Python module for Dataverse Software (dataverse.org).

gdcc, updated 🕥 2022-12-10 19:43:14

PyPI Build Status Coverage Status Documentation Status PyPI - Python Version GitHub Code style: black DOI

pyDataverse

Project Status: Unsupported – The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

pyDataverse is a Python module for Dataverse. It helps to access the Dataverse API's and manipulate, validate, import and export all Dataverse data-types (Dataverse, Dataset, Datafile).

Find out more: Read the Docs

Issues

Add direct datafile upload support (directupload.py) to pyDataverse

opened on 2023-03-16 13:47:11 by cmbz

Background Support for direct upload of datafiles using Python is available via the following standalone script related to the Harvard Dataverse Repository: dataverse.harvard.edu/util/python/direct-upload/directupload.py

This script enables users to upload many datafiles and their associated metadata all at once before requesting reindexing, rather than calling the API for each file resulting in a system performance hit due to frequent reindexing.

Request & Rationale Incorporating this functionality into pyDataverse would benefit Dataverse API users and pyDataverse users at all installations who need to upload large numbers of datafiles.

edit_dataset_metadata replace bug

opened on 2023-03-03 21:57:06 by arielg96

Even though I set "replace" = True. I get:

You may not add data to a field that already has data and does not allow multiples. Use is_replace=true to replace existing data.

I even printed out the params with the result:{'replace': True}

pyDataverse changes the json of a dataset on import too much.

opened on 2022-11-07 08:34:24 by sergejzr

Thank you for your contribution!

It's great, that you want contribute to pyDataverse.

First, start by reading the Bug reports, enhancement requests and other issues section.

Before we can start

Before moving on, please check some things first:

  • [x] Your issue may already be reported! Please search on the issue tracker before creating one.
  • [ ] Use our issue templates for bug reports and feature requests, if that's what you need.
  • [x] Are you running the expected version of pyDataverse? (check via pip freeze).
  • [ ] Is this something you can debug and fix? Send a pull request! Bug fixes and documentation fixes are welcome. For more information, see the Contributor Guide.
  • [ ] We as maintainers foster an open and welcoming environment. Be respectfull, supportive and nice to each other! :)

Issue

[Explain the reason for your issue]

Dear pyDataverser developers, I am running pyDataverse to upload dataset to our Dataverse installation. Our dataset contains custom metadata ("freeKeywordValue")

As recommended in the tutorial, I create the dataset with "ds.from_json(data_json)" (the example is at the end of this message). However, the upload fails and when I check with "ds.get()", I see that the custom metadata fields fields are not there anymore. As well the server rejects the input, because pyDataverse removes the entry "metadataLanguage": "en" which is required by Dataverse as stated in the dataverse tutorial https://guides.dataverse.org/en/latest/api/native-api.html#id43.

If I use a simple CURL, my json is accepted perfectly and the dataset is inserted as expected.

The original json: ```` { "metadataLanguage": "en", "datasetVersion": { "metadataBlocks": { "citation": { "fields": [ { "value": "Youth in Austria 2005", "typeClass": "primitive", "multiple": false, "typeName": "title" }, { "value": [ { "authorName": { "value": "LastAuthor1, FirstAuthor1", "typeClass": "primitive", "multiple": false, "typeName": "authorName" }, "authorAffiliation": { "value": "AuthorAffiliation1", "typeClass": "primitive", "multiple": false, "typeName": "authorAffiliation" } } ], "typeClass": "compound", "multiple": true, "typeName": "author" }, { "value": [ { "datasetContactEmail": { "typeClass": "primitive", "multiple": false, "typeName": "datasetContactEmail", "value": "[email protected]" }, "datasetContactName": { "typeClass": "primitive", "multiple": false, "typeName": "datasetContactName", "value": "LastContact1, FirstContact1" } } ], "typeClass": "compound", "multiple": true, "typeName": "datasetContact" }, { "value": [ { "dsDescriptionValue": { "value": "DescriptionText", "multiple": false, "typeClass": "primitive", "typeName": "dsDescriptionValue" } } ], "typeClass": "compound", "multiple": true, "typeName": "dsDescription" }, { "value": [ { "freeKeywordValue": { "value": "MyKeyword1", "multiple": false, "typeClass": "primitive", "typeName": "freeKeywordValue" }

          },
          {
            "freeKeywordValue": {
              "value": "MyKeyword2",
              "multiple": false,
              "typeClass": "primitive",
              "typeName": "freeKeywordValue"
            }

          }

        ],
        "typeClass": "compound",
        "multiple": true,
        "typeName": "freeKeyword"
      },
      {
        "value": [
          "Medicine, Health and Life Sciences"
        ],
        "typeClass": "controlledVocabulary",
        "multiple": true,
        "typeName": "subject"
      }
    ],
    "displayName": "Citation Metadata"
  }
}

} }

````

The output of ds.get: {'citation_displayName': 'Citation Metadata', 'title': 'Youth in Austria 2005', 'author': [{'authorName': 'LastAuthor1, FirstAuthor1', 'authorAffiliation': 'AuthorAffiliation1'}], 'datasetContact': [{'datasetContactEmail': '[email protected]', 'datasetContactName': 'LastContact1, FirstContact1'}], 'dsDescription': [{'dsDescriptionValue': 'DescriptionText'}], 'subject': ['Medicine, Health and Life Sciences']} The incoming json in the server log: { "datasetVersion": { "metadataBlocks": { "citation": { "fields": [ { "typeName": "subject", "multiple": true, "typeClass": "controlledVocabulary", "value": [ "Medicine, Health and Life Sciences" ] }, { "typeName": "title", "multiple": false, "typeClass": "primitive", "value": "Youth in Austria 2005" }, { "typeName": "author", "multiple": true, "typeClass": "compound", "value": [ { "authorName": { "typeName": "authorName", "typeClass": "primitive", "multiple": false, "value": "LastAuthor1, FirstAuthor1" }, "authorAffiliation": { "typeName": "authorAffiliation", "typeClass": "primitive", "multiple": false, "value": "AuthorAffiliation1" } } ] }, { "typeName": "datasetContact", "multiple": true, "typeClass": "compound", "value": [ { "datasetContactEmail": { "typeName": "datasetContactEmail", "typeClass": "primitive", "multiple": false, "value": "[email protected]" }, "datasetContactName": { "typeName": "datasetContactName", "typeClass": "primitive", "multiple": false, "value": "LastContact1, FirstContact1" } } ] }, { "typeName": "dsDescription", "multiple": true, "typeClass": "compound", "value": [ { "dsDescriptionValue": { "typeName": "dsDescriptionValue", "typeClass": "primitive", "multiple": false, "value": "DescriptionText" } } ] } ], "displayName": "Citation Metadata" } } } }

error: {'status': 'ERROR', 'message': 'Error parsing Json: incorrect typeClass for field kindOfData, should be controlledVocabulary'}

opened on 2022-10-27 11:56:36 by TutasiCSUC

Hi, recently our institution updated Dataverse Version to '5.11.1'. Previously, the metadata Kind of Data wasn't mandatory for our institution, but now it is, and it appears with and dropdow with some options.

I am trying to create a dataset with Pydataverse with this code:

from pyDataverse.models import Dataset ds = Dataset() ds_filename = "dataset.json" ds.from_json(read_file(ds_filename)) ds.validate_json() resp = api.create_dataset("pyDataverse_user-guide", ds.json()) resp.json()

But i got this error: {'status': 'ERROR', 'message': 'Error parsing Json: incorrect typeClass for field kindOfData, should be controlledVocabulary'}

I attached the json I am using

How can I delete specific datafile

opened on 2022-10-27 05:27:52 by wybert

How can I delete specific datafile? How can I replace a data file?

Problems with edit.metadata with Pydataverse

opened on 2022-10-26 08:14:01 by TutasiCSUC

Hi, I'im trying to edit metadata of an existing dataset, for example its title. My code is:

from pyDataverse.api import NativeApi, DataAccessApi from pyDataverse.models import Dataverse

base_url = '' token= '' api = NativeApi(base_url,token) data_api = DataAccessApi(base_url,token)

DOI= " " dataset = api.get_dataset(DOI) dictmetadata=dataset.json() dictmetadata['data']['latestVersion']['metadataBlocks']['citation']['fields'][0]['value']='new title'

import json jsonStr = json.dumps(dictmetadata)

api.edit_dataset_metadata(DOI, jsonStr,is_pid=True, replace=True, auth=True)

I get as response [500] and the title isn't changed. How could i fix it? And how would it be with a Json file. Thanks

Releases

0.3.1 2021-04-06 11:43:36

Small bugfix of #126.

For help or general questions please have a look in our Docs or email [email protected]

Bugs

  • Fix: missing topicClassVocabURI value in Dataset model (#126)

Thanks

Thanks to Karin Faktor for finding the bug.

PyDataverse is supported by AUSSDA and by funding as part of the Horizon2020 project SSHOC.

v0.3.0 - Ruth Wodak 2021-01-27 01:45:56

This release is a big change in many parts of the package. It adds new API's, re-factored models and lots of new documentation.

Overview of the most important changes:

  • Re-factored data models: setters, getters, data validation and JSON export and import
  • Export and import of metadata to/from pre-formatted CSV templates
  • Add User Guides, Use-Cases, Contributor Guide and much more to the documentation
  • Add SWORD, Search, Metrics and Data Access API
  • Collect the complete data tree of a Dataverse with get_children()
  • Use JSON schemas for metadata validation (jsonschemas required)
  • Updated Python requirements: Python>=3.6 (no Python 2 support anymore)
  • Curl required, only for update_datafile()
  • Transfer pyDataverse to GDCC - the Global Dataverse Community Consortium (#52)

Version 0.3.0 is named in honor of Ruth Wodak (Wikipedia), an Austrian linguist. Her work is mainly located in discourse studies, more specific in critical discourse analysis, which looks at discourse as a form of social practice. She was awarded with the Wittgenstein-Preis, the highest Austrian science award.

For help or general questions please have a look in our Docs or email [email protected]

Use-Cases

The new functionalities were developed with some specific use-cases in mind:

See more detailed in our Documentation.

Retrieve data structure and metadata from Dataverse instance (DevOps)

Collect all Dataverses, Datasets and Datafiles of a Dataverse instance, or just a part of it. The results then can be stored in JSON files, which can be used for testing purposes, like checking the completeness of data after a Dataverse upgrade or migration.

Upload and removal of test data (DevOps)

For testing, you often have to upload a collection of data and metadata, which should be removed after the test is finished. For this, we offer easy to use functionalities.

Import data from CSV templates (Data Scientist)

Importing lots of data from data sources outside dataverse can be done with the CSV templates as a bridge. Fill the CSV templates with your data, by machine or by human, and import them into pyDataverse for an easy mass upload via the Dataverse API.

Bugs

  • Missing JSON schemas (#56)
  • Datafile metadata title (#50)
  • Error long_description_content_type (#4)

Features & Enhancements

API

Summary: Add other API's next to Native API and update Native API.

  • add Data Access API:
  • get datafile(s) (get_datafile(), get_datafiles(), get_datafile_bundle())
  • request datafile access (request_access(), allow_access_request(), grant_file_access(), list_file_access_requests())
  • add Metrics API:
  • total(), past_days(), get_dataverses_by_subject(), get_dataverses_by_category(), get_datasets_by_subject(), get_datasets_by_data_location()
  • add SWORD API:
  • get_service_document()
  • add Search API:
  • search()
  • Native API:
  • Get all children data-types of a Dataverse or a Dataset in a tree structure (get_children())
  • Convert Dataverse ID's to its alias (dataverse_id2alias())
  • Get contents of a Dataverse (Datasets, Dataverses) (get_dataverse_contents())
  • Get Dataverse assignements (get_dataverse_assignments())
  • Get Dataverse facets (get_dataverse_facets())
  • Edit Dataset metadata (edit_dataset_metadata()) (#19)
  • Destroy Dataset (destroy_dataset())
  • Dataset private URL functionalities (create_dataset_private_url(), get_dataset_private_url(), delete_dataset_private_url())
  • Get Dataset version(s) (get_dataset_versions(), get_dataset_version())
  • Get Dataset assignments (get_dataset_assignments())
  • Check if Dataset is locked (get_dataset_lock())
  • Get Datafiles metadata get_datafiles_metadata()
  • Update datafile metadata (update_datafile_metadata())
  • Redetect Datafile file type (redetect_file_type())
  • Restrict Datafile (restrict_datafile())
  • ingest Datafiles (reingest_datafile(), uningest_datafile())
  • Datafile upload in native Python (no CURL dependency anymore) (upload_datafile())
  • Replace existing Datafile replace_datafile()
  • Roles functionalities (get_dataverse_roles(), create_role(), show_role(), delete_role())
  • Add API token functionalities (get_user_api_token_expiration_date(), recreate_user_api_token(), delete_user_api_token())
  • Get current user data (get_user()) (#59)
  • Get API ToU (get_info_api_terms_of_use())
  • Add import of existing Dataset in create_dataset() (#3)
  • Datafile upload natively in Python (no curl anymore) (upload_datafile())
  • Api
  • Set User-Agent for requests to pydataverse
  • Change authentication during request functions (get, post, delete, put): If API token is passed, use it. If not, don't set it. No auth parameter used anymore.

Models

Summary: Re-factoring of all models (Dataverse, Dataset, Datafile).

New methods:

  • from_json() imports JSON (like Dataverse's own JSON format) to pyDataverse models object
  • get() returns a dict of the pyDataverse models object
  • json() returns a JSON string (like Dataverse's own JSON format) of the pyDataverse models object. Mostly used for API uploads.
  • validate_data() validates a pyDataverse object with a JSON schema

Utils

  • Save list of metadata (Dataverses, Datasets or Datafiles) to a CSV file (write_dicts_as_csv()) (#11)
  • Walk through the data tree from get_children() and extract Dataverses, Datasets and Datafiles (dataverse_tree_walker())
  • Store the results from dataverse_tree_walker() in seperate JSON files (save_tree_data())
  • Validate any data model dictionary (Dataverse, Dataset, Datafile) against a JSON schema (validate_data())
  • Clean strings (trim whitespace) (clean_string())
  • Create URL's from identifier (create_dataverse_url(), create_dataset_url(), create_datafile_url())
  • Update read_csv_to_dict(): replace dv. prefix, load JSON cells and convert boolean cell strings

Docs

Many new pages and tutorials:

Tests

  • Add tests for new functions
  • Re-factor existing tests
  • Create fixtures
  • Create test data

Miscellaneous

  • Add Python 3.8 and Python 2.7, 3.4 and 3.5 removed (Python>=3.6 required now)
  • Add jsonschema as requirement
  • Add JSON schemas for Dataverse upload, Dataset upload, Datafile upload and DSpace to package
  • Add CSV templates for Dataverses, Datasets and Datafiles from pyDataverse_templates
  • Transfer pyDataverse to GDCC - the Global Dataverse Community Consortium (https://github.com/gdcc/pyDataverse/issues/52)
  • Improve code formatting: black, isort, pylint, mypy, pre-commit
  • Add pylint linter
  • Add mypy type checker
  • Add pre-commit for managing pre-commit hooks.
  • Add radon code metrics
  • Add GitHub templates (PR, issues, commit) (#57)
  • Re-structure requirements
  • Get DOI:10.5281/zenodo.4470151 for GitHub repository

Other

Thanks to Daniel Melichar (@dmelichar), Vyacheslav Tykhonov (Slava), GDCC, @ecowan, @BPeuch, @j-n-c and @ambhudia for their support for this release. Special thanks to the Pandas project for their great blueprint for the Contributor Guide.

PyDataverse is supported by funding as part of the Horizon2020 project SSHOC.

v0.2.1 2019-06-19 15:49:52

This release fixes a bug in the Dataset.dict() generation.

For help or general questions please have a look in our Docs or email [email protected]

Bug Fixes

  • FIXED: calling of the attributes series, socialScienceNotes and targetSampleSize caused error in Dataset.dict(), cause the contained sub-values were stored directly in own class-attributes.

Contribute

To find out how you can contribute, please have a look at the Contributor Guide. No contribution is too small!

The most important contribution you can make right now is to use the module. It would be great, if you install it, run some code on your PC and access your own Dataverse instance if possible - and give feedback after it (contact).

About pyDataverse

pyDataverse includes a collection of functionalities to import, export and manipulate data and it's metadata via the Dataverse API.

-- Greetz, Stefan Kasberger

v0.2.0 - Ida Pfeiffer 2019-06-18 03:34:03

This release adds functionalities to import, manipulate and export the metadata of Dataverses, Datasets and Datafiles.

Version 0.2.0 is named in honor of Ida Pfeiffer (Wikipedia), an Austrian traveler and travel book author. She went on for several travels around the world, where she collected plants, insects, mollusks, marine life and mineral specimens and brought most of them back home to the Natural History Museum of Vienna.

For help or general questions please have a look in our Docs or email [email protected]

Features

  • add Datavers Api metadata functionalities:
  • set allowed attributes via a list of dict()
  • import of Dataverse and Dataset metadata from Dataverse Api JSON
  • validity check of Dataverse, Dataset and Datafile attributes necessary for Dataverse Api upload
  • export Dataverse, Dataset and Datafile attributes as dict() and JSON
  • export Dataverse and Dataset metadata JSON necessary for Dataverse Api upload
  • tests for Dataverse, Dataset and Datafile
  • add PUT request and edit metadata request to Api() (PR #8)
  • read in csv files and convert to Dataverse compatible dict() for automatic import of datasets into a Dataset() object

Improvements

  • improved documentation: added docstrings where missing, cleaned them up and added examples
  • added PyPI test to tox.ini
  • added test fixtures for frequently used functions inside tests

Dependencies

  • fixed requests version: requests>=2.12.0 or newer needed

Contribute

From 18th to 22nd of June 2019, pyDataverse's main developer Stefan Kasberger will be at the Dataverse Community Conference in Cambridge, MA to exchange with others about pyDataverse end develop it further. If you are interested and around, drop by and join us. If you can not attend, you can connect with us via Dataverse Chat.

To find out how you can contribute, please have a look at the Contributor Guide. No contribution is too small!

The most important contribution you can make right now is to use the module. It would be great, if you install it, run some code on your PC and access your own Dataverse instance if possible - and give feedback after it (contact).

Another way is, to share this release with others, who could be interested (e. g. retweet my Tweet, or send an Email).

About pyDataverse

pyDataverse includes a collection of functionalities to import, export and manipulate data and it's metadata via the Dataverse API.

https://twitter.com/stefankasberger/status/1140832352517668864

Thanks to Ajax23 for the PR #8. Great contribution, and it's always amazing to see the idea of Open Source in action. :)

-- Greetz, Stefan Kasberger

v0.1.1 2019-05-28 15:49:37

This release is a quick bugfix. It adds requests to the install_requirements and updates the packaging and testing configuration.

For help or general questions please have a look in our Docs or email [email protected]

Bugfixes

  • https://github.com/AUSSDA/pyDataverse/issues/7: fix pip install error: add requests to the install_requires in setup.py

Improvements

  • cleaned setup.py
  • add badges to index.rst
  • cleaned tools/tests-requirements.txt
  • tox.ini: add python versions, add dist test, add pypitest test, clean up and re-structure configuration
  • update docs

Contribute

To find out how you can contribute, please have a look at the Contributor Guide. No contribution is too small!

The most important contribution right now is simply to use the module. It would be great, if you install it, run some code on your PC and access your own Dataverse instance if possible - and give feedback after it (contact).

About pyDataverse

pyDataverse includes the most basic data operations to import and export data via the Dataverse API. The functionality will be expanded in the next weeks with more requests and a class-based data model for the metadata. This will allow to easily import and export metadata, and upload it directly to the API.

Thanks to @moumenuisawe for mentioning this bug.

-- Greetz, Stefan Kasberger

v0.1.0 - Marietta Blau 2019-05-22 10:35:59

This release is the initial, first one of pyDataverse. It offers basic features to access the Dataverse API via Python, to create, retrieve, publish and delete Dataverses, Datasets and Datafiles.

Version 0.1.0 is named in honor of Marietta Blau (Wikipedia), an Austrian researcher in the field of particle physics. In 1950, she was nominated for the Nobel prize for her contributions.

For help or general questions please have a look in our Docs or email [email protected]

Features

  • api.py:
  • Make GET, POST and DELETE requests.
  • Create, retrieve, publish and delete Dataverses via the API
  • Create, retrieve, publish and delete Datasets via the API
  • Upload and retrieve Datafiles via the API
  • Retrieve server informations and metadata via the API
  • utils.py: File IO and data conversion functionalities to support API operations
  • exceptions.py: Custom exceptions
  • tests/*.py: Tests with test data in pytest, tested with tox on travis ci.
  • Documentation with Sphinx, published on Read the Docs
  • Package on PyPI
  • Open Source (MIT)

Contribute

To find out how you can contribute, please have a look at the Contributor Guide. No contribution is too small!

The most important contribution right now is simply to use the module. It would be great, if you install it, run some code on your PC and access your own Dataverse instance if possible - and give feedback after it (contact).

Another way is, to share this release with others, who could be interested (e. g. retweet my Tweet, or send an Email).

About pyDataverse

pyDataverse includes the most basic data operations to import and export data via the Dataverse API. The functionality will be expanded in the next weeks with more requests and a class-based data model for the metadata. This will allow to easily import and export metadata, and upload it directly to the API.

Thanks to dataverse-client-python, for being the main orientation and input for the start of pyDataverse. Also thanks to @kaczmirek, @pdurbin, @djbrooke and @4tikhonov for their support on this.

-- Greetz, Stefan Kasberger

Global Dataverse Community Consortium

GDCC uses Github to coordinate community contributions to Dataverse and to manage develop of software and documentation that extend or interact with Dataverse.

GitHub Repository Homepage

dataverse api python