The Global Dataverse Community Consortium

Supporting Dataverse Repositories Around the World

pyDataverse Working Group

Motivation and goals

PyDataverse is a well-known library and repository within the Global Dataverse Community Consortium that provides a broad range of features for interacting with Dataverse. It allows users to upload and download datasets, and empowers admins to configure Dataverse, making it a one-stop-shop for Datanauts. However, due to recent halts in its development and maintenance, certain functionalities have become outdated or even broken. Moreover, other libraries such as EasyDataverse and DVCLI have been introduced, offering more options but potentially creating confusion about which tool to use.

To address these concerns, the pyDataverse Re-Vamp working group was launched in late 2023. The working group aims to update the library to a functional state, resolve any issues that hinder functionality, and merge contributions that have gone stale. In addition, the group plans to incorporate concepts from other client libraries like EasyDataverse and DVCLI into pyDataverse to enhance its usability and focus on a single Python library. This initiative also envisions including new libraries and concepts that enable large language models to interact with Dataverse, such as populating metadata from text. Lastly, to address maintenance gaps in open-source projects, the Re-Vamp initiative proposes a dynamic approach incorporating upstream changes to the native API by utilizing OpenAPI specifications and code generation. Ultimately, the Re-Vamp seeks to re-establish pyDataverse as an invaluable tool for Datanauts exploring the vast depths of the Dataverse.

overview

New features, bug fixes, and use cases 💎

  • 11/30/23: Admin interface #166 by Brian Brock
  • 12/02/23: Add CI/CD pipeline and re-establish existing tests #167 by Jan Range
  • 12/28/23: Provide local testing functionality #172 by Jan Range
  • 01/31/24: OpenAPI code generation and comparison. Repository by Jan Range
  • 02/08/24: Requests via HTTPX. by Jan Range #174
  • 03/03/24: Asynchronous requests. #175 by Jan Range
  • 04/11/24: Switch to pyproject.toml and poetry #180 by Jan Range
  • 04/11/24: Draft - Migrate documentation to mkdocs-material #181 by Jan Range
  • 04/18/24: Fix data access and redirects #182 by Jan Range

Roadmap 🗺️

We have set up a proposal for the roadmap of the pyDataverse Re-Vamp working group. The following list highlights each phase and its progress:

  • Finalize first version of Dataverse Action
  • Unit and integration tests workflow
  • Switch to Poetry
  • Publishing (PyPI) workflow
  • Publish to TestPyPI on merge to develop

Phase 2: Issues and PRs

  • POST request header fix → Breaks current version
  • Work through other issues and PRs

Phase 3: Merging/integrating other Python libraries

Phase 4: Core generation based on Swagger/OpenAPI

  • Conceptualization and planning
  • Implementation of core generation based on Swagger/OpenAPI

As we merge pull requests, we will update “new features” list above.

Milestones

We curate our work in the form of milestones for each new version to document new functionalities and fixes that have been contributed. In the following, find a list to our current milestones:

You can check already closed PRs and issues by clicking on the closed tab.

Working group meetings

We welcome anyone to join our meetings! We meet on Wednesdays at 2:00PM UTC.

The WebEx link is https://unistuttgart.webex.com/unistuttgart/j.php?MTID=m322473ae7c744792437ce854422e52a3

Get in touch

We love to hear feedback from you about our goals and outputs not just during meetings, but also using chat.

Please join us in Zulip, linked from chat.dataverse.org.

Improving this website

Please feel free open an issue or create a pull request at https://github.com/gdcc/py.gdcc.io