Knitting Data Communities

The focus of this manifesto is a group of overlapping communities sharing data about the natural environment, although its aims, workflows and technologies will be held in common with other kinds of inclusive data communities. Our communities comprise amateur and citizen scientists, professional naturalists, academics, students, teachers, policymakers, Indigenous peoples and other kinds of stakeholders.

We Will Knit

  1. Communities with diverging models for ownership of knowledge – For example, some Indigenous knowledge holders will consider that certain knowledge should be restricted to certain community members, and may even have a model for reclaiming knowledge once it has been sufficiently used. Some biological data will arise from commissioned surveys and hence be commercially sensitive, whereas some aspects of the same dataset may be less sensitive, especially if aggregated.
  2. Communities with diverging ontologies and practices surrounding data – For example, curators of biological samples held in institions will have slow, long cycles for updating the knowledge relating them, and may prefer to continue referring them to the taxonomy in place when they were collected as opposed to the current professional consensus. Citizen scientists may prefer to use common names taken from their locale, whereas specialists in some taxonomic or geographical areas may have preferred taxonomies more recent than those in any global repository, and want to retain fields in the data that don't conform to any in public schemas.
  3. Communities with diverging timescales and focus of interest – For example, the interests of academics and students can be cyclic, making it hard to bank and build on contributions over long periods, and gear them to more long-lived communities of interest.
  4. Different notions of the role of data and its connections to the stories which surround it – For example, a policymaker may be interested in the story that data may tell about the results of the change of use of a habitat - a scientist may be interested in the story of how a community was shaped by its geological history - a local community may be interested in the history of species that are special to their culture.
  5. Communities at different scales with different levels of power – Whilst giving due respect to powerful, central platforms and their schemas, and integrating harmoniously with them wherever possible, giving smaller, peripheral communities tools and power to orbit at a safe distance, sharing and consolidating their common approaches to their necessary artful integrations.

Themes

Going where people are

We emphasise helping different communities working with data in formats they are comfortable with, with tools they are familiar with, even if these differ from professional norms in other communities. An example of this is the Maxwell Creek Watershed Project project undertaken with Transition Salt Spring which empowers naturalists with moderate technical skills to edit documents in simply structured R Markdown format, of which the web output emitted by the R HTMLWidgets system is then reknitted into an interactive scrollytelling interface. These documents are hosted in GitHub pages where any member may easily update and republish their documents without needing to consult a technician.

This reknitting approach is consonant with the notion of reknitting as promoted by Amy Twigger Holroyd and others, described in chapter 5 of her book Folk Fashion. In this approach, a "readymade" data pipeline, which meets many but not all of the needs of a community, is "unpicked" at the final stage, ready to be reknitted into a more interactive, domain-appropriate interface.

Other examples of working with vernacular technologies might involve fitting domain-specific interface extensions into widespread data tools such as Microsoft Excel or Google Sheets, and maintaining ecologies of communities cooperating on the basis of these. Other data science communities will centre on scripts written in Python, hold their data in Jupyter notebooks, SQL databases or in centralised communities such as iNaturalist. In each case we try to exploit existing pathways for knitting together these communities as far as they can take us – perhaps needing to "unravel" or back up some pieces of surplus work that they do, in order to go the last mile to our communities.

Working with the real, overlapping structure of communities

Colin Clark and Ethan Winn have recently written on Mesh Cooperativism - Toward Mycorrhizal (Infra)structure for the Cooperative Movement (Slides), making an analogy between the interpenetrating structure of fungal filaments (Mycorrhiza) connecting communities in overlapping groups of a range of sizes, and the goals of a Mesh Cooperativism movement, seeking to empower emerging cooperative networks through a combination of shared infrastructure and interoperable practice.

An example of this is the Data Communities for Inclusion project described below. These structures are more widespread than are typically acknowledged in technical systems, which prefer to force data into centrally managed repositories such as GBIF and ontologies such as Darwin Core in order to smooth interoperability.

Entangled Artifacts: The Meeting Between a Volunteer-run Citizen Science Project and a Biodiversity Data Platform (Tchernavskij & Bødker, 2022) takes up this theme in the context of the ecology of artifacts and working practices centring on the interaction between iNaturalist, a large centralist platform, and Biodiversity Galiano, a volunteer-run citizen science initiative, and examines the design tensions which emerge around their patterns of employing common interactive objects towards sometimes conflicting goals. The authors appeal to the notion of artful integrations, due to Suchman (2002), to describe the contrivances that a, typically more peripheral, community puts together to deal with the artefacts produced by a more central community in order to put them to purposes which are locally useful.

Empowering communities to own their infrastructure

We help communities to navigate the space of complex, proprietary solutions with high installation and running costs, by weaving through them with simply structured data and easily redeployable, often static packages. An example of this is the embedding of visualisations from the IMERSS portal in infrastructures with varying technology, for example the WordPress blogs of BioGaliano and the Weebly sites of the Valdes Island Conservancy. By making minimal reliance on server structures and making as much use as possible of public infrastructure such as GitHub and Google WorkSpace products, we hope to ease mobility to many varied hosts such as the PHP-based Mukurtu, a highly successful platform for the hosting of knowledge held by Indigenous communities. We have written about the different categories of ownership communities may wish to enjoy, and the barriers that stand in their way, in What Lies in the Path of the Revolution (Basman, Tchernavskij, 2018).

Some technical aspects of ownable infrastructure are the focus of the Malleable Systems Collective, following the core principle that it should be as easy to change software as it is to use it.

Communities we work with

IMERSS

Through collaboration with the Institute for Multidisciplinary Ecological Research in the Salish Sea, we have produced an ecocultural mapping pilot incorporating data from Indigenous and non-Indigenous community members on cultural values, written and spoken Hul'qumi'num names of regions and species, as well as ecological community boundaries exported from the QGIS geographic information system, fused together into a single interactive system. Xetthecum is a small area on the southwest coast of Galiano Island, also known as Retreat Cove. A YouTube video presented by Dana Ayotte of the Inclusive Design Research Centre has been published about the Ecocultural Mapping Tool and was presented at the April 2022 Indigeverse conference.

In this case, the reknitting workflow takes the form of hitching a ride on the qgis2web plugin of the popular open source QGIS geographical information system, which automates a great deal of the legwork of converting desktop-scale map data into formats and coordinate systems suitable to be displayed on the web. The plugin generates a basic Leaflet map application from which we discard most of the packaging, and reparse/reknit the core files into our interactive visualisation.

Data Communities for Inclusion

The Data Communities for Inclusion project, sponsored by CIFAR and focusing on the Self-Employed Women's Association of India is studying how digital technologies can be placed at the service of overlapping groups of self-organised women labourers, working in agriculture, handicrafts and services. Because of the lack of traditional hierarchical power and informational structures, and the lack of clear distinctions between the outsides and insides of groups, unique issues are raised around data ownership, responsibility and workflow.

WeCount

The WeCount project, an initiative of the IDRC, aims to create an inclusive data ecosystem by removing bias and exclusion in the data economy. Within WeCount is a demonstration of a pluralistic data infrastructure, which can be framed within the principles of data feminism (D'Ignazio, Klein, 2020) which seeks to challenge differentials of power resulting from relationships within data communities. Within this framework, data collection itself is critiqued in What we do with data: a performative critique of data 'collection' (Benjamin, 2021). Benjamin notes that "Data collection can therefore be a site of oppression and violence, a site in need of refusal and resistance, as well as a site for building communities". Benjamin instead promotes the notion of data compilation as an alternative framing, relying on notions of grouping and curation, rather than making the assumption of consent through an extractive process. Under data compilation, the relationships entailed between communities are retained within the dataset through tracking of provenance, rather than being effaced through a collection process. A demonstration of such an architecture is presented on the WeCount initiative site.

Thinking of Knitting?

Get in touch at amb26ponder or file an issue on this repository!