Background & Summary

Almost half of worldwide CH4 emissions originate from natural ecosystems, among which aquatic ecosystems, i.e. wetlands and other inland waters, are major contributors1. At high latitudes, where the highest density of freshwater ecosystems is found2, extreme variability is occurring as a result of climate change3. In the Arctic, air temperatures are expected to rise twice as fast as the global average4. Higher air temperature speeds up permafrost thawing, making available a larger amount of sequestered organic matter for microbial degradation and mineralisation. Permafrost thawing can also lead to anoxic conditions in the resulting soil, peat and/or water bodies, potentially enhancing CH4 emissions. Any disturbance of natural CH4 cycle can constitute a strong positive feedback on global climate, considering the strong radiative effect of this greenhouse gas5,6. For these reasons, documenting microbially-mediated CH4 emissions in high-latitude ecosystems is essential to determine tipping points on the positive feedback to climate warming. Climatic projections may also impact significantly the CH4 emissions from Sub-Antarctic environments. Ecosystems in the Magellanic ecoregion7 have been understudied compared to their northern counterparts8 and CH4 cycling has been rarely investigated9,10. However, this region is of major importance since it is an expansive, unique continental area between 45 and 55 °S, it is very sparsely populated and the ecology of the region has been highly conserved (Cape Horn Biosphere Reserve, UNESCO).

Net atmospheric CH4 emissions from terrestrial and aquatic ecosystems reflect the balance between CH4 production, transport, and oxidation in these ecosystems. Methane is primarily produced by methanogenic Archaea through anaerobic decomposition of organic matter. Methanogenesis can occur via the acetoclastic, hydrogenotrophic or methylotrophic pathways, the contribution of each depending on temperature, substrate availability, and microbial interactions11. This being the only biological CH4 source has been challenged by recent evidences for CH4 production in oxic conditions12,13, contributing for about 20% of CH4 emissions from lakes14,15. However, the global significance of oxic CH4 production, its underlying metabolic pathways, and the identity of microorganisms actively involved in this process are not fully constrained yet16. Methane emissions are strongly mitigated by microbial oxidation. For example, in aquatic ecosystems, 51–100% of the CH4 produced in deep sediments can be oxidized in the water column, before reaching the atmosphere17,18. Methane oxidation is carried out in oxic conditions by aerobic methane-oxidizing bacteria (MOB) belonging to Alphaproteobacteria, Gammaproteobacteria, and Verrucomicrobia19. On the other hand, anaerobic oxidation of methane (AOM) has been also identified as a major process in aquatic20,21 and terrestrial22,23 ecosystems, and attributed to anaerobic methane-oxidizing Archaea (ANME)24, bacteria from the NC10 phylum25 or Gammaproteobacteria MOB active in anoxic zones20,26.

At the landscape scale, these processes occur with different magnitude depending on ecosystem type and associated microtopography. Soils are primarily CH4 sinks due to the drawdown of even very low levels of atmospheric CH427,28. Lakes and wetlands are recognized as major CH4 emission sources, despite high flux variability between and within aquatic ecosystems1,29. Among wetlands, organic-rich peatlands are estimated to cover more than 3.7 × 106 km2 in northern high latitudes30 and to emit 30 Tg CH4 yr−131. These important CH4 contributors are complex ecosystems with a variety of hydrologic regimes, productivity levels, vegetation covers, and variability in CH4 emissions. Especially, lower emissions are generally observed in fens compared to bogs, in association with different CH4 production pathways32. Temperature, water-table level, and permafrost state also influence CH4 emissions33. In permafrost landscapes, CH4 emissions from ‘wet features’, characteristic of degrading permafrost (e.g. ponds, hollows, thaw lakes, internal lawns, and collapse scars), are usually higher than from ‘dry features’, characteristic of intact permafrost (e.g. pingos, polygonal peatlands, hummocks, palsa, or peat plateau)34. Physicochemical characteristics such as pH, nitrogen, and phosphorus availability, and carbon source quality and quantity also drive CH4 emissions35,36,37. In the context of global warming, the expected variations of geochemical and physicochemical factors may affect the microbial CH4-cycle.

The present study focuses on the CH4 cycle in three high-latitude regions (Alaska, Patagonia, Siberia). Three ecosystem types (soils, wetlands, and lakes) have been investigated in each region during summer, with a systematic evaluation of their different habitats. This data set addresses the recent call of the global community of microbial scientists for the integration of microorganisms in mainstream climate change research addressing carbon fluxes38,39. Here, a thorough analysis of microbial community diversity and structure, carried out through functional gene quantifications and high throughput 16S rRNA gene amplicon sequencing, has been coupled with the physicochemical characterisation of habitats and measurement of atmospheric CH4 and CO2 fluxes. The physicochemical analysis included quantification of nutrients and trace elements, as well as stable isotopic signature of carbon species (CH4, dissolved organic and inorganic carbon) to track CH4 production and oxidation pathways. This database offers the possibility to expand the geographical scope of microbial ecology, biogeographic, and/or biogeochemical studies (either related to C cycling or other cycles) towards high latitude ecosystems. Moreover, this database is of particular interest for the earth system science community in order to parameterize relevant surface and sub-surface biogeochemical processes that can be further used to refine climate models or global models.

Methods

Sites overview and characteristics

This study focused on three regions located in subantarctic, arctic, and subarctic latitudes. The respective latitudinal and longitudinal ranges covered in this study were: 54.95 to 52.08 °S, and 72.03 to 67.34 °W in Patagonia; 67.44 to 67.54 °N, and 86.59 to 86.71 °E in Siberia; 63.21 to 68.63 °N, and −150.79 to −145.98 °W in Alaska (Figs. 1 and 2). The exact coordinates for each sample were included in the submitted dataset. The field campaigns were conducted in 2016, during the summer for each respective region: January-February in Chilean Patagonia, June-July in Alaska and July-August in Siberia.

Fig. 1
figure 1

Location of the three areas included in this study (panel a). The permafrost state and the number of sites and samples per region is indicated for each area. General views of 5 sites are provided as examples (bf). Panel B provides a large view of the ecosystem surrounding the wetland ALP2 (Alaska, exact location indicated by the white circle). Lake PCL1 (panel c) is representative of the lakes on Navarino island (Chilean Patagonia). The glacial lake SIL2 is shown in panel d. At site SIP5, the hollow at first plan is surrounded by palsa (hummock, second plan), characterized by dark organic matter and lichen vegetation (panel e). The PPP3 peatland shown in panel f is dominated by Sphagnum magellanicum, like most peatlands in the area.

Fig. 2
figure 2

Maps of sampling sites in Patagonia, Alaska and Siberia, indicating the ecosystem type (lake, wetland, soil). The tables show the complete- (in white) and the partial- (in grey) characterization sites. The exact coordinates of each sample are provided in the data record (See data records section).

For every site included in the present study, a set of nine qualitative environmental and/or ecological site-scale descriptors was selected and adapted from ENVO Environment Ontology40, which included for example permafrost state, biome, environmental feature and vegetation type (Table 1, Fig. 3). Permafrost state was obtained from the NSIDC permafrost map41. The biome, large-scale descriptor based on climate and vegetation criteria, was derived from Olson et al.42. Temperate forest, boreal forest, and tundra biomes were included. The environmental features that were representative for the three regions were considered: lakes, wetlands, broadleaf/coniferous/mixed forest soils, grassland, tundra, and palsa. All the metadata was included in the submitted dataset. Table 2 summarizes the main types of sampled ecosystems and their main characteristics in the three regions, while Supplementary Table S1 provides the details of each sampling site.

Table 1 Overview of the dataset contained in Mimarks sheet.
Fig. 3
figure 3

Description of the qualitative environmental/ecological descriptors used to describe every sample, derived from ENVO Environment Ontology40.

Table 2 Main types of sampled ecosystems in the three studied regions.

In Alaska, the studied area ranged from the Alaska Range and Fairbanks area (interior, continental climate, 63–65°N, discontinuous permafrost) up to Toolik Field Station (North Slope, arctic climate, 66–69°N, continuous permafrost; Fig. 2). The physiochemistry and CH4 emissions of lakes ALL1 (Killarney lake), ALL2 (Otto lake), ALL3 (Nutella lake), and ALL4 (Goldstream lake) were previously characterized35. A number of heterogeneous soil and wetland samples were collected around the studied Alaskan lakes and/or from monitored sites, as detailed in Supplementary Table S1. In the Alaska Range and Fairbanks area, soils were mostly covered by mixed or taiga forests, alpine tundra, and bogs or fens wetlands. In the norther Brooks Ranges mountain system, the landscape was piedmont hills with a predominant soil of porous organic peat underlain by silt and glacial till, all in a permafrost state, characterized mainly by Sphagnum and Eriophorum vegetation, as well as dwarf shrubs.

In Siberia, the studied area was located in the discontinuous permafrost region surrounding Igarka, on the eastern bank of the Yenisei River (Fig. 2). This region was mainly covered by forest, dominated by larch (Larix Siberica), birch (Betula Pendula), and Siberian pine (Pinus Siberica), and palsa landscapes (frozen peat mounts), the latter being dominated by moss, lichens, Labrador tea and dwarf birch. In degraded areas, thermokarst bogs were dominated by Sphagnum spp. and Eriophorum spp. Land cover was an indicator of permafrost status, since forested areas reflected a deep permafrost table (>2 m) associated with Pleistocene permafrost, while palsa-dominated landscapes were indicative of the presence of near-surface (<1 m) Holocene permafrost. In this area, most of the lakes were of glacial origin and influenced by permafrost degradation43 that has been observed for the last 30 years, while some were thermokarst lakes (Supplementary Table S1). Two studies that focused on methane cycling in SIL1 to SIL4 were recently published18,20. We sampled organic soils on a degradation gradient from dry palsa to thermokarst bogs44, as detailed in Supplementary Table S1.

Subantarctic sites were located in three areas in the Southern part of Chilean Patagonia: the Magellanic region around Punta Arenas, Tierra del Fuego, and Navarino Island (Fig. 2). Most of the sampled lakes from Magellanic and Tierra del Fuego regions were of glacial origin, while Navarino Island lakes were peatland lakes, surrounded by peatland and broadleaf forests. Peatlands were characterized by a very low diversity of Sphagnum species dominated by S. magellanicum from hollows up to hummocks. The typical broadleaf forests of the area were dominated by Nothofagus. Some grassland soil came from an experimental monitored field site (Supplementary Table S1). Samples collected from Patagonian soils and wetlands have been included in a recent survey of soil geochemical characterization (organic content)45. Sediment samples collected in lakes PPL1, PPL2, PCL1, PCL2, PCP2 were also included in a recent study by Lavergne et al.46 which showed that increasing air temperature led to enhanced CH4 production and to an associated metabolic shift in the CH4 production pathway, increasing the relative contribution of hydrogenotrophic methanogenesis compared to acetoclastic methanogenesis, together with consistent microbial community changes.

Surface area for lakes and elevation for all sites were determined using Google Earth Pro. Climate variables (Table 1) for each site were retrieved from WorldClim – Global47.

Sampling design

A specific sampling strategy was defined for each kind of ecosystem, i.e. lakes, soils, and wetlands (Fig. 4), as follows.

Fig. 4
figure 4

Sampling strategy for lake, soil and wetland sites (top, bottom left and bottom right panels, respectively). In lakes, at replicate points A, B and C, the water sample ‘WT’ was taken at the oxycline, and the water sample ‘WB’ just above sediment interface. One sediment sample was also collected. At soil sites, at replicate points A, B and C, two soil layers were sampled: ‘ST’ and ‘SB’ samples, representing respectively the top and bottom layers. In wetlands, two replicate transects were defined along the microtopography continuum hollow-edge-hummock. In hollows, one water and one solid sample were collected. At edges and hummocks, the same strategy as for soil sites was followed. For each type of sites, the number of sites and the corresponding number of events (in situ measurement and/or sampling) is indicated between parentheses.

In lakes, surface (0–10 cm) sediments and water samples were collected from three replicate points A, B, and C (Fig. 4) corresponding to the deepest zone of the lake, at ~ 2–5 meters of distance from each other. Two sampling depths were considered for the water samples: (i) at the oxycline, and (ii) just above the interface with sediment. Water was sampled using a 2.2 L Van Dorn bottle (Wildco, Mexico). Sediments were sampled using an Ekman dredge.

Mineral soil samples were collected from three replicate points A, B, and C (at ~ 2–5 meters of distance from each other), considering two sampling depths for each point (Fig. 4).

In wetlands, microtopography is known to influence organic matter decomposition, CH4 emissions, microbial community structure, and metabolic pathways48,49,50. The sampling strategy covered the three main microtopographic features of wetlands: hollows (i.e. small depressions, ponds, that can be filled with water or not at the time of sampling) (points A and D, Fig. 4); flat edges (or lawns) at the water table level or below, usually water-saturated and characterized by Sphagnum moss vegetation (points B and E, Fig. 4); and hummocks (i.e. dryer elevated mounts/raised domes, above the water table level, usually characterized by lichens and shrubs) (points C and F, Fig. 4). Two duplicate transects were considered, i.e. A-B-C and D-E-F transects, collected at ~ 10 meters of distance from each other. At each point, two sampling depths were considered, according to the same strategy as explained below for soils.

For both mineral and organic soils, soil blocks (20 × 20 × 20 cm blocks) were collected with a bread knife or a shovel. If soil layers could be clearly identified, top and bottom samples were defined accordingly and reported in the database. Otherwise, default depths were 0–10 cm for the surface layer and 10–20 cm for the bottom layer.

In addition to ecosystem-scale descriptors, every sample was characterized by point-scale descriptors (latitude, longitude, microtopography and vegetation type) and sample-scale descriptors such as environmental material (water, sediment, organic or mineral soil; Table 1 and Fig. 3). Soil samples were classified between organic and mineral soils using organic matter content (40% threshold) as the discriminating criterion between the two environmental materials6.

The material and methods used for characterizing these samples in situ and in the laboratory are described in the following sections. In some sites (ALP3, ALS3, ALS4, ALS6, ALS8, ALS9, PCL3, PCP3, PCS1, PPL3, PPP3, PTL1, PTL2, PTP1, PTP2, PTS1, SIL5, SIP6, SIP7, SIS3, SIS4), a basic characterization was carried out, due to harsh conditions and limited access. This basic characterisation included restrained set of measured parameters as listed in Table 1, yet enabling to fully fill the objective of this project. All the other sites were fully characterized, including the whole set of measured parameters as listed in Table 1, according to the environmental package (water, sediment, soil).

In situ analyses

Physicochemical analyses

At each sampling point and depth in lakes and hollows, dissolved oxygen, temperature, pH, conductivity, and redox potential were measured in water with a multiparametric probe (HI 9828, Hanna Instrument, Mexico). The detection limits for dissolved O2 was 10 µg L−1. In soil ecosystems, temperature was measured with an insertion thermometer (Isolab, Laborgerate GmbH).

Dissolved CH4 and CO2 concentrations

In lakes, the dissolved CH4 and CO2 concentrations were measured at each replicated sampling point and depth with the membrane-integrated cavity output spectrometry method using an ultraportable greenhouse gas analyzer (UGGA, Los Gatos Research, USA)51. The detection limits for dissolved CH4 and CO2 concentrations were 5 nmol L−1 and 4 μmol L−1 respectively.

Atmospheric CH4 and CO2 emission rates

CH4 and CO2 emission rates were estimated with a static opaque chamber coupled in a loop to the UGGA (Los Gatos Research, USA), following the procedure described previously9. Briefly, a 0.102 m2 floating chamber (7.8 L) was placed at the surface of lakes and ponds and a 0.035 m2 chamber (12.3 L) was installed on soil sites. Accumulation of CH4 and CO2 was recorded during 5 min, and flux determined from the slope of CH4 and CO2. Then the chamber was ventilated and closed to perform another flux measurement. At least three replicate measurements were performed at each location (sampling points defined in Fig. 4). The static chamber method used measures the total flux at the surface, i.e. including both diffusive and ebullitive fluxes. As illustrated in Fig. 5a, the highest CH4 emission rates were found in hollows, especially in Siberian peatlands of discontinuous permafrost and lakes.

Fig. 5
figure 5

Methane emission rates measured during field campaigns (left) and δ13C-CH4 fractionation (right). Methane emission measurements using static chambers were pooled according to meaningful categories that combined the environmental feature, environmental package and microtopography descriptors. The δ13C-CH4 fractionation was measured in water samples only, i.e. in samples collected in the water column of lakes (at oxycline and at the bottom) and in hollows found in wetlands.

Sample processing in the field

For further analysis, water subsamples were collected into 10 mL glass vials, directly in the field. For δ13CCH4, δ2HCH4, and total organic carbon (TOC) analysis, samples were acidified (HCl 6 N). For dissolved inorganic carbon (DIC) concentration and δ13C-DIC analysis, HgCl2 was added to the samples to stop any biological activity. After fixation by HCl and/orHgCl2, water subsamples were stored at 4 °C in dark conditions. Soil samples were also kept at 4 °C for 24 h maximum before further processing.

Laboratory methods

Moisture and organic matter content

Soil and sediment samples were dried at 110 °C overnight to determine the dry weight. Organic matter content was assessed via loss on ignition at 550 °C.

Suspended solids

Lake and hollow water samples (20 mL to 3 L, until clogging) were filtered on pre-weighted combusted GF/F grade glass microfiber filters (0.7 µm pore size, Whatman). The filters were dried overnight at 105 °C to calculate the total suspended solids (TSS). The filters were then incinerated at 550 °C for 2 hrs to determine the concentration of particulate organic matter (POM).

Filtration

After pre-filtration at 80 µm (nylon net filters, Merck Millipore, Cork Ireland), water samples were filtered at 0.22 µm (nitrocellulose GSWP membrane filters, Merck Millipore, Cork Ireland) up to filter clogging (corresponding to 636 ± 521 mL on average, ranging from 70 to 2930 mL depending on the highly variable suspended matter content of the samples). The filter was frozen at −20 °C prior to DNA extraction. The filtrate was recovered and used to prepare four vials for further analysis of dissolved organic carbon (DOC), the isotopic composition (δ13C) of DOC, optical properties of dissolved organic matter, cations, anions, and trace elements.

Pore water extraction

The water extraction was carried out on soil and sediment samples to assess the mobile fraction of DOC, major anions and cations, trace elements, and the optical properties of dissolved organic matter (DOM). Following the procedure recommended in Jones & Willet52, 40 g of sample were placed in 200 mL of deionized water, and gently agitated with a magnetic stirrer at room temperature for 1 hr. The liquid phase was then recovered using a microRhizon sampler (Rhizosphere, Netherlands). The same procedure as for water samples was used to prepare and analyse these extracts.

Total and dissolved organic carbon

In water samples collected in lakes and hollows, TOC and DOC concentrations were analysed in using a TOC-V CSH analyser (Shimadzu, Japan). For DOC concentrations, samples were acidified to pH 2 using HCl 6 N and stored in 10 mL baked clear glass vials. The limit of quantification (LoQ) was 1 mg L−1.

Anions and cations

Major ions were quantified in water samples collected in lakes and hollows and in pore water using a HPLC (Dionex, USA), a Dionex DX-120 analyser for cations (Thermo Fisher Scientific, France) and a Dionex ICS-5000 + analyser for anions (Thermo Fisher Scientific, France), according to recommandations53. The LoQ was 0.5 mg L−1 for calcium, chloride, sulphate, and magnesium; 0.25 mg L−1 for bromide, sodium, and potassium; 0.025 mg L−1 for ammonium and phosphate; 0.01 mg L−1 for fluoride, nitrate, and nitrite.

Trace elements

For trace element analysis, samples were acidified with ultrapure HNO3 prior to ICP-MS (7500ce, Agilent Technologies) analysis, and kept in 15 mL polypropylene vials. LoQ were <0.5 µg g−1 for aluminium, iron, manganese, <0.05 µg g−1 for vanadium, chromium, cobalt, nickel, copper, zinc, and <0.005 µg g−1 for arsenic, strontium, cadmium, antimony, lead, uranium.

Optical properties

Subsamples were collected in 30-mL polypropylene vial for optical properties of DOM. The UV absorption spectra of pore water were measured with a spectrophotometer (Secoman UVi-lightXT5) from 190 to 700 nm in a 1 cm quartz cell. The specific UV absorbance at 254 nm (SUVA, L mg C−1 m−1) was calculated as follows: SUVA = A254/b*DOC54, where A254 is the sample absorbance at 254 nm (non-dimensional), b is the optical path length (m), and DOC is in mg L−1. Fluorescence measurements were performed using a spectrofluorometer (Synergy MX, Biotek). The emission spectrum was recorded for a 370 nm excitation wavelength. The fluorescence Index (FI) was determined for a 370 nm excitation wavelength, as the ratio of the 470 nm emission to 520 nm emission55,56.

Isotopes

The stable isotopic signature of methane (δ13C-CH4, shown in Fig. 5b, and δ2H-CH4) was analyzed at the Stable Isotope Facility of UC-Davis (https://stableisotopefacility.ucdavis.edu/methane-ch4-gas), using a ThermoScientific Precon concentration unit interfaced to a ThermoScientific Delta V Plus isotope ratio mass spectrometer (ThermoScientific, Germany). Methane was extracted for IRMS analysis following the method of Yarnes et al.57. The LoQ was 5 ppm of CH4 for δ2H and 1.7 ppm of CH4 for δ13C, and standard deviation was typically 2‰ for δ2H and 0.2‰ for δ13C. The δ13C-CO2 was analyzed using a mass spectrometer (Isoprime 100, Elementar, UK) coupled with an equilibration system (MultiFlow-Geo, Elementar, UK). Samples were acidified using phosphoric acid and flushed with helium. The δ13C-DOC was analysed at the UC Davis Stable Isotope Facility, following the described procedure (http://stableisotopefacility.ucdavis.edu/doc.html). A TOC Analyzer (OI Analytical, College Station, TX) was interfaced to a PDZ Europa 20–20 isotope ratio mass spectrometer (Sercon Ltd., UK) utilizing a GD-100 Gas Trap Interface (Graden Instruments).

DNA extraction

Soil and sediments were subsampled and frozen at −20 °C. DNA was extracted from 0.5 g of the soil or sediment subsamples and from the previously frozen 0.22-µm filters using the PowerSoil and PowerWater DNA isolation kits, respectively (Qiagen, Hilden, Germany), following manufacturer instructions. The DNA extracts were stored at −20 °C.

qPCR assay

The abundances of four genes were measured by quantitative PCR (qPCR): bacterial 16S rRNA gene, archaeal 16S rRNA gene, pmoA gene (marker gene for aerobic methane oxidizing bacteria through the particulate methane monooxygenase), and mcrA gene (marker gene for methanogens and ANMEs through the methyl coenzyme M reductase). Duplicate measurements were run in 20 µL, using the Takyon SYBR master mix (Eurogentec, Belgium) with a CFX96 thermocycler (Bio-Rad Laboratories, Hercules, CA, US) and AriaMX thermocycler (Agilent, CA, US). Primer sequences and concentrations, thermocycling conditions, and standard curve preparation were detailed in Thalasso et al.18. As an illustration, the abundance of mcrA gene according to habitat (i.e. category combining the environmental material and the microtopography) is displayed in Fig. 6.

Fig. 6
figure 6

Abundance of methanogens in samples collected in Patagonia, Siberia and Alaska. Methanogen abundances were derived from qPCR assays targeting mcrA gene and were pooled according to environmental feature, environmental package and microtopography descriptors.

High-throughput amplicon sequencing

Archaeal and bacterial diversity was assessed using metabarcoding and targeting the V4-V5 region of 16S rRNA gene. Amplicons were obtained from DNA extracts using 515 F (GTGYCAGCMGCCGCGGTA) and 928 R (CCCCGYCAATTCMTTTRAGT) primers58. MTP Taq DNA polymerase was acquired from Sigma (France). The thermocycling procedure was the following: 2 min at 94 °C; 30 cycles of 60 s at 94 °C, 40 s at 65 °C, and 30 s at 72 °C; and finally, 10 min at 72 °C. PCR products were used for pair-end sequencing using Illumina Miseq (2 × 250-bp). After pre-processing of raw reads through the FROGS pipeline59, a total of 18 369 310 sequences were obtained from the 387 samples, and clustered into 121 971 OTUs using Swarm60. The OTUs were further filtered at 0.005% of relative abundance, as previously recommended61, and taxonomically annotated against SILVA 132 rRNA database. Community analysis was carried out in R software, version 4.1.1, with ‘phyloseq’ package62. The taxonomic composition of bacteria according to habitat was represented by a barplot at the phylum level (Fig. 7a). As an illustration of the microbial diversity outcomes from this dataset and the community variability according to the different habitats, the dissimilarity among the 387 community structures was visualized by a principal coordinate analysis PCoA, a.k.a. Multidimensional scaling (MDS) with the ordinate function using Bray Curtis distance (‘phyloseq’ package62) computed on the filtered and standardized (percentage) OTUs relative abundances (Fig. 7b).

Fig. 7
figure 7

Taxonomic composition and similarities between the 387 microbial communities. The taxonomic composition of bacteria is presented at the phylum level, representing only the 15 most abundant phyla (panel a). Relative abundances of the phyla were calculated for seven habitats, i.e. a combination of environmental feature, environmental package and microtopography descriptors. Only the 15 more abundant phyla are displayed. The principal coordinate analysis (PCoA) of the filtered and standardized OTU abundance table was computed with Bray-Curtis distance to visualize similarities between microbial communities of the different habitats (indicated by the symbol shape) across the three studied regions (indicated by the symbol color; panel b). The percentage of total variance explained by each component is indicated along the axis, showing high microbial community variability mainly according to the different habitats.

Data Records

This paper presents a combination of sample metadata, environmental data (gas flux and biogeochemical measurements), and high-throughput microbiome sequencing data co-located in time and space. Linking these data of different nature is crucial for their effective interpretation and reuse. The geo-referenced dataset was documented in the DRYAD platform63 and is fully available under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. The dataset is publicly accessible with the following doi: 10.5061/dryad.rfj6q57dp. The dataset in DRYAD also includes a ‘readme’ file intended to provide key information for understanding and reuse of the dataset. The data is organized in a standard datasheet table easily downloadable (csv format) with 387 samples (in rows) and 120 parameters (in columns). The first row of the table is the parameter name and the second row of the table is the unit of each parameter. The parameters are organized as follows: Sampling context; Ecosystem characteristics; Sequencing method details; Basic physicochemical parameters; Organic matter characteristics; Nutrients, anions, cations; Greenhouse gases; qPCR quantifications; Micro elements; Bioclimatic variables. The data is easily downloadable in csv format, and the clear and unique sample ID enables to link the data to the sequence set. The raw sequence data in FASTQ format without preprocess, were archived in the European Nucleotide Archive (ENA) with accession codes PRJEB36731 (Siberia)64, PRJEB36732 (Alaska)65, and PRJEB36733 (Patagonia)66. These microbial datasets along with sample metadata have been published in the Global Biodiversity Information Facility (GBIF)67,68,69 separately for Patagonia67, Siberia68, and Alaska69. Standardized information about sequence data70 were reported together with environmental data, and formatted as defined by the Genomic Standards Consortium71, based on MIMARKS sheet for miscellaneous natural environment.

Technical Validation

Operator training and strategic harmonization for meta(data) collection occurred at the beginning of the first field campaign to ensure all operators used identical and replicable methods in terms of data acquisition in field, sampling, sample processing in the field and in the laboratory, and data recording. All data were checked and accurately transferred to MIMARKS database. The database was eventually manually curated by a dedicated data manager.

During CH4 and CO2 flux measurement, two criteria were tested before emission trends were validated72: (i) that the initial concentration was nearly equal to ambient atmospheric concentration; and (ii) that the linear correlation coefficient (R2) from the regression analysis reached 0.90. When a measurement did not meet these criteria, additional replicates were done, which occurred in only a few occasions.

Reference material ION-915 and ION 96.4, both acquired from Environment and Climate Change Canada (Canada), were included in the analytical loop of TOC and major anions and cations determination. Recovery was >95% of the certified value. The trace element certified river water53 SLRS6 (National Research Council – Conseil National de Recherches Canada) was used as a reference material on every run for ICPMS analysis, with indium as an internal standard, and accuracy (i.e. recovery >95%) was checked. The analytical routine included the analysis of blanks, calibration standards, and a multi-element quality control solution (EPOND) every 12 samples.

For isotopic analysis of δ13C-CO2 analysis, standards included Na2CO3 and NaHCO3 as well as internal water standards, that were analyzed every 8 samples to check for instrument stability. All samples were analyzed in replicates. Standard deviation was typically around or below 0.2‰.

Blanks (sterile pure water) were included in the DNA extraction process, PCR, and qPCR protocols. The absence of amplification on negative controls (contamination) was checked by gel electrophoresis. The correct size of 16S rRNA amplicons and the PCR specificity (unique band) were also checked by gel electrophoresis.

For qPCR of bacterial and archaeal 16S rRNA gene and mcrA gene, standard curves were prepared from 10-fold serial dilutions of each target gene, amplified from the following pure strains: Pseudomonas stutzeri SLG510A3-8 (KT153610 accession number), Arch_21F_10-Berre_sed clone (KT351355 accession number), and Methanosarcina barkeri CM1 (AKJ39604 accession number), respectively, and cloned in pGEM-T plasmid (Promega). For pmoA, the standard was synthetized by Eurofins from Methylobacter sp. BB5.1 pmoA gene sequence (AF016982 accession number), inserted in TOPO-TA pCR2.1 plasmid. qPCR efficiencies were always >90% and amplicon size and specificity were confirmed by melting curve analysis and agarose gels.

Two samples with less than 1325 sequences retrieved from high-throughput sequencing were discarded from the sequence datasets deposited in the European Nucleotide Archive.