Skip to content

Current Draft

🛑 This draft is now deprecated due to changes in the structure of the MInAS project to better align with the MIxS infrastructure. Please see the latest drafts on the other pages in this section.

The current version is v0.0.2.

This can be considered a pre-alpha version, and has not been reviewed nor approved by the wider palaeogenomics community nor by the Genomics Standards Consortium.

A commentable early draft of the MInAS checklist written by members of the SPAAM community can be found here, otherwise a simplified current version of only MInAS related columns is rendered below.

For the current release of the base MIxS checklists, please see the GenSC website.

Structured comment name Item (rdfs:label) Definition Expected value Value syntax Example Section minas Preferred unit Occurence MIXS ID Modification Suggestion Requires Further Discussion Reason for further discussion
samp_name sample name A local identifier or name that for the material sample used for extracting nucleic acids, and subsequent sequencing. It can refer either to the original material collected or to any derived sub-samples. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. INSDC requires every sample name from a single Submitter to be unique. Use of a globally unique identifier for the field source_mat_id is recommended in addition to sample_name. text {text} ISDsoil1 investigation M nan 1 MIXS:0001107 Definition update: clarify it's a DNA lab code (museum ID should go in source_mat_id) N nan
samp_taxon_id taxonomy ID of DNA sample NCBI taxon id of the sample. Maybe be a single taxon or mixed taxa sample. Use 'synthetic metagenome’ for mock community/positive controls, or 'blank sample' for negative controls. Taxonomy ID {text} [NCBI:txid] Gut Metagenome [NCBI:txid749906] investigation M nan 1 MIXS:0001320 nan N nan
project_name project name Name of the project within which the sequencing was organized nan {text} Forest soil metagenome investigation M nan 1 MIXS:0000092 nan N nan
lat_lon geographic location (latitude and longitude) The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system decimal degrees, limit to 8 decimal points {float} {float} 50.586825 6.408977 environment C nan 1 MIXS:0000009 None - but should be discussed for cases when imprecision may be perferred (e.g. to prevent looting of the site - i.e. is there a minimum level required?) with GSC to find a good solution in cases Y What level of imprecision is allowed by GSC?
depth depth The vertical distance below local surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectively. Depth can be reported as an interval for subsurface samples. measurement value {float} {unit} 10 meter environment E meter 1 MIXS:0000018 Environment-dependent, i.e. minas-environmental N nan
elev elevation Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit. measurement value {float} {unit} 100 meter environment X nan 1 MIXS:0000093 nan N nan
temp temperature Temperature of the sample at the time of sampling. measurement value {float} {unit} 25 degree Celsius environment X degree Celsius 1 MIXS:0000113 Definition update: For ancient samples, this can be temperature of marine sediments, burial environment, cave atmosphere N nan
geo_loc_name geographic location (country and/or sea,region) The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (http://purl.bioontology.org/ontology/GAZ) country or sea name (INSDC or GAZ): region(GAZ), specific location name {term}: {term}, {text} USA: Maryland, Bethesda environment M nan 1 MIXS:0000010 Definition update: in cases of ancient locations, use the name of the present day county that the location is based in. N nan
collection_date collection date The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant date and time {timestamp} 2018-05-11T10:00:00+01:00; 2018-05-11 environment C nan 1 MIXS:0000011 Definition update: For ancient samples, the date of the drilling or subsampling from the main specimen, the sub-sample of which is used for DNA extraction. N nan
neg_cont_type negative control type The substance or equipment used as a negative control in an investigation enumeration or text [distilled water|phosphate buffer|empty collection device|empty collection tube|DNA-free PCR mix|sterile swab |sterile syringe] nan investigation C nan 1 MIXS:0001321 nan N nan
env_broad_scale broad-scale environmental context Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS The major environment type(s) where the sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes. {termLabel} {[termID]} oceanic epipelagic zone biome [ENVO:01000033] for annotating a water sample from the photic zone in middle of the Atlantic Ocean environment M nan 1 MIXS:0000012 Definition update: ENVO terms should be reported. Y If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date)
env_local_scale local environmental context Report the entity or entities which are in the sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS. Environmental entities having causal influences upon the entity at time of sampling. {termLabel} {[termID]} litter layer [ENVO:01000338]; Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]|herb and fern layer [ENVO:01000337]|litter layer [ENVO:01000338]|understory [01000335]|shrub layer [ENVO:01000336]. environment M nan 1 MIXS:0000013 Definition update: ENVO terms should be reported Y If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date)
env_medium environmental medium Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS . Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top). The material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. {termLabel} {[termID]} soil [ENVO:00001998]; Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]|air [ENVO_00002005] environment M nan 1 MIXS:0000014 Definition update: ENVO terms should be reported Y If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date)
subspecf_gen_lin subspecific genetic lineage Information about the genetic distinctness of the sequenced organism below the subspecies level, e.g., serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123. Genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype, variety, cultivar. {rank name}:{text} serovar:Newport nucleic acid sequence source X nan 1 MIXS:0000020 nan N nan
ploidy ploidy The ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATO PATO {termLabel} {[termID]} allopolyploidy [PATO:0001379] nucleic acid sequence source X nan 1 MIXS:0000021 nan N nan
num_replicons number of replicons Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote for eukaryotes and bacteria: chromosomes (haploid count); for viruses: segments {integer} 2 nucleic acid sequence source X nan 1 MIXS:0000022 nan N nan
extrachrom_elements extrachromosomal elements Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids) number of extrachromosmal elements {integer} 5 nucleic acid sequence source X nan 1 MIXS:0000023 nan N nan
ref_biomaterial reference for biomaterial Primary publication if isolated before genome publication; otherwise, primary genome report. PMID, DOI or URL {PMID}|{DOI}|{URL} doi:10.1016/j.syapm.2018.01.009 nucleic acid sequence source X nan 1 MIXS:0000025 nan N nan
source_mat_id source material identifiers A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2). for cultures of microorganisms: identifiers for two culture collections; for other material a unique arbitrary identifer {text} MPI012345 nucleic acid sequence source M nan m MIXS:0000026 Definition update: For ancient samples, this is for example where archaeological/museum collection ID goes. May be duplicate with samp_name. N nan
specific_host host scientific name Report the host's taxonomic name and/or NCBI taxonomy ID. host scientific name, taxonomy ID {text}|{NCBI taxid} Homo sapiens and/or 9606 nucleic acid sequence source C nan 1 MIXS:0000029 Condition: for host(-associated) samples only N nan
host_disease_stat host disease status List of diseases with which the host has been diagnosed; can include multiple diagnoses. The value of the field depends on host; for humans the terms should be chosen from the DO (Human Disease Ontology) at https://www.disease-ontology.org, non-human host diseases are free text disease name or Disease Ontology term {termLabel} {[termID]}|{text} rabies [DOID:11260] nucleic acid sequence source X nan m MIXS:0000031 nan N nan
samp_collec_method sample collection method The method employed for collecting the sample. PMID,DOI,url , or text {PMID}|{DOI}|{URL}|{text} swabbing nucleic acid sequence source M nan 1 MIXS:0001225 Have 'novel' as a selection option. Y [JAFY] Doesn't 'free text' count for this
samp_mat_process sample material processing A brief description of any processing applied to the sample during or after retrieving the sample from environment, or a link to the relevant protocol(s) performed. text {text} filtering of seawater, storing samples in ethanol nucleic acid sequence source ??? nan 1 MIXS:0000016 nan N nan
size_frac size fraction selected Filtering pore size used in sample preparation filter size value range {float}-{float} {unit} 0-0.22 micrometer nucleic acid sequence source ??? nan 1 MIXS:0000017 nan N nan
samp_size amount or size of sample collected The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected. measurement value {float} {unit} 5 liter nucleic acid sequence source ??? millliter, gram, milligram, liter 1 MIXS:0000001 nan N nan
samp_vol_we_dna_ext sample volume or weight for DNA extraction Volume (ml) or mass (g) of total collected sample processed for DNA extraction. Note: total sample collected should be entered under the term Sample Size (MIXS:0000001). measurement value {float} {unit} 1500 milliliter nucleic acid sequence source M millliter, gram, milligram, square centimeter 1 MIXS:0000111 nan N nan
virus_enrich_appr virus enrichment approach List of approaches used to enrich the sample for viruses, if any enumeration [filtration|ultrafiltration|centrifugation|ultracentrifugation|PEG Precipitation|FeCl Precipitation|CsCl density gradient|DNAse|RNAse|targeted sequence capture|other|none] filtration + FeCl Precipitation + ultracentrifugation + DNAse nucleic acid sequence source ??? nan 1 MIXS:0000036 nan N nan
nucl_acid_ext nucleic acid extraction A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample PMID, DOI or URL {PMID}|{DOI}|{URL} https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf sequencing M nan 1 MIXS:0000037 Have 'novel' as a selection option. Y [JAFY] Wouldn't you cite your own paper if it's novel? I think it makes more sense to keep it as C as this is the same for all other MIXS checklists
nucl_acid_amp nucleic acid amplification A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids PMID, DOI or URL {PMID}|{DOI}|{URL} https://phylogenomics.me/protocols/16s-pcr-protocol/ sequencing ??? nan 1 MIXS:0000038 Have 'novel' as a selection option. Y [JAFY] Wouldn't you cite your own paper if it's novel? I think it makes more sense to keep it as C as this is the same for all other MIXS checklists
lib_reads_seqd library reads sequenced Total number of clones sequenced from the library number of reads sequenced {integer} 20 sequencing ??? nan 1 MIXS:0000040 Definition update: Total number of reads sequenced from the library N nan
lib_layout library layout Specify whether to expect single, paired, or other configuration of reads enumeration [paired|single|vector|other] paired sequencing M nan 1 MIXS:0000041 nan N nan
lib_screen library screening strategy Specific enrichment or screening methods applied before and/or after creating libraries screening strategy name {text} enriched, screened, normalized sequencing ??? nan 1 MIXS:0000043 Definition update: Suggest splitting screening and enrichment, as these mean different things (at least in aDNA) N nan
target_gene target gene Targeted gene or locus name for marker gene studies gene name {text} 16S rRNA, 18S rRNA, nif, amoA, rpo sequencing X nan 1 MIXS:0000044 nan N nan
target_subfragment target subfragment Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA gene fragment name {text} V6, V9, ITS sequencing X nan 1 MIXS:0000045 nan N nan
pcr_primers pcr primers PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters FWD: forward primer sequence;REV:reverse primer sequence FWD:{dna};REV:{dna} FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAAT sequencing X nan 1 MIXS:0000046 Definition update: Add in 5' to 3' orientation. N nan
mid multiplex identifiers Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters multiplex identifier sequence {dna} GTGAATAT sequencing ??? nan 1 MIXS:0000047 Definition update: Specify if single or dual tagged/indexed (following similar structure as adapters?) N nan
adapters adapters Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters adapter A and B sequence {dna};{dna} AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT sequencing ??? nan 1 MIXS:0000048 nan N nan
pcr_cond pcr conditions Description of reaction conditions and components of PCR in the form of 'initial denaturation:94degC_1.5min; annealing=...' initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles initial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35 sequencing ??? nan 1 MIXS:0000049 Definition update: Targeted PCR or library PCR?; add a DOI requirement N nan
seq_meth sequencing method Sequencing machine used. Where possible the term should be taken from the OBI list of DNA sequencers (http://purl.obolibrary.org/obo/OBI_0400103). Text or OBI {termLabel} {[termID]}|{text} 454 Genome Sequencer FLX [OBI:0000702] sequencing M nan 1 MIXS:0000050 nan N nan
chimera_check chimera check software Tool(s) used for chimera checking, including version number and parameters, to discover and remove chimeric sequences. A chimeric sequence is comprised of two or more phylogenetically distinct parent sequences. name and version of software, parameters used {software};{version};{parameters} uchime;v4.1;default parameters sequencing C nan 1 MIXS:0000052 nan N nan
tax_ident taxonomic identity marker The phylogenetic marker(s) used to assign an organism name to the SAG or MAG enumeration [16S rRNA gene|multi-marker approach|other] other: rpoB gene sequencing C nan 1 MIXS:0000053 nan N nan
assembly_qual assembly quality The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated enumeration [Finished genome|High-quality draft genome|Medium-quality draft genome|Low-quality draft genome|Genome fragment(s)] High-quality draft genome sequencing C nan 1 MIXS:0000056 Condition: Conditional on assembly performed - which type of assembly applies here? N nan
assembly_name assembly name Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community name and version of assembly {text} {text} HuRef, JCVI_ISG_i3_1.0 sequencing C nan 1 MIXS:0000057 Condition: Conditional on assembly performed - which type of assembly applies here? N nan
assembly_software assembly software Tool(s) used for assembly, including version number and parameters name and version of software, parameters used {software};{version};{parameters} metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise sequencing C nan 1 MIXS:0000058 Condition: Conditional on assembly performed - which type of assembly applies here? N nan
annot annotation Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter name of tool or pipeline used, or annotation source description {text} prokka sequencing X nan 1 MIXS:0000059 nan N nan
number_contig number of contigs Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG value {integer} 40 sequencing X nan 1 MIXS:0000060 nan N nan
feat_pred feature prediction Method used to predict UViGs features such as ORFs, integration site, etc. names and versions of software(s), parameters used {software};{version};{parameters} Prodigal;2.6.3;default parameters sequencing X nan 1 MIXS:0000061 nan N nan
ref_db reference database(s) List of database(s) used for ORF annotation, along with version number and reference to website or publication names, versions, and references of databases {database};{version};{reference} pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975 sequencing X nan 1 MIXS:0000062 nan N nan
sim_search_meth similarity search method Tool used to compare ORFs with database, along with version and cutoffs used names and versions of software(s), parameters used {software};{version};{parameters} HMMER3;3.1b2;hmmsearch, cutoff of 50 on score sequencing X nan 1 MIXS:0000063 nan N nan
tax_class taxonomic classification Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes classification method, database name, and other parameters {text} vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters) sequencing X nan 1 MIXS:0000064 nan N nan
16s_recover 16S recovered Can a 16S gene be recovered from the submitted SAG or MAG? boolean {boolean} yes sequencing X nan 1 MIXS:0000065 nan N nan
16s_recover_software 16S recovery software Tools used for 16S rRNA gene extraction names and versions of software(s), parameters used {software};{version};{parameters} rambl;v2;default parameters sequencing X nan 1 MIXS:0000066 nan N nan
trnas number of standard tRNAs extracted The total number of tRNAs identified from the SAG or MAG value from 0-21 {integer} 18 sequencing X nan 1 MIXS:0000067 nan N nan
trna_ext_software tRNA extraction software Tools used for tRNA identification names and versions of software(s), parameters used {software};{version};{parameters} infernal;v2;default parameters sequencing X nan 1 MIXS:0000068 nan N nan
compl_score completeness score Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores quality;percent completeness [high|med|low];{percentage} med;60% sequencing C nan 1 MIXS:0000069 Condition: Conditional on assembly performed N nan
compl_software completeness software Tools used for completion estimate, i.e. checkm, anvi'o, busco names and versions of software(s) used {software};{version} checkm sequencing C nan 1 MIXS:0000070 Condition: Conditional on assembly performed N nan
compl_appr completeness approach The approach used to determine the completeness of a given genomic assembly, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome text [marker gene|reference based|other] other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83) sequencing C nan 1 MIXS:0000071 Condition: Conditional on assembly performed N nan
contam_score contamination score The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases value {float} percentage 1% sequencing C nan 1 MIXS:0000072 Condition: Conditional on contamination estimation N nan
contam_screen_input contamination screening input The type of sequence data used as input enumeration [reads| contigs] contigs sequencing X nan 1 MIXS:0000005 nan N nan
contam_screen_param contamination screening parameters Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer enumeration;value or name [ref db|kmer|coverage|combination];{text|integer} kmer sequencing X nan 1 MIXS:0000073 nan N nan
decontam_software decontamination software Tool(s) used in contamination screening enumeration [checkm/refinem|anvi'o|prodege|bbtools:decontaminate.sh|acdc|combination] anvi'o sequencing X nan 1 MIXS:0000074 nan N nan
sort_tech sorting technology Method used to sort/isolate cells or particles of interest enumeration [flow cytometric cell sorting|microfluidics|lazer-tweezing|optical manipulation|micromanipulation|other] optical manipulation sequencing C nan 1 MIXS:0000075 nan N nan
single_cell_lysis_appr single cell or viral particle lysis approach Method used to free DNA from interior of the cell(s) or particle(s) enumeration [chemical|enzymatic|physical|combination] enzymatic sequencing C nan 1 MIXS:0000076 Use a non-single cell version?; DOI/PMID for custom protocols Y [JAFY] Isn't this mutually exclusive
single_cell_lysis_prot single cell or viral particle lysis kit protocol Name of the kit or standard protocol used for cell(s) or particle(s) lysis kit, protocol name {text} ambion single cell lysis kit sequencing C nan 1 MIXS:0000054 Use a non-single cell version? Y [JAFY] Isn't this mutually exclusive
wga_amp_appr WGA amplification approach Method used to amplify genomic DNA in preparation for sequencing enumeration [pcr based|mda based] mda based sequencing C nan 1 MIXS:0000055 Use a non-single cell version?; DOI/PMID for custom protocols Y [JAFY] Isn't this mutually exclusive
wga_amp_kit WGA amplification kit Kit used to amplify genomic DNA in preparation for sequencing kit name {text} qiagen repli-g sequencing C nan 1 MIXS:0000006 Use a non-single cell version? Y [JAFY] Isn't this mutually exclusive
bin_param binning parameters The parameters that have been applied during the extraction of genomes from metagenomic datasets enumeration [homology search|kmer|coverage|codon usage|combination] coverage and kmer sequencing C nan 1 MIXS:0000077 nan N nan
bin_software binning software Tool(s) used for the extraction of genomes from metagenomic datasets, where possible include a product ID (PID) of the tool(s) used. names and versions of software(s) used {software};{version}{PID} MetaCluster-TA (RRID:SCR_004599), MaxBin (biotools:maxbin) sequencing C nan 1 MIXS:0000078 nan N nan
reassembly_bin reassembly post binning Has an assembly been performed on a genome bin extracted from a metagenomic assembly? boolean {boolean} no sequencing C nan 1 MIXS:0000079 nan N nan
mag_cov_software MAG coverage software Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets enumeration [bwa|bbmap|bowtie|other] bbmap sequencing C nan 1 MIXS:0000080 nan N nan
vir_ident_software viral identification software Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used software name, version and relevant parameters {software};{version};{parameters} VirSorter; 1.0.4; Virome database, category 2 sequencing C nan 1 MIXS:0000081 nan N nan
pred_genome_type predicted genome type Type of genome predicted for the UViG enumeration [DNA|dsDNA|ssDNA|RNA|dsRNA|ssRNA|ssRNA (+)|ssRNA (-)|mixed|uncharacterized] dsDNA sequencing C nan 1 MIXS:0000082 nan N nan
pred_genome_struc predicted genome structure Expected structure of the viral genome enumeration [segmented|non-segmented|undetermined] non-segmented sequencing C nan 1 MIXS:0000083 Condition: Ancient virus genome only N nan
detec_type detection type Type of UViG detection enumeration [independent sequence (UViG)|provirus (UpViG)] independent sequence (UViG) sequencing C nan 1 MIXS:0000084 Condition: Ancient virus genome only N nan
otu_class_appr OTU classification approach Cutoffs and approach used when clustering “species-level” OTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside OTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis cutoffs and method used {ANI cutoff};{AF cutoff};{clustering method} 95% ANI;85% AF; greedy incremental clustering sequencing C nan 1 MIXS:0000085 Condition: Ancient virus genome only N nan
otu_seq_comp_appr OTU sequence comparison approach Tool and thresholds used to compare sequences when computing "species-level" OTUs software name, version and relevant parameters {software};{version};{parameters} blastn;2.6.0+;e-value cutoff: 0.001 sequencing C nan 1 MIXS:0000086 Condition: Ancient virus genome only N nan
otu_db OTU database Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in "species-level" OTUs, if any database and version {database};{version} NCBI Viral RefSeq;83 sequencing C nan 1 MIXS:0000087 Condition: Ancient virus genome only N nan
host_pred_appr host prediction approach Tool or approach used for host prediction enumeration [provirus|host sequence similarity|CRISPR spacer match|kmer similarity|co-occurrence|combination|other] CRISPR spacer match sequencing C nan 1 MIXS:0000088 Condition: Ancient virus genome only N nan
host_pred_est_acc host prediction estimated accuracy For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature false discovery rate {text} CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048) sequencing C nan 1 MIXS:0000089 Condition: Ancient virus genome only N nan
associated resource relevant electronic resources A related resource that is referenced, cited, or otherwise associated to the sequence. reference to resource {PMID} | {DOI} | {URL} http://www.earthmicrobiome.org/ sequencing C nan m MIXS:0000091 nan N nan
sop relevant standard operating procedures Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences reference to SOP {PMID}|{DOI}|{URL} http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/ sequencing C nan m MIXS:0000090 nan N nan
cultural_era nan The cultural era approximating the period in which the individual lived from https://chronontology.dainst.org/ or PeriodO. Specify when no value for 'sample_age' is present, or additionally to it. Chronotology or PeriodO term; text {termLabel} {[termID]}|{text} Copper Age [Chronotology: NW6hofAScJSE] investigation C nan m MIXS:XXXXXXX Question: switch to free-text? Conditional to sample_age Y nan
dna_extraction_date nan The date when the nucleic acids was extracted from the sample material. In case no exact time is available, the date can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant date {timestamp} 2015 nucleic acid sequence source M year 1 MIXS:XXXXXXX nan N nan
recovery_date nan Date of excavation or retrieval from burial or depositional context, if known date {timestamp} 1930 investigation X date 1 MIXS:XXXXXXX nan N nan
sample_age nan The approximate date that individual was living and then died, or the sample was exposed to the surface. Typically inferred from archaeological material, or biological material associated with a sediment, with radiocarbon dating and other chronometric methods. Should be midpoint of calibrated radiocarbon age. value from BP (1950) {integer} 123440 investigation C cal BP m MIXS:XXXXXXX nan N nan
sample_age_inference_methods nan The method used to infer the sample age. Method (14C, OLS, etc.) and associated information (lab code, etc). An enumerated list with age/depths model, corralative (relative dating), annual lamination, pollen records, diatom records, physical observation, etc. enumeration ??? C14 investigation C nan m MIXS:XXXXXXX nan Y [JAFY/MS] How to define, very heterogenous, maybe split: method/lab code
site_type nan The type of site the sediment cores where taken. E.g. ocean, marine, freshwater, brackish, ice caves, caves. text {text} cave environment E nan 1 MIXS:XXXXXXX nan Y [JAFY] can this be expanded to non-seidment like sites? Open air burials? Is there an overlap here with env_*_scale? Otherwise I think this would go to the environmental_packages
damage_treatment nan Indication of whether characteristic ancient DNA damage has been removed in a laboratory enumeration [none|partial-udg|full-udg|enriched|other] none sequencing M nan 1 MIXS:XXXXXXX nan N [JAFY] what if people upload BAMs of merged multiple libraries
experimental_procedures nan Provide a DOI to refer to the paper where the procedure is explained in more details PMID, DOI or URL {PMID}|{DOI}|{URL} nan sequencing X nan 1 MIXS:XXXXXXX nan Y [JAFY] What procedures does this refer to? Each paper will have many protocols to cite for extraction, library construction, reconditioning, etc. Duplicate of SOP?
lib_concentration nan Concentration of library in copies per µl, as inferred by qPCR. integer {integer} 123000000 sequencing X copies/µl 1 MIXS:XXXXXXX nan N nan
lib_index_polymerase nan The name of polymerase enzyme used to index DNA libraries text {text} Agilent PfuTurbo Cx HotStart sequencing C nan 1 MIXS:XXXXXXX nan Y [JAFY] Condition on what? And would including the SKU be useful to include too (as it's a more stable code)
lib_preparation_protocol nan Citation(s) for the DNA library preparation protocol text {text} Meyer and Kircher, 2010 sequencing C nan 1 MIXS:XXXXXXX nan Y [JAFY] Condition on what?
lib_reamplification_polymerase nan The name of polymerase enzyme used for reamplifying DNA libraries text {text} KAPA HiFi HotStart Uracil+ sequencing C nan 1 MIXS:XXXXXXX nan Y [JAFY] Condition on what?
lib_type library type The type of library created. Amplicon based or non-amplicon based. Amplicon based, is a library that result in rather short DNA fragments while non-amplicon-based referres to non targeted approach. enumeration [shotgun|amplicon] shotgun sequencing M nan 1 MIXS:XXXXXXX nan Y [JAFY] I'm not sure about the definition of amplicon here, it could be confusing - 'natural' aDNA is normally short, maybe need to rather change the definition to specify using primers targeting specific regions of the genome instead
neg_cont_status negative control status Specify whether the sample is a negative control or not. negative control status {boolean} nan investigation C nan 1 MIXS:XXXXXXX nan Y [JAFY] Wouldn't this be mandatory if it's boolean? A sample is either a negative control or not ... Would this also potentially not apply to all other tables? [AFG/BM] The problem with negative controls is that they come from diff batches so becomes hard to control. (M). Issue: in paleometagenome less negative controls are used as there is a more reliance on damage patterns and this is not the case for sedaDNA where negative controls should be mandatory.
num_capture_reamp_cycles nan Number of amplification cycles after capture enrichment number of amplification cycles {integer} 10 sequencing C nan 1 MIXS:XXXXXXX Condition: if performed N nan
num_reamp_cycles number of reamplification cycles Number of amplification cycles after library indexing PCR number of amplification cycles {integer} 8 sequencing C nan 1 MIXS:XXXXXXX Condition: if performed N nan
preservational_treatment preservation treatment Description of any treatment applied to samples for the purpose of maximising collection preservation that may influence downstream DNA recovery or library construction, such as storage fluid or reconstructive glue text {text} stored in formalin nucleic acid sequence source C nan 1 MIXS:XXXXXXX Condition: if present N nan
sample_alt_lab_ids alternative sample IDs Any alternate sample IDs used in by the research group publishing the paper or other groups. If known. text {text} ABC_24 investigation X nan 1 MIXS:XXXXXXX nan N nan
samp_decontam_pretreat sample decontamination pretreatment Method(s) employed for surface decontamination of samples of external modern DNA; Treatment used on the samples. Depends on the sample type. More relevant for bones than environmental samples. E.g. buffers, EDTA, etc. PMID, DOI or URL {PMID}|{DOI}|{URL}|{text} EDTA wash, 10.17504/protocols.io.bidyka7w sequencing X nan m MIXS:XXXXXXX nan N nan
prev_pubs previous publications Any publications that report data from the same body/skeleton/individual text {DOI}{URL}{PMID} 10.1126/science.7761839 investigation X nan m MIXS:XXXXXXX nan Y [AG/PH] Just DNA data, or also contextual/archaeolgoical publications?
collection_context_name context name where location was collected Name of where sample originated and is typically stored. Typically will be 'owning institution' text {text} Natural History London environment X nan 1 MIXS:XXXXXXX nan Y [AG/PH] e.g. Museum, community, field (want to avoid assumption that insitutuions instead of communities control samples)
ethical_authority ethical authority Name of the authority or institution that awarded sampling and analysis (e.g. human remains) and/or export permission (e.g. animal remains) text {text} Federal Foreign Office (Germany) investigation C nan m MIXS:XXXXXXX nan Y [JAFY] What is the condition on?
ethical_date date of ethical approval date Date of award of ethical/export permission. The date can be right truncated i.e. all of these are valid times: 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant date {timestamp} 2018-05-11T10:00:00+01:00; 2018-05-11 investigation C nan m MIXS:XXXXXXX nan Y [JAFY] What is the condition on?
ethical_id ethical permit/approval ID The permissions code or ID provided by the authority associated with approval of the analysis of this particular sample text {text} DE-123-JK investigation C nan m MIXS:XXXXXXX nan Y [JAFY] What is the condition on?
storage_conditions conditions of sample storage General conditions in which the sample was stored in long-term collection storage, that may influenced DNA recovery or library construction. For example, specify temperature, humidity, presence of microbial overgrowth etc.. text {text} Climate-controlled environment X nan 1 MIXS:XXXXXXX nan N nan
Environmental package Structured comment name Package item Definition Expected value Value syntax Example Requirement Preferred unit Occurrence MIXS ID Modification Suggestion Requires Further Discussion Reason for further discussion Completed
host-associated environment burial_context Description of the burial context from which sampled indivuals were recovered enumeration [primary inhumation|secondary inhumation|multiple inhumation|commingled assemblage|disartuclated remains|information not available] multiple inhumation X nan 1 MIXS:XXXXXXX N Y Is there some form of ontology for such a thing? nan
host-associated environment host_biological_sex Biological sex of the host individual enumeration [male|female|other|unknown] female X nan 1 MIXS:XXXXXXX N N nan nan
human-oral environment samp_sample_site Specific location in oral cavity enumeration {termLabel} {[termID]} maxilla left 3rd molar [UBERON:0002535] X nan m MIXS:XXXXXXX N N nan nan
sediment environment sediment_type The sediment type. E.g. silt, clay, organic, porous. text {text} cave X nan 1 MIXS:XXXXXXX N Y [PH/AI] Very important for molecular scientists to be able to distinguish between the different sediment types perhaps a blog post or workshop for the sedaDNA society from a sedimentologists nan
sediment environment sedimentation_rate The sedimentation rate calculated from age depths models. value {integer} ??? X nan ??? MIXS:XXXXXXX N Y [???] but with NA as its not always available. nan
sediment sampling coring_system Specify whether the sampling was with open or closed system. An open system is where no corers are used and the samples are exposed, e.g. in caves. A closed system is where a corer is used and the sampling procedure is ‘controlled’. enumeration [open|closed] closed X nan 1 MIXS:XXXXXXX N N nan nan
sediment sampling depths The depths at which the sediments where taken from. This depends on the source of sediments. E.g. water depths. value {integer} ??? X cm 1 MIXS:XXXXXXX N N [???] Should influence microbial activity (M) But include NA in the options as not applicable for all samples. nan
sediment sampling stratigraphic_horizon The depths at which the samples from the sediments were taken from the core. E.g. from a one meter long core, the sample corresponds to depths 50 cm and from caves from the surface. (M) value {integer} ??? X ??? 1 MIXS:XXXXXXX N Y [JAFY] the description here seems to correspond to depths, I think this needs to be revised nan