sequence


Description

The Sequence table represents a viral sequence of either DNA or RNA. In both cases, sequences are composed of nucleotides (A, C, G, T).

Columns

Column Type Size Nulls Auto Default Children Parents Comments
sequence_id serial 10 nextval('sequence_sequence_id_seq'::regclass)
annotation.sequence_id annotation_sequence_id_fkeyR
annotation_sequence.sequence_id annotation_sequence_sequence_id_fkeyR
nucleotide_sequence.sequence_id nucleotide_sequence_sequence_id_fkeyR
nucleotide_variant.sequence_id nucleotide_variant_sequence_id_fkeyR
experiment_type_id int4 10 null
experiment_type.experiment_type_id sequence_experiment_type_id_fkeyR
virus_id int4 10 null
virus.virus_id sequence_virus_id_fkeyR
host_sample_id int4 10 null
host_sample.host_sample_id sequence_host_sample_id_fkeyR
sequencing_project_id int4 10 null
sequencing_project.sequencing_project_id sequence_sequencing_project_id_fkeyR
accession_id varchar 2147483647 null

Sequence identifier as extracted from the data source

alternative_accession_id varchar 2147483647 null

Sequence alternative identifier as extracted from the original data source or another one

strain_name varchar 2147483647 null

Name of strain of the sequence

is_reference bool 1 null

True when the sequence is the reference one (from RefSeq) for the virus species, False when the sequence is not the reference one

is_complete bool 1 null

True when the sequence is complete, False when the sequence is partial. When not available from original source, we set False if its length is less than 95% of the reference sequence length, otherwise we set N/D since completeness cannot be determined with needed accuracy.

strand varchar 2147483647 null

Strand to which the sequence belongs to (either positive or negative)

length int4 10 null

Number of nucleotides of the sequence

gc_percentage float8 17,17 null

Percentage of read G and C bases

n_percentage float8 17,17 null

Percentage of unknown bases

lineage varchar 2147483647 null

Sequence lineage derived from source (for COG-UK) or calculated with the Pangolin software https://cov-lineages.org/pangolin.html (for other sources)

clade varchar 2147483647 null

Clade as computed by GISAID (when available)

gisaid_only bool 1 null

True if sequence is only available via GISAID, False if available also in GenBank or COG-UK

Indexes

Constraint Name Type Sort Column(s)
sequence_pkey Primary key Asc sequence_id
seq__accession_id Must be unique
seq__alternative_accession_id Must be unique
seq__experiment_id Performance Asc experiment_type_id
seq__host_id Performance Asc host_sample_id
seq__seq_proj_id Performance Asc sequencing_project_id
seq__virus_id Performance Asc virus_id
sequence__is_reference__idx Performance Asc is_reference

Relationships