Columns
| Column | Type | Size | Nulls | Auto | Default | Children | Parents | Comments | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| sequence_id | serial | 10 | √ | nextval('sequence_sequence_id_seq'::regclass) |
|
|
||||||||||
| experiment_type_id | int4 | 10 | null |
|
|
|||||||||||
| virus_id | int4 | 10 | null |
|
|
|||||||||||
| host_sample_id | int4 | 10 | null |
|
|
|||||||||||
| sequencing_project_id | int4 | 10 | null |
|
|
|||||||||||
| accession_id | varchar | 2147483647 | null |
|
|
Sequence identifier as extracted from the data source |
||||||||||
| alternative_accession_id | varchar | 2147483647 | √ | null |
|
|
Sequence alternative identifier as extracted from the original data source or another one |
|||||||||
| strain_name | varchar | 2147483647 | √ | null |
|
|
Name of strain of the sequence |
|||||||||
| is_reference | bool | 1 | null |
|
|
True when the sequence is the reference one (from RefSeq) for the virus species, False when the sequence is not the reference one |
||||||||||
| is_complete | bool | 1 | √ | null |
|
|
True when the sequence is complete, False when the sequence is partial. When not available from original source, we set False if its length is less than 95% of the reference sequence length, otherwise we set N/D since completeness cannot be determined with needed accuracy. |
|||||||||
| strand | varchar | 2147483647 | √ | null |
|
|
Strand to which the sequence belongs to (either positive or negative) |
|||||||||
| length | int4 | 10 | √ | null |
|
|
Number of nucleotides of the sequence |
|||||||||
| gc_percentage | float8 | 17,17 | √ | null |
|
|
Percentage of read G and C bases |
|||||||||
| n_percentage | float8 | 17,17 | √ | null |
|
|
Percentage of unknown bases |
|||||||||
| lineage | varchar | 2147483647 | √ | null |
|
|
Sequence lineage derived from source (for COG-UK) or calculated with the Pangolin software https://cov-lineages.org/pangolin.html (for other sources) |
|||||||||
| clade | varchar | 2147483647 | √ | null |
|
|
Clade as computed by GISAID (when available) |
|||||||||
| gisaid_only | bool | 1 | null |
|
|
True if sequence is only available via GISAID, False if available also in GenBank or COG-UK |
Indexes
| Constraint Name | Type | Sort | Column(s) |
|---|---|---|---|
| sequence_pkey | Primary key | Asc | sequence_id |
| seq__accession_id | Must be unique | ||
| seq__alternative_accession_id | Must be unique | ||
| seq__experiment_id | Performance | Asc | experiment_type_id |
| seq__host_id | Performance | Asc | host_sample_id |
| seq__seq_proj_id | Performance | Asc | sequencing_project_id |
| seq__virus_id | Performance | Asc | virus_id |
| sequence__is_reference__idx | Performance | Asc | is_reference |



