API
API#
- chame.util.seq.count_gc(sequence)#
Counts the frequency of capital G and C letters in a given string or array of strings.
There is no check that only A, T, G and C are in the string.
- Args:
- sequence: str or array
A sequence or an array of sequences
- Returns:
A string if a string was provided. A numpy array in an array was provided.
- Return type
float
- chame.util.seq.sequence_to_onehot(sequence, mapping={'A': 0, 'C': 1, 'G': 2, 'T': 3}, map_unknown_to_x=False)#
Maps the sequence into a one-hot encoded matrix.
Follows the interface in AlphaFold.
- Args:
- sequence:
A sequence such as a sequence of nucleotides
- mapping (optional):
A dictionary mapping possible sequence items (nucleotides) to integers, { ACGT -> 0123 } by default.
- map_unknown_to_x (optional):
Items not in the mapping will be mapped to “X”. If there is no “X” in the mapping, an error will be thrown. False by default.
- Returns:
A numpy array of shape (seq_len, num_unique_items) with one-hot encoding of the sequence.
- Raises:
- ValueError:
If the mapping doesn’t contain values from 0 to num_unique_items - 1 without gaps.
- Return type