designer_dna.oligos

Common utility functions to work with and analyze oligonucleotide sequences.

Functions

complement(sequence[, dna])

Complement a nucleotide sequence.

complement_py(sequence[, dna])

Return the complement of a nucleotide sequence.

manacher(sequence[, dna])

Find the longest palindromic substring within a nucleotide sequence.

nrepeats(sequence, n)

Calculate the maximum observed repeats of composite pattern size n characters.

nrepeats_py(sequence, n)

Calculate the longest substring of n repeating characters.

palindrome(sequence[, dna])

Find the longest palindromic substring within a nucleotide sequence.

palindrome_py(sequence[, dna])

Find the longest substring palindrome within a nucleotide sequence.

reverse(sequence)

Reverse a nucleotide sequence.

reverse_py(sequence)

Reverse a nucleotide sequence.

reverse_complement(sequence[, dna])

Reverse complement a nucleotide sequence.

reverse_complement_py(sequence[, dna])

Reverse complement a nucleotide sequence.

stretch(sequence)

Return the maximum length of a single letter (nucleotide) repeat in a string.

stretch_py(sequence)

Calculate the maximum stretch of a single character in a string.

designer_dna.oligos.complement(sequence, dna=True)

Complement a nucleotide sequence.

Parameters:
  • sequence (str) – Nucleotide sequence string.

  • dna (bool) – Sequence is DNA, else RNA.

Returns:

(str) Complement of a nucleotide sequence string.

Examples

complement("ATGC", True) == "TACG"
complement("ATGC", False) == "UACG"
designer_dna.oligos.complement_py(sequence: str, dna: bool = True) str[source]

Return the complement of a nucleotide sequence.

Parameters:
  • sequence (str) – Nucleotide sequence string.

  • dna (bool) – If true, treat sequence as DNA, otherwise treat as RNA

Returns:

Complement of input sequence.

Return type:

(str)

Examples

complement_py("ATGC", True) == "TACG"
complement_py("ATGC", False) == "UACG"
designer_dna.oligos.manacher(sequence, dna=True)

Find the longest palindromic substring within a nucleotide sequence.

Parameters:
  • sequence (str) – Nucleotide sequence string.

  • dna (bool) – Sequence is DNA, else RNA.

Returns:

(str) Longest palindromic substring within a sequence.

Notes

  • This is a cython/c++ implementation of the O(n) Manacher’s algorithm.

designer_dna.oligos.nrepeats(sequence, n)

Calculate the maximum observed repeats of composite pattern size n characters.

Parameters:
  • sequence (str) – Nucleotide sequence string.

  • n (int) – Size of k-mers (composite pattern) to observe.

Returns:

(int) The longest tandem run of nucleotides comprised of a composite pattern of length n characters.

Raises:

ZeroDivisionError – if value of n is 0.

Examples

nrepeats("AAAA", 1) == 3  #  True
nrepeats("AAAA", 2) == 1  #  True
nrepeats("ACAACAACA", 3) == 2  #  True
designer_dna.oligos.nrepeats_py(sequence: str, n: int) int[source]

Calculate the longest substring of n repeating characters.

Parameters:
  • sequence (str) – Nucleotide string or Series of string

  • n (int) – stretch of k-mer to observe

Returns:

(int) The longest run of repeating n-length characters.

Raises:

ValueError – when n < 1

Examples

nrepeats_py("AAAA", 1) == 3  #  True
nrepeats_py("AAAA", 2) == 1  #  True
nrepeats_py("ACAACAACA", 3) == 2  #  True
designer_dna.oligos.palindrome(sequence, dna=True)

Find the longest palindromic substring within a nucleotide sequence.

Parameters:
  • sequence (str) – Nucleotide sequence string.

  • dna (bool) – Sequence is DNA, else RNA.

Returns:

(str) longest palindromic subsequence within sequence.

Examples

palindrome("ATAT") == "ATAT"
palindrome("GATATG") == "ATAT"
palindrome("ANT") == "ANT" # Handles degenerate bases

Notes

  • If a sequence contains two or more palindromic substrings of equal size, the first leftmost palindrome is prioritized.

designer_dna.oligos.palindrome_py(sequence: str, dna: bool = True) str[source]

Find the longest substring palindrome within a nucleotide sequence.

Parameters:
  • sequence (str) – Nucleotide sequence string.

  • dna (bool) – If true, treat sequence as DNA, otherwise treat as RNA

Returns:

longest palindromic subsequence within sequence.

Return type:

(str)

Examples

palindrome_py("ATAT") == "ATAT"
palindrome_py("GATATG") == "ATAT"

Notes

  • Algorithmic time complexity is O(N).

  • If a sequence contains two or more palindromic substrings of equal size, the first leftmost palindrome is prioritized.

designer_dna.oligos.reverse(sequence)

Reverse a nucleotide sequence.

Parameters:

sequence (str) – Nucleotide sequence string.

Returns:

(str) Reverse a string.

Examples

reverse("ATATAT") == "TATATA"
reverse("AATATA") == "ATATAA"
designer_dna.oligos.reverse_complement(sequence, dna=True)

Reverse complement a nucleotide sequence.

Parameters:
  • sequence (str) – Nucleotide sequence string.

  • dna (bool) – Sequence is DNA, else RNA.

Returns:

(str) Reverse complement of sequence string.

Examples

reverse_complement("ATGC", True) == "GCAT"
reverse_complement("ATGC", False) == "GCAU"
designer_dna.oligos.reverse_complement_py(sequence: str, dna: bool = True) str[source]

Reverse complement a nucleotide sequence.

Parameters:
  • sequence (str) – Nucleotide sequence string.

  • dna (bool) – sequence is dna, else rna.

Returns:

(str) Reverse complement of sequence string.

Examples

reverse_complement_py("ATGC", True) == "GCAT"
reverse_complement_py("ATGC", False) == "GCAU"
designer_dna.oligos.reverse_py(sequence: str) str[source]

Reverse a nucleotide sequence.

Parameters:

sequence (str) – Nucleotide sequence string.

Returns:

(str) Reverse a string.

Examples

reverse_py("ATATAT") == "TATATA"
reverse_py("AATATA") == "ATATAA"
designer_dna.oligos.stretch(sequence)

Return the maximum length of a single letter (nucleotide) repeat in a string.

Parameters:

sequence (str) – Nucleotide sequence string.

Returns:

(int) Length of maximum run of a single letter.

Examples

stretch("ATATAT") == 0  # True
stretch("AATATA") == 1  # True
designer_dna.oligos.stretch_py(sequence: str) int[source]

Calculate the maximum stretch of a single character in a string.

Parameters:

sequence (str) – Nucleotide sequence string.

Returns:

maximum length observed within sequence of a repeated character.

Return type:

(int)

Examples

stretch_py("AAAA") == 3
stretch_py("AATT") == 1