Repeated DNA Sequences In Java

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example, given String s = “AAAAAACCCCCAAAAAACCCCCCAAAAAAGGGGTTTT”, return: [“AAAAACCCCC”, “CCCCCAAAAA”,”CCCCAAAAAA”].

Solution:
The key to solving the problem is that each of the 4 nucleotides can be stored in 2 bits. So the 10 letter long sequence can be converted to 20 bits long integer. The following is a Java solution. You may use an example to manually execute the program and see how it works.

package com.algorithm.dna;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class RepeatedDNA {

    public static void main(String[] args) {
        String s = "AAAAAACCCCCAAAAAACCCCCCAAAAAAGGGGTTTT";
        for (String output : findRepeatedDnaSequences(s)) {
            System.out.println(output);
        }
    }

    public static List<String> findRepeatedDnaSequences(String s) {
        List<String> result = new ArrayList<String>();
        int len = s.length();
        if (len < 10) {
            return result;
        }

        Map<Character, Integer> map = new HashMap<Character, Integer>();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);

        Set<Integer> temp = new HashSet<Integer>();
        Set<Integer> added = new HashSet<Integer>();

        int hash = 0;
        for (int i = 0; i < len; i++) {
            if (i < 9) {
                hash = (hash << 2) + map.get(s.charAt(i));
            } else {
                hash = (hash << 2) + map.get(s.charAt(i));
                hash = hash & (1 << 20) - 1;
                if (temp.contains(hash) && !added.contains(hash)) {
                    result.add(s.substring(i - 9, i + 1));
                    added.add(hash);
                } else {
                    temp.add(hash);
                }
            }
        }
        return result;
    }
}

Bitwise AND of Numbers range

Problem:
Given a range [m,n] where 0 <= m <= n <= 2147483647, return the bitwise AND of all numbers in this range, inclusive. For example, given the range [5,7], you should return 4.

Solution:
The key to solving this problem is bitwise AND consecutive number. You can use the following the example to walk through the code.

———————————————————————————
  8    4    3    2
———————————————————————————
 5 | 0 1 0 1
 6 | 0 1 1 0
 7 | 0 1 1 1
———————————————————————————

package com.algorithm.bitwise;

public class BitwiseAND {

    public static void main(String[] args) {
        System.out.println(rangeBitwiseAnd(5, 7));
    }

    public static int rangeBitwiseAnd(int m, int n) {
        while (n > m) {
            n = n & n - 1;
        }
        return m & n;
    }
}

Sum of two integers without plus operator

Problem:
Calculate the sum of two integers a and b, but you are not supposed to use + operator. For example, given a=1 and b=2, return 3.

Solution:
Given two numbers a and b, a&b returns the number formed by 1 bits on a and b. When it is left shifted by 1 bit, it is the carry.
For example, give a = 101 and b = 111(binary), a&b = 101. a&b << 1 =1010. a^b is the number formed by different bits of a and b. a&b=10.

package com.algorithm.twosum;

public class TwoSumWithoutPlusOperator {

    public static void main(String[] args) {
        System.out.println("Sum of a and b is: " + getSum(1, 2));
    }

    private static int getSum(int a, int b) {
        while (b != 0) {
            int c = a & b;
            a = a ^ b;
            b = c << 1;
        }
        return a;
    }
}

As per the above, the output is like Sum of a and b is: 3