Codility Genomic Range Query

Question 1

Ah, I was working on the same thing and it took me a long~~~ time to debug, but i managed to get 100/100 in the end.

For example, when S='AGT', and P=[1], Q=[2], the function should return 3 for G, but yours (and mine originally) will return 4 for T

I think this will fix it:

if l > 0: pre_sum = [sol[0][l-1],sol[1][l-1],sol[2][l-1],sol[3][l-1]]

Question 2

This works 100/100 too

def solution(S, P, Q):
    res = []
    for i in range(len(P)):
        if 'A' in S[P[i]:Q[i]+1]:
            res.append(1)
        elif 'C' in S[P[i]:Q[i]+1]:
            res.append(2)
        elif 'G' in S[P[i]:Q[i]+1]:
            res.append(3)
        else:
            res.append(4)
    return res

Question 3

Score 100/100 O(N+M) algorithm without any tricks with language specific implementation of in or contains operators:

Lets define prefix as:
 * last index of particular nucleone before on in current position. If no prev occcurance put -1.
 * 
 * 
 * indexes:     0   1   2   3   4   5   6
 * factors:     2   1   3   2   2   4   1
 *              C   A   G   C   C   T   A
 *              
 * prefix : A  -1   1   1   1   1   1   6
 *          C   0   0   0   3   4   4   4
 *          G  -1  -1   2   2   2   2   2
 *          T  -1  -1  -1  -1  -1   5   5
 *
 * Having such defined prefix let us easily calculate answer question of minimal factor in following way:
 * subsequence S[p]S[p+1]...S[q-1]S[q] has the lowest factor:
 * 1 if prefix index [A][q] >= p
 * 2 if prefix index [C][q] >= p
 * 3 if prefix index [G][q] >= p
 * 4 if prefix index [T][q] >= p

My implementation of this idea

Question 4

If someone is still interested in this exercise, I share my Python solution (100/100 in Codility)

def solution(S, P, Q):

    count = []
    for i in range(3):
        count.append([0]*(len(S)+1))

    for index, i in enumerate(S):
        count[0][index+1] = count[0][index] + ( i =='A')
        count[1][index+1] = count[1][index] + ( i =='C')
        count[2][index+1] = count[2][index] + ( i =='G')

    result = []

    for i in range(len(P)):
      start = P[i]
      end = Q[i]+1

      if count[0][end] - count[0][start]:
          result.append(1)
      elif count[1][end] - count[1][start]:
          result.append(2)
      elif count[2][end] - count[2][start]:
          result.append(3)
      else:
          result.append(4)

    return result

Question 5

We can calculate distances from the current position (i=0,1,...,N-1) to the nearest previous nucleotide for each type of nucleotide, where all previous nucleotides and the current one (at current position) are considered.

The distances array pre_dists will be something like:

    |   C   A    G    C    C    T    A  |
----|-----------------------------------|
 A  |  -1   0    1    2    3    4    0  |
 C  |   0   1    2    0    0    1    2  |
 G  |  -1  -1    0    1    2    3    4  |
 T  |  -1  -1   -1   -1   -1    0    1  |

Based on this distance data, I can get the minimal impact factor for any slice.

My implementation in Python:

def solution(S, P, Q):
    
    N = len(S)
    M = len(P)

    # impact factors
    I = {'A': 1, 'C': 2, 'G': 3, 'T': 4}
    
    # distance from current position to the nearest nucleotide
    # for each nucleotide type (previous or current nucleotide are considered) 
    # e.g. current position is 'A' => the distance dist[0] = 0, index 0 for type A
    #                          'C' => the distance dist[1] = 0, index 1 for type C
    pre_dists = [[-1]*N,[-1]*N,[-1]*N,[-1]*N]

    # initial values
    pre_dists[I[S[0]]-1][0] = 0

    for i in range(1, N):
        
        for t in range(4):
            if pre_dists[t][i-1] >= 0:
                # increase the distances
                pre_dists[t][i] = pre_dists[t][i-1] + 1

        # reset distance for current nucleotide type
        pre_dists[I[S[i]]-1][i] = 0
    
    # result keeper
    res = [0]*M

    for k in range(M):
        p = P[k]
        q = Q[k]

        if pre_dists[0][q] >=0 and q - pre_dists[0][q] >= p:
            res[k] = 1
        elif pre_dists[1][q] >=0 and q - pre_dists[1][q] >= p:
            res[k] = 2
        elif pre_dists[2][q] >=0 and q - pre_dists[2][q] >= p:
            res[k] = 3
        else:
            res[k] = 4
    
    return res

I hope this helped. Thanks!

Question 6

100/100

def solution(S,P,Q):
    d = {"A":0,"C":1,"G":2,"T":3}
    n = len(S)
    pref = [[0,0,0,0]]*(n+1)
    for i in range(0,n):
        pref[i] = [x for x in pref[i-1]]
        pref[i][d[S[i]]] += 1
    lst = []
    for i in range(0,len(P)):
        if Q[i] == P[i]:
            lst.append(d[S[P[i]]]+1)
        else:
            x = 0
            while x < 4:
                if pref[Q[i]][x] - pref[P[i]-1][x] > 0:
                    lst.append(x+1)
                    break
                x += 1
    return lst

Question 7

100% for Python3.6:

def solution(S, P, Q):

    NUCLEOTIDES = 'ACGT'
    IMPACTS = {nucleotide: impact for impact, nucleotide in enumerate(NUCLEOTIDES, 1)}

    result = []

    for query in range(len(P)):
        sample = S[P[query]:Q[query]+1]

        for nucleotide, impact in IMPACTS.items():
            if nucleotide in sample:
                result.append(impact)
                break

    return result

Question 8

I found excellent result on GenomicRangeQuery that scores 100%.

def solution(s,p,q):
    n = len(p)
    r = [0]*n

    for i in range(n):
        pi=p[i]
        qi=q[i]+1
        ts=s[pi:qi]
        if 'A' in ts:
            r[i]=1
        elif 'C' in ts:
            r[i]=2
        elif 'G' in ts:
            r[i]=3
        elif 'T' in ts:
            r[i]=4
    return r

s,p,q = 'CAGCCTA', [2, 5, 0], [4, 5, 6]
solution(s,p,q)

Question 9

This is my solution with O(N+M) time complexity

def solution(S, P, Q):

    ans = [0] * len(P)

    for i in range(0, len(P)):

        nucleotide = S[P[i]: Q[i]+1]

        if 'A' in nucleotide:
            ans[i] = 1
        elif 'C' in nucleotide:
             ans[i] = 2
        elif 'G' in nucleotide:
             ans[i] = 3
        elif 'T' in nucleotide:
            ans[i] = 4
    
    return ans