Blog Archive

Monday, August 6, 2012

Patch Data

This is a short Script to help patch one incomplete data set with another.

Let's say we have:
 [[2,'B'],[4,'D']]
and we want to fill it any missing numbers with:
 [[1,'A'],[2,'Q'],[3,'C'],[4,'Q'],[5,'E']]
#The data
data2 = [[2,'B'],[4,'D']]
fill2=[[1,'A'],[2,'Q'],[3,'C'],[4,'Q'],[5,'E']]


def patch(data,fill,column):
    
    #the start sample
    #---------------
    i=0
    c=column
    #pull of columns comparing
    #--------------------------
    data_c=[d[c] for d in data]
    fill_c=[f[c] for f in fill]
    
    #funct to insert at a particular place in data
    #---------------------------------------------
    ins=lambda x,i: data.insert(i,x)
    
    #ID's samples in fill that are not in the data samples
    #Places them into the data samples
    #--------------------------------------------
    for f in fill:
        if f[c] not in data_c:
            print str(f[c])+' not in data'
            #insert missing line into data
            ins(f,i)
            #reset the new data_c list to scan
            data_c=[d[c] for d in data]
        #bump up the counter
        i=i+1
        
    return data

#to patch by letter    
print patch(data2,fill2,0)


Now we see that our data has been patched
>>> 
1 not in data
3 not in data
5 not in data
[[1, 'A'], [2, 'B'], [3, 'C'], [4, 'D'], [5, 'E']]

We can also patch by letter instead
#to patch by number   
print patch(data2,fill2,1)

>>> 
A not in data
Q not in data
C not in data
E not in data
[[1, 'A'], [2, 'Q'], [3, 'C'], [2, 'B'], [5, 'E'], [4, 'D']]

No comments:

Post a Comment