0

given a dataframe df

df = pandas.DataFrame(data=[1,0,0,1,1,1,0,1,0,1,1,1],columns = ['A'])

df

Out[20]: 
   A
0  1
1  0
2  0
3  1
4  1
5  1
6  0
7  1
8  0
9  1
10  1
11  1

I would like to find the start and end index of interval of ones larger than 3. In this case what I expect is (3,5 and 9,11)

  • Right, and your attempt was what? What went wrong? Did you shift the column? – roganjosh Apr 15 at 19:43
  • What have you tried so far? – BenT Apr 15 at 19:43
  • @gabboshow, what do you mean by interval larger than 3? – lmiguelvargasf Apr 15 at 19:43
  • @roganjosh ,BenT I ve tried many things... – gabboshow Apr 15 at 19:47
2

Use the shifting cumsum trick to mark consecutive groups, then use groupby to get indices and filter by your conditions.

v = (df['A'] != df['A'].shift()).cumsum()
u = df.groupby(v)['A'].agg(['all', 'count'])
m = u['all'] & u['count'].ge(3)

df.groupby(v).apply(lambda x: (x.index[0], x.index[-1]))[m]

A
3     (3, 5)
7    (9, 11)
dtype: object
0

I don't explicitly know Pandas, but I do know Python, and took this as a small challenge:

def find_sub_in_list(my_list, sublist, greedy=True):
    matches = []
    results = []
    for item in range(len(my_list)):
        aux_list = my_list[item:]
        if len(sublist) > len(aux_list) or len(aux_list) == 0:
            break

        start_match = None
        end_pos = None

        if sublist[0] == my_list[item]:
            start_match = item
            for sub_item in range(len(sublist)):
                if sublist[sub_item] != my_list[item+sub_item]:
                    end_pos = False
        if end_pos == None and start_match != None:
            end_pos = start_match+len(sublist)
            matches.append([start_match, end_pos])

    if greedy:
        results = []
        for match in range(len(matches)-1):
            if matches[match][1] > matches[match+1][0]:
                results.append([matches[match][0], matches[match+1][1]])
            else:
                results.append(matches[match])

    else:
        results = matches

    return results


my_list = [1,1,1,0,1,1,0,1,1,1,1]

interval = 3
sublist = [1]*interval

matches = find_sub_in_list(my_list, sublist)
print(matches)
  • hey thanks fr the effort! – gabboshow Apr 15 at 21:15

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.