Efficiently finding intersecting regions in two huge dictionaries

Question 1

I really recommend you to use PANDAS to cope with this kind of problem.

for proof that can be simply done with pandas:

import pandas as pd  #install this, and read de docs
from StringIO import StringIO #You dont need this

#simulating a reading the file 
first_file = """contig17 GRMZM2G052619_P03 x
contig33 AT2G41790.1 x
contig98 GRMZM5G888620_P01 x
contig102 GRMZM5G886789_P02 x
contig123 AT3G57470.1 x"""

#simulating reading the second file
second_file = """y GRMZM2G052619_P03 y
y GRMZM5G888620_P01 y
y GRMZM5G886789_P02 y"""

#here is how you open the files. Instead using StringIO
#you will simply the file path. Give the correct separator
#sep="\t" (for tabular data). Here im using a space.
#In name, put some relevant names for your columns
f_df = pd.read_table(StringIO(first_file), 
                     header=None, 
                     sep=" ", 
                     names=['a', 'b', 'c'])
s_df = pd.read_table(StringIO(second_file), 
                     header=None, 
                     sep=" ", 
                     names=['d', 'e', 'f'])
#this is the hard bit. Here I am using  a bit of my experience with pandas
#Basicly it select the rows in the second data frame, which "isin"
#in the second columns for each data frames. 
my_df = s_df[s_df.e.isin(f_df.b)]

Output: Out[180]:

    d   e                   f
0   y   GRMZM2G052619_P03   y
1   y   GRMZM5G888620_P01   y
2   y   GRMZM5G886789_P02   y
#you can save this with:
my_df.to_csv("result.txt", sep="\t")

chers!

Question 2

This is almost the same but within a function.

#Creates a function to do the reading for each file
def read_store(file_, dictio_): 
    """Given a file name and a dictionary stores the values
    of the file in a dictionary by its value on the column provided."""
    import re 
    with open(file_,'r') as file_0:
        lines_file_0 = fileA.readlines()
    for line in lines_file_0:
        ID = re.findall("^.+\s+(\w+)", line) 
    #I couldn't check it but it should match whatever is after a separate
    # character that has letters, numbers or underscore
        dictio_[ID] = line

To use do:

file1 = {}
read_store("file1.txt", file1)

And then compare it normally as you do, but I would to use \s instead of \t to split. Even though it will split also between words, but that is easy to rejoin with " ".join(DictA[1:5])