SequenceMatcher多个输入，而不是两个？

https://stackoverflow.com/questions/2562893

23-09-2019
|

题

想知道的最佳方式来处理该特定的问题，并且如果任何库（蟒优选，但我可以是柔性如果需要的话）。

我有每行的字符串的文件。我想找个时间最长的常见模式及其在每条线的位置。我知道，我可以使用SequenceMatcher比较线一和二，一和三，等等，然后结果相关联，但如果有一些已经不是吗？

在理想情况下这些比赛将在每一行的任何地方出现，但对于初学者，我可以与他们存在于同每行偏移精确，并从那里走。类似的东西，有一个良好的API来访问它的字符串表可能是理想的，但我没有发现任何东西至今的压缩库，符合这一描述。

有关实例与这些行：

\x00\x00\x8c\x9e\x28\x28\x62\xf2\x97\x47\x81\x40\x3e\x4b\xa6\x0e\xfe\x8b
\x00\x00\xa8\x23\x2d\x28\x28\x0e\xb3\x47\x81\x40\x3e\x9c\xfa\x0b\x78\xed
\x00\x00\xb5\x30\xed\xe9\xac\x28\x28\x4b\x81\x40\x3e\xe7\xb2\x78\x7d\x3e

我希望看到0-1，并且在所有行10-12匹配在相同的位置和第1行[4,5]匹配LINE2 [5,6]匹配line3中[7,8]。

谢谢，

解决方案

如果你想要的是找到在同一每行偏移常见字符串，你需要的是这样的：

matches = []
zipped_strings = zip(s1,s2,s3)
startpos = -1
for i in len(zipped_strings):
  c1,c2,c3 = zipped_strings[i]
  # if you're not inside a match, 
  #  look for matching characters and save the match start position
  if startpos==-1 and c1==c2==c3:
    startpos = i
  # if you are inside a match, 
  #  look for non-matching characters, save the match to matches, reset startpos
  elif startpos>-1 and not c1==c2==c3:
    matches.append((startpos,i,s1[startpos:i]))
    # matches will contain (startpos,endpos,matchstring) tuples
    startpos = -1
# if you're still inside a match when you run out of string, save that match too!
if startpos>-1:
  endpos = len(zipped_strings)
  matches.append((startpos,endpos,s1[startpos:endpos]))

要找到最长的常见模式不分地点，SequenceMatcher听起来像是个好主意，但不是STRING1至字符串2，然后STRING1比较STRING3并试图合并的结果，只得到字符串1和字符串的所有常见字符串（与get_matching_blocks），然后比较每个结果的，为STRING3所有三个串之间得到匹配。

其他提示

是你的问题的表现？

大怎么是你输入？

时的最小字符串匹配2长度？

请注意，你的例子是不正确的，我认为像您期望的不匹配您提供的样本串的结果。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow