I dont know if this the best algorithm to solve this (I'm sure, this is not), but, I guess this is a good one. The idea is simple, start by searching for the lowest token from blob2 in blob1. When you find a match, try to see if you can match biggers tokens at this position. If this is true, update your token length.
Continue your search, from your last stop, but, at this time, searching for token with the updated token length from blob2. When you find a match, try to see if you can match biggers tokens at this position. If this is true, update your token length. Repeat the previous procedure until the end of your buffer.
Bellow is a simple fluxogram trying to explain this algorithm and, in sequence, a simple complete program, showing an implementation.
#include <algorithm>
#include <vector>
#include <iostream>
/////////////////////0123456789012345678901234567
const char str1[] = "HELLO LOOO HELOO LOOO LOO JU";
const char str2[] = "LOOO TUS";
int main()
{
std::vector<char> blob1(strlen(str1));
std::vector<char> blob2(strlen(str2));
blob1.reserve(30);
blob2.reserve(30);
std::copy(str1, str1+strlen(str1), blob1.begin());
std::copy(str2, str2+strlen(str2), blob2.begin());
auto next = blob1.begin();
auto tokenLength = 1;
auto position = -1;
while ( std::next(next, tokenLength) < blob1.end() ) {
auto current = std::search(next,
blob1.end(),
blob2.begin(),
std::next(blob2.begin(), tokenLength));
if (current == blob1.end() )
break;
position = std::distance(blob1.begin(), current);
next = std::next(current, 1);
for (auto i = tokenLength; std::next(blob2.begin(), i) < blob2.end(); ++i) {
auto x = std::search(std::next(current, i),
std::next(current, i + 1),
std::next(blob2.begin(), i),
std::next(blob2.begin(), i + 1));
if ( x != std::next(current, i) )
break;
++tokenLength;
}
}
std::cout << "Index: " << position << ", length: " << tokenLength << std::endl;
}