Array Searching code challenge
-
21-08-2019 - |
Question
Here's my (code golf) challenge: Take two arrays of bytes and determine if the second array is a substring of the first. If it is, output the index at which the contents of the second array appear in the first. If you do not find the second array in the first, then output -1.
Example Input: { 63, 101, 245, 215, 0 } { 245, 215 }
Expected Output: 2
Example Input 2: { 24, 55, 74, 3, 1 } { 24, 56, 74 }
Expected Output 2: -1
Edit: Someone has pointed out that the bool is redundant, so all your function has to do is return an int representing the index of the value or -1 if not found.
Solution
J
37 characters for even more functionality than requested: it returns a list of all matching indices.
I.@(([-:#@[{.>@])"_ 0(<@}."0 _~i.@#))
Usage:
NB. Give this function a name i =: I.@(([-:#@[{.>@])"_ 0(<@}."0 _~i.@#)) NB. Test #1 245 215 i 63 101 245 215 0 2 NB. Test #2 - no results 24 56 74 i 24 55 74 3 1 NB. Test #3: matches in multiple locations 1 1 i 1 1 1 2 1 1 3 0 1 4 NB. Test #4: only exact substring matches 1 2 i 0 1 2 3 1 0 2 1 2 0 1 7
NB. list[0 to end], list[1 to end], list[2 to end], ...
<@}."0 _~i.@#
NB. Does the LHS completely match the RHS (truncated to match LHS)?
[-:#@[{.>@]
NB. boolean list of match/no match
([-:#@[{.>@])"_ 0(<@}."0 _~i.@#)
NB. indices of *true* elements
I.@(([-:#@[{.>@])"_ 0(<@}."0 _~i.@#))
OTHER TIPS
Common lisp:
(defun golf-code (master-seq sub-seq) (search sub-seq master-seq))
PostScript, 149 146 170 166 167 159 characters (in the "do the work" part):
% define data
/A [63 101 245 215 0] def
/S [245 215] def
% do the work
/d{def}def/i{ifelse}d/l S length 1 sub d/p l d[/C{dup[eq{pop -1}{dup S p
get eq{pop p 0 eq{]length}{/p p 1 sub d C}i}{p l eq{pop}if/p l d C}i}i}d
A aload pop C
% The stack now contains -1 or the position
Note that this find the last occurance of the subarray if it is contained more than once.
Revision history:
- Replace
false
by[[ne
andtrue
by[[eq
to save three characters - Removed a bug that could cause a false negative if the last element of
S
appears twice inA
. Unfortunately, this bugfix has 24 characters. - Made the bugfix a little cheaper, saving four chars
- Had to insert a space again because a dash is a legal character in a name. This syntax error wasn't caught because the test case didn't reach this point.
- Stopped returning the bools as the OP doesn't require them anymore. Saves 8 chars.
Explained version:
Unfortunately, the SO syntax highlighter doesn't know PostScript so readability is still limited.
/A [63 101 245 215 0] def
/S [245 215 ] def
/Slast S length 1 sub def % save the index of the last element of S,
% i.e. length-1
/Spos Slast def % our current position in S; this will vary
[ % put a mark on the bottom of the stack, we need this later.
/check % This function recursively removes values from the stack
% and compares them to the values in S
{
dup [
eq
{ % we found the mark on the bottom, i.e. we have no match
pop -1 % remove the mark and push the results
}
{ % we're not at the mark yet
dup % save the top value (part of the bugfix)
S Spos get
eq
{ % the top element of the stack is equal to S[Spos]
pop % remove the saved value, we don't need it
Spos 0
eq
{ % we are at the beginning of S, so the whole thing matched.
] length % Construct an array from the remaining values
% on the stack. This is the part of A before the match,
% so its length is equal to the position of the match.
% Hence we push the result and we're done.
}
{ % we're not at the beginning of S yet, so we have to keep comparing
/Spos Spos 1 sub def % decrease Spos
check % recurse
}
ifelse
}
{ % the top element of the stack is different from S[Spos]
Spos Slast eq {pop} if % leave the saved top value on the stack
% unless we're at the end of S, because in
% this case, we have to compare it to the
% last element of S (rest of the bugfix)
/Spos Slast def % go back to the end of S
check % recurse
}
ifelse
}
ifelse
}
def % end of the definition of check
A aload % put the contents of A onto the stack; this will also push A again,
% so we have to ...
pop % ...remove it again
check % And here we go!
C99
#include <string.h>
void find_stuff(void const * const array1, const size_t array1length, /* Length in bytes, not elements */
void const * const array2, const size_t array2length, /* Length in bytes, not elements */
char * bReturnBool,
int * bReturnIndex)
{
void * found = memmem(array1, array1length, array2, array2length);
*bReturnBool = found != NULL;
*bReturnIndex = *bReturnBool ? found - array1 : -1;
}
In shorthand, and a bit a LOT messier:
#include <string.h>
#define f(a,b,c,d,e,f) { void * g = memmem(a, b, c, d); f = (e = !!g) ? g - a : -1; }
Python 2 & 3, 73 68 58 Characters
Based on Nikhil Chelliah's answer kaiser.se's answer:
>>> t=lambda l,s:''.join(map(chr,l)).find(''.join(map(chr,s)))
>>> t([63, 101, 245, 215, 0], [245, 215])
2
>>> t([24, 55, 74, 3, 1], [24, 56, 74])
-1
Python 3, 41 36 Characters
Thanks in part to gnibbler:
>>> t=lambda l,s:bytes(l).find(bytes(s))
>>> t([63, 101, 245, 215, 0], [245, 215])
2
>>> t([24, 55, 74, 3, 1], [24, 56, 74])
-1
Haskell, 68 64 Characters
Argument order as specified by the OP:
import List;t l s=maybe(-1)id$findIndex id$map(isPrefixOf s)$tails l
As ephemient points out, we can switch the arguments and reduce the code by four characters:
import List;t s=maybe(-1)id.findIndex id.map(isPrefixOf s).tails
In Python:
def test(large, small):
for i in range(len(large)):
if large[i:i+len(small)] == small:
return i
return -1
But since people want terse, not elegant:
def f(l,s):
for i in range(len(l)):
if l[i:i+len(s)]==s:return i
return -1
Which is 75 characters, counting whitespace.
Ruby, using Array#pack (41 chars body):
def bytearray_search(a,b)
(i=b.pack('C*').index(b.pack('C*')))?i:-1
end
Perl (36 chars body, excluding parameter handling):
sub bytearray_search {
($a,$b) = @_;
index(pack('C*',@$a),pack('C*',@$b))
}
I feel that I'm cheating, but using Perl this would do what the OP wants:
sub byte_substr {
use bytes;
index shift,shift
}
Normally index()
in Perl works on strings with character semantics, but the "use bytes" pragma makes it use byte segmantics instead. From the manpage:
When "use bytes" is in effect, the encoding is temporarily ignored, and each string is treated as a series of bytes.
Another one in Python:
def subarray(large, small):
strsmall = ' '.join([str(c).zfill(3) for c in small])
strlarge = ' '.join([str(c).zfill(3) for c in large])
pos = strlarge.find(strsmall)
return ((pos>=0), pos//4)
Ruby 1.9 (44B)
_=->a,b{[*a.each_cons(b.size)].index(b)||-1}
p _[[63, 101, 245, 215, 0], [245, 215]]
p _[[24, 55, 74, 3, 1], [24, 56, 74]]
goruby (29B)
_=->a,b{a.e_(b.sz).dx(b)||-1}
Python: 84 characters
def f(a,b):
l=[a[i:i+len(b)]for i in range(len(a))]
return b in l and l.index(b)or-1
Prolog: 84 characters (says "no" instead of returning -1):
s(X,[]).
s([H|T],[H|U]):-s(T,U).
f(X,Y,0):-s(X,Y).
f([_|T],Y,N):-f(T,Y,M),N is M+1.
Python oneliner function definition in 64 characters
def f(l,s): return ''.join(map(chr,l)).find(''.join(map(chr,s)))
Since we are explicitly passed an array of bytes we can transform that to Python's native byte array str
and use str.find
Python3 36 bytes
based on Stephan202
>>> t=lambda l,s:bytes(l).find(bytes(s))
...
>>> t([63, 101, 245, 215, 0], [245, 215])
2
>>> t([24, 55, 74, 3, 1], [24, 56, 74])
-1
In Python:
def SearchArray(input, search):
found = -1
for i in range(0, len(input) - len(search)):
for j in range(0, len(search)):
if input[i+j] == search[j]:
found = i
else:
found = -1
break
if found >= 0:
return True, found
else:
return False, -1
To test
print SearchArray([ 63, 101, 245, 215, 0 ], [ 245, 215 ])
print SearchArray([ 24, 55, 74, 3, 1 ], [ 24, 56, 74 ])
Which prints:
(True, 2)
(False, -1)
Note there is a shorter solution, but it uses python language features that aren't really portable.
In C#:
private object[] test(byte[] a1, byte[] a2)
{
string s1 = System.Text.Encoding.ASCII.GetString(a1);
string s2 = System.Text.Encoding.ASCII.GetString(a2);
int pos = s1.IndexOf(s2, StringComparison.Ordinal);
return new object[] { (pos >= 0), pos };
}
Usage example:
byte[] a1 = new byte[] { 24, 55, 74, 3, 1 };
byte[] a2 = new byte[] { 24, 56, 74 };
object[] result = test(a1, a2);
Console.WriteLine("{0}, {1}", result[0], result[1]); // prints "False, -1"
public class SubArrayMatch
{
private bool _IsMatch;
private int _ReturnIndex = -1;
private List<byte> _Input;
private List<byte> _SubArray;
private bool _Terminate = false;
#region "Public Properties"
public List<byte> Input {
set { _Input = value; }
}
public List<byte> SubArray {
set { _SubArray = value; }
}
public bool IsMatch {
get { return _IsMatch; }
}
public int ReturnIndex {
get { return _ReturnIndex; }
}
#endregion
#region "Constructor"
public SubArrayMatch(List<byte> parmInput, List<byte> parmSubArray)
{
this.Input = parmInput;
this.SubArray = parmSubArray;
}
#endregion
#region "Main Method"
public void MatchSubArry()
{
int _MaxIndex;
int _Index = -1;
_MaxIndex = _Input.Count - 1;
_IsMatch = false;
foreach (byte itm in _Input) {
_Index += 1;
if (_Terminate == false) {
if (SubMatch(_Index, _MaxIndex) == true) {
_ReturnIndex = _Index;
_IsMatch = true;
return;
}
}
else {
return;
}
}
}
private bool SubMatch(int BaseIndex, int MaxIndex)
{
int _MaxSubIndex;
byte _cmpByte;
int _itr = -1;
_MaxSubIndex = _SubArray.Count - 1;
_MaxSubIndex += 1;
if (_MaxSubIndex > MaxIndex) {
_Terminate = true;
return false;
}
foreach (byte itm in _SubArray) {
_itr += 1;
_cmpByte = _Input(BaseIndex + _itr);
if (!itm == _cmpByte) {
return false;
}
}
return true;
}
#endregion
}
By Anhar Hussain Miah 'Edited by: Anhar.Miah @: 03/07/2009
PHP
In 105...
function a_m($h,$n){$m=strstr(join(",",$h),join(",",$n));return$m?(count($h)-substr_count($m,",")-1):-1;}
or more explicitly,
function array_match($haystack,$needle){
$match = strstr (join(",",$haystack), join(",",$needle));
return $match?(count($haystack)-substr_count($match,",")-1):-1;
}
GNU C:
int memfind(const char * haystack, size_t haystack_size, const char * needle,
size_t needle_size)
{
const char * match = memmem(haystack, hasystack_size, needle, needle_size);
return match ? match - haystack : -1;
}
ANSI C, without library:
int memfind(const char * haystack, size_t haystack_size, const char * needle,
size_t needle_size)
{
size_t pos = 0;
for(; pos < haystack_size; ++pos)
{
size_t i = 0;
while(pos + i < haystack_size && i < needle_size &&
haystack[pos + i] == needle[i]) ++i;
if(i == needle_size) return pos;
}
return -1;
}
Ruby. Not exactly the shortest in the world, but cool since it's an extension to Array.
class Array
def contains other=[]
index = 0
begin
matched = 0
ndx = index
while other[matched] == self[ndx]
return index if (matched+1) == other.length
matched += 1
ndx += 1
end
end until (index+=1) == length
-1
end
end
puts [ 63, 101, 245, 215, 0 ].contains [245, 215]
# 2
puts [ 24, 55, 74, 3, 1 ].contains [24, 56, 74 ]
# -1
C#, lists called "a" and "b":
Enumerable.Range(-1, a.Count).Where(n => n == -1
|| a.Skip(n).Take(b.Count).SequenceEqual(b)).Take(2).Last();
If you don't care about returning the first instance, you can just do:
Enumerable.Range(-1, a.Count).Last(n => n == -1
|| a.Skip(n).Take(b.Count).SequenceEqual(b));
int m(byte[]a,int i,int y,byte[]b,int j,int z){return i<y?j<z?a[i]==b[j++]?m(a,++i,y,b,j,z):m(a,0,y,b,j,z):-1:j-y;}
Java, 116 characters. Has a little bit of extra functionality thrown in. OK, so it's a kludge to push start condition and array lengths into the caller. Call it like:
m(byte[] substring, int substart, int sublength, byte[] bigstring, int bigstart, int biglength)
As Fredrik has already posted the code using the STRING conversion way. Here's another way it could be done using C#.
jwoolard beat me to it, btw. I too have used the same algorithm as he has. This was one of the problems that we had to solve using C++, in college.
public static bool Contains(byte[] parent, byte[] child, out int index)
{
index = -1;
for (int i = 0; i < parent.Length - child.Length; i++)
{
for (int j = 0; j < child.Length; j++)
{
if (parent[i + j] == child[j])
index = i;
else
{
index = -1;
break;
}
}
}
return (index >= 0);
}
Lisp v1
(defun byte-array-subseqp (subarr arr)
(let ((found (loop
for start from 0 to (- (length arr) (length subarr))
when (loop
for item across subarr
for index from start below (length arr)
for same = (= item (aref arr index))
while same
finally (return same))
do (return start))))
(values (when found t) ; "real" boolean
(or found -1))))
Lisp v2 (NB, subseq creates a copy
(defun byte-array-subseqp (subarr arr)
(let* ((alength (length arr))
(slength (length subarr))
(found (loop
for start from 0 to (- alength slength)
when (equalp subarr (subseq arr start (+ start slength)))
do (return start))))
(values (when found t)
(or found -1))))
C#:
public static object[] isSubArray(byte[] arr1, byte[] arr2) {
int o = arr1.TakeWhile((x, i) => !arr1.Skip(i).Take(arr2.Length).SequenceEqual(arr2)).Count();
return new object[] { o < arr1.Length, (o < arr1.Length) ? o : -1 };
}
In Ruby:
def subset_match(array_one, array_two)
answer = [false, -1]
0.upto(array_one.length - 1) do |line|
right_hand = []
line.upto(line + array_two.length - 1) do |inner|
right_hand << array_one[inner]
end
if right_hand == array_two then answer = [true, line] end
end
return answer
end
Example: irb(main):151:0> subset_match([24, 55, 74, 3, 1], [24, 56, 74]) => [false, -1]
irb(main):152:0> subset_match([63, 101, 245, 215, 0], [245, 215]) => [true, 2]
C#, works with any type that has equality operator:
first
.Select((index, item) =>
first
.Skip(index)
.Take(second.Count())
.SequenceEqual(second)
? index : -1)
.FirstOrDefault(i => i >= 0)
.Select(i => i => 0 ?
new { Found = true, Index = i }
:
new { Found = false, Index - 1 });
(defun golf-code (master-seq sub-seq)
(let ((x (search sub-seq master-seq)))
(values (not (null x)) (or x -1))))
Haskell (114 Chars):
import Data.List
import Data.Maybe
g a b | elem b $ subsequences a = fromJust $ elemIndex (head b) a | otherwise = -1
Ruby, i feel ashamed after seeing Lar's code
def contains(a1, a2)
0.upto(a1.length-a2.length) { |i| return i if a1[i, a2.length] == a2 }
-1
end
Here's a C# version using string comparison. It works correctly but feels a bit hacky to me.
int FindSubArray(byte[] super, byte[] sub)
{
int i = BitConverter.ToString(super).IndexOf(BitConverter.ToString(sub));
return i < 0 ? i : i / 3;
}
// 106 characters
int F(byte[]x,byte[]y){int i=BitConverter.ToString(x)
.IndexOf(BitConverter.ToString(y));return i<0?i:i/3;}
Here's a slightly longer version that performs a true comparison of each individual array element.
int FindSubArray(byte[] super, byte[] sub)
{
int i, j;
for (i = super.Length - sub.Length; i >= 0; i--)
{
for (j = 0; j < sub.Length && super[i + j] == sub[j]; j++);
if (j >= sub.Length) break;
}
return i;
}
// 135 characters
int F(byte[]x,byte[]y){int i,j;for(i=x.Length-y.Length;i>=0;i--){for
(j=0;j<y.Length&&x[i+j]==y[j];j++);if(j>=y.Length)break;}return i;}