Desafío del código de búsqueda de matrices

https://stackoverflow.com/questions/1078770

21-08-2019
|

Pregunta

Aquí está mi desafío (código de golf):Tome dos matrices de bytes y determine si la segunda matriz es una subcadena de la primera.Si es así, genere el índice en el que el contenido de la segunda matriz aparece en la primera.Si no encuentra la segunda matriz en la primera, genere -1.

Entrada de ejemplo:{63, 101, 245, 215, 0} {245, 215}

Rendimiento esperado:2

Ejemplo de entrada 2:{ 24, 55, 74, 3, 1 } { 24, 56, 74 }

Resultado esperado 2:-1

Editar: Alguien ha señalado que bool es redundante, por lo que todo lo que su función tiene que hacer es devolver un int que representa el índice del valor o -1 si no se encuentra.

Solución

J

37 caracteres para aun más funcionalidad que solicitaron:. Devuelve una lista de todos a juego índices

I.@(([-:#@[{.>@])"_ 0(<@}."0 _~i.@#))

Uso:

   NB. Give this function a name
   i =: I.@(([-:#@[{.>@])"_ 0(<@}."0 _~i.@#))
   NB. Test #1
   245 215 i 63 101 245 215 0
2
   NB. Test #2 - no results
   24 56 74 i 24 55 74 3 1

   NB. Test #3: matches in multiple locations
   1 1 i 1 1 1 2 1 1 3
0 1 4
   NB. Test #4: only exact substring matches
   1 2 i 0 1 2 3 1 0 2 1 2 0
1 7

NB. list[0 to end], list[1 to end], list[2 to end], ...
<@}."0 _~i.@#

NB. Does the LHS completely match the RHS (truncated to match LHS)?
[-:#@[{.>@]

NB. boolean list of match/no match
([-:#@[{.>@])"_ 0(<@}."0 _~i.@#)

NB. indices of *true* elements
I.@(([-:#@[{.>@])"_ 0(<@}."0 _~i.@#))

Otros consejos

Common Lisp:

(defun golf-code (master-seq sub-seq)
  (search sub-seq master-seq))

Posdata, ~~149~~ ~~146~~ ~~170~~ ~~166~~ ~~167~~ 159 caracteres (en la parte "hacer el trabajo"):

% define data
/A [63 101 245 215 0] def
/S [245 215] def

% do the work
/d{def}def/i{ifelse}d/l S length 1 sub d/p l d[/C{dup[eq{pop -1}{dup S p
get eq{pop p 0 eq{]length}{/p p 1 sub d C}i}{p l eq{pop}if/p l d C}i}i}d
A aload pop C

% The stack now contains -1 or the position

Tenga en cuenta que esto encuentra el último aparición del subarreglo si está contenido más de una vez.

Revisión histórica:

Reemplazar false por [[ne y true por [[eq para guardar tres caracteres
Se eliminó un error que podía causar un falso negativo si el último elemento de S aparece dos veces en A.Desafortunadamente, esta corrección de errores tiene 24 caracteres.
Hizo la corrección de errores un poco más barata, ahorrando cuatro caracteres.
Tuve que insertar un espacio nuevamente porque un guión es un carácter legal en un nombre.Este error de sintaxis no se detectó porque el caso de prueba no llegó a este punto.
Dejé de devolver los bools porque el OP ya no los requiere.Guarda 8 caracteres.

Versión explicada:

Desafortunadamente, el resaltador de sintaxis SO no conoce PostScript, por lo que la legibilidad aún es limitada.

/A [63 101 245 215 0] def
/S [245 215 ] def

/Slast S length 1 sub def % save the index of the last element of S,
                          % i.e. length-1
/Spos Slast def % our current position in S; this will vary
[ % put a mark on the bottom of the stack, we need this later.

/check % This function recursively removes values from the stack
       % and compares them to the values in S
{
  dup [ 
  eq
  { % we found the mark on the bottom, i.e. we have no match
    pop -1 % remove the mark and push the results
  }
  { % we're not at the mark yet
    dup % save the top value (part of the bugfix)
    S Spos get
    eq 
    {  % the top element of the stack is equal to S[Spos]
       pop % remove the saved value, we don't need it
       Spos 0
       eq 
       { % we are at the beginning of S, so the whole thing matched.
         ] length % Construct an array from the remaining values
                  % on the stack. This is the part of A before the match,
                  % so its length is equal to the position of the match.
                  % Hence we push the result and we're done.
       }
       { % we're not at the beginning of S yet, so we have to keep comparing
         /Spos Spos 1 sub def % decrease Spos
         check % recurse
       }
       ifelse
    }
    { % the top element of the stack is different from S[Spos]
      Spos Slast eq {pop} if % leave the saved top value on the stack
                             % unless we're at the end of S, because in
                             % this case, we have to compare it to the
                             % last element of S (rest of the bugfix)
      /Spos Slast def % go back to the end of S
      check % recurse
    }
    ifelse
 }
 ifelse
}
def % end of the definition of check

A aload % put the contents of A onto the stack; this will also push A again,
        % so we have to ...
pop % ...remove it again
check % And here we go!

C99

#include <string.h>

void find_stuff(void const * const array1, const size_t array1length, /* Length in bytes, not elements */
                void const * const array2, const size_t array2length, /* Length in bytes, not elements */
                char * bReturnBool,
                int * bReturnIndex)
{
    void * found = memmem(array1, array1length, array2, array2length);
    *bReturnBool = found != NULL;
    *bReturnIndex = *bReturnBool ? found - array1 : -1;
}

En la taquigrafía, y ~~un poco~~ una Messier LOT:

#include <string.h>
#define f(a,b,c,d,e,f) { void * g = memmem(a, b, c, d); f = (e = !!g) ? g - a : -1; }

Python 2 y 3, 73 68 58 caracteres

En base a ~~Nikhil Chelliah 's respuesta~~ Kaiser .es 's responder :

>>> t=lambda l,s:''.join(map(chr,l)).find(''.join(map(chr,s)))
>>> t([63, 101, 245, 215, 0], [245, 215])
2
>>> t([24, 55, 74, 3, 1], [24, 56, 74])
-1

Python 3, 41 36 caracteres

Gracias en parte a gnibbler :

>>> t=lambda l,s:bytes(l).find(bytes(s))
>>> t([63, 101, 245, 215, 0], [245, 215])
2
>>> t([24, 55, 74, 3, 1], [24, 56, 74])
-1

Haskell, 68 64 caracteres

orden de los argumentos como se especifica por el OP:

import List;t l s=maybe(-1)id$findIndex id$map(isPrefixOf s)$tails l

ephemient señala, podemos cambiar los argumentos y reducir el código de cuatro caracteres:

import List;t s=maybe(-1)id.findIndex id.map(isPrefixOf s).tails

En Python:

def test(large, small):
    for i in range(len(large)):
        if large[i:i+len(small)] == small:
            return i
    return -1

Sin embargo, ya que las personas quieren conciso, no elegante:

def f(l,s):
 for i in range(len(l)):
  if l[i:i+len(s)]==s:return i
 return -1

Lo que es 75 caracteres, contando los espacios en blanco.

Ruby, usando matriz # paquete (41 caracteres del cuerpo):

def bytearray_search(a,b)
  (i=b.pack('C*').index(b.pack('C*')))?i:-1
end

Perl (36 caracteres cuerpo, excluyendo la administración de parámetros):

sub bytearray_search {
  ($a,$b) = @_;
  index(pack('C*',@$a),pack('C*',@$b))
}

Siento que estoy haciendo trampa, pero usando Perl esto haría lo que quiere el OP:

sub byte_substr {
    use bytes;
    index shift,shift
}

Normalmente index() en Perl funciona con cadenas de caracteres con la semántica, pero el "uso de bytes" pragma hace uso segmantics byte lugar. Desde la página de manual:

Cuando "bytes" es el uso de efecto, la codificación se ignora temporalmente, y cada cadena es tratado como una serie de bytes.

Otra en Python:

def subarray(large, small):
    strsmall = ' '.join([str(c).zfill(3) for c in small])
    strlarge = ' '.join([str(c).zfill(3) for c in large])
    pos = strlarge.find(strsmall)
    return  ((pos>=0), pos//4)

Ruby 1.9 (44B)

_=->a,b{[*a.each_cons(b.size)].index(b)||-1}

p _[[63, 101, 245, 215, 0], [245, 215]]
p _[[24, 55, 74, 3, 1], [24, 56, 74]]

goruby (29B)

_=->a,b{a.e_(b.sz).dx(b)||-1}

Python : 84 caracteres

def f(a,b):
 l=[a[i:i+len(b)]for i in range(len(a))]
 return b in l and l.index(b)or-1

Prolog : 84 caracteres (dice "no" en lugar de devolver -1):

s(X,[]).
s([H|T],[H|U]):-s(T,U).
f(X,Y,0):-s(X,Y).
f([_|T],Y,N):-f(T,Y,M),N is M+1.

Definición de función Python oneliner en 64 caracteres

def f(l,s): return ''.join(map(chr,l)).find(''.join(map(chr,s)))

Dado que se nos pasa explícitamente una serie de bytes podemos transformar eso a la matriz de bytes nativa de Python str y use str.find

python3 36 bytes

basado en Stephan202

>>> t=lambda l,s:bytes(l).find(bytes(s))
... 
>>> t([63, 101, 245, 215, 0], [245, 215])
2
>>> t([24, 55, 74, 3, 1], [24, 56, 74])
-1

En Python:

def SearchArray(input, search):
found = -1
for i in range(0, len(input) - len(search)):
    for j in range(0, len(search)):
        if input[i+j] == search[j]:
            found = i
        else:
            found = -1
            break
if  found >= 0:
    return True, found
else:
    return False, -1

Para probar

print SearchArray([ 63, 101, 245, 215, 0 ], [ 245, 215 ])
print SearchArray([ 24, 55, 74, 3, 1 ], [ 24, 56, 74 ])

que imprime:

(True, 2)
(False, -1)

Nota que hay una solución más corto, pero utiliza las características del lenguaje python que no son realmente portátil.

En C #:

private object[] test(byte[] a1, byte[] a2)
{
    string s1 = System.Text.Encoding.ASCII.GetString(a1);
    string s2 = System.Text.Encoding.ASCII.GetString(a2);
    int pos = s1.IndexOf(s2, StringComparison.Ordinal);
    return new object[] { (pos >= 0), pos };
}

Ejemplo de uso:

byte[] a1 = new byte[] { 24, 55, 74, 3, 1 };
byte[] a2 = new byte[] { 24, 56, 74 };
object[] result = test(a1, a2);
Console.WriteLine("{0}, {1}", result[0], result[1]); // prints "False, -1"

public class SubArrayMatch
{
    private bool _IsMatch;
    private int _ReturnIndex = -1;
    private List<byte> _Input;
    private List<byte> _SubArray;
    private bool _Terminate = false;
#region "Public Properties"
    public List<byte> Input {
        set { _Input = value; }
    }

    public List<byte> SubArray {
        set { _SubArray = value; }
    }

    public bool IsMatch {
        get { return _IsMatch; }
    }

    public int ReturnIndex {
        get { return _ReturnIndex; }
    }
#endregion
#region "Constructor"
    public SubArrayMatch(List<byte> parmInput, List<byte> parmSubArray)
    {
        this.Input = parmInput;
        this.SubArray = parmSubArray;
    }
#endregion
#region "Main Method"
    public void MatchSubArry()
    {
        int _MaxIndex;
        int _Index = -1;
        _MaxIndex = _Input.Count - 1;

        _IsMatch = false;

        foreach (byte itm in _Input) {
            _Index += 1;

            if (_Terminate == false) {
                if (SubMatch(_Index, _MaxIndex) == true) {
                    _ReturnIndex = _Index;
                    _IsMatch = true;
                    return;
                }
            }
            else {
                return;
            }
        }
    }

    private bool SubMatch(int BaseIndex, int MaxIndex)
    {
        int _MaxSubIndex;
        byte _cmpByte;
        int _itr = -1;

        _MaxSubIndex = _SubArray.Count - 1;
        _MaxSubIndex += 1;

        if (_MaxSubIndex > MaxIndex) {
            _Terminate = true;
            return false;
        }

        foreach (byte itm in _SubArray) {
            _itr += 1;

            _cmpByte = _Input(BaseIndex + _itr);

            if (!itm == _cmpByte) {
                return false;
            }
        }

        return true;
    }
#endregion

}

Por Anhar Hussain Miah 'Editado por: Anhar.Miah @: 03/07/2009

PHP

En el año 105 ...

function a_m($h,$n){$m=strstr(join(",",$h),join(",",$n));return$m?(count($h)-substr_count($m,",")-1):-1;}

o más explícitamente,

function array_match($haystack,$needle){
  $match = strstr (join(",",$haystack), join(",",$needle));
  return $match?(count($haystack)-substr_count($match,",")-1):-1;
}

GNU C:

int memfind(const char * haystack, size_t haystack_size, const char * needle,
    size_t needle_size)
{
    const char * match = memmem(haystack, hasystack_size, needle, needle_size);
    return match ? match - haystack : -1;
}

ANSI C, sin biblioteca:

int memfind(const char * haystack, size_t haystack_size, const char * needle,
    size_t needle_size)
{
    size_t pos = 0;
    for(; pos < haystack_size; ++pos)
    {
        size_t i = 0;
        while(pos + i < haystack_size && i < needle_size &&
            haystack[pos + i] == needle[i]) ++i;

        if(i == needle_size) return pos;
    }

    return -1;
}

Rubí . No es exactamente el más corto del mundo, pero fresco, ya que es una extensión de la matriz.

class Array
  def contains other=[]
    index = 0
    begin
      matched = 0
      ndx = index
      while other[matched] == self[ndx]
        return index if (matched+1) == other.length
        matched += 1
        ndx += 1
      end
    end until (index+=1) == length
    -1
  end
end

puts [ 63, 101, 245, 215, 0 ].contains [245, 215]
# 2
puts [ 24, 55, 74, 3, 1 ].contains [24, 56, 74 ]
# -1

C #, listas de llamadas "a" y "b":

Enumerable.Range(-1, a.Count).Where(n => n == -1 
    || a.Skip(n).Take(b.Count).SequenceEqual(b)).Take(2).Last();

Si no se preocupan por devolver el primer caso, sólo se puede hacer:

Enumerable.Range(-1, a.Count).Last(n => n == -1 
    || a.Skip(n).Take(b.Count).SequenceEqual(b));

int m(byte[]a,int i,int y,byte[]b,int j,int z){return i<y?j<z?a[i]==b[j++]?m(a,++i,y,b,j,z):m(a,0,y,b,j,z):-1:j-y;}

Java, 116 caracteres. Tiene un poco de funcionalidad adicional tirado. OK, así que es una chapuza para empujar condiciones de arranque y gama de longitudes en la persona que llama. Llamar así:

m(byte[] substring, int substart, int sublength, byte[] bigstring, int bigstart, int biglength)

Fredrik ya se ha publicado el código usando la manera Conversión de cadenas. Aquí hay otra manera se podría hacer usando C #.

jwoolard se me adelantó, por cierto. Yo también he utilizado el mismo algoritmo que él tiene. Este fue uno de los problemas que hemos tenido que resolver usando C ++, en la universidad.

public static bool Contains(byte[] parent, byte[] child, out int index)
{
    index = -1;

    for (int i = 0; i < parent.Length - child.Length; i++)
    {
        for (int j = 0; j < child.Length; j++)
        {
            if (parent[i + j] == child[j])
                index = i;
            else
            {
                index = -1;
                break;
            }
        }
    }

    return (index >= 0);
}

Lisp v1

(defun byte-array-subseqp (subarr arr)
  (let ((found (loop 
                  for start from 0 to (- (length arr) (length subarr))
                  when (loop 
                          for item across subarr
                          for index from start below (length arr)
                          for same = (= item (aref arr index))
                          while same
                          finally (return same))
                  do (return start))))
    (values (when found t) ; "real" boolean
            (or found -1))))

Lisp v2 (NB, subseq crea una copia

(defun byte-array-subseqp (subarr arr)
  (let* ((alength (length arr))
         (slength (length subarr))
         (found (loop 
                   for start from 0 to (- alength slength)
                   when (equalp subarr (subseq arr start (+ start slength)))
                   do (return start))))
    (values (when found t)
            (or found -1))))

C #:

public static object[] isSubArray(byte[] arr1, byte[] arr2) {
  int o = arr1.TakeWhile((x, i) => !arr1.Skip(i).Take(arr2.Length).SequenceEqual(arr2)).Count();
  return new object[] { o < arr1.Length, (o < arr1.Length) ? o : -1 };
}

En Ruby:

def subset_match(array_one, array_two)
  answer = [false, -1]
  0.upto(array_one.length - 1) do |line|
    right_hand = []
    line.upto(line + array_two.length - 1) do |inner|
      right_hand << array_one[inner]
    end
    if right_hand == array_two then answer = [true, line] end
  end
  return answer
end

Ejemplo: IRB (principal): 151: 0> subset_match ([24, 55, 74, 3, 1], [24, 56, 74]) => [Falso, -1]

IRB (principal): 152: 0> subset_match ([63, 101, 245, 215, 0], [245, 215]) => [Verdadero, 2]

C #, funciona con cualquier tipo que tiene operador de igualdad:

first
  .Select((index, item) => 
    first
     .Skip(index)
     .Take(second.Count())
     .SequenceEqual(second) 
    ? index : -1)
  .FirstOrDefault(i => i >= 0)
  .Select(i => i => 0 ? 
     new { Found = true, Index = i } 
    : 
     new { Found = false, Index - 1 });

(defun golf-code (master-seq sub-seq)
  (let ((x (search sub-seq master-seq)))
    (values (not (null x)) (or x -1))))

Haskell (114 caracteres):

import Data.List
import Data.Maybe
g a b | elem b $ subsequences a = fromJust $ elemIndex (head b) a | otherwise = -1

Ruby, me siento avergonzado después de ver el código de Lar

def contains(a1, a2)
  0.upto(a1.length-a2.length) { |i| return i if a1[i, a2.length] == a2 }
  -1
end

Aquí hay una versión de C # utilizando la comparación de cadenas. Funciona correctamente, pero se siente un poco hacky a mí.

int FindSubArray(byte[] super, byte[] sub)
{
    int i = BitConverter.ToString(super).IndexOf(BitConverter.ToString(sub));
    return i < 0 ? i : i / 3;
}

// 106 characters
int F(byte[]x,byte[]y){int i=BitConverter.ToString(x)
.IndexOf(BitConverter.ToString(y));return i<0?i:i/3;}

Aquí hay una versión ligeramente más largo que realiza una verdadera comparación de cada elemento de la matriz individual.

int FindSubArray(byte[] super, byte[] sub)
{
    int i, j;
    for (i = super.Length - sub.Length; i >= 0; i--)
    {
        for (j = 0; j < sub.Length && super[i + j] == sub[j]; j++);
        if (j >= sub.Length) break;
    }
    return i;
}

// 135 characters
int F(byte[]x,byte[]y){int i,j;for(i=x.Length-y.Length;i>=0;i--){for
(j=0;j<y.Length&&x[i+j]==y[j];j++);if(j>=y.Length)break;}return i;}

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow