Why this model does not converge in keras?

https://datascience.stackexchange.com/questions/36704

31-10-2019
|

Pregunta

This case has an underlying story but I have essentially boiled it down to the simplest possible re-producible example I could.

Essentially let us think that I have up to 1000 nodes and each node represented by a small (this case is a 3-cell vector) vector and I concatenate and represent these nodes as a padded 3*1000 input vector and need to find out which one is more suitable. So the model is trying to predict 1000 float values, one for each node.

Let's imagine the function to score nodes is this arbitrary code:

def score_vector(v):
  a, b, c = tuple(v)
  if a == 0 or b == 0 or a - c < 2:
    return float(Defs.INVALID_SCORE)
  return float(a * math.sqrt(a - c) / math.log(b + 2, 5))

And essentially my model is supposed to learn this function plus an argmax to find the node that has the highest score. This looks to me like a pretty simple problem compared to the problems I have solved so far (but it is different too).

So my question is why doesn't this model converge? I am thinking it could be due to its differentiability but really kinda lost and started to doubt everything I know about NN (which is not a lot).

Here is the repro code:

import numpy as np
import math

from keras import Sequential, Input
from keras.layers import Flatten, Activation, Dense
from keras.optimizers import Adam

class Defs:
  VECTOR_SIZE=3
  NODE_COUNT=1000
  MAX_REAL_NODE_COUNT=400
  MIN_REAL_NODE_COUNT=20
  INVALID_SCORE=0

def score_vector(v):
  a, b, c = tuple(v)
  if a == 0 or b == 0 or a - c < 2:
    return float(Defs.INVALID_SCORE)
  return float(a * math.sqrt(a - c) / math.log(b + 2, 5))

def build_vector():
  a = np.random.randint(1, 100)
  c = np.random.randint(1, 50) if np.random.choice([False, True, True]) else 0
  b = 0 if c == 0 else np.random.randint(c, c*3)
  return [float(a), float(b), float(c)]

def build_vectorset_score():
  n = np.random.randint(Defs.MIN_REAL_NODE_COUNT, Defs.MAX_REAL_NODE_COUNT)
  vectorset = []
  for i in range(0, n):
    vectorset += build_vector()

  # pad it
  vectorset += [0. for i in range((Defs.NODE_COUNT-n) * Defs.VECTOR_SIZE)]
  scores = [score_vector(vectorset[i*Defs.VECTOR_SIZE:(i+1)*Defs.VECTOR_SIZE]) for i in range(0, Defs.NODE_COUNT)]
  index = np.argmax(scores)
  scores = [1. if index == i else 0. for i in range(0, len(scores))]
  return vectorset, scores

def build_model():
  model = Sequential()
  model.add(Dense(Defs.VECTOR_SIZE * Defs.NODE_COUNT, input_dim=Defs.VECTOR_SIZE * Defs.NODE_COUNT, activation='relu'))
  model.add(Dense(Defs.NODE_COUNT, activation='relu'))
  model.add(Dense(Defs.NODE_COUNT))
  model.add(Activation('softmax'))
  print(model.summary())
  model.compile(loss="categorical_crossentropy",
                optimizer=Adam(lr=0.001), metrics=['categorical_accuracy'])
  return model


if __name__ == '__main__':
  SAMPLE_SIZE = 1 * 1000
  X = []
  Y = []
  for i in range(0, SAMPLE_SIZE):
    x, y = build_vectorset_score()
    X.append(np.array(x))
    Y.append(np.array(y))
  model = build_model()
  model.fit(np.array(X),
                  np.array(Y), batch_size=100, epochs=200, verbose=1)

No hay solución correcta

Licenciado bajo: CC-BY-SA con atribución

No afiliado a datascience.stackexchange