Question

I have a number of lists, such as [1,2,3,4], [2,3,4], [1,2], [2,3,4,6,8,10], whose lengths are obviously unsame.

How can I use this as input of placeholder in Tensorflow?

As I have tried, the following setting will raise error.

tf.constant([[1,2],[1,2,3]...],dtype=tf.int32)

So I guess placeholder cannot be set by the upper input of lists.

Is there any solution?

Edit:

The following is my example. How to make it run without errors?

enter image description here

Was it helpful?

Solution

When you create a Numpy array like this:

x_data = np.array( [[1,2],[4,5,6],[1,2,3,4,5,6]])

The internal Numpy dtype is "object":

array([[1, 2], [4, 5, 6], [1, 2, 3, 4, 5, 6]], dtype=object)

and this cannot be used as a Tensor in TensorFlow. In any case, Tensors must have same size in each dimension, they cannot be "ragged" and must have a shape defined by a single number in each dimension. TensorFlow basically assumes this about all its data types. Although the designers of TensorFlow could write it in theory make it accept ragged arrays and include a conversion function, that kind of auto-casting is not always a good idea, because it might hide a problem in the input code.

So you need to pad the input data to make it a usable shape. On a quick search, I found this approach in Stack Overflow, replicated as a change to your code:

import tensorflow as tf
import numpy as np

x = tf.placeholder( tf.int32, [3,None] )
y = x * 2

with tf.Session() as session:
    x_data = np.array( [[1,2],[4,5,6],[1,2,3,4,5,6]] )

    # Get lengths of each row of data
    lens = np.array([len(x_data[i]) for i in range(len(x_data))])

    # Mask of valid places in each row
    mask = np.arange(lens.max()) < lens[:,None]

    # Setup output array and put elements from data into masked positions
    padded = np.zeros(mask.shape)
    padded[mask] = np.hstack((x_data[:]))

    # Call TensorFlow
    result = session.run(y, feed_dict={x:padded})

    # Remove the padding - the list function ensures we 
    # create same datatype as input. It is not necessary in the case
    # where you are happy with a list of Numpy arrays instead
    result_without_padding = np.array(
       [list(result[i,0:lens[i]]) for i in range(lens.size)]
    )
    print( result_without_padding )

Output is:

[[2, 4] [8, 10, 12] [2, 4, 6, 8, 10, 12]]

You don't have to remove the padding at the end - only do this if you require to show your output in the same ragged array format. Also note that when you feed the resulting padded data to more complex routines, the zeros - or other padding data if you change it - may get used by whatever algorithm you have implemented.

If you have many short arrays and just one or two very long ones, then you might want to consider using a sparse tensor representation to save memory and speed up calculations.

OTHER TIPS

As an alternative to using padded arrays, you can just feed all of your data as one big spaghetti string and then do origami inside the tensorflow graph

Example:

import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

noodle = tf.placeholder(tf.float32, [None])
chop_indices = tf.placeholder(tf.int32, [None,2])

do_origami = lambda list_idx: tf.gather(noodle, tf.range(chop_indices[list_idx,0], chop_indices[list_idx,1]))

print( [do_origami(list_idx=i).eval({noodle:[1,2,3,2,3,6], chop_indices:[[0,2],[2,3],[3,6]]}).tolist() for i in range(3)] )

Result:

[[1.0, 2.0], [3.0], [2.0, 3.0, 6.0]]

If you have a variable number of inner lists, though, then good luck. You can't return a list from tf.while_loop and you can't just use a list comprehension as above so you would have to do the computations separately for each inner list.

import tensorflow as tf

sess = tf.InteractiveSession()

my_list = tf.Variable(initial_value=[1,2,3,4,5])

init = tf.global_variables_initializer()

sess.run(init)

sess.run(my_list)

Result: array([1, 2, 3, 4, 5])

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top