as @Beaker points out, it will assume everything is a pen. However, you can use the level of the output to predict how likely it is to be a pen. But what you really want to do is to train it on pens and non-pens. You may also see better performance with 2 outputs nodes, one for pen and one for not pen. That is typically how people do NN classification, although using a single should work. Be aware that your choice of activation function will guide the range of inputs and outputs between nodes (sigmoid expects 0-1, tanh -1 to 1). Also, the number of levels of the network and the number of neurons in each hidden layer can make a big difference. Make sure you have at least one hidden layer. It is unlikely to do well if you have an input and output layer only.
I would use some online images of random objects to train the negative cases, if you have none. However, as you seem to have a usage of this in mind, giving it images close to the ones it will see when used after training will result in better performance.