What exactly is an n Gram?

Question 1

N-grams are simply all combinations of adjacent words or letters of length n that you can find in your source text. For example, given the word fox, all 2-grams (or “bigrams”) are fo and ox. You may also count the word boundary – that would expand the list of 2-grams to #f, fo, ox, and x#, where # denotes a word boundary.

You can do the same on the word level. As an example, the hello, world! text contains the following word-level bigrams: # hello, hello world, world #.

The basic point of n-grams is that they capture the language structure from the statistical point of view, like what letter or word is likely to follow the given one. The longer the n-gram (the higher the n), the more context you have to work with. Optimum length really depends on the application – if your n-grams are too short, you may fail to capture important differences. On the other hand, if they are too long, you may fail to capture the “general knowledge” and only stick to particular cases.

Question 2

Usually a picture is worth thousand words.

Source: http://recognize-speech.com/language-model/n-gram-model/comparison

Question 3

An n-gram is a n-tuple or group of n words or characters (grams, for pieces of grammar) which follow one another. So an n of 3 for the words from your sentence would be like "# I live", "I live in", "live in NY", "in NY #". This is used to create an index of how often words follow one another. You can use this in a Markov Chain to create something that will be similar to language. As you populate a mapping of the distributions of word groups or character groups, you can recombine them with the probability that the output will be close to natural, the longer the n-gram is.

Too high of a number, and your output will be a word for word copy of the original, too low of a number, and the output will be too messy.