Creating Lexicon and Scanner in Python

Question 1

I wouldn't use a list to make the lexicon. You're mapping words to their types, so make a dictionary.

Here's the biggest hint that I can give without writing the entire thing:

lexicon = {
    'north': 'directions',
    'south': 'directions',
    'east': 'directions',
    'west': 'directions',
    'go': 'verbs',
    'stop': 'verbs',
    'look': 'verbs',
    'give': 'verbs',
    'the': 'stops',
    'in': 'stops',
    'of': 'stops',
    'from': 'stops',
    'at': 'stops'
}

def scan(sentence):
    words = sentence.lower().split()
    pairs = []

    # Iterate over `words`,
    # pull each word and its corresponding type
    # out of the `lexicon` dictionary and append the tuple
    # to the `pairs` list

    return pairs

Question 2

Based on the ex48 instructions, you could create a few lists for each kind of word. Here's a sample for the first test case. The returned value is a list of tuples, so you can append to that list for each word given.

direction = ['north', 'south', 'east', 'west', 'down', 'up', 'left', 'right', 'back']

class Lexicon:
    def scan(self, sentence):
        self.sentence = sentence
        self.words = sentence.split()
        stuff = []
        for word in self.words:
            if word in direction:
                stuff.append(('direction', word))
        return stuff

lexicon = Lexicon()

He notes that numbers and exceptions are handled differently.

Question 3

Finally I did it!

lexicon = {
    ('directions', 'north'),
    ('directions', 'south'),
    ('directions', 'east'),
    ('directions', 'west'),
    ('verbs', 'go'),
    ('verbs', 'stop'),
    ('verbs', 'look'),
    ('verbs', 'give'),
    ('stops', 'the'),
    ('stops', 'in'),
    ('stops', 'of'),
    ('stops', 'from'),
    ('stops', 'at')
    }

def scan(sentence):

    words = sentence.lower().split()
    pairs = []

    for word in words:
        word_type = lexicon[word]
        tupes = (word, word_type) 
        pairs.append(tupes)

    return pairs

Question 4

This is a really cool exercise. I had to research for days and finally got it working. The other answers here don't show how to actually use a list with tuples inside like the e-book sugests, so this will do it like that. Owner's answer doesn't quite work, lexicon[word] asks for interger and not str.

lexicon = [('direction', 'north', 'south', 'east', 'west'),
           ('verb', 'go', 'kill', 'eat'),
           ('nouns', 'princess', 'bear')]
def scan():
    stuff = raw_input('> ')
    words = stuff.split()
    pairs = []

    for word in words:

        if word in lexicon[0]:
            pairs.append(('direction', word))
        elif word in lexicon[1]:
            pairs.append(('verb', word))
        elif word in lexicon[2]:
            pairs.append(('nouns', word))
        else: 
            pairs.append(('error', word))

    print pairs

Cheers!

Question 5

clearly Lexicon is another python file in ex48 folder.

like: ex48
      ----lexicon.py

so you are importing lexicon.py from ex 48 folder.

scan is a function inside lexicon.py

Question 6

Like the most here I am new to the world of coding and I though I attach my solution below as it might help other students.

I already saw a few more efficient approaches that I could implement. However, the code handles every use case of the exercise and since I am wrote it on my own with my beginners mind it does not take complicated shortcuts and should be very easy to understand for other beginners.

I therefore thought it might beneficial for someone else learning. Let me know what you think. Cheers!

class Lexicon(object): 

def __init__(self):
    self.sentence = []
    self.dictionary = {
        'north' : ('direction','north'),
        'south' : ('direction','south'),
        'east' : ('direction','east'),
        'west' : ('direction','west'),
        'down' : ('direction','down'),
        'up' : ('direction','up'),
        'left' : ('direction','left'),
        'right' : ('direction','right'),
        'back' : ('direction','back'),
        'go' : ('verb','go'),
        'stop' : ('verb','stop'),
        'kill' : ('verb','kill'),
        'eat' : ('verb', 'eat'),
        'the' : ('stop','the'),
        'in' : ('stop','in'),
        'of' : ('stop','of'),
        'from' : ('stop','from'),
        'at' : ('stop','at'),
        'it' : ('stop','it'),
        'door' : ('noun','door'),
        'bear' : ('noun','bear'),
        'princess' : ('noun','princess'),
        'cabinet' : ('noun','cabinet'),
    }

def scan(self, input):
    loaded_imput = input.split()
    self.sentence.clear()

    for item in loaded_imput:
        try:
            int(item)
            number = ('number', int(item))
            self.sentence.append(number)
        except ValueError:
            word = self.dictionary.get(item.lower(), ('error', item))
            self.sentence.append(word)

    return self.sentence
lexicon = Lexicon()

Question 7

This is my version of scanning lexicon for ex48. I am also beginner in programming, python is my first language. So the program may not be efficient for its purpose, anyway the result is good after many testing. Please feel free to improve the code.

WARNING

If you haven't try to do the exercise by your own, I encourage you to try without looking into any example.

WARNING

One thing I love about programming is that, every time I encounter some problem, I spend a lot of time trying different method to solve the problem. I spend over few weeks trying to create structure, and it is really rewarding as a beginner that I really learn a lot instead of copying from other.

Below is my lexicon and search in one file.

direction = [('direction', 'north'),
            ('direction', 'south'),
            ('direction', 'east'),
            ('direction', 'west'),
            ('direction', 'up'),
            ('direction', 'down'),
            ('direction', 'left'),
            ('direction', 'right'),
            ('direction', 'back')
]

verbs = [('verb', 'go'),
        ('verb', 'stop'),
        ('verb', 'kill'),
        ('verb', 'eat')
]

stop_words = [('stop', 'the'),
            ('stop', 'in'),
            ('stop', 'of'),
            ('stop', 'from'),
            ('stop', 'at'),
            ('stop', 'it')
]

nouns = [('noun', 'door'),
        ('noun', 'bear'),
        ('noun', 'princess'),
        ('noun', 'cabinet')
]   

library = tuple(nouns + stop_words + verbs + direction)

#below is the search method with explanation.

def convert_number(x):
try:
    return int(x)
except ValueError:
    return None


def scan(input):
#include uppercase input for searching. (Study Drills no.3)
lowercase = input.lower()
#element is what i want to search.
element = lowercase.split()
#orielement is the original input which have uppercase, for 'error' type
orielement = input.split()
#library is tuple of the word types from above. You can replace with your data source.
data = library
#i is used to evaluate the position of element
i = 0
#z is used to indicate the position of output, which is the data that match what i search, equals to "i".
z = 0
#create a place to store my output.
output = []
#temp is just a on/off switch. Turn off the switch when i get any match for that particular input.
temp = True
#creating a condition which evaluates the total search needed to be done and follows the sequence by +1.
while not(i == len(element)):
    try:
        #j is used to position the word in the library, eg 'door', 'bear', 'go', etc which exclude the word type.
        j = 0
        while not (j == len(data)):
            #data[j][1] all the single word in library
            matching = data[j][1]
            #when the word match, it will save the match into the output.
            if (matching == element[i]):
                output.append(data[j])
                #print output[z]
                j += 1
                z += 1
                #to switch off the search for else: below and go to next input search. Otherwise they would be considerd 'error'
                temp = False
            #else is everything that is not in the library.
            else:
                while (data[j][1] == data [-1][1]) and (temp == True):
                    #refer to convert_number, to test if the input is a number, here i use orielement which includes uppercase
                    convert = convert_number(orielement[i])
                    #a is used to save number only.
                    a = tuple(['number', convert])
                    #b is to save everything
                    b = tuple(['error', orielement[i]])
                    #conver is number a[1] is the access the number inside, if it returns None from number then it wont append. 
                    if convert == a[1] and not(convert == None):    
                        output.append(a)
                        temp = False
                    else:
                        output.append(b)
                        #keep the switch off to escape the while loop!
                        temp = False
                #searching in next data
                j += 1
        #next word of input
        i += 1
        temp = True
    except ValueError:
        return output
else:
    pass
return output