Программирование аркадных игр
и обучение информатикеLab 15: Spell Check
This lab shows how to create a spell checker. To prepare for the lab, go to:
ProgramArcadeGames.com/index.php?chapter=examples list
...and
download the files listed below. The files are also in the “Searching and Sorting Examples” section.
- AliceInWonderLand.txt - Text of “Alice In Wonderland”
- AliceInWonderLand200.txt - First chapter of “Alice In Wonderland”
- dictionary.txt - A list of words
If you are working off the BitBucket repository template for this class, then the files are already there and you don't need to download them.
15.1 Requirements
Write a single program in Python that checks the spelling of the first chapter of “Alice In Wonderland.” First use a linear search, then use a binary search. Print the line number along with the word that does not exist in the dictionary.
Follow the steps below carefully. If you don't know how to accomplish one step, ask before moving on to the next step.
15.2 Steps to complete:
- If you work off the BitBucket template, skip ahead to step 6.
- Find or create a directory for your project.
- Download the dictionary to the directory.
- Download first 200 lines of Alice In Wonderland to your directory.
- Start a Python file for your project.
- It is necessary to split apart the words in the story so that
they may be checked individually. It is also necessary to remove
extra punctuation and white-space. Unfortunately, there is not any good way of
doing this with what the book has covered so far. The code to do this is short,
but a full explanation is beyond the scope of this class. Include the following function
in your program. Remember, function definitions should go at the top of your
program just after the imports. We'll call this function in a later step.
import re # This function takes in a line of text and returns # a list of words in the line. def split_line(line): return re.findall('[A-Za-z]+(?:\'[A-Za-z]+)?',line)
This code uses a regular expression to split the text apart. Regular expressions are very powerful and relatively easy to learn. To learn more about regular expressions, see:
http://regexone.com/ - Read the file dictionary.txt into an array. Go back to the chapter on Searching, or see the searching_example.py for example code on how to do this. This does not have anything to do with the import command, libraries, or modules. Don't call the dictionary word_list or something generic because that will be confusing. Call it dictionary_list or something similar.
- Close the file.
- Print --- Linear Search ---
- Open the file AliceInWonderLand200.txt
- We are not going to read the story into a list. Do not create a new list here like you did with the dictionary.
- Start a for loop to iterate through each line.
- Call the split_line function to split apart the line of text in the story that was just read in. Store the list that the function returns in a new variable named words. Remember, just calling the function won't do anything useful. You need to assign a variable equal (words) to the result. If you've forgotten now to capture the return value from a function, flip back to the functions chapter to find it.
- Start a nested for loop to iterate through each word in the words list. This should be inside the for loop that runs through each line in the file. (One loop for each line, another loop for each word in the line.)
- Using a linear search, check the current word against the words in the dictionary. Check the chapter on searching or the searching_example.py for example code on how to do this. The linear search is just three lines long. When comparing to the word to the other words in the dictionary, convert the word to uppercase. In your while loop just use word.upper() instead of word for the key. This linear search will exist inside the for loop created in the prior step. We are looping through each word in the dictionary, looking for the current word in the line that we just read in.
- If the word was not found, print the word. Don't print anything if you do find the word, that would just be annoying.
- Close the file.
- Make sure the program runs successfully before moving on to the next step.
- Create a new variable that will track the line number that you are on. Print this line number along with the misspelled from the prior step.
- Make sure the program runs successfully before moving on to the next step.
- Print --- Binary Search ---
- The linear search takes quite a while to run. To temporarily disable it, it may be commented out by using three quotes before and after that block of code. Ask if you are unsure how to do this.
- Repeat the same pattern of code as before, but this time use a binary search. Much of the code from the linear search may be copied, and it is only necessary to replace the lines of code that represent the linear search with the binary search.
- Note the speed difference between the two searches.
- Make sure the linear search is re-enabled, if it was disabled while working on the binary search.
- Upload the final program or check in the final program.
15.3 Example Run
--- Linear Search --- Line 3 possible misspelled word: Lewis Line 3 possible misspelled word: Carroll Line 46 possible misspelled word: labelled Line 46 possible misspelled word: MARMALADE Line 58 possible misspelled word: centre Line 59 possible misspelled word: learnt Line 69 possible misspelled word: Antipathies Line 73 possible misspelled word: curtsey Line 73 possible misspelled word: CURTSEYING Line 79 possible misspelled word: Dinah'll Line 80 possible misspelled word: Dinah Line 81 possible misspelled word: Dinah Line 89 possible misspelled word: Dinah Line 89 possible misspelled word: Dinah Line 149 possible misspelled word: flavour Line 150 possible misspelled word: toffee Line 186 possible misspelled word: croquet --- Binary Search --- Line 3 possible misspelled word: Lewis Line 3 possible misspelled word: Carroll Line 46 possible misspelled word: labelled Line 46 possible misspelled word: MARMALADE Line 58 possible misspelled word: centre Line 59 possible misspelled word: learnt Line 69 possible misspelled word: Antipathies Line 73 possible misspelled word: curtsey Line 73 possible misspelled word: CURTSEYING Line 79 possible misspelled word: Dinah'll Line 80 possible misspelled word: Dinah Line 81 possible misspelled word: Dinah Line 89 possible misspelled word: Dinah Line 89 possible misspelled word: Dinah Line 149 possible misspelled word: flavour Line 150 possible misspelled word: toffee Line 186 possible misspelled word: croquet
You are not logged in. Log in here and track your progress.
English version by Paul Vincent Craven
Spanish version by Antonio Rodríguez Verdugo
Russian version by Vladimir Slav
Turkish version by Güray Yildirim
Portuguese version by Armando Marques Sobrinho and Tati Carvalho
Dutch version by Frank Waegeman
Hungarian version by Nagy Attila
Finnish version by Jouko Järvenpää
French version by Franco Rossi
Korean version by Kim Zeung-Il
Chinese version by Kai Lin