Wednesday, July 30, 2008

Spelling & Other Problems

For no particularly good reason, I occasionally review the stuff I've posted to my blog. I don’t post very often and most of the ones that make it there are some form of complaint. So blogging is therapy? Anyway, I usually write the postings in Word because my spelling is so awful. Once I've gotten rid of all the squiggly red lines I'm a happy camper and post it to the site. Re-reading a few of the posts reveals not only that I also have terrible grammar in places but that I apparently also use random words here and there. This crops up in all of my writing and allows me to identify my own work. If you plagiarize from me you get a sprinkling of stupid words in with the other stuff. This also shows up in comments that I put in the code I write. No, you didn’t read that wrong, it clearly says that I write comments in my code. It's a form of madness. By this point one or both of my readers may have thought ahead a bit and started a response telling me to turn on the grammar checker. I have that on but it's pretty easy to confuse it. Some of my longer, more wandering sentences live forever with green squiggly lines under them. A quick mouse-over reveals: "Fragment (consider revising)" which is what I do. I consider revising it and then move on. Once I've accumulated a lot of green squiggly lines and a few red ones under the words like 'blogging' which I can’t be bothered to add to the Word dictionary, the arrival of a real hint of a mistake is lost in the weeds. So if I type is instead of as or was or instead of wash or saw, Word hasn’t got a clue that I've goofed. So this is my candidate for an AI project - let's help Nigel get his writing in better shape. If you have a copy of "The Elements of Style" by Strunk and White, you'll know that there are a whole bunch of rules that can be applied to writing so that it doesn’t read like crap. Surely those would be easy to code up in Prolog? Add to that some wisdom from a few hundred editors as to what sort of structure to use and I'm pretty sure you've got something useful. After all one human editor could easily find the goofs with a single pass. It's all about practice and recognizing the faults in what should be clean patterns or prose. That's supposed to be what AI systems are good at. Any takers?

No comments: