Yep, not so simple - remember I said no extensions, but with extensions I would use Regex and this is a great thing to know about.
I would decide first if we are going to:
a) define word characters and everything else is a word deliminator, or
b) define word deliminators and everthing else is a word character
I would go with the first, maybe:
lower and upper case letters (a-zA-Z) and numbers (0-9) and perhaps hypen and underscore.
To do this I would have a subroutine that sets a flag as word or not for any character, and I would probably use ascii character codes for this.
Just a thought - to handle other languages I might go with option b.
I would decide first if we are going to:
a) define word characters and everything else is a word deliminator, or
b) define word deliminators and everthing else is a word character
I would go with the first, maybe:
lower and upper case letters (a-zA-Z) and numbers (0-9) and perhaps hypen and underscore.
To do this I would have a subroutine that sets a flag as word or not for any character, and I would probably use ascii character codes for this.
Just a thought - to handle other languages I might go with option b.