It should run at least as fast as the regex (likely faster), and it's far less error-prone, since no character has special meaning (translation tables are just mappings from Unicode ordinals to None, meaning delete, another ordinal, meaning single character replacement, or a string, meaning char -> multichar replacement they don't have a concept of special escapes). Killpunctuation = str.maketrans('', '', replace the line: text = text = anslate(killpunctuation) First off, outside the function, make a translation table of the things to remove: # The redundant - is harmless here since the result is a dict which dedupes anyway Instead, replace that line with a simple str.translate call. You could fix this by just removing the second - in your character class (you already included it at the beginning of the class where it doesn't need to be escaped), changing from text = "", text)īut I'm going to suggest dropping regular expressions here the risk of mistakes with lots of literal punctuation is high, and there are other methods that don't involve regex at all that should work just fine and not make you worry if you escaped all the important stuff (the alternative is over-escaping, which makes the regex unreadable, and still error-prone). =-}, you'd have silently removed all characters from ordinal 61 to 125 inclusive, which would have included, along with a mess of punctuation, all standard ASCII letters, both lower and uppercase. In a way you got lucky if the characters around the - had been reversed, e.g. Since character ranges must go from low ordinal to high ordinal, 125->61 is nonsensical, thus the error. Your character class (as shown in the traceback) is invalid } comes after = in ordinal value ( } is 125, = is 61), and the - in between them means it's trying to match any character from }'s ordinal to ='s and in between. I am using Python 3.6, specifically the Anaconda build Anaconda3-2018.12-Windows-x86_64. Raise source.error(msg, len(this) + 1 + len(that)) P = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)įile "C:\Users\hp\Anaconda3\lib\sre_parse.py", line 426, in _parse_subįile "C:\Users\hp\Anaconda3\lib\sre_parse.py", line 580, in _parse Return _compile(pattern, flags).sub(repl, string, count)įile "C:\Users\hp\Anaconda3\lib\re.py", line 286, in _compileįile "C:\Users\hp\Anaconda3\lib\sre_compile.py", line 764, in compileįile "C:\Users\hp\Anaconda3\lib\sre_parse.py", line 930, in parse Text = "C:\Users\hp\Anaconda3\lib\re.py", line 192, in sub : clean_questions.append(clean_text(question)) The console error says: In :clean_questions= I reopened the spyder and the list got full but without being cleaned and then reopened it and I got it empty. I need to clean some text like the code below says: import reĬlean_questions.append(clean_text(question))Īnd this code must give me the questions list clean but I got the clean questions empty.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |