regex - Python regular expressions and CMD

问题描述:

I'm having some problems with a piece of python work. I have to write a piece of code that is run through CMD. I need it to then open a file the user states and count the number of each alphabetical characters it contains.

So far I have this, which I can run through CDM, and state a file to open. I've messed around with regular expressions, still can't figure out how to count individual characters. Any ideas? sorry if I explained this badly.

import sys

import re

filename = raw_input()

count = 0

datafile=open(filename, 'r')

网友答案:

I'd stay away from regexes. They'll be slow and ugly. Instead, read the entire file into a string, and use the built-in string method count to count the characters.

To put it together for you:

filename = raw_input()
datafile=open(filename, 'r')
data = datafile.read()
datafile.close() # Don't forget to close the file!
counts = {} # make sure counts is an empty dictionary
data = data.lower() # convert data to lowercase
for k in range(97, 123): # letters a to z are ASCII codes 97 to 122
character = chr(k) # get the ASCII character from the number
counts[character] = data.count(character)

Then, you have a dictionary counts containing all the counts. For example, counts['a'] gives you the number of as in the file. Or, for the entire list of counts, do counts.items().

网友答案:

The Counter type is useful for counting items. It was added in python 2.7:

import collections
counts = collections.Counter()
for line in datafile:
# remove the EOL and iterate over each character
#if you desire the counts to be case insensitive, replace line.rstrip() with line.rstrip().lower()
for c in line.rstrip():
# Missing items default to 0, so there is no special code for new characters
counts[c] += 1

For counting the number of distinct characters, use Counter as already adviced.

网友答案:

Regular expressions are useful if you want to find complex patterns in a string. Because you want to count (as opposed to find) simple (just single alphabetic characters) “patterns”, regular expressions are not the tool of choice here.

If I understand correctly what you are trying, the most transparent way to solve this is to iterate over all lines, and to iterate over all characters in that line, and if that character is alphabetic, add 1 to a corresponding dictionary entry. In code:

filename=raw_input()
found = {}
with open(filename) as file:
for line in file:
for character in line:
if character in "abcdefghijklmnopqrstuvxyz":
# Checking `in (explicit string)` is not quick, but transparent.
# You can use something like `character.isalpha()` if you want it to
# automatically depend on your locale.
found[character] = found.get(character, 0)+1
# If there is no dictionary entry for character yet, assume default 0
# If you need e.g. small and capital letters counted together,
# "Normalize" them to one particular type, for example using
# found[character.upper()] = found.get(character, 0)+1

After this loop has run through the file, the dictionary found will contain the number of occurences for each character.