Skip to content Skip to sidebar Skip to footer

Fast/efficient Counting Of List Of Space Delimited Strings In Python

Given the input: x = ['foo bar', 'bar blah', 'black sheep'] I could do this to get the count of each word in the list of space delimited string: from itertools import chain from c

Solution 1:

Assuming you are in Python 3x, both chain(*map(str.split, x)) and simple iteration will create intermediate lists sequentially from each line; this will not take up much memory in either case. Performance should be very close and may be implementation-dependent.

However, it is most efficient memory-wise to create a generator function to feed Counter(). Either way you use string.split(), it creates intermediate lists which are not necessary. This could cause slowdown if you have a particularly long line, but to be honest it's unlikely.

Such a generator function is described below. Note that I am using optional typing for clarity.

from typing import Iterable, Generator
def gen_words(strings: Iterable[str]) -> Generator[str]:
    forstringin strings:
        start = 0for i, charinenumerate(string):
            ifchar == ' ':
                if start != i:
                    yieldstring[start:i]
                start = i
        if start != i:
            yieldstring[start:i]
c = counter(gen_words(strings))

Solution 2:

The answer to your question is profiling.

Following are some profiling tools:

Post a Comment for "Fast/efficient Counting Of List Of Space Delimited Strings In Python"