Skip to content Skip to sidebar Skip to footer

Writing Huge Strings In Python

I have a very long string, almost a megabyte long, that I need to write to a text file. The regular file = open('file.txt','w') file.write(string) file.close() works but is too sl

Solution 1:

Your issue is that str(long) is very slow for large intergers (millions of digits) in Python. It is a quadratic operation (in number of digits) in Python i.e., for ~1e8 digits it may require ~1e16 operations to convert the integer to a decimal string.

Writing to a file 500MB should not take hours e.g.:

$ python3 -c 'open("file", "w").write("a"*500*1000000)'

returns almost immediately. ls -l file confirms that the file is created and it has the expected size.

Calculating math.factorial(67867957) (the result has ~500M digits) may take several hours but saving it using pickle is instantaneous:

import math
import pickle

n = math.factorial(67867957) # takes a long timewithopen("file.pickle", "wb") as file:
    pickle.dump(n, file) # very fast (comparatively)

To load it back using n = pickle.load(open('file.pickle', 'rb')) takes less than a second.

str(n) is still running (after 50 hours) on my machine.

To get the decimal representation fast, you could use gmpy2:

$ python -c'import gmpy2;open("file.gmpy2", "w").write(str(gmpy2.fac(67867957)))'

It takes less than 10 minutes on my machine.

Solution 2:

ok this is really not an answer it is more to prove your reasoning for the delay wrong

first test write speed of a big string

import timeit
 defwrite_big_str(n_bytes=1000000):
     withopen("test_file.txt","wb") as f:
          f.write("a"*n_bytes)
 print timeit.timeit("write_big_str()","from __main__ import write_big_str",number=100)

you should see a fairly respectable speed (and thats to repeat it 100 times)

next we will see how long it takes to convert a very big number to a str

import timeit,math
n = math.factorial(200000)
print timeit.timeit("str(n)","from __main__ import n",number=1)

it will probably take ~10seconds (and that is a million digit number) , which granted is slow ... but not hours slow (ok its pretty slow to convert to string :P... but still shouldnt take hours) (well it took more like 243 seconds for my box i guess :P)

Post a Comment for "Writing Huge Strings In Python"