Why Python Does Not See All The Rows In A File?
I count number of rows (lines) in a file using Python in the following method: n = 0 for line in file('input.txt'): n += 1 print n I run this script under Windows. Then I count
Solution 1:
You most likely have a file with one or more DOS EOF (CTRL-Z) characters in it, ASCII codepoint 0x1A. When Windows opens a file in text mode, it'll still honour the old DOS semantics and end a file whenever it reads that character. See Line reading chokes on 0x1A.
Only by opening a file in binary mode can you bypass this behaviour. To do so and still count lines, you have two options:
read in chunks, then count the number of line separators in each chunk:
defbufcount(filename, linesep=os.linesep, buf_size=2 ** 15): lines = 0withopen(filename, 'rb') as f: last = ''for buf initer(f.read, ''): lines += buf.count(linesep) if last and last + buf[0] == linesep: # count line separators straddling a boundary lines += 1iflen(linesep) > 1: last = buf[-1] return lines
Take into account that on Windows
os.linesep
is set to\r\n
, adjust as needed for your file; in binary mode line separators are not translated to\n
.Use
io.open()
; theio
set of file objects open the file in binary mode always, then do the translations themselves:import io with io.open(filename) as f: lines = sum(1for line in f)
Post a Comment for "Why Python Does Not See All The Rows In A File?"