Tags: back, bug, file, misunderstanding, morelikely, programming, python, reading, writing

Reading a file and then writing something back

On Programmer » Python

10,743 words with 6 Comments; publish: Tue, 29 Apr 2008 19:09:00 GMT; (20078.00, « »)

Hi All -

I'm not sure, but I'm wondering if this is a bug, or maybe (more

likely) I'm misunderstanding something...see below:

'kevin\n'

'dan\n'

Traceback (most recent call last):

File "<stdin>", line 1, in ?

IOError: (0, 'Error')

I've figured out that I can do an open('testfile', 'r+') and then s

and write something (without an error), but it just seems odd that I

would get an IOError for what I was trying to do. Oh, and I also

tried to do "f.flush()" before the write operation with no luck.

I've searched google, but can't seem to find much. Any thoughts?

TIA,

Kevin

All Comments

Leave a comment...

  • 6 Comments
    • Kevin T. Ryan wrote:

      > I'm not sure, but I'm wondering if this is a bug, or maybe (more

      > likely) I'm misunderstanding something...see below:

      >

      > 'kevin\n'

      >

      > 'dan\n'

      >

      > Traceback (most recent call last):

      > File "<stdin>", line 1, in ?

      > IOError: (0, 'Error')

      This is just a guess, I don't know the inner workings of files in

      Python, but here we go:

      I think that readline() doesn't read one character at a time from the

      file, until it finds a newline, but reads a whole block of characters,

      looks for the first newline and returns that string (for efficiency

      reasons). Due to this buffering, the file pointer position is undefined

      after a readline(), and so a write() afterwards doesn't make sense.

      Python tries to help you not to fall into this trap.

      > I've figured out that I can do an open('testfile', 'r+') and then s

      > and write something (without an error), but it just seems odd that I

      > would get an IOError for what I was trying to do.

      When you do a s(), the file pointer position is clearly defined, so a

      write() makes sense.

      A tentative solution could be:

      pos = f.tell()

      s = f.readline() # Reads 'dan\n'

      f.s(pos + len(s))

      f.write('chris\n')

      However, I'm not sure you want to do that, as the string written will

      just overwrite the previous content, and will probably not be aligned

      with the next newline in the file. Except if you don't care about the

      data following your write.

      Hope this helps.

      -- Remy

      Remove underscore and anti-spam suffix in reply address for a timely

      response.

      #1; Tue, 29 Apr 2008 19:10:00 GMT
    • Python's file object is based on ISO C's file I/O primitives

      (fopen, fread, etc) and inherits both the requirements of the standard

      and any quirks of your OS's C implementation.

      According to this document

      http://www.lysator.liu.se/c/rat/d9.html#4-9-5-3

      a direction change is only permitted after a "flushing" operation

      (fsetpos, fs, rewind, fflush). file.flush calls C's fflush.

      I believe that this C program is equivalent to your Python program:

      #include <stdio.h>

      int main(void) {

      char line[21];

      FILE *f = fopen("testfile", "w");

      fputs("kevin\n", f);

      fputs("dan\n", f);

      fputs("pat\n", f);

      fclose(f);

      f = fopen("testfile", "r+");

      fgets(line, 20, f); printf("%s", line);

      fgets(line, 20, f); printf("%s", line);

      fflush(f);

      if(fputs("chris\n", f) == EOF) { perror("fputs"); }

      fclose(f);

      return 0;

      }

      On my Linux machine, it prints

      kevin

      pat

      and testfile's third and final line is "chris".

      On a windows machine nearby (compiled with mingw, but using msvcrt.dll)

      it prints

      kevin

      dan

      fputs: No error

      and testfile's third and final line is "pat".

      If I add fs(f, 0, SEEK_CUR) after fflush(f), I don't get a failure

      but I do get the curious contents

      kevin

      dan

      pat

      chris

      If I use just fs(f, 0, SEEK_CUR) I get no error and correct

      contents in testfile.

      I don't have a copy of the actual C standard, but even Microsoft's

      documentation says

      When the "r+", "w+", or "a+" access type is specified, both reading

      and writing are allowed (the file is said to be open for “update”).

      However, when you switch between reading and writing, there must be

      an intervening fflush, fsetpos, fs, or rewind operation. The

      current position can be specified for the fsetpos or fs

      operation, if desired.

      http://msdn.microsoft.com/library/d...2c_._wfopen.asp

      so it smells like a bug to me. Do you happen to be using Windows? I

      guess you didn't actually say.

      Jeff

      #2; Tue, 29 Apr 2008 19:11:00 GMT
    • Hi Kevin,

      Even though I am fairly new to Python, it appears that you might of

      found a bug with 'r+' writing / reading mode.

      Here's a couple of suggestions which you might find helpful:

      1) To make your programs faster and less 'error'-prone, you might want

      to read the text file into memory (a list) first, like this:

      f = open("c:/test.txt", "r")

      names = f.readlines() # Read all lines in file and store data in a list.

      for name in names: # Display a listing of all lines in the file.

      print name

      f.close() # Close the text file (we'll change it later).

      2) When you want to add new names to the "text file" (in memory), you

      can easily do so by doing this:

      names = names + ["William\n"] # Adds William to the list (text file

      in memory).

      names = names + ["Steven\n"] # Adds Steven to the list.

      names = names + ["Tony\n"] # Adds Tony to the list also.

      3) If you wish to sort the list in memory, you can do this:

      names.sort() # Places the names in the list now in ascending

      order (A - Z).

      4) Finally, to re-write the text file on the disk, you can do this:

      f = open("c:/test.txt", "w") # Re-write the file from scratch with

      revised info.

      for name in names: # For each name that is in the list (names)

      f.write(name) # Write it to the file.

      f.close() # Finally, since the file has now been 100%

      rewritten with new data, close it.

      Why does this have advantages? Several reasons, which are:

      1) It does the processing in the memory, which is much quicker. Faster

      programs are always a nice feature!

      2) It allows for additional processes to occur, such as sorting, etc.

      3) It reduces the chances of "having a disk problem." One simple read &

      one simple write.

      Hope this helps,

      Byron

      --

      #3; Tue, 29 Apr 2008 19:12:00 GMT
    • Opps, forgot to add one extra thing:

      ---

      If you would like to see all of the names from your "names" list, you

      can do the following:

      for name in names:

      print name

      This provides you with the results:

      Dan

      Kevin

      Pat

      Steven

      Tony

      William

      If you would like to see the first and fourth items in the list, you can

      do the following:

      print names[0] # Display the first item in the list. First item

      always starts with zero.

      print names[3] # Display the fourth item in the list.

      Result is:

      Dan

      Steven

      Finally, if you would like to remove an item from the list:

      del names[3]

      Hope this helps!

      Byron

      --

      #4; Tue, 29 Apr 2008 19:13:00 GMT
    • Remy Blank wrote:

      > Kevin T. Ryan wrote:

      > This is just a guess, I don't know the inner workings of files in

      > Python, but here we go:

      > I think that readline() doesn't read one character at a time from the

      > file, until it finds a newline, but reads a whole block of characters,

      > looks for the first newline and returns that string (for efficiency

      > reasons). Due to this buffering, the file pointer position is undefined

      > after a readline(), and so a write() afterwards doesn't make sense.

      > Python tries to help you not to fall into this trap.

      >

      > When you do a s(), the file pointer position is clearly defined, so a

      > write() makes sense.

      > A tentative solution could be:

      > pos = f.tell()

      > s = f.readline() # Reads 'dan\n'

      > f.s(pos + len(s))

      > f.write('chris\n')

      > However, I'm not sure you want to do that, as the string written will

      > just overwrite the previous content, and will probably not be aligned

      > with the next newline in the file. Except if you don't care about the

      > data following your write.

      > Hope this helps.

      > -- Remy

      >

      > Remove underscore and anti-spam suffix in reply address for a timely

      > response.

      Thanks all for the suggestions. I'm guessing that what Remy stated seems to

      be about right. Jeff: I AM using windows (or at least, was today while i

      was writing that script)...if Remy was wrong, then it still might be a bug

      though - I tried to do the f.flush(), but the error still occurred.

      Byron - thanks for the advice. For my simple example, you're totally

      correct, but I was thinking along the lines of a much bigger file w/ tons

      of records - and therefore didn't want to slurp everything in to memory in

      case the file got to be too big. Maybe I'm wrong though - I don't know how

      much a "normal" computer could hold in memory (maybe 100's of 1,000's of

      lines?).

      Oh well, thanks again :)

      #5; Tue, 29 Apr 2008 19:14:00 GMT