Tags: assuming, detect, detection, eof, filecontaining, graceful, gracefully, load, number, objects, pickle, programming, python, unknown

Graceful detection of EOF

On Programmer » Python

20,330 words with 20 Comments; publish: Wed, 26 Dec 2007 23:50:00 GMT; (200109.38, « »)

How does one detect the EOF gracefully? Assuming I have a pickle file

containing an unknown number of objects, how can I read (i.e.,

pickle.load()) until the EOF is encountered without generating an EOF

exception?

Thanks for any assistance.

MickeyBob

All Comments

Leave a comment...

  • 20 Comments
    • Write a file-like object that can "look ahead" and provide a flag to

      check in your unpickling loop, and which implements enough of the file

      protocol ("read" and "readline", apparently) to please pickle. The

      following worked for me.

      class PeekyFile:

      def __init__(self, f):

      self.f = f

      self.peek = ""

      def eofnext(self):

      if self.peek: return False

      try:

      self.peek = self.f.read(1)

      except EOFError:

      return True

      return not self.peek

      def read(self, n=None):

      if n is not None:

      n = n - len(self.peek)

      result = self.peek + self.f.read(n)

      else:

      result = self.peek + self.f.read()

      self.peek = ""

      return result

      def readline(self):

      result = self.peek + self.f.readline()

      self.peek = ""

      return result

      import StringIO, pickle

      o = StringIO.StringIO()

      for x in range(5):

      pickle.dump(x, o)

      i = PeekyFile(StringIO.StringIO(o.getvalue()))

      while 1:

      i.eofnext()

      if i.eofnext():

      break

      print pickle.load(i)

      print "at the end"

      --BEGIN PGP SIGNATURE--

      Version: GnuPG v1.2.1 (GNU/Linux)

      iD8DBQFBZZsVJd01MZaTXX0RAl0FAJ9GCBIWmLaS+UbhCgZGR6 PlJ94c4QCePq/k

      x9c7Hokjaj+RpSYryvEwCJ8=

      =sIw8

      --END PGP SIGNATURE--

      #1; Wed, 26 Dec 2007 23:51:00 GMT
    • MickeyBob wrote:

      > How does one detect the EOF gracefully? Assuming I have a pickle file

      > containing an unknown number of objects, how can I read (i.e.,

      > pickle.load()) until the EOF is encountered without generating an EOF

      > exception?

      Why isn't catching the exception graceful?

      # UNTESTED CODE

      def load_pickle_iter(infile):

      while 1:

      try:

      yield pickle.load(infile)

      except EOFError:

      break

      for obj in load_pickle_iter(open("mydata.pickle", "rb")):

      print obj

      This is well in line with the normal Python idiom,

      as compared to "look before you leap".

      Andrew

      dalke.python.todaysummary.com.dalkescientific.com

      #2; Wed, 26 Dec 2007 23:52:00 GMT
    • Andrew Dalke wrote:

      > MickeyBob wrote:

      >> How does one detect the EOF gracefully? Assuming I have a pickle file

      >> containing an unknown number of objects, how can I read (i.e.,

      >> pickle.load()) until the EOF is encountered without generating an EOF

      >> exception?

      >

      > Why isn't catching the exception graceful?

      > # UNTESTED CODE

      > def load_pickle_iter(infile):

      > while 1:

      > try:

      > yield pickle.load(infile)

      > except EOFError:

      > break

      > for obj in load_pickle_iter(open("mydata.pickle", "rb")):

      > print obj

      >

      > This is well in line with the normal Python idiom,

      > as compared to "look before you leap".

      > Andrew

      > dalke.python.todaysummary.com.dalkescientific.com

      So, what you're saying is that the Python way, in contradistinction to

      "look before you leap", is "land in it, then wipe it off?" Can we get

      that in the Zen of Python? :-)

      Seriously, this is beautiful. I understand generators, but haven't

      become accustomed to using them yet. That is just beautiful, which _is_

      Zen.

      Jeremy Jones

      #3; Wed, 26 Dec 2007 23:53:00 GMT
    • A file is too large to fit into memory.

      The first line must receive a special treatment, because

      it contains information about how to handle the rest of the file.

      Of course it is not difficult to test if you are reading the first line

      or another one, but it hurts my feelings to do a test which by definition

      succeeds at the first record, and never afterwards.

      Any suggestions ?

      egbert

      --

      Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991

      ================================================== ======================

      #4; Wed, 26 Dec 2007 23:54:00 GMT
    • Egbert Bouwman wrote:

      > A file is too large to fit into memory.

      > The first line must receive a special treatment, because

      > it contains**information*about*how*to*handle*t he*rest*of*the*file.

      > Of course it is not difficult to test if you are reading the first line

      > or another one, but it hurts my feelings to do a test which by definition

      > succeeds at the first record, and never afterwards.

      >>> lines = iter("abc")

      >>> for first in lines:

      ... print first

      ... break

      ...

      a

      >>> for line in lines:

      ... print line

      ...

      b

      c

      Unless it hurts your feelings to unconditionally break out of a for-loop,

      that is.

      Peter

      #5; Wed, 26 Dec 2007 23:55:00 GMT
    • Peter Otten wrote:

      > >>> lines = iter("abc")

      > >>> for first in lines:

      > ... print first

      > ... break

      > ...

      > a

      > >>> for line in lines:

      > ... print line

      > ...

      > b

      > c

      > Unless it hurts your feelings to unconditionally break out of a for-loop,

      > that is.

      How about:

      >>> lines = iter("abc")

      >>> first = lines.next()

      >>> print first

      a

      >>> for line in lines:

      ... print line

      ...

      b

      c

      Would hurt less feeling I presume.

      Gerrit.

      --

      Weather in Twenthe, Netherlands 08/10 11:25:

      11.0°C Few clouds mostly cloudy wind 0.9 m/s None (57 m above NAP)

      --

      In the councils of government, we must guard against the acquisition of

      unwarranted influence, whether sought or unsought, by the

      military-industrial complex. The potential for the disastrous rise of

      misplaced power exists and will persist.

      -Dwight David Eisenhower, January 17, 1961

      #6; Wed, 26 Dec 2007 23:56:00 GMT
    • Jeremy Jones <zanesdad.python.todaysummary.com.bellsouth.net> wrote:

      ...

      > > This is well in line with the normal Python idiom,

      > > as compared to "look before you leap".

      > > Andrew

      > > dalke.python.todaysummary.com.dalkescientific.com

      > So, what you're saying is that the Python way, in contradistinction to

      > "look before you leap", is "land in it, then wipe it off?" Can we get

      > that in the Zen of Python? :-)

      The "normal Python idiom" is often called, in honor and memory of

      Admiral Grace Murray-Hopper (arguably the most significant woman in the

      history of programming languages to this time), "it's Easier to Ask

      Forgiveness than Permission" (EAFP, vs the LBYL alternative). This

      motto has been attributed to many, but Ms Hopper was undoubtedly the

      first one to use it reportedly and in our field.

      In the general case, trying to ascertain that an operation will succeed

      before attempting the operation has many problems. Often you end up

      repeating the same steps between the ascertaining and the actual usage,

      which offends the "Once and Only Once" principle as well as slowing

      things down. Sometimes you cannot ensure that the ascertaining and the

      operating pertain to exactly the same thing -- the world can have

      changed in-between, or the code might present subtle differences between

      the two cases.

      In contrast, if a failed attempt can be guaranteed to not alter

      persistent state and only result in an easily catchable exception, EAFP

      can better deliver on its name. In terms of your analogy, there's

      nothing to "wipe off" -- if the leap "misfires", no damage is done.

      Alex

      #7; Wed, 26 Dec 2007 23:57:00 GMT
    • Egbert Bouwman <egbert.list.python.todaysummary.com.hccnet.nl> wrote:

      > A file is too large to fit into memory.

      > The first line must receive a special treatment, because

      > it contains information about how to handle the rest of the file.

      > Of course it is not difficult to test if you are reading the first line

      > or another one, but it hurts my feelings to do a test which by definition

      > succeeds at the first record, and never afterwards.

      option 1, the one I would use:

      thefile = open('somehugefile.txt')

      first_line = thefile.next()

      deal_with_first(first_line)

      for line in thefile:

      deal_with_other(line)

      this requires Python 2.3 or better, so that thefile IS-AN iterator; in

      2.2, get an iterator with foo=iter(thefile) and use .next and for on

      that (better still, upgrade!).

      option 2, not unreasonable (not repeating the open & calls...):

      first_line = thefile.readline()

      for line in thefile: ...

      option 3, a bit cutesy:

      for first_line in thefile: break

      for line in thefile: ...

      (again, in 2.2 you'll need some foo=iter(thefile)).

      I'm sure there are others, but 3 is at least 2 too many already,

      so...;-)

      Alex

      #8; Wed, 26 Dec 2007 23:58:00 GMT
    • Gerrit wrote:

      >>>> first = lines.next()

      [as opposed to 'for first in lines: break']

      > Would hurt less feeling I presume.

      >>> iter("").next()

      Traceback (most recent call last):

      File "<stdin>", line 1, in ?

      StopIteration

      I feel a little uneasy with that ...unless I'm sure I want to deal with the

      StopIteration elsewhere.

      Looking at it from another angle, the initial for-loop ist just a peculiar

      way to deal with an empty iterable. So the best (i. e. clear, robust and

      general) approach is probably

      items = iter(...)

      try:

      first = items.next()

      except StopIteration:

      # deal with empty iterator, e. g.:

      raise ValueError("need at least one item")

      else:

      # process remaining data

      part of which is indeed your suggestion.

      Peter

      #9; Wed, 26 Dec 2007 23:59:00 GMT
    • > >>> lines = iter("abc")

      > >>> first = lines.next()

      > >>> print first

      > a

      > >>> for line in lines:

      > ... print line

      > ...

      > b

      > c

      > Would hurt less feeling I presume.

      Unless it was empty, then you'd get the dreaded StopIteration!

      IMO, unconditionally breaking out of a for loop is the nicer way of

      handling things in this case, no exceptions to catch.

      - Josiah

      #10; Thu, 27 Dec 2007 00:00:00 GMT
    • In article <mailman.4563.1097227579.5135.python-list.python.todaysummary.com.python.org>,

      Egbert Bouwman <egbert.list.python.todaysummary.com.hccnet.nl> wrote:

      >A file is too large to fit into memory.

      >The first line must receive a special treatment, because

      >it contains information about how to handle the rest of the file.

      >Of course it is not difficult to test if you are reading the first line

      >or another one, but it hurts my feelings to do a test which by definition

      >succeeds at the first record, and never afterwards.

      >Any suggestions ?

      f = file("lines.txt", "rt")

      first_line_processing (f.readline())

      for line in f:

      line_processing (line)

      ought to work.

      Regards. Mel.

      #11; Thu, 27 Dec 2007 00:01:00 GMT
    • Peter Otten <__peter__.python.todaysummary.com.web.de> wrote:

      ...

      > Looking at it from another angle, the initial for-loop ist just a peculiar

      > way to deal with an empty iterable. So the best (i. e. clear, robust and

      > general) approach is probably

      > items = iter(...)

      > try:

      > first = items.next()

      > except StopIteration:

      > # deal with empty iterator, e. g.:

      > raise ValueError("need at least one item")

      > else:

      > # process remaining data

      I think it can't be optimal, as coded, because it's more nested than it

      needs to be (and "flat is better than nested"): since the exception

      handler doesn't fall through, I would omit the try statement's else

      clause and outdent the "process remaining data" part. The else clause

      would be needed if the except clause could fall through, though.

      Alex

      #12; Thu, 27 Dec 2007 00:02:00 GMT
    • Josiah Carlson <jcarlson <at> uci.edu> writes:

      > IMO, unconditionally breaking out of a for loop is the nicer way of

      > handling things in this case, no exceptions to catch.

      There's still a NameError to catch if you haven't initialized line:

      >>> for line in []:

      ... break

      ...

      >>> line

      Traceback (most recent call last):

      File "<stdin>", line 1, in ?

      NameError: name 'line' is not defined

      I don't much like the break out of a for loop, because it feels like a misuse

      of a construct designed for iteration... But take your pick: StopIteration or

      NameError. =)

      Steve

      #13; Thu, 27 Dec 2007 00:03:00 GMT
    • Steven Bethard wrote:

      > There's still a NameError to catch if you haven't initialized line:

      >>>> for line in []:

      > ... break

      > ...

      >>>> line

      > Traceback (most recent call last):

      > File "<stdin>", line 1, in ?

      > NameError: name 'line' is not defined

      No, you would put code specific to the first line into the loop before the

      break.

      > I don't much like the break out of a for loop, because it feels like a

      > misuse

      I can understand that.

      Peter

      #14; Thu, 27 Dec 2007 00:04:00 GMT
    • Alex Martelli wrote:

      > Peter Otten <__peter__.python.todaysummary.com.web.de> wrote:

      > ...

      >> Looking at it from another angle, the initial for-loop ist just a

      >> peculiar way to deal with an empty iterable. So the best (i. e. clear,

      >> robust and general) approach is probably

      >>

      >> items = iter(...)

      >> try:

      >> first = items.next()

      >> except StopIteration:

      >> # deal with empty iterator, e. g.:

      >> raise ValueError("need at least one item")

      >> else:

      >> # process remaining data

      > I think it can't be optimal, as coded, because it's more nested than it

      > needs to be (and "flat is better than nested"): since the exception

      > handler doesn't fall through, I would omit the try statement's else

      > clause and outdent the "process remaining data" part. The else clause

      > would be needed if the except clause could fall through, though.

      I relied more on the two letters 'e. g.' than I should have as there are two

      different aspects I wanted to convey:

      1. Don't let the StopIteration propagate:

      items = iter(...)

      try:

      first = items.next()

      except StopIteration:

      raise MeaningfulException("clear indication of what caused the error")

      2. General structure when handling the first item specially:

      items = iter(...)

      try:

      first = items.next()

      except StopIteration:

      # handle error

      else:

      # a. code relying on 'first'

      # b. code independent of 'first' or relying on the error handler

      # defining a proper default.

      where both (a) and (b) are optional.

      As we have now two variants, I have to drop the claim to generality.

      Regarding the Zen incantation, "flat is better than nested", I tend measure

      nesting as max(indent level) rather than avg(), i. e. following my (perhaps

      odd) notion the else clause would affect nesting only if it contained an

      additional if, for, etc. Therefore I have no qualms to sometimes use else

      where it doesn't affect control flow:

      def whosAfraidOf(color):

      if color == red:

      return peopleAfraidOfRed

      else:

      # if it ain't red it must be yellow - nobody's afraid of blue

      return peopleAfraidOfYellow

      as opposed to

      def whosAfraidOf(color):

      if color == red:

      return peopleAfraidOfRed

      return peopleAfraidOfAnyOtherColor

      That said, usually my programs have bigger problems than the above subtlety.

      Peter

      #15; Thu, 27 Dec 2007 00:05:00 GMT
    • On Fri, Oct 08, 2004 at 11:59:32AM +0200, Alex Martelli wrote:

      > option 3, a bit cutesy:

      > for first_line in thefile: break

      > for line in thefile: ...

      > (again, in 2.2 you'll need some foo=iter(thefile)).

      This technique depends in the file being positioned at line 2,

      after the break.

      However, In the Nutshell book, page 191, you write:

      > Interrupting such a loop prematurely (e.g. with break)

      > leaves the file's current position with an arbitrary value.

      So the information about the current position is useless.

      Do I discover a contradiction ?

      egbert

      --

      Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991

      ================================================== ======================

      #16; Thu, 27 Dec 2007 00:06:00 GMT
    • Egbert Bouwman <egbert.list.python.todaysummary.com.hccnet.nl> wrote:

      > On Fri, Oct 08, 2004 at 11:59:32AM +0200, Alex Martelli wrote:

      > > option 3, a bit cutesy:

      > > for first_line in thefile: break

      > > for line in thefile: ...

      > > (again, in 2.2 you'll need some foo=iter(thefile)).

      > This technique depends in the file being positioned at line 2,

      > after the break.

      Not exactly, if by "being positioned" you mean what's normally meant for

      file objects (what will thefile.tell() respond, what next five bytes

      will thefile.read(5) read, and so on). All it depends on is the

      _iterator_ on the file being "positioned" in the sense in which

      iterators are positioned (what item will come if you call next on the

      iterator).

      In 2.3 a file is-an iterator; in 2.2 you need to explicitly get an

      iterator as indicated in the parenthesis you've also quoted.

      > However, In the Nutshell book, page 191, you write:

      > > Interrupting such a loop prematurely (e.g. with break)

      > > leaves the file's current position with an arbitrary value.

      > So the information about the current position is useless.

      > Do I discover a contradiction ?

      Nope -- the file's current position is (e.g.) what tell will respond if

      you call it, and that IS arbitrary. In 2.2 (which is what the Nutshell

      covers) you need to explicitly get an iterator to do anything else; in

      2.3 you can rely on the fact that a file is its own iterator to make

      your code simpler. But the iteration state is not connected with the

      file's current position.

      Alex

      #17; Thu, 27 Dec 2007 00:07:00 GMT
    • On Sun, Oct 10, 2004 at 12:41:37AM +0200, Alex Martelli wrote:

      > ...

      > But the iteration state is not connected with the

      > file's current position.

      That is very useful information.Thanks.

      egbert

      --

      Egbert Bouwman - Keizersgracht 197 II - 1016 DS Amsterdam - 020 6257991

      ================================================== ======================

      #18; Thu, 27 Dec 2007 00:08:00 GMT
    • Steven Bethard <steven.bethard.python.todaysummary.com.gmail.com> wrote:

      ...

      > I don't much like the break out of a for loop, because it feels like a misuse

      > of a construct designed for iteration... But take your pick: StopIteration or

      > NameError. =)

      Jacopini and Bohm have much to answer for...;-)

      Alex

      #19; Thu, 27 Dec 2007 00:10:00 GMT
    • Egbert Bouwman <egbert.list.python.todaysummary.com.hccnet.nl> wrote in message news:<mailman.4563.1097227579.5135.python-list.python.todaysummary.com.python.org>...

      > Of course it is not difficult to test if you are reading the first line

      > or another one, but it hurts my feelings to do a test which by definition

      > succeeds at the first record, and never afterwards.

      > Any suggestions ?

      An alternative approach (which I'm sure will offend just as many

      sensibilities) is to use a function that replaces itself.

      -- Pseudo-code --

      def process_firstline(...):

      # ...do something here...

      global processline

      processline = process_otherlines

      def process_otherlines(...):

      # ...do something here...

      processline = process_firstline

      for line in file:

      result = processline(line)

      ---------

      If you read more than one file you'll need to reset processline at the

      beginning of each file.

      I never said this was a *good* way, just one way. :-)

      --Phil.

      #20; Thu, 27 Dec 2007 00:10:00 GMT