Tags: bysubprocess, encodage, hazard, hifrom, popen, programming, python, return, stdout, strings, subprocesspopen, time, windows

Windows, subprocess.Popen & encodage

On Programmer » Python

1,912 words with 2 Comments; publish: Sat, 26 Apr 2008 22:51:00 GMT; (20078.13, « »)

Hi!

From long time, I have problems with strings return, in Windows, by

subprocess.Popen / stdout.read()

Last night, I found, by hazard, than if the second byte equal 0, it's,

perhaps, the solution.

With a code like this:

p=subprocess.Popen(u850("cmd /u/c ...

tdata=p.stdout.read()

if ord(tdata[1])==0:

data=tdata.decode('utf-16')

else:

data=tdata.decode('cp850')

Diffrents scripts seem run OK. I had try with:

- common dir

- dir on unicode-named-files

- ping

- various commands

But, I don't found anything, in any documentations, on this.

Sombody can confirm? Am I misled? Am I right?

* sorry for my bad english*

.python.todaysummary.com.-salutations

Michel Claveau

All Comments

Leave a comment...

  • 2 Comments
    • > But, I don't found anything, in any documentations, on this.

      > Sombody can confirm? Am I misled? Am I right?

      You are right, and you are misled. The encoding of the data

      that you get from Popen.read is not under the control of Python:

      i.e. not only you don't know, but Python doesn't know, either.

      The operating system simply has no mechanism of indicating

      what encoding is used on a pipe.

      So different processes may chose different encodings. Some

      may produce UTF-16, others may produce CP-850, yet others

      UTF-8, and so on. There really is no way to tell other than

      reading the documentation *of the program you run*, and,

      failing that, reading the source code of the program you

      run.

      On Windows, many programs will indeed use one of the

      two system code pages, or UTF-16. It's true that

      UTF-16 can be quite reliably detected by looking at the

      first two bytes. However, the two system code pages

      (OEM CP and ANSI CP) are not so easy to tell apart.

      Regards,

      Martin

      #1; Sat, 26 Apr 2008 22:52:00 GMT
    • Thank you.

      .python.todaysummary.com.-salutations

      Michel Claveau

      #2; Sat, 26 Apr 2008 22:53:00 GMT