Tags: csv, field, file, liketext, programming, python, split, text, xxx, yyy, zzz

Split with python

On Programmer » Python

7,118 words with 8 Comments; publish: Thu, 08 May 2008 01:10:00 GMT; (20078.13, « »)

Hello,

I have a csv file which is has a field that has something like:

text.csv

"text (xxx)"

"text (text) (yyy)"

"text (text) (text) (zzz)"

I would like to split the last '(text)' out and put it in a new column,

so that I get:

new_test.csv

"text","(xxx)"

"text (text)","(yyy)"

"text (text) (text)","(zzz)"

how can this be done?

Thanks

Norman

All Comments

Leave a comment...

  • 8 Comments
    • Norman Khine wrote:

      > Hello,

      > I have a csv file which is has a field that has something like:

      > text.csv

      > "text (xxx)"

      > "text (text) (yyy)"

      > "text (text) (text) (zzz)"

      > I would like to split the last '(text)' out and put it in a new column,

      > so that I get:

      > new_test.csv

      > "text","(xxx)"

      > "text (text)","(yyy)"

      > "text (text) (text)","(zzz)"

      > how can this be done?

      line.rsplit(None, 1)

      seems to do the trick for me:

      ... ("text (xxx)", ("text","(xxx)")),

      ... ("text (text) (yyy)", ("text (text)","(yyy)")),

      ... ("text (text) (text) (zzz)", ("text (text) (text)","(zzz)"))

      ... ]

      ... r = test.rsplit(None,1)

      ... if r[0] <> result[0] or r[1] <> result[1]:

      ... print test, result

      ...

      shows that the results of rsplit() match the expected results.

      -tkc

      #1; Thu, 08 May 2008 01:12:00 GMT
    • Tim Chase wrote:

      > Norman Khine wrote:

      > line.rsplit(None, 1)

      > seems to do the trick for me:

      >

      provided the (xxx) etc doesn't contain whitespace.

      #2; Thu, 08 May 2008 01:13:00 GMT
    • Tim Chase wrote:

      > Norman Khine wrote:

      > line.rsplit(None, 1)

      > seems to do the trick for me:

      >

      > ... ("text (xxx)", ("text","(xxx)")),

      > ... ("text (text) (yyy)", ("text (text)","(yyy)")),

      > ... ("text (text) (text) (zzz)", ("text (text) (text)","(zzz)"))

      > ... ]

      > ... r = test.rsplit(None,1)

      > ... if r[0] <> result[0] or r[1] <> result[1]:

      > ... print test, result

      > ...

      > shows that the results of rsplit() match the expected results.

      > -tkc

      Of course, fixing the csv file takes a little more work. It sounds like the

      test lines given were just one of the fields, and there are

      the quotes to worry about.

      csvfile:

      "field1","text (xxx)","field3"

      "field1","text (text) (yyy)","field3"

      "field1","text (text) (text) (zzz)","field3"

      ........................

      import sys

      def fix(x):

      for line in open('csvfile'):

      fields = line.split(',')

      first, last = fields[x].rsplit(None, 1)

      fields[x] = first + '"'

      fields.insert(x + 1, '"' + last)

      sys.stdout.write(','.join(fields))

      fix(1)

      ........................

      "field1","text","(xxx)","field3"

      "field1","text (text)","(yyy)","field3"

      "field1","text (text) (text)","(zzz)","field3"

      But then this fails if there are commas in the

      data. I could split and join on '","' but then

      that fails when 'x' is either the first or last field.

      Are there tools in the csv module that make this

      easier?

      Tobiah

      Posted via a free Usenet account from http://www.teranews.com

      #3; Thu, 08 May 2008 01:14:00 GMT
    • tobiah wrote:

      > Of course, fixing the csv file takes a little more work. It sounds like t

      he

      > test lines given were just one of the fields, and there are

      > the quotes to worry about.

      >

      [snip]

      > But then this fails if there are commas in the

      > data. I could split and join on '","' but then

      > that fails when 'x' is either the first or last field.

      > Are there tools in the csv module that make this

      > easier?

      There are no "tools". The main (whole?) purpose of the csv module is to

      intelligently handle the embedded comma and embedded quote problems on

      both input and output. May I suggest that you read the documentation?

      #4; Thu, 08 May 2008 01:15:00 GMT
    • Norman Khine:

      > I have a csv file which is has a field that has something like:

      > "text (xxx)"

      > "text (text) (yyy)"

      > "text (text) (text) (zzz)"

      > I would like to split the last '(text)' out and put it in a new column,

      > so that I get:

      > "text","(xxx)"

      > "text (text)","(yyy)"

      > "text (text) (text)","(zzz)"

      Maybe something like this can be useful, after few improvements (RE

      formatting is a work in progress):

      from StringIO import StringIO

      import re

      datain = StringIO("""

      "text (xxx)"

      "text (text) (yy y) "

      "text (text) (text) ( zzz ) "

      """)

      lastone = re.compile("""

      \s* ( \(

      [^()"]*

      \)

      \s* "

      )

      \s* $

      """, re.VERBOSE)

      def repl(mobj):

      txt_found = mobj.groups()[0]

      return '", "' + txt_found

      for line in datain:

      line2 = line.strip()

      if line2:

      print lastone.sub(repl, line2)

      """

      The output is:

      "text", "(xxx)"

      "text (text)", "(yy y) "

      "text (text) (text)", "( zzz ) "

      """

      Bye,

      bearophile

      #5; Thu, 08 May 2008 01:16:00 GMT
    • At Tuesday 29/8/2006 20:31, tobiah wrote:

      >But then this fails if there are commas in the

      >data. I could split and join on '","' but then

      >that fails when 'x' is either the first or last field.

      >Are there tools in the csv module that make this

      >easier?

      Yes, just use the csv module and forget all that split&joins...

      Gabriel Genellina

      Softlab SRL

      ________________________________________

      __________

      Pregunt. Respond. Descubr.

      Todo lo que queras saber, y lo que ni imaginabas,

      est en Yahoo! Respuestas (Beta).

      Probalo ya!

      http://www.yahoo.com.ar/respuestas

      #6; Thu, 08 May 2008 01:17:00 GMT
    • John Machin wrote:

      > tobiah wrote:

      >

      > [snip]

      > There are no "tools". The main (whole?) purpose of the csv module is to

      > intelligently handle the embedded comma and embedded quote problems on

      > both input and output. May I suggest that you read the documentation?

      >

      If you read the entire thread, you may find that I am

      not deserving of a condescending reply.

      Posted via a free Usenet account from http://www.teranews.com

      #7; Thu, 08 May 2008 01:18:00 GMT
    • tobiah wrote:

      > John Machin wrote:

      > If you read the entire thread, you may find that I am

      > not deserving of a condescending reply.

      >

      I did read the whole thread. I have read it again. I find that given a

      question of the form "are there tools in the X module ...", a response

      suggesting that you read the documentation and failing re.search("F",

      response, re.I) is not unreasonable.

      "condescending" is your inference, not my implication.

      Cheers,

      John

      #8; Thu, 08 May 2008 01:19:00 GMT