### Tags: adventure, becausei, cant, compare, engine, function, itto, programming, python, similar, strings, text, xor

# Why cant I xor strings?

On Programmer » Python

84,921 words with 50 Comments; publish: Wed, 26 Dec 2007 23:50:00 GMT; (200203.13, « »)

I wrote a function to compare whether two strings are "similar" because

I'm using python to make a small text adventure engine and I want to it

to be responsive to slight mispellings, like "inevtory" and the like. To

save time the first thing my function does is check if I'm comparing an

empty vs. a non-empty string, because they are to be never considered

similar. Right now I have to write out the check like this:

if str1 and str2:

if not str1 or not str2:

return 0

Because python won't let me do str1 ^ str2. Why does this only work for

numbers? Just treat empty strings as 0 and nonempty as 1.

Regardless of whether this is the best implementation for detecting if

two strings are similar, I don't see why xor for strings shouldn't be

supported. Am I missing something? Inparticular, I think it'd be cool to

have "xor" as opposed to "^". The carrot would return the resulting

value, while "xor" would act like and/or do and return the one that was

true (if any).

*http://python.todaysummary.com/q_python_98618.html*

All Comments

Leave a comment...

- 50 Comments
- dataangel <k04jg02.python.todaysummary.com.kzoo.edu> writes:
> To save time the first thing my function does is check if

> I'm comparing an empty vs. a non-empty string, because they are to be

> never considered similar.

> Right now I have to write out the check like this:

> if str1 and str2:

> if not str1 or not str2:

> return 0

How about:

def nonempty(s):

return (len(s) > 0)

and then

if nonempty(str1) != nonempty(str2):

return 0

Of course there's other ways to write the same thing. Python 2.3 has

a "bool" function which when called on a string, returns true iff the

string is nonempty.

#1; Wed, 26 Dec 2007 23:51:00 GMT

- dataangel <k04jg02.python.todaysummary.com.kzoo.edu> writes:
- Hi DataAngel,
Welcome to the Python community!

From reading your post, it sounds like you are wanting to use XOR

encryption because there are other ways to quickly and easily compare

strings to see if they are equal.

To compare a string, use the following:

name = "Steven"

name2 = "Steven"

print name == name2

The result will be "True." However, if my initial guess is correct that

you are wanting to use XOR for strings, it probably because you are

wanting a quick way of encrypting data so that basic users won't be able

to view the information. If this is what you are wanting, I might have

something for you...

Let me know,

Byron

--

dataangel wrote:

> I wrote a function to compare whether two strings are "similar" because

> I'm using python to make a small text adventure engine and I want to it

> to be responsive to slight mispellings, like "inevtory" and the like. To

> save time the first thing my function does is check if I'm comparing an

> empty vs. a non-empty string, because they are to be never considered

> similar. Right now I have to write out the check like this:

> if str1 and str2:

> if not str1 or not str2:

> return 0

> Because python won't let me do str1 ^ str2. Why does this only work for

> numbers? Just treat empty strings as 0 and nonempty as 1.

> Regardless of whether this is the best implementation for detecting if

> two strings are similar, I don't see why xor for strings shouldn't be

> supported. Am I missing something? Inparticular, I think it'd be cool to

> have "xor" as opposed to "^". The carrot would return the resulting

> value, while "xor" would act like and/or do and return the one that was

> true (if any).

>

#2; Wed, 26 Dec 2007 23:52:00 GMT

- Hi DataAngel,
- Hi DataAngel,
Welcome to the Python community!

From reading your post, it sounds like you are wanting to use XOR for

"encryption." The reason why I say this is because there are many other

ways of comparing the content of two strings to see if they are exactly

the same. An example of how to do such a thing is listed below:

# How to compare two strings to see if they are exact matches.

name = "Steven"

name2 = "Steven"

print name == name2

The result will be "True."

However, the only reason that one might want to use XOR for a string is

if he or she wanted to use XOR based encryption in order to keep data

semi-private.

Let me know,

Byron

--

dataangel wrote:

> I wrote a function to compare whether two strings are "similar" because

> I'm using python to make a small text adventure engine and I want to it

> to be responsive to slight mispellings, like "inevtory" and the like. To

> save time the first thing my function does is check if I'm comparing an

> empty vs. a non-empty string, because they are to be never considered

> similar. Right now I have to write out the check like this:

> if str1 and str2:

> if not str1 or not str2:

> return 0

> Because python won't let me do str1 ^ str2. Why does this only work for

> numbers? Just treat empty strings as 0 and nonempty as 1.

> Regardless of whether this is the best implementation for detecting if

> two strings are similar, I don't see why xor for strings shouldn't be

> supported. Am I missing something? Inparticular, I think it'd be cool to

> have "xor" as opposed to "^". The carrot would return the resulting

> value, while "xor" would act like and/or do and return the one that was

> true (if any).

>

#3; Wed, 26 Dec 2007 23:53:00 GMT

- Hi DataAngel,
- On 2004-10-08, Byron <DesertLinux.python.todaysummary.com.netscape.net> wrote:
> Hi DataAngel,

> Welcome to the Python community!

> From reading your post, it sounds like you are wanting to use XOR

> encryption

No, he wants to do an exclusive or of the boolean values of the

strings.

> The result will be "True." However, if my initial guess is correct that

> you are wanting to use XOR for strings, it probably because you are

> wanting a quick way of encrypting data so that basic users won't be able

> to view the information.

No, he explained exactly what he was trying to do, and it had

nothing to do with encryption. He wants to know if exactly one

(1) of the strings is the empty string.

--

Grant Edwards grante Yow! I hope something GOOD

at came in the mail today so

visi.com I have a REASON to live!!

#4; Wed, 26 Dec 2007 23:54:00 GMT

- On 2004-10-08, Byron <DesertLinux.python.todaysummary.com.netscape.net> wrote:
- Grant Edwards wrote:
> No, he explained exactly what he was trying to do, and it had

> nothing to do with encryption. He wants to know if exactly one

> (1) of the strings is the empty string.

Hi Grant,

Opps, that's what happens when on skims through a post a light speed...

lol :-)

Byron

--

#5; Wed, 26 Dec 2007 23:55:00 GMT

- Grant Edwards wrote:
- Grant Edwards wrote:
> No, he explained exactly what he was trying to do, and it had

> nothing to do with encryption. He wants to know if exactly one

> (1) of the strings is the empty string.

Hi Grant,

Opps, that's what happens when one skims through a post a light speed...

lol :-)

Byron

--

#6; Wed, 26 Dec 2007 23:57:00 GMT

- Grant Edwards wrote:
- Grant Edwards wrote:
> No, he explained exactly what he was trying to do, and it had

> nothing to do with encryption. He wants to know if exactly one

> (1) of the strings is the empty string.

Hi Grant,

Opps, that's what happens when one skims through a post at light

speed... lol :-)

Byron

--

#7; Wed, 26 Dec 2007 23:58:00 GMT

- Grant Edwards wrote:
- Grant Edwards:
> No, he explained exactly what he was trying to do, and it had

> nothing to do with encryption. He wants to know if exactly one

> (1) of the strings is the empty string.

BTW, another way to get that is

if bool(str1) + bool(str2) == 1:

print "one and only one of them was empty"

Andrew

dalke.python.todaysummary.com.dalkescientific.com

#8; Wed, 26 Dec 2007 23:58:00 GMT

- Grant Edwards:
- Grant Edwards <grante <at> visi.com> writes:
> No, he wants to do an exclusive or of the boolean values of the

> strings.

I'm guessing the OP has already guessed this solution from the variety already

provided, but the direct translation of this statement would be:

bool(x) ^ bool(y)

for example:

>>> bool("") ^ bool("a")

True

>>> bool("") ^ bool("")

False

>>> bool("a") ^ bool("a")

False

>>> bool("a") ^ bool("")

True

Steve

#9; Thu, 27 Dec 2007 00:00:00 GMT

- Grant Edwards <grante <at> visi.com> writes:
- On 2004-10-09, Steven Bethard <steven.bethard.python.todaysummary.com.gmail.com> wrote:
> Grant Edwards <grante <at> visi.com> writes:

>> No, he wants to do an exclusive or of the boolean values of the

>> strings.

> I'm guessing the OP has already guessed this solution from the

> variety already provided, but the direct translation of this

> statement would be:

> bool(x) ^ bool(y)

:)

It dawned on me later that perhaps the jump from what I wrote

to the code you wrote wasn't as obvious as I thought.

--

Grant Edwards grante Yow!

at TAILFINS!!... click...

visi.com

#10; Thu, 27 Dec 2007 00:00:00 GMT

- On 2004-10-09, Steven Bethard <steven.bethard.python.todaysummary.com.gmail.com> wrote:
- On Fri, 08 Oct 2004 16:19:22 -0400, dataangel wrote:
> Regardless of whether this is the best implementation for detecting if

> two strings are similar, I don't see why xor for strings shouldn't be

> supported. Am I missing something?

The basic problem is that there is no obvious "xor on string" operation

*in general*, even if you stipulate "bitwise". In particular, what does it

mean to bitwise xor two different length strings?

"In general" is the key, here. You actually don't have strings,

conceptually, you have "tokens" or "commands" or something that happen to

be strings. I recommend creating a class to match this concept by

subclassing str... or even just write it directly without subclassing

string. You can then make __xor__(self, other) for that class do whatever

makes sense with your concept.

http://docs.python.org/ref/numeric-types.html

I can almost guarantee that you will later find other things to put into

that class. It may very well clean up a lot of code.

#11; Thu, 27 Dec 2007 00:02:00 GMT

- On Fri, 08 Oct 2004 16:19:22 -0400, dataangel wrote:
- On 2004-10-09, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
>> Regardless of whether this is the best implementation for

>> detecting if two strings are similar, I don't see why xor for

>> strings shouldn't be supported. Am I missing something?

> The basic problem is that there is no obvious "xor on string" operation

Sure there is. Strings have a boolean value, and the xor

operation on boolean values is well-defined.

--

Grant Edwards grante Yow! Are you mentally here

at at Pizza Hut??

visi.com

#12; Thu, 27 Dec 2007 00:03:00 GMT

- On 2004-10-09, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- Grant Edwards <grante.python.todaysummary.com.visi.com> writes:
> On 2004-10-09, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:

> >> Regardless of whether this is the best implementation for

> >> detecting if two strings are similar, I don't see why xor for

> >> strings shouldn't be supported. Am I missing something?

> > The basic problem is that there is no obvious "xor on string" operation

> Sure there is. Strings have a boolean value, and the xor

> operation on boolean values is well-defined.

That's an operation, but I'm not sure that's the obvious one. For my

part, if I saw "string1 ^ string2" I'd probably expect a byte by byte

xor with the result being a new string.

It doesn't feel natural to me to have my strings suddenly interpreted

as a new data type based on the operation at hand. Logical operators

work that way but not numerics (it would be in the same vein as string

+ number interpreting the string as a number - that way lies Perl :-))

-- David

#13; Thu, 27 Dec 2007 00:04:00 GMT

- Grant Edwards <grante.python.todaysummary.com.visi.com> writes:
- On 2004-10-09, David Bolen <db3l.python.todaysummary.com.fitlinxx.com> wrote:
>>> The basic problem is that there is no obvious "xor on string"

>>> operation

>>

>> Sure there is. Strings have a boolean value, and the xor

>> operation on boolean values is well-defined.

> That's an operation, but I'm not sure that's the obvious one.

> For my part, if I saw "string1 ^ string2" I'd probably expect

> a byte by byte xor with the result being a new string.

Only because Python lacks a logical xor operator, so you're

used to thinking of ^ as a bitwise operator. What if you saw

string1 xor string2?

Wouldn't you expect it to be equivalent to

(string1 and (not string2)) or ((not string1) and string2)

> It doesn't feel natural to me to have my strings suddenly

> interpreted as a new data type based on the operation at hand.

> Logical operators work that way but not numerics

I don't know what you mean by that. Nobody seems to have a

problem with "and" "or" and "not" operators using the truth

values of strings. What is there about "xor" that precludes it

from behaving similarly?

> (it would be in the same vein as string + number interpreting

> the string as a number - that way lies Perl :-))

I don't see that at all.

--

Grant Edwards grante Yow! Let's go to CHURCH!

at

visi.com

#14; Thu, 27 Dec 2007 00:05:00 GMT

- On 2004-10-09, David Bolen <db3l.python.todaysummary.com.fitlinxx.com> wrote:
- David Bolen wrote:
> Grant Edwards <grante.python.todaysummary.com.visi.com> writes:

>>On 2004-10-09, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:

>>>

>>>The basic problem is that there is no obvious "xor on string" operation

>>

>>Sure there is. Strings have a boolean value, and the xor

>>operation on boolean values is well-defined.

> That's an operation, but I'm not sure that's the obvious one. For my

> part, if I saw "string1 ^ string2" I'd probably expect a byte by byte

> xor with the result being a new string.

... but you'd get a traceback. ;) As pointed out earlier in this

thread, what works is "bool(string1) ^ bool(string2)", which

certainly doesn't violate the law of least astonishment.

Why would anything else be needed?

#15; Thu, 27 Dec 2007 00:06:00 GMT

- David Bolen wrote:
- >>>>> "Grant" == Grant Edwards <grante.python.todaysummary.com.visi.com> writes:
Grant> I don't know what you mean by that. Nobody seems to have a

Grant> problem with "and" "or" and "not" operators using the truth

Grant> values of strings. What is there about "xor" that

Grant> precludes it from behaving similarly?

It's just that logical xor is an extremely rare beast. I would

probably prefer to see the operation expanded to the more typical

and-or-nots in real code.

--

Ville Vainio http://tinyurl.com/2prnb

#16; Thu, 27 Dec 2007 00:07:00 GMT

- >>>>> "Grant" == Grant Edwards <grante.python.todaysummary.com.visi.com> writes:
- Ville Vainio <ville.python.todaysummary.com.spammers.com> writes:
> It's just that logical xor is an extremely rare beast. I would

> probably prefer to see the operation expanded to the more typical

> and-or-nots in real code.

I think != is appropriate in this situation.

if bool(x) != bool(y): ...

#17; Thu, 27 Dec 2007 00:08:00 GMT

- Ville Vainio <ville.python.todaysummary.com.spammers.com> writes:
- Ville Vainio wrote:
> Grant> I don't know what you mean by that. Nobody seems to have a

> Grant> problem with "and" "or" and "not" operators using the truth

> Grant> values of strings. What is there about "xor" that

> Grant> precludes it from behaving similarly?

> It's just that logical xor is an extremely rare beast. I would

> probably prefer to see the operation expanded to the more typical

> and-or-nots in real code.

"imp" and "eqv", anyone?

</F

#18; Thu, 27 Dec 2007 00:09:00 GMT

- Ville Vainio wrote:
- Grant Edwards wrote:
> What if you saw

> string1 xor string2?

> Wouldn't you expect it to be equivalent to

> (string1 and (not string2)) or ((not string1) and string2)

I would expect it to give a syntax error.

If not, and 'xor' did become a new boolean operator

in Python I would expect it to act more like

xor_f(x, y)

where 'xor_f' the function is defined as

def xor_f(x, y):

x = bool(x)

y = bool(y)

return (x and not y) or (not x and y)

Why the distinction? In your code you call bool on

an object at least once and perhaps twice. The

truth of an object should only be checked once. You

also have asymmetric return values. Consider

s1 s2 s1 xor s2

"A" "B" False

"A" "" True

"" "B" "B"

"" "" False

Esthetics suggest that either "A"/"" return "A" or that

""/"B" return True. Mine does the latter. Yours does

neither. Probably the Pythonic way, assuming 'xor'

can be considered Pythonic, is to return the object

which gave the True value, like this

def xor_f(x, y):

bx = bool(x)

by = bool(y)

if bx:

if not by:

return bx

return False

else:

if by:

return by

return False

In any case, 'xor' the binary operator is rarely

needed and as you've shown can be interpreted in

a couple different ways. Each alone weighs against

it. Both together make it almost certainly a bad

idea for Python.

Andrew

dalke.python.todaysummary.com.dalkescientific.com

#19; Thu, 27 Dec 2007 00:10:00 GMT

- Grant Edwards wrote:
- Paul Rubin wrote:
> I think != is appropriate in this situation.

> if bool(x) != bool(y): ...

while I prefer my already mentioned

if bool(x) + bool(y) == 1:

print "One and only one given"

That's because the pure logic ones, both != and

xor, confuse me while arithmetic doesn't.

It's also extensible to, say, "all three

parameters are either empty strings or non-empty

strings".

if 0 < bool(x) + bool(y) + bool(z) < 3:

print "that's not allowed"

BTW, I've used an idiom like this with some

success

def blah(filename = None, infile = None, url = None):

if (bool(filename is None) + bool(infile is None) +

bool(url is None)) != 2:

raise TypeError("Must specify one and only one of "

"'filename', 'infile', or 'url'")

if filename is not None:

infile = open(filename)

elif url is not None:

infile = urllib.urlopen(url)

...

Try that with an xor.

Andrew

dalke.python.todaysummary.com.dalkescientific.com

#20; Thu, 27 Dec 2007 00:11:00 GMT

- Paul Rubin wrote:
- On 2004-10-10, Ville Vainio <ville.python.todaysummary.com.spammers.com> wrote:
>>>>>> "Grant" == Grant Edwards <grante.python.todaysummary.com.visi.com> writes:

> Grant> I don't know what you mean by that. Nobody seems to have a

> Grant> problem with "and" "or" and "not" operators using the truth

> Grant> values of strings. What is there about "xor" that

> Grant> precludes it from behaving similarly?

> It's just that logical xor is an extremely rare beast.

Probably so, but that doesn't support the arguement that

there's something wrong with a logical xor argument coercing

it's operands to boolean values unless one also argues that

the logical "and", "or" and "not" operators should also not

coerce their operands to booleans.

> I would probably prefer to see the operation expanded to the

> more typical and-or-nots in real code.

If logical operators shouldn't coerce operands, then what you

should see is:

if (bool(string1) and (not bool(string2)) or ((not bool(string1)) and bool(string2))

--

Grant Edwards grante Yow! A shapely CATHOLIC

at SCHOOLGIRL is FIDGETING

visi.com inside my costume...

#21; Thu, 27 Dec 2007 00:12:00 GMT

- On 2004-10-10, Ville Vainio <ville.python.todaysummary.com.spammers.com> wrote:
- On 2004-10-10, Andrew Dalke <adalke.python.todaysummary.com.mindspring.com> wrote:
> Why the distinction? In your code you call bool on

> an object at least once and perhaps twice. The

> truth of an object should only be checked once. You

> also have asymmetric return values. Consider

> s1 s2 s1 xor s2

> "A" "B" False

> "A" "" True

> "" "B" "B"

> "" "" False

Hey, _that_ particular bug isn't my fault. Somebody else

decided that "and" and "or" don't return boolean values like

they should. ;)

> Esthetics suggest that either "A"/"" return "A" or that

> ""/"B" return True. Mine does the latter. Yours does

> neither. Probably the Pythonic way, assuming 'xor'

> can be considered Pythonic, is to return the object

> which gave the True value, like this

> def xor_f(x, y):

> bx = bool(x)

> by = bool(y)

> if bx:

> if not by:

> return bx

> return False

> else:

> if by:

> return by

> return False

That's ugly. But then again, "A" or "B" returning "A" is ugly.

While I agree with your points, they're immaterial to the

argument I was making. The poster to which I responded was

arguing that "xor" didn't make sense because having it coerce

it's arguments to booleans was "wrong".

--

Grant Edwards grante Yow! Zippy's brain cells

at are straining to bridge

visi.com synapses...

#22; Thu, 27 Dec 2007 00:13:00 GMT

- On 2004-10-10, Andrew Dalke <adalke.python.todaysummary.com.mindspring.com> wrote:
- On Sun, 10 Oct 2004 15:30:32 +0000, Grant Edwards wrote:
> While I agree with your points, they're immaterial to the

> argument I was making. The poster to which I responded was

> arguing that "xor" didn't make sense because having it coerce

> it's arguments to booleans was "wrong".

I didn't say "wrong", I said "non-obvious".

And, with respect, the fact that you have to argue your point is evidence

in my favor. Of course "proof" would require a comprehensive survey, and I

think we can all agree this point isn't worth such effort :-)

But I do think that a bitwise operator should silently transform to a

logical operator for strings is not obvious.

What is 388488839405842L ^ "hello"?

Python says that's an "unsupported operand type(s) for ^: 'long' and

'str'". Why is it wrong?

This would also work if Python were more like C++ and we could define

xor(string, string)

xor(int, int)

and be done with it, but in Python, there should be an obvious meaning for

int ^ string, and there isn't.

It is also true that I recommended the OP consider subclassing string to

make ^ do what he wants. But it seems to be reasonably well expected that

while user classes can do what they like with operators (as long as they

are willing to pay the sometimes-subtle prices, especially the ones

involved with poorly defining __cmp__), that the core language should not

do such things.

#23; Thu, 27 Dec 2007 00:14:00 GMT

- On Sun, 10 Oct 2004 15:30:32 +0000, Grant Edwards wrote:
- On 2004-10-10, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
> But I do think that a bitwise operator should silently

> transform to a logical operator for strings is not obvious.

> What is 388488839405842L ^ "hello"?

No, I wouldn't want that either. What I was talking about was

the possibility of an "xor" logical operator that would behave

in a manner similar to the other logical operators in that it

coerces it's parameters to boolean values.

--

Grant Edwards grante Yow! I am covered with

at pure vegetable oil and I am

visi.com writing a best seller!

#24; Thu, 27 Dec 2007 00:15:00 GMT

- On 2004-10-10, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- On Sun, 10 Oct 2004 19:25:16 GMT, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
>On Sun, 10 Oct 2004 15:30:32 +0000, Grant Edwards wrote:

>> While I agree with your points, they're immaterial to the

>> argument I was making. The poster to which I responded was

>> arguing that "xor" didn't make sense because having it coerce

>> it's arguments to booleans was "wrong".

>I didn't say "wrong", I said "non-obvious".

>And, with respect, the fact that you have to argue your point is evidence

>in my favor. Of course "proof" would require a comprehensive survey, and I

>think we can all agree this point isn't worth such effort :-)

>But I do think that a bitwise operator should silently transform to a

>logical operator for strings is not obvious.

>What is 388488839405842L ^ "hello"?

>Python says that's an "unsupported operand type(s) for ^: 'long' and

>'str'". Why is it wrong?

I agree it is not wrong. The user is reminded that s/he should be explicit

and create a type that behaves as desired. However (;-) ...

What's right about accepting 3^7 ? Why should xor be defined for integers?

IMO we have implicit subtyping of integers as vectors or column matrices

of bools and operations element by element and implicit re-presentation

of the result as integer.

This dual interpretation of course reflects CPU hardware functionality, and

I would argue that familiarity (not to mention the expediency of conciseness)

has bred acceptance of operator notation without explicit cruft such as

int(boolvec(3)^boolvec(7)) instead of 3^7. The trouble is that a boolvec

should have a length, and boolvec(3) really hides implicit determination of length

(by dropping sign bits of unsigned value or extending signed integer sign bits infinitely),

and the ^ operation between boolvecs of different length hides implicit normalization to

the length of the longer (or infinity with compressed representation)). Again,

CPU hardware legacy comes into play, with the hidden imposition of length=32, sometimes

64 now, and we are forced (happily IMO) into defining what we mean in terms of

physical-representation-independent abstractions.

So the question becomes

What is boolvec(388488839405842L) ^ boolvec("hello")?

and boolvec("hello") is the more ambiguous one. If we wrote

boolvec("hello", as_bytes=True)

I think most would have an idea of what it meant -- until they

started to think about endianness ;-) I.e., what is the value of the following?

(boolvec("hello", asbytes=True) & boolvec(0xffff)).as_string() #note '&' for simpler example

should it be

"he\x00\x00\x00"

or do you see it as big-endian

"\x00\x00\x00lo"

and should "sign" bits be dropped, so the result would be "he" or "lo" ?

Or -- should boolvec("hello") default to boolvec(bool("hello"), length=1) ?

>This would also work if Python were more like C++ and we could define

>xor(string, string)

>xor(int, int)

>and be done with it, but in Python, there should be an obvious meaning for

>int ^ string, and there isn't.

Notice that no one complains about intgr^intgr

not being defined as int(bool(intgr)^bool(intgr)) ?;-)

>It is also true that I recommended the OP consider subclassing string to

>make ^ do what he wants. But it seems to be reasonably well expected that

>while user classes can do what they like with operators (as long as they

>are willing to pay the sometimes-subtle prices, especially the ones

>involved with poorly defining __cmp__), that the core language should not

>do such things.

Except where there is an accepted legacy of such things being done already ;-)

BTW, what is the rationale behind this:

>>> ['',(),[],0, 0.0, 0L].count(0)

3

>>> ['',(),[],0, 0.0, 0L].count(())

1

>>> ['',(),[],0, 0.0, 0L].count([])

1

>>> ['',(),[],0, 0.0, 0L].count('')

1

>>> ['',(),[],0, 0.0, 0L].count(0.0)

3

>>> ['',(),[],0, 0.0, 0L].count(0L)

3

BTW2, just had the thought that ',' could be generalized as an operator. E.g.,

obj, x

would mean

type(obj).__dict__['__comma__'](obj, x)

unless __comma__ was undefined. And you could have __rcomma__ for x, obj.

Returning the same type object would support multiple commas as in obj, x, y

Of course, you would get interesting effects in such contexts as print obj,x,y

and foo(obj,x,y) or even (obj,x,y) vs (obj,x,y,) vs ((obj,x,y),) ;-)

Probably sequence-building should have priority and be overridden by parenthesized

expression. So print obj,x would be effectively be print str(obj),x and

print (obj,x,y) would be print str(type(obj).__dict__['__comma__'](obj,x)).

Similarly for arg list formation.

Regards,

Bengt Richter

#25; Thu, 27 Dec 2007 00:16:00 GMT

- On Sun, 10 Oct 2004 19:25:16 GMT, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- Grant Edwards <grante.python.todaysummary.com.visi.com> writes:
> Only because Python lacks a logical xor operator, so you're

> used to thinking of ^ as a bitwise operator. What if you saw

> string1 xor string2?

> Wouldn't you expect it to be equivalent to

> (string1 and (not string2)) or ((not string1) and string2)

Yes, no problem. I was definitely working with the bitwise operator

which is what I thought the OP was originally desiring.

> > It doesn't feel natural to me to have my strings suddenly

> > interpreted as a new data type based on the operation at hand.

> > Logical operators work that way but not numerics

> I don't know what you mean by that. Nobody seems to have a

> problem with "and" "or" and "not" operators using the truth

> values of strings. What is there about "xor" that precludes it

> from behaving similarly?

I think we're just talking about different items. I was referring to

the bitwise (^) xor operator (numeric), but not to a logical xor

operator. I'd have no problem with a separate logical operator that

behaved like the other logical operators.

The fact that saying "xor" can imply either isn't helping :-)

-- David

#26; Thu, 27 Dec 2007 00:17:00 GMT

- Grant Edwards <grante.python.todaysummary.com.visi.com> writes:
- Grant Edwards <grante.python.todaysummary.com.visi.com> writes:
> While I agree with your points, they're immaterial to the

> argument I was making. The poster to which I responded was

> arguing that "xor" didn't make sense because having it coerce

> it's arguments to booleans was "wrong".

Well, actually I said:

"Logical operators work that way but not numerics ..."

so I'm not arguing against a logical "xor" working that way, but not

the existing xor (^) operation in Python, which is bitwise.

-- David

#27; Thu, 27 Dec 2007 00:18:00 GMT

- Grant Edwards <grante.python.todaysummary.com.visi.com> writes:
- On Sun, 10 Oct 2004 22:03:08 +0000, Bengt Richter wrote:
> What's right about accepting 3^7 ? Why should xor be defined for integers?

> IMO we have implicit subtyping of integers as vectors or column matrices

> of bools and operations element by element and implicit re-presentation

> of the result as integer.

I'm intrigued but torn by your arguments.

For positive numbers, a number really is its vector of bits. In the

mathematical sense of "equal" (which I usually express in English as "two

equal things are fully substitutable with each other in all relevant

contexts"), where all base numbers are written in base 10:

10 base 10 = 11 base 9 = 22 base 4 = 1010 base 2

The fact that it happens to really be a bit vector in the computer in this

case actually doesn't count for anything, because just as that bit vector

*is*, in every conceivable mathematical way, a base 10 number, it is also

a base 3 number. I can and have defined ternary math before, and

experimental computers have been built on ternary bases and ternary logic,

and 44 in a binary computer is 44 in a ternary computer.

Only our choice of negative numbers, practically speaking, makes a

difference between the number-as-bitstring and number-qua-number, and

something as high level as Python can conceptually use an extra "negative"

bit so even that distinction goes away. In fact, playing with xor'ing

longs makes me wonder if Python doesn't *already* do that.

> What is boolvec(388488839405842L) ^ boolvec("hello")?

I'd rather see this as the fairly meaningless "388488839405842L base 2",

meaningless because the base of a number does not affect its value, and

bitwise-xor, as the name implies, operates on the number base two.

Since I reject the need to cast ^ in terms of boolvec, I don't feel

compelled to try to define "boolvec" for strings. Cast it into a number

explicitly, refusing the temptation to guess.

> Except where there is an accepted legacy of such things being done already ;-)

> BTW, what is the rationale behind this:

> >>> ['',(),[],0, 0.0, 0L].count(0)

> 3

> >>> ['',(),[],0, 0.0, 0L].count(())

> 1

> >>> ['',(),[],0, 0.0, 0L].count([])

> 1

> >>> ['',(),[],0, 0.0, 0L].count('')

> 1

> >>> ['',(),[],0, 0.0, 0L].count(0.0)

> 3

> >>> ['',(),[],0, 0.0, 0L].count(0L)

> 3

Again, considered abstractly as numbers, 0 is zero no matter how you slice

it. Abstractly, even floats have a binary representation, just as they

have a decimal representation, and we could even xor them. Realistically,

it is much less useful and there is the infinite-length problem of floats

to deal with.

Theoretically, no float should ever equal an int or a long, since an Int

or Long conceptually identifies exactly one point and a float should be

seen as representing a range of numbers depending on the precision that

can be made arbitrarily small but never truly identifies one point. This

is another one of those "practicality beats purity" things.

#28; Thu, 27 Dec 2007 00:19:00 GMT

- On Sun, 10 Oct 2004 22:03:08 +0000, Bengt Richter wrote:
- On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
>On Sun, 10 Oct 2004 22:03:08 +0000, Bengt Richter wrote:

>> What's right about accepting 3^7 ? Why should xor be defined for integers?

>> IMO we have implicit subtyping of integers as vectors or column matrices

>> of bools and operations element by element and implicit re-presentation

>> of the result as integer.

>I'm intrigued but torn by your arguments.

>For positive numbers, a number really is its vector of bits. In the

^^^^^[1] [2]^^ ^^^[3] ^^^^[4]

[1] Abstract, non-negative integers?

[2] "is" as in "is identical with"? I doubt it.

[3] In what sense does an abstract number posess (its) a vector of bits?

I take it 'its' is in the sense of a 1:1 mapping to the corresponding

(unique) vector of "bits".

[4] A two's complement representation of an (abstract) integer, has "bits"

as _numerical_ values in the sum series with corresponding numerical

coefficients that are powers of two. But "bits" in the context of

bitwise ^ or & or | do not represent abstract _numerical_ values or 0 or 1,

they represent logical values False or True (and 0<->False, 1<->True is

just a convention that we're used to.

>mathematical sense of "equal" (which I usually express in English as "two

>equal things are fully substitutable with each other in all relevant

>contexts"), where all base numbers are written in base 10:

>10 base 10 = 11 base 9 = 22 base 4 = 1010 base 2

>The fact that it happens to really be a bit vector in the computer in this

[1]^^ [2]^^^^^^^^^ ^^^[3]

[1] The abstract integer value?

[2] have 1:1 mapping to?

[3] bit is short for binary _digit_, not binary logic-value -- i.e., it's numeric,

and ties in with your base 2 representation.

>case actually doesn't count for anything, because just as that bit vector

>*is*, in every conceivable mathematical way, a base 10 number, it is also

1:1 correspondence is not identity. IOW, there's no such thing as "a base 10 number"

There is a set of abstract integers, and each can be put into 1:1 correpondence with

a set of symbols and rules. A "base 10 number" is composed with a set of digits [0-9]

and a rule for ordered composition and a rule for an interpretation to map back to

the abstract value. We can discuss each part of this process, but the bottom line is

that when we say that represented values are equal, we are saying that representations

map back to the same unique abstract value, not that representations per se are identical.

>a base 3 number. I can and have defined ternary math before, and

>experimental computers have been built on ternary bases and ternary logic,

>and 44 in a binary computer is 44 in a ternary computer.

Sure, but again we are talking unique abstract integers vs various representations

and the rules for composing them and interpreting them. BTW, did you ever hear

of bi-quinary? That was a 4-bit representation of a decimal digit where the msb

had a "coefficient" of 5 and the 3 lsb's counted in binary modulo 5 ;-)

>Only our choice of negative numbers, practically speaking, makes a

>difference between the number-as-bitstring and number-qua-number, and

>something as high level as Python can conceptually use an extra "negative"

>bit so even that distinction goes away. In fact, playing with xor'ing

>longs makes me wonder if Python doesn't *already* do that.

Yes, it plays the as-if-extended-infinitely-to-the-left game with sign bits,

but the hex representation of negative longs is not much good for seeing what

the bits are in a negative number >;-(

>> What is boolvec(388488839405842L) ^ boolvec("hello")?

>I'd rather see this as the fairly meaningless "388488839405842L base 2",

>meaningless because the base of a number does not affect its value, and

>bitwise-xor, as the name implies, operates on the number base two.

Well, to be nit-picky, I still think it does not really operate on a number

at all. It converts a number to an ordered set of bools, by way of an intermediate

two's complement _representation_ whose _numeric_ bit values are mapped to bool values.

Then, after the operating with the ordered sets of bools, the resulting set is mapped

back to a two's complement integer representation again, which can be interpreted as

an abstract integer value again ;-)

>Since I reject the need to cast ^ in terms of boolvec, I don't feel

>compelled to try to define "boolvec" for strings. Cast it into a number

>explicitly, refusing the temptation to guess.

>> Except where there is an accepted legacy of such things being done already ;-)

>>

>> BTW, what is the rationale behind this:

>>

>> >>> ['',(),[],0, 0.0, 0L].count(0)

>> 3

>> >>> ['',(),[],0, 0.0, 0L].count(())

>> 1

>> >>> ['',(),[],0, 0.0, 0L].count([])

>> 1

>> >>> ['',(),[],0, 0.0, 0L].count('')

>> 1

>> >>> ['',(),[],0, 0.0, 0L].count(0.0)

>> 3

>> >>> ['',(),[],0, 0.0, 0L].count(0L)

>> 3

I guess this shows what's happening

>>> class joker(object):

... def __cmp__(self, other): return 0

...

>>> ['',(),[],0, 0.0, 0L, joker()].count('')

2

>>> ['',(),[],0, 0.0, 0L, joker()].count(())

2

>>> ['',(),[],0, 0.0, 0L, joker()].count(0)

4

>>> ['',(),[],0, 0.0, 0L, joker()].count(0.0)

4

>>> ['',(),[],0, 0.0, 0L, joker()].count(0L)

4

>>> ['',(),[],0, 0.0, 0L, joker()].count(joker())

7

>Again, considered abstractly as numbers, 0 is zero no matter how you slice

>it. Abstractly, even floats have a binary representation, just as they

I think of "float" as indicating representation technique, as opposed to "real"

which I think of as a point in the continuous abstract -+infinity interval

of real numbers.

>have a decimal representation, and we could even xor them. Realistically,

>it is much less useful and there is the infinite-length problem of floats

>to deal with.

Infinite-length problem...must...not...go...there... ;-)

>Theoretically, no float should ever equal an int or a long, since an Int

>or Long conceptually identifies exactly one point and a float should be

>seen as representing a range of numbers depending on the precision that

IMO a float should be seen as what it is (a _representation_), which may be

interpreted variously, just like integers ;-)

IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64

possible states of the representation machinery. Not all of

those states are legal floating point number representations, but each one

that _is_ legal corresponds to an _exact_ abstract real value. It _can_ be

the exact value you wanted to represent, or not, depending on your programming

problem. Inexactness in not a property of floating point representations, it is

a property of their _relation_ to the exact abstract values the programmer is trying

to represent. 0.0 represents the exact same abstract value as integer 0

(perhaps by way of mapping integers conceived as separate to a subset of reals).

>can be made arbitrarily small but never truly identifies one point. This

I disagree. 0.0 truly identifies one point. So does

3.141592653589793115997963468544185161590576171875

which is the _exact_ value of math.pi in decimal representation.

But this is _not_ the exact value of the abstract mathematical pi.

But that is not a problem with floating point numbers' not representing

exact values. It's that the set of available exact values is finite, and

you have to choose one to approximate (including right on sometimes) an abstract value.

(Or you might choose two to represent _exact_ bounds on a value, for interval math).

>is another one of those "practicality beats purity" things.

I think the fact that repr(math.pi) is not printed as above is

a case of "practicality beats purity" ;-)

>>> import math

>>> repr(math.pi)

'3.1415926535897931'

>>> float(repr(math.pi))

3.1415926535897931

>>> float(repr(math.pi)) == float(3.141592653589793115997963468544185161590576 171875)

True

IMO floats have their own kind of purity, besides being practical ;-)

Regards,

Bengt Richter

#29; Thu, 27 Dec 2007 00:20:00 GMT

- On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- On Mon, 11 Oct 2004 02:19:03 +0000, Bengt Richter wrote:
> On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:

>>For positive numbers, a number really is its vector of bits. In the

> ^^^^^[1] [2]^^ ^^^[3] ^^^^[4]

> [1] Abstract, non-negative integers?

Integers is more convenient here, but any one dimensional number will do.

It so happens that computers do not encode "reals" in the naive way, to

allow for more variation in magnitude.

> [2] "is" as in "is identical with"? I doubt it.

With respect, this isn't something to doubt or not doubt. There is one,

and only one, way to represent any positive number in base two, since

encoding sign is not an issue. Assuming an extra bit to show sign, there

is one and only one way to represent any negative number, too. (Zero gets

to be the exception since then you can have positive and negative zero,

which are theoretically identical, but on some machines that implement

them have practical differences.)

Endianness is just some wackiness in the ordering, but does not really

change that. 60000 + 50000 = 110000 no matter what endianess you have.

> [3] In what sense does an abstract number posess (its) a vector of bits?

It does not. It *is* a series of ones and zeros, in base two. It *is* a

series of ones, zeros, and twos, in base three. It *is* a series of [0-9]

in base ten. It is all of these things at once, and an infinite number of

other things besides.

1 + 1 fully *is* 2. There is no difference mathematically. If there is,

your math is broken.

> [4] A two's complement representation of an (abstract) integer, has

> "bits"

> as _numerical_ values in the sum series with corresponding numerical

> coefficients that are powers of two. But "bits" in the context of

> bitwise ^ or & or | do not represent abstract _numerical_ values or

> 0 or 1, they represent logical values False or True (and 0<->False,

> 1<->True is just a convention that we're used to.

No. They represent both of those things at once. 1 *is* True, and 0 *is*

false, in the domain of bitwise operators.

There is no "either/or" here, only "both/and".

> IOW, there's no such thing as "a base 10 number"... A "base 10 number" is composed with a set of

> digits [0-9] and a rule for ordered composition and a rule for an

> interpretation to map back to the abstract value.

Well, I thank you for supplying the missing definition, I suppose.

> We can discuss each

> part of this process, but the bottom line is that when we say that

> represented values are equal, we are saying that representations map

> back to the same unique abstract value, not that representations per se

> are identical.

Yep, that is part of my point, see below.

>>> What is boolvec(388488839405842L) ^ boolvec("hello")?

>>

>>I'd rather see this as the fairly meaningless "388488839405842L base 2",

>>meaningless because the base of a number does not affect its value, and

>>bitwise-xor, as the name implies, operates on the number base two.

> Well, to be nit-picky, I still think it does not really operate on a

> number at all. It converts a number to an ordered set of bools, by way

> of an intermediate two's complement _representation_ whose _numeric_ bit

> values are mapped to bool values. Then, after the operating with the

> ordered sets of bools, the resulting set is mapped back to a two's

> complement integer representation again, which can be interpreted as an

> abstract integer value again ;-)

In the domain of binary logic, I do not concede a difference between "1"

and "True", on the grounds that there is no way in the domain of binary

logic to distinguish the two.

> IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64

> possible states of the representation machinery. Not all of those states

> are legal floating point number representations, but each one that _is_

> legal corresponds to an _exact_ abstract real value. It _can_ be the

> exact value you wanted to represent, or not, depending on your

> programming problem. Inexactness in not a property of floating point

> representations, it is a property of their _relation_ to the exact

> abstract values the programmer is trying to represent. 0.0 represents

> the exact same abstract value as integer 0 (perhaps by way of mapping

> integers conceived as separate to a subset of reals).

This is a matter of definition. My problem with your definition mostly

lies in the fact that it is not as practically useful as mine. Someone

operating on the "floats are almost, but not quite, exact values" is

*much* less well equipped to handle the way floats bite you than someone

who operates with "floats represent a value and have an error range around

that value", because then they should understand they need to compute not

just with the values, but track the buildup of that error range as well.

It's a point-of-view thing.

>>can be made arbitrarily small but never truly identifies one point. This

> I disagree. 0.0 truly identifies one point. So does

> 3.141592653589793115997963468544185161590576171875

> which is the _exact_ value of math.pi in decimal representation. But

> this is _not_ the exact value of the abstract mathematical pi.

Python 2.3.4 (#1, Sep 28 2004, 20:15:25)

[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> 3.141592653589793115997963468544185161590576171875 8327502398752309 == 3.141592653589793115997963468544185161590576171875 00000000000

True

Look at the last digits. (Python is just a convenience here, this is

fundamental to all float representation schemes, as opposed to things like

Decimal.)

This is why I say they represent a range. In set theory, we'd call this an

equivalence class for equality. There is *no* way to distinguish between

3.141592653589793115997963468544185161590576171875 8327502398752309 and

3.141592653589793115997963468544185161590576171875 00000000000 with floats

(of the size Python uses, add more digits for some other bit count), not

even in theory. They are the same in every way. That's because the closest

float represents a range that covers them both. You can shrink that range

arbitrarily small but you can not get to a unique point, such that only

one true real will compare equal to that number and all others do not, not

even in theory.

#30; Thu, 27 Dec 2007 00:21:00 GMT

- On Mon, 11 Oct 2004 02:19:03 +0000, Bengt Richter wrote:
- On 2004-10-11, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
> With respect, this isn't something to doubt or not doubt.

> There is one, and only one, way to represent any positive

> number in base two, since encoding sign is not an issue.

> Assuming an extra bit to show sign, there is one and only one

> way to represent any negative number, too.

That's news to me. I've used three different base-2

representations for negative numbers in the past week, and I

can think of at least one other one I've used in the past.

> (Zero gets to be the exception since then you can have

> positive and negative zero,

That depends on which base-2 representation you've chosen. In

two's compiliment and excess-N representations, there is only

one zero value. In signed-magnitude there may be two.

--

Grant Edwards grante Yow! I've been WRITING

at to SOPHIA LOREN every 45

visi.com MINUTES since JANUARY 1ST!!

#31; Thu, 27 Dec 2007 00:22:00 GMT

- On 2004-10-11, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- Andrew Dalke <adalke.python.todaysummary.com.mindspring.com> wrote:
> Grant Edwards:

> > No, he explained exactly what he was trying to do, and it had

> > nothing to do with encryption. He wants to know if exactly one

> > (1) of the strings is the empty string.

> BTW, another way to get that is

> if bool(str1) + bool(str2) == 1:

> print "one and only one of them was empty"

If this is getting into a many-ways-to-skin-the-cat context, I want to

make sure I get my entry in:

[str1, str2].count('') == 1

Other possibilities (I think -- untested, and it's late here):

len([x for x in (str1, str2) if x]) == 1

'' == min(str1, str2) != max(str1, str2)

len(str1+str2) == max(len(str1),len(str2)) != 0

sum(map(bool, (str1, str2))) == 1

not str1 ^ not str2

(str1 or str2) and not min(str1, str2)

not str1 != not str2

str1 != str2 and (str1+str2) == (str2+str1)

I guess I'd better stop here because I can't actually _prove_ the last

one of these actually implies one of the strings is empty but I can't

find counterexamples either...!-)

Alex

#32; Thu, 27 Dec 2007 00:23:00 GMT

- Andrew Dalke <adalke.python.todaysummary.com.mindspring.com> wrote:
- Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
> On Fri, 08 Oct 2004 16:19:22 -0400, dataangel wrote:

> > Regardless of whether this is the best implementation for detecting if

> > two strings are similar, I don't see why xor for strings shouldn't be

> > supported. Am I missing something?

> The basic problem is that there is no obvious "xor on string" operation

> *in general*, even if you stipulate "bitwise". In particular, what does it

> mean to bitwise xor two different length strings?

What I, personally, would expect here, would be to leave the trailing

part of the longer string "untouched". I.e., just like:

def bitwisexor(s1, s2):

result = [ chr(ord(a) ^ ord(b)) for a, b in zip(s1, s2) ]

# at most one of the following 2 stmts actually does anything:

result.extend(s1[len(result):])

result.extend(s2[len(result):])

return ''.join(result)

I realize this is the same as imagining the shorter string to be

conceptually padded with as many '\x0' characters as needed -- that just

seems the only "intuitively natural" way to extend "sequence xor'ing" to

sequences of different length...

Alex

#33; Thu, 27 Dec 2007 00:24:00 GMT

- Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:
...

> used to thinking of ^ as a bitwise operator. What if you saw

> string1 xor string2?

> Wouldn't you expect it to be equivalent to

> (string1 and (not string2)) or ((not string1) and string2)

Personally, I would not expect 'xor' to be non-commutative, and yet the

expression YOU apparently would expect it to be equivalent to obviousy

is...:

'foo' xor ''

would be True, while

'' xor 'foo'

would be 'foo'. At least, that's how the "equivalent to" expression you

give evaluates, I believe. The problem is that "not x" is a bool for

any x, while "x and y" isn't.

If I had to design the least-astonishing definition of xor in Python, I

would probably have to settle for a mix of the type-producing behavior

of and/or (always return one of the two operands, a wonderful thing) vs

not (always return a bool, and it's hard to imagine it doing otherwise).

"a xor b" would return one of the two operands if exactly one of them

evaluates to true in a boolean context, but False if both or neither do

(sigh).

E.g.,

def xor(a, b):

if (a and b) or (not a and not b): return False

else: return a or b

Unfortunately, it seems to me that his unholy admixture of return types

(hard to avoid...), as well as the inevitable lack of short-circuiting,

makes such a hypothetical 'xor' _WAY_ less useful than the existing

'and'/'or' operators are:-(

Alex

#34; Thu, 27 Dec 2007 00:25:00 GMT

- Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:
- On Mon, 11 Oct 2004 04:08:49 GMT, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
>On Mon, 11 Oct 2004 02:19:03 +0000, Bengt Richter wrote:

>> On Sun, 10 Oct 2004 22:24:57 GMT, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:

>>>For positive numbers, a number really is its vector of bits. In the

>> ^^^^^[1] [2]^^ ^^^[3] ^^^^[4]

>> [1] Abstract, non-negative integers?

>Integers is more convenient here, but any one dimensional number will do.

>It so happens that computers do not encode "reals" in the naive way, to

>allow for more variation in magnitude.

True. But my point (just about everywhere in the post) was to draw attention

to the distinction between abstract entities (e.g. numbers) and concrete representations.

>> [2] "is" as in "is identical with"? I doubt it.

>With respect, this isn't something to doubt or not doubt. There is one,

>and only one, way to represent any positive number in base two, since

I think we are focusing on different things. When you say "a number really is

its vector of bits" I think wait: a number is a number and a vector of bits

is a vector of bits, but you can't say one _is_ the other. You can say they

correspond in some way. When you say of a number "its vector of bits" I presume

that "its" refers to "its" partner in the number<->bit vector relationship,

not in the sense that an aggregate has its components. To me, a (one-dimensional)

number is a primitive entity, and "has" no parts, though it may correspond to

a representation that does have parts and order etc.

I am hard put to think of anything that is _identical_ with any of the

possible representations that it may be put in correspondence with.

>encoding sign is not an issue. Assuming an extra bit to show sign, there

>is one and only one way to represent any negative number, too. (Zero gets

>to be the exception since then you can have positive and negative zero,

>which are theoretically identical, but on some machines that implement

>them have practical differences.)

>Endianness is just some wackiness in the ordering, but does not really

>change that. 60000 + 50000 = 110000 no matter what endianess you have.

Wackiness must be accounted for ;-)

>> [3] In what sense does an abstract number posess (its) a vector of bits?

>It does not. It *is* a series of ones and zeros, in base two. It *is* a

>series of ones, zeros, and twos, in base three. It *is* a series of [0-9]

>in base ten. It is all of these things at once, and an infinite number of

>other things besides.

Depends on what the meaning of "is" is ;-) Your "is" seems to mean

"can be represented by" in my terms. That's not "is" in my book ;-)

>1 + 1 fully *is* 2. There is no difference mathematically. If there is,

>your math is broken.

You are apparently ignoring representation and just thinking in terms of

what is indicated. So yes, '1 + 1' represents something and '2' represents

something, and for some getabstractvalue, getabstractvalue('1 + 1') _is_

getabstractvalue('2'). But whatever your representation, there is that

function that maps to the abstract mathematical value.

>> [4] A two's complement representation of an (abstract) integer, has

>> "bits"

>> as _numerical_ values in the sum series with corresponding numerical

>> coefficients that are powers of two. But "bits" in the context of

>> bitwise ^ or & or | do not represent abstract _numerical_ values or

>> 0 or 1, they represent logical values False or True (and 0<->False,

>> 1<->True is just a convention that we're used to.

>No. They represent both of those things at once. 1 *is* True, and 0 *is*

>false, in the domain of bitwise operators.

ISTM we have different concepts of "is" ;-)

>There is no "either/or" here, only "both/and".

>> IOW, there's no such thing as "a base 10 number"... A "base 10 number" is composed with a set of

>> digits [0-9] and a rule for ordered composition and a rule for an

>> interpretation to map back to the abstract value.

>Well, I thank you for supplying the missing definition, I suppose.

Well, I made up an ad hoc definition to try to express my thinking on it ;-)

>> We can discuss each

>> part of this process, but the bottom line is that when we say that

>> represented values are equal, we are saying that representations map

>> back to the same unique abstract value, not that representations per se

>> are identical.

>Yep, that is part of my point, see below.

>>>> What is boolvec(388488839405842L) ^ boolvec("hello")?

>>>

>>>I'd rather see this as the fairly meaningless "388488839405842L base 2",

>>>meaningless because the base of a number does not affect its value, and

>>>bitwise-xor, as the name implies, operates on the number base two.

>> Well, to be nit-picky, I still think it does not really operate on a

>> number at all. It converts a number to an ordered set of bools, by way

>> of an intermediate two's complement _representation_ whose _numeric_ bit

>> values are mapped to bool values. Then, after the operating with the

>> ordered sets of bools, the resulting set is mapped back to a two's

>> complement integer representation again, which can be interpreted as an

>> abstract integer value again ;-)

>In the domain of binary logic, I do not concede a difference between "1"

>and "True", on the grounds that there is no way in the domain of binary

>logic to distinguish the two.

It's a matter of interpretation. I view 1 and True as different, sort of like

>>> type(1), type(bool(1))

(<type 'int'>, <type 'bool'>)

>> IOW, a typical float is a 64-bit IEEE-754 double, so there are 2**64

>> possible states of the representation machinery. Not all of those states

>> are legal floating point number representations, but each one that _is_

>> legal corresponds to an _exact_ abstract real value. It _can_ be the

>> exact value you wanted to represent, or not, depending on your

>> programming problem. Inexactness in not a property of floating point

>> representations, it is a property of their _relation_ to the exact

>> abstract values the programmer is trying to represent. 0.0 represents

>> the exact same abstract value as integer 0 (perhaps by way of mapping

>> integers conceived as separate to a subset of reals).

>This is a matter of definition. My problem with your definition mostly

>lies in the fact that it is not as practically useful as mine. Someone

>operating on the "floats are almost, but not quite, exact values" is

I didn't say that. I said they do represent _exact_ values, and that you

can use the available exact values as you like, including intepreting them

as tokens representing an interval on the real axis containing all the values

that will round to the exact value in question.

>*much* less well equipped to handle the way floats bite you than someone

>who operates with "floats represent a value and have an error range around

>that value", because then they should understand they need to compute not

>just with the values, but track the buildup of that error range as well.

>It's a point-of-view thing.

>>>can be made arbitrarily small but never truly identifies one point. This

>> I disagree. 0.0 truly identifies one point. So does

>>

>> 3.141592653589793115997963468544185161590576171875

>>

>> which is the _exact_ value of math.pi in decimal representation. But

>> this is _not_ the exact value of the abstract mathematical pi.

>Python 2.3.4 (#1, Sep 28 2004, 20:15:25)

>[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)] on linux2

>Type "help", "copyright", "credits" or "license" for more information.

>>>> 3.141592653589793115997963468544185161590576171875 8327502398752309 == 3.141592653589793115997963468544185161590576171875 00000000000

>True

Two different numbers were rounded to the same exact number ;-)

>Look at the last digits. (Python is just a convenience here, this is

>fundamental to all float representation schemes, as opposed to things like

>Decimal.)

>This is why I say they represent a range. In set theory, we'd call this an

I agree they can be used to _represent_ a range. The range (actually a half-open

continuous interval I think) will be defined in terms of the _exact_ value

and the neighboring _exact_ values available in the set of values with concrete

representations in the particular hardware.

>equivalence class for equality. There is *no* way to distinguish between

>3.141592653589793115997963468544185161590576171875 8327502398752309 and

>3.141592653589793115997963468544185161590576171875 00000000000 with floats

>(of the size Python uses, add more digits for some other bit count), not

>even in theory. They are the same in every way. That's because the closest

>float represents a range that covers them both. You can shrink that range

Now tell me _exactly_ what that "range" is. I know it's a messy algorithm,

but the intevals are _exactly_ defined, so where will the _exact_ mathematical

values of the limits come from? ;-) (we'll ignore fpu rounding modes ;-)

>arbitrarily small but you can not get to a unique point, such that only

>one true real will compare equal to that number and all others do not, not

>even in theory.

Well, yes, we can only create finite sets of representations ;-)

But you can "get to a unique point" by switching interpretation. I.e.,

if you take the floating point number's exact value as such, instead of

using that value and neighbors as a means to define an interval, then

you do have a unique point. E.g., all the exact values of a 53-bit integer

can be represented _exactly_ by a typical 64-bit double. The floating point

representation is no less exact than the corresponding int or long.

Regards,

Bengt Richter

#35; Thu, 27 Dec 2007 00:26:00 GMT

- On Mon, 11 Oct 2004 04:08:49 GMT, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- aleaxit.python.todaysummary.com.yahoo.com (Alex Martelli) writes:
>> Grant Edwards:

>> > No, he explained exactly what he was trying to do, and it had

>> > nothing to do with encryption. He wants to know if exactly one

>> > (1) of the strings is the empty string.

[...]

> not str1 ^ not str2

[...]

> not str1 != not str2

These give syntax errors, actually. I wouldn't have expected that

either, though :)

> str1 != str2 and (str1+str2) == (str2+str1)

Counter example:

>>> str1 = "x"; str2 = "xx"

>>> str1 != str2 and (str1+str2) == (str2+str1)

True

Bernhard

--

Intevation GmbH http://intevation.de/

Skencil http://sketch.sourceforge.net/

Thuban http://thuban.intevation.org/

#36; Thu, 27 Dec 2007 00:27:00 GMT

- aleaxit.python.todaysummary.com.yahoo.com (Alex Martelli) writes:
- Bernhard Herzog <bh.python.todaysummary.com.intevation.de> wrote:
...

> > not str1 ^ not str2

> > not str1 != not str2

> These give syntax errors, actually. I wouldn't have expected that

> either, though :)

Yep, you need parentheses, as in (not str1) &c.

> > str1 != str2 and (str1+str2) == (str2+str1)

> Counter example:

> >>> str1 = "x"; str2 = "xx"

Yep, excellent counterexample, thanks. So let's try instead:

(str1+str2==str2!=str1) or (str2+str1==str1!=str2)

I _do_ really want something based on concatenation equaling one of the

strings iff the other's empty...

Alex

#37; Thu, 27 Dec 2007 00:28:00 GMT

- Bernhard Herzog <bh.python.todaysummary.com.intevation.de> wrote:
- Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:
...

> Probably so, but that doesn't support the arguement that

> there's something wrong with a logical xor argument coercing

> it's operands to boolean values unless one also argues that

> the logical "and", "or" and "not" operators should also not

> coerce their operands to booleans.

One of the greatest sources of usefulness for 'and' and 'or' is exactly

that these operators _don't_ coerce their operands -- they always return

one of the objects that are given as their operands, never any

'coercion' of it. This lets one code, e.g.,

k, v = (onedict or another).popitem()

instead of

if onedict:

k, v = onedict.popitem()

else:

k, v = another.popitem()

Operator 'xor' couldn't possibly ensure this useful behavior, alas.

Alex

#38; Thu, 27 Dec 2007 00:29:00 GMT

- Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:
- On 2004-10-11, Alex Martelli <aleaxit.python.todaysummary.com.yahoo.com> wrote:
> Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:

> ...

>> Probably so, but that doesn't support the arguement that

>> there's something wrong with a logical xor argument coercing

>> it's operands to boolean values unless one also argues that

>> the logical "and", "or" and "not" operators should also not

>> coerce their operands to booleans.

> One of the greatest sources of usefulness for 'and' and 'or' is exactly

> that these operators _don't_ coerce their operands

Surely they must coerce the operands for the "test" part if not

for the "return" part.

--

Grant Edwards grante Yow! Being a BALD HERO

at is almost as FESTIVE as a

visi.com TATTOOED KNOCKWURST.

#39; Thu, 27 Dec 2007 00:30:00 GMT

- On 2004-10-11, Alex Martelli <aleaxit.python.todaysummary.com.yahoo.com> wrote:
- Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:
> On 2004-10-11, Alex Martelli <aleaxit.python.todaysummary.com.yahoo.com> wrote:

> > Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:

> > ...

> >> Probably so, but that doesn't support the arguement that

> >> there's something wrong with a logical xor argument coercing

> >> it's operands to boolean values unless one also argues that

> >> the logical "and", "or" and "not" operators should also not

> >> coerce their operands to booleans.

> > One of the greatest sources of usefulness for 'and' and 'or' is exactly

> > that these operators _don't_ coerce their operands

> Surely they must coerce the operands for the "test" part if not

> for the "return" part.

We may have different conceptions about the meaning of the word

"coerce", perhaps...? For example, Python may be doing len(x) when I

use x in a boolean context, but I sure don't see computing len(x) as

"coercing" anything. To me, Coercion, in Python, is what's documented

at <http://docs.python.org/ref/coercion-rules.html> and is quite a

different concept from Truth Value Testing, which is documented at

<http://docs.python.org/lib/truth.html>.

Alex

#40; Thu, 27 Dec 2007 00:31:00 GMT

- Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:
- On 2004-10-11, Alex Martelli <aleaxit.python.todaysummary.com.yahoo.com> wrote:
>> Surely they must coerce the operands for the "test" part if not

>> for the "return" part.

> We may have different conceptions about the meaning of the word

> "coerce", perhaps...? For example, Python may be doing len(x) when I

> use x in a boolean context,

I guess I was assuming Python was doing the equivalent of

bool(x) when x was used as an operand of a logical operator.

That's coercion to me.

> but I sure don't see computing len(x) as "coercing" anything.

> To me, Coercion, in Python, is what's documented at

> <http://docs.python.org/ref/coercion-rules.html> and is quite

> a different concept from Truth Value Testing, which is

> documented at <http://docs.python.org/lib/truth.html>.

I guess I'm misusing the term coercion (at least in a Python

context). To _me_ operands of logical operators are being

coerced to boolean.

--

Grant Edwards grante Yow! I'm gliding over a

at NUCLEAR WASTE DUMP near

visi.com ATLANTA, Georgia!!

#41; Thu, 27 Dec 2007 00:32:00 GMT

- On 2004-10-11, Alex Martelli <aleaxit.python.todaysummary.com.yahoo.com> wrote:
- Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:
> On 2004-10-11, Alex Martelli <aleaxit.python.todaysummary.com.yahoo.com> wrote:

> >> Surely they must coerce the operands for the "test" part if not

> >> for the "return" part.

> > We may have different conceptions about the meaning of the word

> > "coerce", perhaps...? For example, Python may be doing len(x) when I

> > use x in a boolean context,

> I guess I was assuming Python was doing the equivalent of

> bool(x) when x was used as an operand of a logical operator.

> That's coercion to me.

Well, Python is doing just the same thing it was doing before 'bool' was

introduced (in 2.2.1, I think) -- Truth Value Testing. I believe a very

similar transition happened, for example, in the C++ language: any

number or pointer was and still is acceptable in a "condition" context

(e.g., for C, 'if(x) ...'). That used to be quite different from

coercions. Then a 'bool' type was introduced -- the language still kept

doing exactly the same thing for 'if(x)' and the like, but even though

nothing was changed, it ``feels'' taxonomically different (in C++ the

case is stronger, because a new 'operator bool' was also introduced; and

with it new and important pitfalls, cfr

<http://www.artima.com/cppsource/safebool.html> ).

> > but I sure don't see computing len(x) as "coercing" anything.

> > To me, Coercion, in Python, is what's documented at

> > <http://docs.python.org/ref/coercion-rules.html> and is quite

> > a different concept from Truth Value Testing, which is

> > documented at <http://docs.python.org/lib/truth.html>.

> I guess I'm misusing the term coercion (at least in a Python

> context). To _me_ operands of logical operators are being

> coerced to boolean.

Didactically, I still prefer to say they're "being truth-value tested",

because the concept of a hypothetical coercion which happens during the

testing then "un-happens" to determine the result is just impossible to

get across -- even if I thought it somehow more correct or preferable,

which I don't. Note, again, that in C++ the case to consider

truth-value testing as coercion is stronger, because && and ||

(shortcircuiting logical operators) _DO_ undertake to always return a

bool, when applied to C++'s built-in types. I consider this a

suboptimal decision, prompting people to code 'x?x:y' or 'x?y:x' where

they might have coded, more simply, x||y or x&&y respectively, were it

not for the coercion which the latter forms imply.

BTW, if you consider a form such as 'x?x:y', which is a better match for

the semantics of Python's 'x or y' than C++'s || operator, you will see

why speaking of operandS (plural) as being 'coerced' is surely

misleading. Under _no_ conceivable circumstances is the RHS operand

subject to any operation whatsoever by 'x or y'. Whatever "coercion"

is, it definitely cannot be happening here on y. Similarly for the

analogous 'x and y' -- never is any operation at all done on y.

Truth Value Testing (which you want to call coercion) happens on the

first (LHS) operand, but not on the second (RHS) one, ever...

Alex

#42; Thu, 27 Dec 2007 00:33:00 GMT

- Grant Edwards <grante.python.todaysummary.com.visi.com> wrote:
- On Mon, 11 Oct 2004 04:24:28 +0000, Grant Edwards wrote:
> On 2004-10-11, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:

>> With respect, this isn't something to doubt or not doubt.

>> There is one, and only one, way to represent any positive

>> number in base two, since encoding sign is not an issue.

>> Assuming an extra bit to show sign, there is one and only one

>> way to represent any negative number, too.

> That's news to me. I've used three different base-2

> representations for negative numbers in the past week, and I

> can think of at least one other one I've used in the past.

I am aware of only one encoding that uses a single bit to represent sign,

as I stipulated, and discarding endianness issues I'm having a hard time

imagining what reasonable alternatives there are.

>> (Zero gets to be the exception since then you can have

>> positive and negative zero,

> That depends on which base-2 representation you've chosen. In

> two's compiliment and excess-N representations, there is only

> one zero value. In signed-magnitude there may be two.

I explicitly only discussed signed-magnitude: "Assuming an extra bit to

show sign".

Normally I'd exhort you to read more carefully but Bengt pointed out some

places I wasn't rigorous with my terms. :-) So I earned this.

#43; Thu, 27 Dec 2007 00:34:00 GMT

- On Mon, 11 Oct 2004 04:24:28 +0000, Grant Edwards wrote:
- On 2004-10-11, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
>>> With respect, this isn't something to doubt or not doubt.

>>> There is one, and only one, way to represent any positive

>>> number in base two, since encoding sign is not an issue.

>>> Assuming an extra bit to show sign, there is one and only one

>>> way to represent any negative number, too.

>>

>> That's news to me. I've used three different base-2

>> representations for negative numbers in the past week, and I

>> can think of at least one other one I've used in the past.

> I am aware of only one encoding that uses a single bit to

> represent sign, as I stipulated, and discarding endianness

> issues I'm having a hard time imagining what reasonable

> alternatives there are.

* Two's compliment.

* One's compliment.

* Signed-magnitude with a "1" sign bit being positive.

* Signed-magnitude with a "1" sign bit being negative.

* Excess-N notation.

Four of the five are in use in the software I'm working on

today.

>>> (Zero gets to be the exception since then you can have

>>> positive and negative zero,

>>

>> That depends on which base-2 representation you've chosen. In

>> two's compiliment and excess-N representations, there is only

>> one zero value. In signed-magnitude there may be two.

> I explicitly only discussed signed-magnitude: "Assuming an

> extra bit to show sign".

I don't know what you mean. Two's compliment, one's

compliment, signed-magnitude, and excess-N _all_ use a single

bit for sign.

--

Grant Edwards grante Yow! Kids, don't gross me

at off... "Adventures with

visi.com MENTAL HYGIENE" can be

carried too FAR!

#44; Thu, 27 Dec 2007 00:35:00 GMT

- On 2004-10-11, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- On Mon, 11 Oct 2004 19:35:55 +0000, Grant Edwards wrote:
> I don't know what you mean. Two's compliment, one's

> compliment, signed-magnitude, and excess-N _all_ use a single

> bit for sign.

I meant a single bit exclusively for sign, such that

1 00001001

and

0 00001001

are the same absolute value. The complements don't work that way. I

guess that does leave a question for whether 1 is positive or negative,

but for the purposes of my point, that doesn't matter much.

#45; Thu, 27 Dec 2007 00:36:00 GMT

- On Mon, 11 Oct 2004 19:35:55 +0000, Grant Edwards wrote:
- On 2004-10-11, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
> On Mon, 11 Oct 2004 19:35:55 +0000, Grant Edwards wrote:

>> I don't know what you mean. Two's compliment, one's

>> compliment, signed-magnitude, and excess-N _all_ use a single

>> bit for sign.

> I meant a single bit exclusively for sign, such that

> 1 00001001

> and

> 0 00001001

> are the same absolute value. The complements don't work that

> way. I guess that does leave a question for whether 1 is

> positive or negative, but for the purposes of my point, that

> doesn't matter much.

OK, if you meant signed-magnitude, you should have said that. ;)

AFAICT, the only people that use signed-magnitude for _integer_

quantities [IEEE FP representations are signed-magnitude] are

the analog guys who design A/D converters [and somebody needs

to slap them and tell them that out in the real world we want

2's compliment].

I haven't _ever_ seen a signed-magnitude integer ALU, but I'm

sure some existed back before I started doing this sort of

stuff 25 years ago.

--

Grant Edwards grante Yow! My TOYOTA is built

at like a... BAGEL with CREAM

visi.com CHEESE!!

#46; Thu, 27 Dec 2007 00:37:00 GMT

- On 2004-10-11, Jeremy Bowers <jerf.python.todaysummary.com.jerf.org> wrote:
- On Mon, 11 Oct 2004 22:25:26 +0000, Grant Edwards wrote:
> OK, if you meant signed-magnitude, you should have said that. ;)

> AFAICT, the only people that use signed-magnitude for _integer_

> quantities [IEEE FP representations are signed-magnitude] are

I never said they existed :-)

#47; Thu, 27 Dec 2007 00:38:00 GMT

- On Mon, 11 Oct 2004 22:25:26 +0000, Grant Edwards wrote:
- Grant Edwards wrote:
>>One of the greatest sources of usefulness for 'and' and 'or' is exactly

>>that these operators _don't_ coerce their operands

>

> Surely they must coerce the operands for the "test" part if not

> for the "return" part.

Is calling a method cosidered coercing? The "test" part must call the

object's __nonzero__ method but no Python boolean object is ever created.

Cheers,

Brian

#48; Thu, 27 Dec 2007 00:39:00 GMT

- Grant Edwards wrote:
- I suggest you have a look at the following:
http://aspn.activestate.com/ASPN/Co...n/Recipe/252237

I believe it does what you want (and has little to do with the

follow-up discussion so far!)

Andre Roberge

dataangel <k04jg02.python.todaysummary.com.kzoo.edu> wrote in message news:<mailman.4604.1097266756.5135.python-list.python.todaysummary.com.python.org>...

> I wrote a function to compare whether two strings are "similar" because

> I'm using python to make a small text adventure engine and I want to it

> to be responsive to slight mispellings, like "inevtory" and the like. To

> save time the first thing my function does is check if I'm comparing an

> empty vs. a non-empty string, because they are to be never considered

> similar. Right now I have to write out the check like this:

> if str1 and str2:

> if not str1 or not str2:

> return 0

> Because python won't let me do str1 ^ str2. Why does this only work for

> numbers? Just treat empty strings as 0 and nonempty as 1.

> Regardless of whether this is the best implementation for detecting if

> two strings are similar, I don't see why xor for strings shouldn't be

> supported. Am I missing something? Inparticular, I think it'd be cool to

> have "xor" as opposed to "^". The carrot would return the resulting

> value, while "xor" would act like and/or do and return the one that was

> true (if any).

#49; Thu, 27 Dec 2007 00:40:00 GMT

- I suggest you have a look at the following:
- Alex
> str1 != str2 and (str1+str2) == (str2+str1)

> I guess I'd better stop here because I can't actually _prove_ the last

> one of these actually implies one of the strings is empty but I can't

> find counterexamples either...!-)

str1 = "a"

str2 = "aa"

Andrew

dalke.python.todaysummary.com.dalkescientific.com

#50; Thu, 27 Dec 2007 00:41:00 GMT

- Alex