Tip:
Highlight text to annotate it
X
JEREMY MANSON: I know you've all come here to see ***, of
course, but I wanted to first say that this is the latest in
our series of talks about programming
languages topics at Google.
The goal of this series of talks is to have everybody who
knows something about programming languages that
Googlers in general don't know come up here and
give a series of talks.
Obviously, we're very lucky here today to have *** Van
Rossum, the benevolent dictator for Life of Python.
But you don't have to be *** Van Rossum to give this talk.
To give this talk, you do.
But to give a talk at the series, you don't have to be.
So please, if you have ideas for talks, if you want to give
a talk, come up and see me.
My name is-- or email me-- my name is Jeremy Manson.
And again--
OK, so now I should move on to the actual
meat of the talk here.
Again, *** Van Rossum is the benevolent
dictator for Life of Python.
He is the creator and father of Python, and we're very,
very lucky to have him able to give a talk to us here today.
And here he is.
*** VAN ROSSUM: Thank you.
Jeremy.
Thanks for giving me the opportunity to do a preview of
my talk, which is going to be a keynote at Python
conference next week.
Reminder for Googlers--
we're going to put this up on Google Video, so please don't
ask any Google-sensitive questions.
Quick overview of what I hope to be talking about, and I'll
make sure that we will not skip the last two bullets--
"What You Can Do Today" and "Questions."
What happened since last summer--
I mean, I started giving Python 3000 talks--
well, I really started giving Python 3000 talks about seven
years ago in 2000.
For a very long time it was purely a daydream.
It was purely conceptual.
It was going to be the next big thing.
Early last year, we decided to really go make an effort, fix
a set of features, and actually start implementing.
And so gradually over last year the plans became more
solid and we had various revisions of the schedule.
I'll give a little bit of a timeline.
I'll give some highlights.
If you have to leave in 10 minutes, stay until the
highlights slide is finished.
Then I'm going to have a long laundry list of various things
that will definitely, or most likely, or in one or two cases
potentially make it into this new release.
And the things that are sort of most interesting from the
developers from the end user's perspective.
I'll try to say a bit about how you turn your Python 2
code into Python 3 code, which is not completely trivial, but
also doesn't have to be a tedious,
completely manual process.
And I'll start giving some hints.
And over the next six to nine months those hints will
probably improve in quality on what you can do to your code
today to be ready for Python 3000.
Basically to make the final
transition as easy as possible.
So we started with having lots of discussions.
At some point I actually had to say, we've had enough
discussions.
Let's get down to implementation work.
And I had to say that several times, and I think the last
time I said it was around Christmas 2006.
Since then, it's really been very much nose to the
grindstone, work out the details on features that we
know we're going to have, and work on the implementation.
And sometimes the implementation actually
informs the specification as things go.
We did write quite a few PEPs--
still not enough, in my view.
And I think we're pretty much on schedule in terms of
writing code.
But it's certainly going to be a big effort
between now and June.
Which takes us to the "Timeline" slide.
I hope that by April this year we'll really be done with the
sort of feature proposal process.
And the feature selection process should be done soon
after that, because that typically goes hand in hand.
We don't collect all the proposals and then there's a
long pause where somebody selects them.
We discuss them as they are being proposed, and as they
are sort of finalized, that also means they are accepted.
So I hope to be able to complete that by April.
Then by June I hope to actually have
a first alpha release.
Then I'm giving myself and the developers another year to
sort of work through the feedback, shave off the sharp
bits, improve performance.
Because at the moment, we're really feature-driven and
performance sometimes goes by the wayside.
Increasingly get users to actually try the new Python
with their source code, with their applications.
And then hopefully in 2008, in June, or somewhere in the
middle of next year we'll actually have a release that
we can be happy with.
That doesn't mean that at that point, everyone who is using
2.x will be forced to upgrade.
There is going to be a Python 2.6 release actually somewhat
earlier than the planned Python 3.0 release.
Although you never know.
Releases tend to sort of fluctuate a bit.
2.6 is the first release that is going to make an active
effort to also incorporate things that will help you
transition to Python 3000.
It will, in some cases, have options that turn on warnings
for things that are going to disappear.
And in some cases features from Python 3000 will actually
be backported into 2.6.
And unless the transition goes really smooth for everyone
immediately, it's very likely that there will also be a 2.7
release at the usual schedule for the 2.x releases.
So the highlights--
and I have more slides on each of these-- but print is going
to be a function.
That is sort of--
I just implemented that last week, and I really have to get
used to it still.
But it is the right thing to do.
Dictionary views are even fresher.
Oh, by the way-- a single star means that there is some
working code, but it's not complete, and two stars means
that it's currently completely vaporware, but we know we're
going to do this.
Question marks means that it's not just vaporware, but we're
also not sure that we're going to do that.
But there aren't any question marks with this slide.
Dictionary views is another thing that will impact many
people's code.
Basically [? take ?] the keys and items and values will
return something that smells like a set
rather than like a list.
Comparing objects has also changed.
At least the default comparison will sort of be
more type-safe and less lenient.
Probably one of the biggest things certainly in terms of
implementation is unicode.
We're going to move to a more Java-like model where all
strings are unicode.
And we have a separate bytes data type, which is more like
an array of small integers than like a string.
That means that we also have to implement a new I/O
library, which I'm actually pretty optimistic about.
Some things have already been done in integer unification,
which means that there's only one integer type.
There's no more long--
no more long literals.
And you can get pretty close to that even
today in Python 2.4.
In 2.4 you almost never have to cast
things to long anymore.
And you don't have to cast them back from long to int
because most of those conversions are taken care of
by the system.
In 3000, the long type will completely disappear.
Integer division will return float.
That's been a longstanding wish of mine.
Actually you can turn that on in Python 2--
since, I think probably since Python 2.1 or 2.0 even.
You could turn--
2.1?
Thomas knows everything.
But not too many people use it.
And then, of course, there's lots of other cleanups like
string exceptions no longer exist, classic classes no
longer exist, we're changing the race statement, and so on
and so forth.
So a little bit more on many of those items and a bunch
that didn't make it to the highlights page.
Print is a function.
We had a discussion and there were a couple
of competing proposals.
One of the proposals was that they were going
to make it a function.
We should also drastically change what it does.
Maybe not insert spaces between items, maybe have
print f functionality.
In the end, we decided to actually go with a very simple
transformation where we have a print function that is just as
convenient as the print statement is currently.
So in most cases, all you have to do is put
parentheses around it.
By the way, you won't have to edit your code yourself.
We have a conversion tool, and while the conversion tool is
far from perfect, this is one of the things it
can do really well.
There's this funny business with a trailing comma that
suppresses the trailing the newline.
You can simulate that by--
print function will have three different keyword arguments.
It will have END, which is the character that is output at
the end of the list of arguments which
defaults to a new line.
There's SEP which is the thing that gets output in between
items, which defaults to a space.
And there is FILE, which is the file where it's going be
printed to which defaults to whatever [INAUDIBLE]
at the moment.
So these three forms of print syntax all translate to very
straightforward calls to the print function.
And there's some functionality that you can't easily do with
a print statement at the moment that you can do by
setting, for example, the SEP keyword to an empty string.
We can automatically translate this.
The only place where that fails is in--
it turns out the print statement has a couple of bits
of cleverness where it works with an attribute only output
file named softspace, which is mostly hidden.
But it's actually accessible to end users if
you really want to.
And the softspace attribute is used to delay the outputting
of the space between items until you actually know that
you have the next item.
That is pretty murky semantics, and it means that
everybody who implements a file-like object at some point
finds that they also have to support the softspace feature.
So I decided to just get rid of that.
It does mean that there are few corner cases, like if you
print a string that ends in either a newline or a tab
character, and then comma and another item, the current
print is cleverly suppressing the space
between the two items.
The print function will intentionally be slightly
dumber about that.
So I actually--
when converting the standard library in the standard unit
test, I had to maybe--
I think I had maybe five cases where I had to fix this
manually in the code.
And usually it's very straightforward.
So dictionary views.
This has a star because the dictionary views currently,
while implemented, don't quite behave like set objects yet.
They can be compared to set objects, but they can't quite
implement--
they don't quite implement all the operations that you expect
of set objects like union and intersection.
They do sort of have the basic functionality.
You can iterate over them, you can do a member chip check,
and you can compare them to another set for equality,
which is actually a relatively big deal.
In the past, if you wanted to see if two dictionaries had
the same set of keys, you would have to make a copy of
each dictionary's keys into a list and then sort the lists.
Or make copies into sets if you were sort of using a more
recent version of Python, like 2.3 which has a sets module,
or 2.4 which has sets as a built-in type.
And then you had to-- you could compare those two sets
or those two sorted lists.
The problem with that is that if you have a large
dictionary, you end up making a large copy of all the keys.
What you can do with the keys' view is actually, you can just
compare the two keys and because they act as sets, it
will automatically and efficiently compare whether
the two sets have the same elements, whether one is a
subset of the other and vice versa.
Just a mathematical definition of set equality.
We're doing the same thing for items.
Items also returns the set view.
Values, of course, could not quite return a set because you
can't have duplicates.
And you'd like those duplicates to show up when you
iterate over it.
We continue to maintain that the invariant--
that if you iterate in parallel over the keys and the
values, that you get matching keys and values at the same
position in the sequence, as long as you don't, of course,
modify the dictionary while you're iterating over it.
This, of course, has all been borrowed from the Java
Collections Framework.
I'm not afraid to borrow stuff from other languages.
I never have been.
I don't think I would've gotten anywhere if I tried to
invent everything myself.
So the important part of the keys--
the dictionary views in general, I expect that keys
and items are going to be the most important ones and values
are going to be only rarely used in practice.
Mostly probably in unit tests-- that's where I found
most of the uses.
These view objects are very lightweight, because they're
basically a structure containing one pointer which
points to the original dictionary.
So its rating over a key's view, or its rating over any
of those three views was actually trivially
implemented.
Because even though I removed the iterkeys, iteritems, and
itervalues methods, I didn't remove their implementations.
And their implementations are still useful as the iterators
over the view objects.
So because I actually did some of the work on this over the
weekend, there are two unimplemented parts of it.
One is, as I mentioned, the set
semantics are not complete.
You cannot check whether your keys object is a subset of
some other key's object or another set.
You can only check-- compare them for equality.
The other thing is that we currently have about 15 or 20
failing unit tests still.
I expect that most of those unit tests are failing for
very trivial reasons.
I mean, what a lot of code does is it assumes that keys
returns a list.
And then it compares that-- the unit test, especially,
often do things like they create a little dictionary,
they mess around with it a little bit, and then they test
that the list--
the keys after sorting have a certain value.
And they usually just compare the keys object with the list
of constants.
That doesn't work anymore.
You could fix that in two ways.
You can explicitly cast the view to a list object.
That sort of fixes it solidly.
You can also replace your list constant that you compare it
with a set constant.
Which I haven't mentioned yet, but which is one
of the later slides.
We have to set literals now.
So much for dictionary views.
The default comparison, I already mentioned that in the
highlights.
Equality and not equal, of course, compare whether--
I mean, you have a default comparison and you can
overload your comparison.
You can implement your own comparison any
way you want it.
I'm not touching any of that.
But the default comparison that you get when you derive
from object and you don't overload any comparison
operators is changing quite a bit.
In Python 2, even in Python 1, even in Python 0, I think, if
you compare two objects of different types with an
ordering relationship, and we just compare the address of
each object and say the one with the lower address comes
before the one with the higher address.
That turns out to be mostly a useless comparison.
It can give you sort of a false sense of security that
if you sort or compare something and you don't know
what the types of the objects are, it's not going to throw a
type error.
But actually you want to throw a type error.
Because most of the time, if there are objects of different
types that aren't really comparable, that haven't
explicitly programmed how they should be compared with each
other, the default comparison is just
giving you random results.
And maybe in one run, this object always shows up before
that object.
But another run, because you have slightly different input
data, their allocation on the heap is different and the
object that was smaller first is now suddenly larger.
And you can have all sorts of bizarre situations where you
have flaky unit tests.
So in particular, this means that you can no longer
compare, or sort, integers and strings, just like you can't
concatenate them or do anything else with them before
converting.
That's pretty much it.
In practice I have not found that this affects much code.
I mean, I have found very little code in the standard
library that actually relied on this default comparison
existing, except again in unit tests that were specifically
checking this behavior.
Which always feels good to rip out code.
Then we get to the scary thing.
And it's scary for me because I haven't started
implementing it yet.
I think it's also scary for application developers because
it can potentially affect application performance,
application semantics.
It's going to be one of the bigger things for converting
code to Python 3000.
If you're not using unicode today in your application,
you're probably pretty safe.
If you're using unicode today, sort of everything you know
about keeping track of encodings, and which strings
are unicode which strings are not unicode, will probably
have to be changed somewhat.
So again, we're borrowing heavily from Java.
there's going to be one string type named str.
But again, it's implementation will be most likely that of
the 2.x unicode implementation.
And we'll have a separate bytes type which
is new, brand new.
Although it's implementation resembles most closely the
array module that has been around probably since
Python 1.5 or so.
You can only ever go between these using an encoding.
If you compare them or concatenate them--
if you compare a bytes object to a string object, it will
just throw a type error.
This is yet another place where the change due to
default comparison is actually helpful, because it just
points out that you're doing
nonsensical operations quicker.
What will completely disappear, and this is
actually a big improvement and the main motivation, is the
endless problems you have in current Python applications
that use a mix of 8-bit and unicode strings.
And occasionally, encoded unicode ends up in 8-bit
string, so you have characters with a high bit set, and then
suddenly they will not interoperate happily with
actual unicode strings.
The thing is if you have an 8-bit string that only
contains ASCII characters, you can concatenate it or compare
it to a unicode string just fine.
And it will have sort of the proper semantics.
But if you have an 8-bit string that actually uses bit
number 8 of at least one of the characters in that string,
you suddenly cannot compare it or concatenate it
to a unicode object.
And unfortunately, this often happens after your application
has been deployed, especially web applications.
The developers live in the US.
They do a lot of testing, they type in their name, and
there's never an accented character around.
Then their first French customer enters their login
name and everything blows up.
Painful.
So we hope that by forcing you to sort of do all the
conversion between bytes and unicode at a much more
specified point slightly earlier in the life of the
strings, you won't--
I mean, you basically--
if you make a mistake, and you do not explicitly convert your
bytes to unicode, typing a name without accented
characters will also not work.
So you're much more likely to actually have effectively
tested your application for all use cases.
This has caused a lot of discussion, and I think that's
still an understatement.
There are lots of different implementation choices.
My personal choice would be, we'll go with basically the
unicode data type that we currently have in Python 2.--
well, since Python 2.0 it hasn't changed a whole lot.
It uses an internal representation that is either
two bytes per character or four bytes per character.
When it's two bytes per character, technically it's
UTF-16 because you can have surrogates in there, if you
care about that.
But the surrogates for most practical purposes look like
characters through the application unless you really
go to dive deep into unicode.
That is one possible implementation.
Another possible implementation would be to
keep a similar thing, but actually have three internal
representations.
One that is a single character wide-- single byte wide, one
is two bytes wide, and one that's four bytes wide.
This means it's less easy to use some of the C Standard
Library that might exist, or extensions of the standard
library that might exist for working with unicode
characters on a particular platform.
On the other hand, it means that you would never have to
worry about surrogates, because the surrogates would
always be converted into 4-byte characters.
It means that if you have a string that contains 1
character that doesn't fit in 2 bytes, the entire string is
4 bytes per character.
That's a compromise currently.
You can compile Python in such a way that all unicode
characters are 4 bytes wide.
It's sort of a cultural was choice whether it's worth you
having the bytes--
having the characters be wider and not having to worry about
surrogates.
The C API issues frankly are a mess.
I'm not going to spend much time describing that here.
Generally, my approach to Python 3000 is first I want to
get sort of the Python programmers APIs cleaned up.
And while it's too bad if extension writers will
temporarily have to deal with a sort of slightly messy set
of APIs, I mean, at C-code you're used to
things being messy.
There is a different faction in the Python developer
community, or at least in the people who are quite vocal in
the Python 3000 list, which is not necessarily the same, who
would like to see things like--
well, the most extreme view is actually support variable
length and codings as the internal representation.
For example, if you have a large file containing unicode
data, you might want to read that into something that calls
itself string object but actually
still contains unicode--
UTF-8 bytes internally.
The problem with that is now I have 10 megabytes of UTF-8,
and I have a program that sort of tries to walk through that
code from the end or just randomly
accesses byte 7 million.
There's no way to find out where-- sorry, character 7
million-- there's no way to find out where character 7
million is without parsing the first 7 million characters.
You could try to optimize that.
Keep a cache of a couple pointers, but it gets messier
and messier and more and more complicated.
I'm not sure that that's at all a viable idea.
Maybe someone can prove me wrong by actually coming up
with an implementation, but I'm skeptical.
A slightly less ambitious, but still very controversial idea
is to optimize things like slicing operations and
potentially also
concatenations, so that if you--
for example, if you have a slice-- you have a string of
10 megabytes, and you take a slice of four megabytes out of
that string, currently Python always copies.
You could say, well, let's just share that array that
already contains those bytes.
I mean, after all, they're immutable
objects, they can't change.
Once they've been read into memory, they are there.
The object's not going to move.
Unfortunately, most of the implementations of that idea
are very easily lured into a worst-case behavior, where you
do something like you read repeatedly--
you read a megabyte string in and you slice 30 bytes out of
it or something.
And so now you have a 30-byte object--
a 30-character string object that references a slice of a
megabyte-long string object.
And you can't deallocate that megabyte until
you deallocate 30-byte--
the 30-character string.
And you can try to work around that with heuristics, like if
the slice is really small, you copy anyway.
Or if it's small relative to the size of the original, or
you can try to use weak references to sort of
dynamically copy.
And the only effect of that is that you have more and more
code that could go wrong, and less and less actual
performance benefit.
So I think in the end, the approach of very
straightforward, simple algorithms that always copy is
still going to be a winner.
But I'm trying to keep an open mind about this.
So the bytes type--
the best way to think of it is a mutable
sequence of small integers.
So it behaves a little bit like a list, but the values
you can store into it are limited to being integers.
The have to be positive and they have to fit in a byte.
It also behaves a little bit like a string.
There's a bunch of string methods that make total sense
for byte arrays like find.
On the other hand, certain string methods that are locale
dependent or character encoding dependent will
definitely not be allowed.
Like you will not be able to lowercase or
uppercase a byte string--
a byte array.
To go from a byte array to string,
use this encode method.
To go from string back to byte array you
use this decode method.
And those always require an encoding parameter.
If you want some kind of default encoding, you're going
to have to dig it out of the environment yourself.
Bytes type has actually been implemented.
Some of the string behavior probably still needs to be
added, but in general I'm pretty happy with it.
You can actually already use it for I/O in limited
situations.
So that's a nice segue to the new I/O library, which is yet
another idea inspired by Java.
And you could also say it's inspired a little bit by Perl,
which also has stackable components in
its newer I/O library.
So at the very low level, you can read bytes from--
well, from a file descriptor, a file handle.
On Unix it's going to be a tiny object that wraps a Unix
file descriptor.
On Windows it's going to be a tiny object that wraps a
Windows file handle.
It provides read, write, close, seek and tell methods.
There is no buffering going on and it always
talks in terms of bytes.
It doesn't do any carriage return line
feed conversion either.
If you start on a brand new platform that is not at all
like Unix or Linux or Windows or Mac, you're going to have
to provide your own low level byte I/O implementation.
Most likely there's actually a Unix emulation library that
you could probably use, as long as you can turn off any
character translation features it might have.
I mean, that's a possibility for Windows, too, but on
Windows there are actually slightly lower level things
that are more efficient and more flexible.
But that's the only thing you have to do for a platform.
I mean, buffering, unicode, encoding, decoding, carriage
return, line feed translation--
all those things can then be built on top of that without
any platform specific stuff.
Using this, I expect that in most applications, unless you
are doing very messy stuff where you're sort of not sure
whether you're reading binary data or text, which of course
happens, you will not have to change your program.
The open function will continue to
return the file object.
You can tell it to open a binary file or a text file for
reading or for writing.
All those things will still work.
However, if you open a text file, read and
write will use strings.
If you open it as binary, read and write
will use byte arrays.
So that's probably--
if you're doing binary I/O, you're more likely to have to
change your code than if you're doing text I/O.
Now how does it decide on the encoding when you're doing
text I/O and you don't specify the encoding as
an extra open parameter?
Open will have a keyword parameter that will let you
specify an encoding, but if you don't specify that, it's
going to pick a default.
And I can imagine a number of different ways
of picking a default.
You could say, well we'll pick ASCII, or we'll pick UTF-8, or
we'll sniff the file and actually see whether it looks
like UTF-8 or UTF-16, little-endian
or big-endian encoding.
You may try to see what the user's environment says about
file formats.
There are a couple of different ways.
I mean, if you're dealing with a TTY device in the Windows
environment, I think the TTY device actually knows what
encoding to use.
So that would be another way to get
your encoding by default.
I expect that when you're opening a file for binary I/O,
you will not be able to use the
ReadLines or ReadLine methods.
Unless it turns out that a lot of code breaks--
I mean, I don't actually honestly know if there is much
code around that has a legitimate reason for calling
ReadLine on binary files, but there might be.
So we'll see.
An interesting thing is also how you're going to tie these
things to sockets.
But I think all the socket has to do is provide little
[? wrapper ?] that implements the same read write operations
that their lowest level of binary I/O object does.
And you have to somehow decide on what your encoding is.
[INAUDIBLE]
or ASCII or something else.
And then you will be able to read and write from sockets.
By the way, we're completely weaning ourselves off the C
Standard I/O Library for a number of reasons, mostly
having to do with the C Standard I/O Library not
actually always providing the functionality that we need.
Like it provides buffering, but it doesn't provide an API
to see how many bytes have been buffered, if there's
anything buffered.
It doesn't have a way of [? peeking ?] in the buffer.
We need those things.
There's also this thing that the C Standard I/O Library
says that basically you could expect a [? seg fault ?]
or World War III when you read, and then suddenly you
start writing the same file descriptor even if the thing
is suitable for reading and writing.
You still have to seek when you switch
from reading to writing.
Since the Standard I/O Library-- the C Standard I/O
Library doesn't promise that you get a neat error message
when you forget to seek in between, that's a really
unpleasant thing for Python to have.
So Python has to keep track of, are you
reading or are you writing.
So we end up sort of redoing too much of the C Standard I/O
Library's functionality anyway.
So we'll just throw it out and hopefully have a bigger and
better implementation.
So int/long unification is a really simple thing.
Currently, Python has small ints named int, and large ints
named long.
The large ints are actually arbitrary precision, so you
can represent numbers as long as they fit in memory.
The small integers are actually mapped to C long, so
they are 32 or 64 bits depending on what kind of
platform you have.
That was really a mistake, and I made that mistake sort of
very, very, very early on in Python's design.
And over the years we've made more and more compromises
where you can use int and it will actually behave as if it
were long if it doesn't fit.
Like in older versions of Python, if you kept
multiplying numbers together and the results got bigger and
bigger, at some point you'd get an overflow error.
In modern Pythons--
I think it started in Python 2.3 or so--
certainly in Python 2.4--
when the result doesn't fit in 32 bits or in 64 bits in some
platforms, you'll just get a long integer.
And more and more places, if it doesn't fit in a small
integer, we'll just give you a long integer even to the point
where if you call the int function, and somehow the int
function can do a couple of things.
It can convert a float, or a string, or another
integer to an int.
In most of those cases if the result is a valid integer, but
it doesn't fit in 32 bits, so it's a valid mathematical
integer, nowadays int will just return a long object.
And so the only place where you're still aware of the
difference between ints and longs is if you're explicitly
checking the type of your objects.
If you say, if it's instance x comma int, then do this,
otherwise do that, then your code won't work when someone
passes you a long, even if it's a long containing a very
small value.
So the long thing becomes less and less useful.
And in Python 3000, we're just throwing the type out.
We looked at the number of different implementations.
What we chose was actually taking the long implementation
and renaming it to int, at least at the Python level.
In the C level, the distinction between long and
int is still very much visible.
We did have to optimize it a little bit, because the int
implementation was traditionally very optimized,
like it has a cache of small integers and a couple of other
allocation tricks.
The long type was completely unoptimized.
I think the new long int type is somewhat optimized.
At least it has a cache for a small values.
We're probably going to try to get that performance back up
to speed comparable to the best performance in Python 2.x
during the year after the 3.0 alpha 1 release.
I don't know how close we'll get, but I'm hopeful that some
smart people will be able to do magic there.
And it makes life for the programmer much easier because
you know you can actually write if instance x comma int,
and it will do the right thing.
Unfortunately, I have no idea what time it is.
I'm worried that I might actually--
15 minutes?
Oh, excellent.
AUDIENCE: You can usually run over, and nobody cares.
*** VAN ROSSUM: Except the tape runs out.
AUDIENCE: [INAUDIBLE]
*** VAN ROSSUM: Doesn't matter.
I'll try to be done in 15 minutes.
I think that's OK.
So we have integer division.
And again, that was a very early mistake where I sort of
mindlessly borrowed behavior from C. If you divide 3 by 4,
it gives you 0.
It turns out that certain algorithms really sort of find
that a *** trap waiting to explode when you
least expect it.
So we're going to make 3 divided by 4 return 3/4 in
some kind of float representation.
And you can use double slash if you really wanted that 0.
Now that double slash operation has been in Python
2.x probably since 2.1 again--
2.2, OK, I believe you.
So you've had plenty of warning and there's also an
option you can pass to Python 2.x that will tell you when
you're using the single slash operator, and it is used on
integer operands.
So changes to exceptions.
We're getting rid of string exceptions.
We're also enforcing that all exceptions derive from a
single root exception type which is called BaseException.
In practice, you should derive all your exceptions from
exception which is slightly lower in
hierarchy than base exception.
But you can, if you know what you're doing, derive from
BaseException.
Also, we're going to move the traceback into
the exception object.
Again, I should mention, this is an area where
Java has been leading.
We're cleaning up the raise statement.
There are two different ways of raising an
exception with arguments.
You can say raise E parentheses arguments close
parentheses, or you can say raise E comma arguments, or
arguments in parentheses even.
But second syntax was only necessary back in the day of
string exception, so we're getting rid of that.
If you want to pass a traceback, you call a
method-only exception object that you already created that
sort of adds a traceback object.
Which also, changing the except clause, when you're
catching exceptions there's a pretty common mistake where
you wanted to catch two exceptions but you forgot to
put parentheses around them, and now you're catching the
first exception, and when you catch one, a local variable is
created with the name of the second exception.
In order to prevent that, we're going to use-- instead
of a common between the exception and the variable,
we're going to use the keyword S.
Also new is-- and this has to do with the exceptions now
sort of containing the traceback as an attribute.
We're going to delete that variable, if it still exists
at least, at the end of the except block.
We're basically going to put the [? try ?] finally in that
block that you won't see but will be there.
That leads the value if it exists.
Which means that if you want that value--
if you want that exception value to survive beyond the
except block, you have to just assign it to a
different local variable.
So we're not going to do optional type checking, but we
are going to add some syntax that will allow other people
to implement frameworks that do something like type
checking, or whatever they would like to do.
Basically, currently every parameter of a function has a
default value.
Well, it can have a default value.
We can now also associate an
annotation with every parameter.
The annotation is introduced by call on the default value,
is of course introduced by an equals sign.
You can combine those, the call on annotation equals
signs expression notation.
You can also annotate the function returned
value with an arrow.
All those things are evaluated when the function is defined.
So at the same time the function object is created,
both the default and the annotation, which are just
generic expressions, I have no constraints on
that, but they must--
if they reference variables, those variables must exist at
that point in time.
And then you can pull those annotations out of the
function object by asking for the func annotations attribute
of the function.
And that's just the dictionary indexed with variable names
and the keyword return.
If you want to do something with this, you'll have to do
it yourself.
I can imagine all sorts of decorators or metaclasses that
make good use of this to enforce all sorts of things
from actual type checking to automatic adaptation and a
number of other interesting things.
I'm not going to put anything like that in the language, at
least not in 3.0.
Another small change to function signatures,
completely independent from the previous one-- both of
these have been implemented by the way.
Sometimes it's really helpful to have a parameter that is
required to be used as a keyword in your call syntax.
If you really want to enforce that in Python 2, you can use
star star keywords and sort of pull it out of the star star
keywords dictionary.
But it's kind of messy, and you have to sort of check for
each of the keywords that you might expect and check that
there isn't anything else in there in order to be sort of
robust and user friendly.
Now you can just use this strange notation where there's
a star without--
I mean, the star, of course, normally means star--
you can use it already as star args, which means we have a
variable number of positional arguments here that gets
returned as a tuple.
Now if you leave the name out from that syntax, you just
have a star without the star args.
And then you cannot specify arbitrary positional
arguments, but you can, after that, specify more arguments
that will then be required to be keywords.
And they don't even have to have to have defaults.
So after that star, you could have c is 42, so that's an
optional keyword parameter, but d doesn't have a default
value, so that's a required keyword parameter.
So every call to foo in that case must specify a value for
d, and it must specify it using the keyword notation.
Set literals, very simple.
You put a number of expressions in curly braces
and it creates a set object.
Except if there is nothing between the curly braces, it
still creates a dictionary.
At some point I tried to propose to unify the
dictionary and the set object.
That didn't get a lot of support from
the developer community.
If you really want frozen sets, it turns out frozen sets
are only very, very rarely used.
You'll have to cast that thing explicitly to a frozen set.
Or, of course, you can use a frozen
set with a list argument.
We're also going to implement set comprehensions.
Those are not yet in the code base.
It works the same way as a list comprehension except it
returns a set.
Absolute import--
you can already do that in Python 2.5.
From [? under ?]
future import absolute underscore import--
that means that if you import a module using import foo or
something like that, inside the package, normally in
Python 2.4 and before, it first sees--
tries to find that foo in the package.
If it's not in the package, it looks [? insist ?] the path.
In 3.0 or in 2.5, if you have that future statement in your
module, it's not going to look in the package.
That solves a particular ambiguity where you might have
a module in your package that has the same name as a module
in the standard library--
the top level module in the standard library.
Currently, without this future import, there's no way to
reach out and actually import the standard library module
because the one in your current package will always be
seen for first.
Well, you could dig it out of system modules, but only if
it's already been imported by someone else.
If you want to say, I definitely want the foo, it's
in my package, rather than potentially the one in
[? system ?]
path, you can say from dot import foo.
That's also already in 2.5.
The only difference, really, is that in 3.0, you always
have that future statement automatically
implied in your code.
Exec--
very early Python versions this actually was a function.
It takes an object which is either a string or a code
object or a file, and then optionally globals and locals.
At some point I thought that the compiler could make good
use of the fact that you were using exec somewhere in a
function and I decided that in order for the compiler to know
about it, it would have to be a statement.
Well, compiler technology has advanced a little bit.
And you can actually tell fairly reliably whether you're
using a function like this.
So there's no need for it to be a statement, and it's
actually easier to have it as a statement.
Sorry, as a function.
So it's back to being function.
The interesting thing is this is very easy to do in Python
2.x also, because since it once was a function, that same
syntax with a tuple of up to three values is also still
supported in 2.x.
So range--
just like we have keys and iterkeys, we
have range and xrange.
Because range was there first, range creates a list of many
integers, potentially many.
Xrange produces only the integers that you ask for.
So we're going to change that so that there's only going to
be a function named range, but it will
behave mostly like xrange.
The difference is the current xrange is optimized so that it
actually only works for integers that are less than
syst of maxint.
And Neal Norwitz has a patch to fix that, but I'm still
waiting for him to upload the patch or something.
Zip--
this is actually a pretty minor issue.
Zip is something that would be a very good candidate for
returning an iterator in Python 2 when it was-- except
it was introduced before iterators existed.
So there's an itertools, that izip thing that does return an
iterator that makes much more sense for zip to be an
iterator in the language.
So string formatting has a couple of problems, and there
is a PEP which I hope will be implemented.
I'm certainly in favor of the proposal to give strings a dot
format method and to use curly braces instead of percent
something as the indicator for replacement
in the format string.
Here, quickly, are a couple of examples.
You can specify format arguments by positions, 0 and
1, or by name, foo.
If you want to include little curly braces
you can double them.
You can even access attributes or use get item dictionary
notation in simple cases on the formatting object.
You can also specify parameters after a colon.
I think that is actually borrowed from dot net.
Although I'm not sure that we are taking
exactly the same notation.
Read the PEP if you're interested.
So this is something that's actually probably not going to
make it, but I'm mentioning it anyway, because it is
potentially an interesting feature.
It's just there are a couple of difficult
decisions to be made.
I mean, it's very easy to come up with a decent
switch style syntax.
You can say switch expression, case
expressions, blah blah blah.
The question is, when do you evaluate the case expressions.
In order to actually benefit from potential speed up, like
you could do it as patch-based on the dictionary, you would
like to precompile those case expressions.
For example, compile them at the time the function is
defined rather than each time the function is invoked.
But that limits you to actually constants.
And that's not a concept we currently have anywhere else
in the language, which makes it somewhat problematic sort
of conceptually.
Which is why we haven't implemented it yet and it's
marked with both stars in question marks.
Another thing that is more likely to make it, even though
it's slightly ugly, if you have a function, an inner
function that references a variable defined in an outer
function, you can use it, but currently you
cannot assign to it.
You can modify it if it's a mutable object, like if you
have a list object checked in the outer function, you can
append to that list or even index it and change an element
of that list.
But you cannot replace it with a different list object using
plain assignment.
Turns out that there are enough places where people
would like to have that functionality.
And we had a long discussion where Ka-Ping Yee Yee did a
brilliant job of summarizing the discussion and sort of
guiding it towards perhaps not final completion, but at least
closure so that everybody could agree with what was
written down in the PEP.
We're pretty much settled on the
syntax and on the semantics.
The only thing is there are different flavors of keyword
that sort of each have their own advantage and
disadvantage.
Nonlocal is the current favorite.
It's sort of ugly because it's a long word and it has sort of
a negative meaning.
Unfortunately, the only real contenders
were global and outer.
Where the problem of global is that, global for most people's
minds, has fairly set semantics, which really
doesn't mean just go search outward scope by scope by
scope, but really go all the way to the outermost scope,
the global scope.
So that's why--
even though global was my favorite, nobody else seemed
to like it very much.
And I have to respect my users.
Outer was a nice candidate until we found how often that
word is already used as a variable name, or function
name, and that made it much less attractive.
So it's probably going to be nonlocal, which is not
something people tend to use a lot as variable names.
So another very speculative thing is
abstract base classes.
And we had long discussions about
interfaces, generic functions.
Abstract base classes actually, the more I think
about it, the more attractive they look from the perspective
of a somewhat voluntary declaration of, I implement a
particular protocol.
But protocol is a very informal concept.
We've had the concept like protocol in
Python for a long time.
We've been talking about sequences and mappings as sort
of implementing certain operations and not others.
The problem is if you have an actual object and you don't
know whether it's a sequence or a mapping, there's not
really a good way to decide which one it is.
You can check whether it has a keys method, but there are
actually some cases where you have something that really
behaves like a mapping, but it maps an infinite number of
keys, and you really don't want to implement a keys
methods that tries to enumerate all of them.
So if there was an abstract base type that didn't provide
any semantics or implementation, but just
serves as a marker class, I am implementing the sequence
protocol or I am implementing the mapping protocol or I am
implementing the file protocol.
And it's probably going to be a couple of--
there is going to be more fine-grain distinctions, like
you have readable files and writeable files, and readable
and writeable files, and you probably have mutable
sequences and immutable sequences, and very basic
mappings that only implement the map operation and sort of
very complete mappings that implement lots of other
functionality like update and keys.
But if I get time between now and April, I'll write a PEP
about this and then implementing it
is going to be simple.
But this is something that just adds some stuff.
It's going to be easy to make all the standard types declare
what stuff they implement.
And then it's just up to user code to
voluntarily follow this.
I mean, we won't stop you from implementing sequence protocol
methods without declaring that you're a sequence.
But the carrot in this case is that if you want to interface
with a large framework like SOAP or Twisted or something
like that, it might be that eventually future versions of
those frameworks that work under Python 3.x will
actually, instead of sniffing which methods are implemented,
will actually just look at the base classes.
That's the hope, anyway.
So I'm going to skip the miscellaneous changes.
You can get the slides from the web eventually.
This is mostly cleanup of very small stuff.
Library reform is not my own idea of fun.
I like to focus on the language.
Language is big enough that--
other people are interested in reforming the library.
There's currently not a lot of activity going on.
It's certainly something that I think is a fine project to
do after we've released the alpha 1
release of the language.
So again the C API--
I'm currently not too worried.
I'm just randomly changing the C API as object types change.
Of course, if you're writing a third party extension that's
not already part of the Python source tree, you would like to
know what's going to happen.
At this point, the only thing I can promise is I'm not going
to change functions to have a different signature
but the same name.
Or different semantics even with the same signature.
I'm going to add APIs, I'm going to delete APIs that are
no longer relevant or impossible to implement.
I'm not going to change APIs in an incompatible way that
would break your code.
I am going to require everyone to recompile their code.
That's the minimum I can expect.
So if your compilation passes, you're somewhat likely to
actually have a working extension.
Best case scenario.
If you're using APIs that no longer exist, you'll get a
clear compile time error about something that doesn't exist
or maybe a link time error.
So now you have a bunch of Python 2.x code and you want
to turn it into Python 3.0 code.
Well, you could just try to run it with 3.0 and fix all
the syntax errors and then fix all the runtime errors.
Hopefully you have unittests.
That's going to be pretty tedious because there--
even though the general flavor of the language doesn't change
much, there are clearly a lot of small changes
that really add up.
Classic classes, except S, different race syntax, no
comparisons, keys, dictionary views is going to affect lot
of people, print statements, of course is going to affect a
lot of people.
Unicode is going to be a major deal for at least some people,
so there is a conversion tool.
Now, we cannot do a perfect conversion because in some
cases it's inevitable that you have to do a symbolic
execution of the application in order to find out what the
types of a particular variable are before you know how to
convert a particular call.
I mean, if I say x dot keys, there's no guarantee that x is
actually a built-in dictionary.
It could be a completely unrelated object that has a
keys method.
However, there's a good chance that it is a dictionary.
If you have something that has an iterkeys method, there is
an even bigger chance that it's a dictionary.
So what we're doing is we have a tool that parses your code
and looks purely at the parse tree and is able to transform
that parse tree in place and then write it back out.
And we annotate this parse tree with exactly where the
white space is and where your comments are.
So in theory, certainly if I know I have tested that--
if you don't make any transformations, it's always
the output is exactly the same as the input.
Every single white space character.
That's conversion as perfect.
Now if you make transformations, sometimes
it's possible that you would lose a comment if that comment
sort of is in the middle of an expression that gets
completely discombobulated and transformed into something
completely different.
That's not very likely to happen, because how often do
you have significant comments between the parameters of a
function right after a binary operator.
Not so common.
So if you're interested in looking at this code,
currently you have to go to svn.python.org and find a
sandbox code and go to the 2to3 subdirectory.
It's relatively easy to add new conversions.
I mean, I've had a couple of Python developers who started
contributing conversions actually.
That's been really great.
The idea is you write a pattern that decides, I want
to match certain notes in the parse tree that look like--
that match the pattern.
And the parent completely ignores what the comments say.
It purely looks at what the parser actually sees.
So there are really two parts to the parse tree.
There's the annotation for white space and comments, and
there is the syntactic tokenization and parse.
So the matching is purely concerned with matching nodes
and leaves in the tree.
And I'll show the pattern--
syntax in a minute.
So you write your pattern, and then you write a
transformation function that sort of picks the node you
find apart and puts it back together in a different order
and returns that new node.
And some caveats--
then there's a framework that does all the rest of the work,
like traversing the entire tree looking for all the nodes
that match the pattern and calling your transformation on
each of those.
Sort of a separate strategy that is also going to help is
Python 2.6--
by default it will just be Python 2.6, but it will have
an option where it will warn about things that will go out
of style in Python 3000.
It will probably also backport certain Python 3000 features
so you can start using those.
I don't want to give examples because not much of that has
actually been implemented.
Maybe Thomas can talk about that next week.
So here a couple of things that the transformer
is really good at.
It can take a call to apply and turn it into the more
modern notation using star args and star star keywords.
And as long as you don't have a local variable named
"apply," this is going to do the right thing.
And it will put extra parentheses around the
function or the arguments, if necessary, to make sure that
it doesn't sort of get affected by nearby operators.
Slightly less perfect but still pretty close, it turns
everything that says iterkeys into keys and
iteritems into items.
It can also do a really good job with exec.
It can do a really good job with print.
It can do a really good job with except clauses.
It also recognizes has key, assuming that you don't have,
again, a user object that happens to implement has key.
I found one example in the standard library where the BSD
wrapper library actually has a two-argument has key where the
second argument I think passes in transaction state.
So I'm not quite sure what to do with that.
So just don't convert that one.
But otherwise, [? turning ?] d dot has key k into k in d,
again making sure to parenthesize subexpressions or
the whole thing as necessary based on the context.
So it doesn't add parentheses unless they are necessary to
disambiguate stuff.
On the other hand, if you have redundant parentheses in your
input, you will have the same redundant
parentheses in the output.
It's very simple to turn the less than, equal than--
sorry, less than, greater notation for unequal into
exclamation point equals sign.
That could turn back ticks--
can even turn int into long.
I've found that actually these things were not quite enough
to get most of the unit tests we were--
to pass.
The problem is that a very popular testing framework in
Python is called doc test.
And it works by having documentation strings so
they're just string literals due to parser containing
fragments of Python sessions--
interactive Python sessions that, in theory, you could
just cut and paste them out of your shell window into your
Python source code.
And then there's a framework that automatically tests--
it's sort of a regression framework that checks that
those examples still have the same output as they had when
you pasted them in.
Since all this stuff is inside string literals, it's not so
easy to see how we could convert those, because we
can't just go scan all the string literals and assume
that they contain Python code and turn everything that looks
like a print statement into a print function call.
However, what you can do is--
it turns out that, at least for the doc test stuff, doc
tests are pretty recognizable, because they have to start
with a Python problem, three greater than signs, and if
there are continuation lines they have to start with three
dots, and they all have to be sort of indented the same way.
So with very great reliability I parsed the doc test out of
the source file.
You have to actually run the tool a second time-- maybe
eventually I'll combine that-- currently you have to run the
tool a second time.
And it will just scan the source code
looking for doc tests.
And this was a great relief.
I mean, at some point I was a little panicky because I
realized how much unit testing code I would
have to convert manually.
And then I realized I just have to do this.
The only place where it broke down tremendously was the doc
test for the doc test module itself, which applies this
trick recursively.
There, I just ran the test and sort of fixed the thing
manually until it worked.
Nothing else I could do.
Now there are also a whole bunch of things that this
conversion unfortunately cannot do.
If it sees d dot iterkeys, it has no way of knowing whether
d is actually a dictionary.
If it sees d dot keys, it has no way of knowing whether
you're going to expect that thing to be a list or not.
It has no way, if it sees x slash y, whether you meant
that to be-- whether when you execute that code, x and y are
integers or not.
So it's not able to sort of turn that single slash into a
double slash.
It can't find code that somehow depends on being able
to order objects of different types.
It certainly doesn't clean up your code or remove redundant
definitions.
If you write your own code that emulates a dictionary and
reimplements the mapping protocol, it's not going to
touch that.
It's also not going to fix your string exceptions.
Basically, all it can do is match on a parse tree.
Stuff that you can reliably or mostly reliably fix by looking
at the parse tree only is good candidates for this tool.
I don't know if that's going to be enough.
Maybe at some point, we'll have to add understanding of
variable scope and things like that so it can actually tell
whether a particular occurrence of a variable named
"apply" is, in fact, the built-in in
function "apply" or not.
I'm currently hoping that we won't need to do that.
Otherwise we would somehow probably have to merge this
tool with pi checker which would be quite a refactoring.
So if you're interested--
I'm actually probably going to skip this--
this is what the matching notation looks.
You basically--
you use the names that are also used in the grammar file.
Python has its own grammar file.
Here's a couple of examples.
Power is a token--
power is an atom followed by 0 or more trailers, and then
optionally followed by a double star and something
called a [? factorer. ?]
And there's a couple of alternatives for what an atom
and the definition of what a trailer is.
And there's like several hundred lines like this that
make up the entire Python syntax.
So our conversion tool actually reads that file with
the Python syntax at the start of a run.
And builds a parser customized to that syntax.
So it's very easy, actually, to change the syntax that the
conversion tool uses, but you just have to
edit one text file.
The trick I use in the patterns is I use the same
notation as in the grammar.
I actually use regular expression notation.
So you can match here a pattern-- the pattern power,
and then the angular brackets are actually
sort of, they specify--
and inside this node labeled power, I must match the
following thing.
So this is--
we want to match a power that starts with, well, one or more
nodes of any type, but they must be exactly at that level.
And then a node of type trailer with a particular
substructure, namely the trailer alternative that has a
dot followed by a name.
And the name, in this case, must be dot iteritems.
And then it can have more trailers.
That's an example of a matching rule that's close to
actually the rule I use for fixing iteritems.
So if you have that expression a square bracket 0 square
bracket dot iteritems paren paren, the parser sees that as
an atom containing a, and then a node that's a trailer,
that's the square brackets, another node that's a trailer,
the dot iteritems, and another node that's a trailer that's
the parentheses.
And that happens to match this pattern as follows.
The first two together actually match the any plus.
Then follows the trailer which happens to match a trailer
with that particular substructure.
And then the final trailer it matches the trailer star.
And you can nest these things as much as you want, and it's
relatively efficient in just traversing the three and
finding matches.
What your transformation function gets is it gets the
node that matched the top level of the pattern.
It also gets a dictionary containing elements, sort of
subnodes of that node.
And what I didn't show-- what I'm not showing here is you
can add names to any particular
section of the pattern.
You can say, oh, this subpattern, call that foo.
Or call this other subpattern bar.
And then you can sort of pull all those subnodes that match
those subsections out, and you can rearrange those in a
different order.
That's, for example, how you do the apply thing.
So here's the slide that you're all waiting for.
What can you do today.
Well, my first recommendation is don't worry about the
changes that the transformation tool can
actually take care of.
I mean, my first version of this slide actually started
out with, OK, so use star args instead of apply, and use
raise exception, parentheses, parentheses instead of erase
exception comma value.
And then I realized, no.
You shouldn't have to worry about all the stuff that we
can transform through syntactically.
I mean, it's unlikely that you'll be able to write code
that is both valid Python 2.6 source code and valid Python
3.0 source code.
So you're going to have to run the
transformation tool anyway.
What you can do is make things easier so that after you've
run the transformation, you actually end up
with working code.
Using Python 2.6 means that you can use Python 2.6 as
warnings to find certain things that the transformation
tool cannot handle.
It's always a good idea to have unittests so you can sort
of see if the semantics of your new code is still what
you expect it to be.
And then there's a couple of things that the transformation
tool does not handle.
Like if you extract the keys from a dictionary and then you
sort the resulting list, the transformation tool is not
smart enough to correlate that the variable you assigned on
line 1 is being sorted on line 27 or on line 2, even.
But you can write today, you can use the built-in in sorted
function, which is available in Python 2.4 and up.
And then you have code that can be
easily transformed correctly.
And similarly, if you really have a good reason to want the
return value of keys as a list, call list and pass it
the iterkeys function.
The iterkeys will be transformed by the
transformation tool and so it will still be a list and it
will be just as efficient in 2.6 as in 3.0.
Another thing you can very easily do is make sure that
all your exceptions are actually using classes derived
from exception.
You can also make all your classes that aren't
exceptions--
that don't have a base class--
derive them from objects of their new style.
There are certain semantic differences between classic
classes and new style classes.
By converting them to new style classes, now you catch
those semantics while you're sort of thinking about it.
And then with print, don't worry about the print syntax.
I recommend that you just use the print statement and rely
on the transformer to turn them into function calls when
the time comes.
But be aware of the two cases where the transformation tool
doesn't do the right thing, which has to do--
I think I showed it on the slide about print.
If you have a string ending in a new line or a tab.
Another thing you can do now is make sure that your code
uses a double slash where you expect an integer division.
So now we have--
well, in theory, we have five minutes for questions, if
anybody has the energy.
Yes.
AUDIENCE: You rejected the idea of using UTF-8 or UTF-16
for strings, because that would make random access
impossible to implement in [? more than ?] one.
I am curious why you think that order 1 random access is
a necessary property of strings to individual
characters.
*** VAN ROSSUM: So the question is why do I not want
strings to use internal UTF- 8 or 16 representation, and why
do I think that order 1 indexing
of strings is important.
I think because it's a tradition in Python, unlike
some other languages, that we actually write a lot of code
that sort of traverses a string and keeps track of a
particular index.
There's just lots of code that indexes the string.
I mean, it's very common to say that if s dot ends with
dot p y, return s sliced from zero through len s minus 3.
That's all I can say.
It's sort of, common idioms in Python code are using slicing
which uses numerical indices quite a bit.
And pattern matching is used much less.
AUDIENCE: [INAUDIBLE]
*** VAN ROSSUM: OK.
So the question is, can the transformation tool
potentially be abused for other purposes.
I think it definitely can.
There's nothing that says you have to use it to
transform it to--
you don't have to use it to transform
Python 2.x to 3.x code.
I mean, you can make the input syntax whatever you want it,
and you can slightly alter the driver so that instead of
transformations, you just get error messages if you match
certain patterns.
That's an excellent idea, actually.
AUDIENCE: [INAUDIBLE]
*** VAN ROSSUM: I didn't get the last few words, but your
question is, did I consider some other string abstraction
that would not make it necessary to rely on the
indexing so much.
AUDIENCE: As an adjunct to [INAUDIBLE].
*** VAN ROSSUM: Oh, I see.
Your question is specifically could we have an additional
string class that has sort of a different model.
That's a reasonable question.
I hadn't really considered that.
I see it as a library issue.
I think I would encourage people to sort of write custom
string classes that might be more efficient for certain
situations.
And you can probably write them by--
you can implement them in Python by using a byte array
and a thin layer on top of that.
Or if you're really interested in super performance, you can,
of course, do it all in C. I mean, that's the beauty of an
extensible language.
It doesn't all have to be in the standard library.
In the back?
AUDIENCE: [INAUDIBLE]
*** VAN ROSSUM: Sorry.
Could you speak up?
It's getting noisy.
AUDIENCE: [INAUDIBLE]
*** VAN ROSSUM: OK.
So yes.
So the question is, there's going to be a long period
where library developers--
third party library developers especially--
will sort of be required to maintain a 2.6 and a 3.0
version of the same library.
Or maybe even going back to earlier versions than 2.6, is
the expectation that they limit themselves to code that
can be automatically transformed to 3.0.
Expectation is a strong word.
I would recommend that because I expect that that is the sort
of least painful way for library developers to go.
Of course, if you have an existing library that has
backwards compatibility requirements going back to
Python 2.2 or some time even before, it becomes gradually
harder to maintain your source-- that code in a form
that can still be transformed.
I mean, if you're in the lucky situation that you can
actually say, 2.6 is the oldest version of Python I
support, then at least you can use some of the 3.0 features
that will be backported to 2.6.
But I think the syntactic conversion approach will work.
I mean, there's no reason that the transformer couldn't
convert Python 2.2 code to 3.0.
It would just sort of--
the subset of Python 2.2 that actually is validly
transformable into 3.0 is slightly smaller.
I would recommend that.
I mean, the bigger nightmare is for developers who have
extension modules, because the C API is going to be--
it's going to be a rougher ride, unfortunately.
Well, if you all aren't exhausted, I certainly am.
So I thank you for staying all the way until the end.
[APPLAUSE]