CSC401: Strings and Lists

Strings

An immutable sequence of characters

No separate character type

Immutable: cannot be modified in-place

Safety

Efficiency

Sequence: can be indexed

Indices start at zero

Built-in function len() returns the length of a string

String Indexing

element = "boron"
i = 0
while i < len(element):
    print element[i]
    i += 1
b
o
r
o
n

Slicing

a[start:end] is the elements of a from start up to (but not including) end

Like the C/C++/Java loop: for (i=0; i<N; ++i)

val = "helium"
print val[1:3], val[:2], val[4:]
el he um      # Note that the lower index is inclusive, but the upper index is not

Bounds

Bounds always checked for item access

See how to handle errors later

But out-of-range slice indices are truncated

Inconsistent, but convenient

val = "helium"
print val[1:22]
x = val[22]
elium
IndexError: string index out of range

Negative Indices

Negative indices count backward from the end of the string

x[-1] is the last character

x[-2] is the second-to-last character

Negative Indexing Example

val = "carbon"
print val[-2], val[-4], val[-6]
print val[1:-1]
print val[-1:1]
o r c
arbo
# the empty string

Immutable

The contents of strings can't be modified

greeting = "hello world"
greeting[0:4] = "goodbye cruel"
TypeError: object doesn't support slice assignment

But we can put a new string inside the greeting variable

How would we rewrite the above program to actually work?

greeting = "hello world"
greeting = "goodbye cruel" + greeting[5:]
print greeting
goodbye cruel world

String Methods

Strings are objects, with methods

Yes, it does look a lot like Java, doesn't it?

Not so much convergent evolution as convergent laziness

s.capitalize() Capitalize first letter.
s.lower() Convert all letters to lower case.
s.strip() Remove leading and trailing whitespace.
s.rstrip() Remove trailing (right-hand) whitespace.
s.upper() Convert all letters to upper case.
s.count(pat, start, end) Count occurrences of pat;
start and end are optional.
s.find(pat, start, end) Return index of first occurrence of pat, or -1;
start and end are optional.
s.replace(old, new, limit) Replace occurrences of old with new;
limit is optional.

Lists

A mutable sequence of objects

Like a resizeable vector

Literal arrays: [], [3], [5, "b"]

Empty list is false

Indexed just like strings

x = ["a", 2, "bcd"]
print x[0], x[-1], x[1:-2]
a bcd []

Updating Lists

Can modify lists by assigning to their elements

Unlike strings

x = ["a", "b", "c", "d"]
i = 0
while i < len(x):
    x[i] = i
    i += 1
print x
[0, 1, 2, 3]

Nesting Lists

Lists of lists of lists of...

Numeric library gives true multi-dimensional arrays

Index from the outside in

Can write nested lists directly: [[1, 2], [3, 4]]

x = [[13, 17, 19], [23, 29]]
print x[1]
print x[0][1:3]
[23, 29]
[17, 19]

Indexing Hands Back Content

Nested lists are objects in their own right

Outer list points to inner list

[Nested Lists Diagram]

x = [["a", "b"], ["c", "d"]]
y = x[0]
y[0] = 123
print y
print x
[123, "b"]
[[123, "b"], ["c", "d"]]

Adding and Splicing Lists

Adding lists concatenates them

Yes, you can multiply a list by an integer

Assigning to a slice splices the lists

Replace (possibly empty) section of list with (possibly empty) list

x  = ["a", "b"] + ["c", "d"]
print x
x[1:2] = ["x", "y", "z"]
print x
["a", "b", "c", "d"]
["a", "x", "y", "z", "c", "d"]

More on Splicing

Contents of a slice must be a list

x = ["a", "b", "c"]
x[1:2] = "z"
TypeError: must assign list (not 'str') to slice

Splicing in the empty list removes elements

x = ["a", "b", "c", "d"]
x[1:3] = []
print x
["a", "d"]

Slicing Creates a New Object

Not an alias for a subsection of an existing list

x = ["a", "b", "c", "d"]
y = x[0:2]
y[0] = 123
print y
print x
[123, "b"]
["a", "b", "c", "d"]

For Loops

Python's for loops over the contents of a collection of objects

for elt in coll sets elt to each element of coll in turn

for c in "lead":
    print "[" + c + "]",
 [l] [e] [a] [d]

Note: trailing comma in print newline

Ranges

So how do you loop from 0 to N?

Built-in function range(a,b) creates [a, a+1, ..., b-1]

range(x) is the same as range(0,x)

range(a,b,s) goes in increments of s

May generate empty list

print range(3)
print range(2, 10, 3)
print range(3, 1)
[0, 1, 2]
[2, 5, 8]
[]

Ranges and Loops

Often use range(len(x)) in loop

len() gives upper bound

range() gives loop indices

chars = "abc"
for i in range(len(chars)):
    print i, chars[i]
0 a
1 b
2 c

More Examples of Ranges and Loops

for i in range(1,4):
    print "abcd"
abcd
abcd
abcd
for i in range(1,4):
    print "abcd", "mnp"
abcd mnp
abcd mnp
abcd mnp  # (note the space caused by having two separate arguments for print)
for i in range(1,4):
    print "abcd",
abcd abcd abcd  # we used print without newlines!

And with files ...

This:

inputfile = file("a.txt","r")
line = inputfile.readline()
while line:
    sys.stdout.write(line)
    line = inputfile.readline()
inputfile.close()

Can also be written as:

inputfile = file("a.txt","r")
for line in inputfile:
    sys.stdout.write(line)
    line = inputfile.readline()
inputfile.close()

Use xrange For Efficiency

xrange() is a generator

Creates values on demand

Much more efficient for large ranges

total = 0
for i in xrange(1000):
    total += i
print total
499500

Difference between xrange() and range()x = range(3) print x [0,1,2] x = xrange(3) print x xrange(3)

Breaking and Continuing

Jump out of loop at any time using break statement

Only exits one level of loop

Use continue to skip immediately to next iteration of loop

Python (and Java) inherited both of these from C

grades = [88 58 23 -19 -77 -15 56 11]
for g in grades:
  if g < 0:
    break
print g
-19
Hopefully your grades will be better than these ones

Membership

x in c is 1 if the value x is in the collection c

Works on all collections

Uses linear search on sequences

vowels = "aeiou"
for v in vowels:
  if v in "uranium":
      print v
a
i
u

One More Trick

Python supports multi-valued assignment

a, b = 2, 3 does what you would expect

a, b = b, a swaps the values in a and b

Can be used in for loops to unpack structures on the fly

input = [[1, 2], [3, 4], [5, 6]]
output = []
for (first, second) in input:
    output.append([second, first])
print output
[[2, 1], [4, 3], [6, 5]]

Slides originally created by Greg Wilson. Initial adaptation for CSC401 by David James. Revisions by Michelle Craig, Michael Szamosi, Karen Reid, and David James. Revisions for CSC401 Winter 2006 by Cosmin Munteanu.