09-dictionaries.tex

% LaTeX source for ``Python for Informatics: Exploring Information''
% Copyright (c)  2010-  Charles R. Severance, All Rights Reserved

\chapter{Dicionários}
%\chapter{Dictionaries}
\index{dicionário}
% \index{dictionary}

\index{dicionário}
% \index{dictionary}
\index{tipo!dict}
% \index{type!dict}
\index{chave}
% \index{key}
\index{par chave-valor}
% \index{key-value pair}
\index{índice}
% \index{index}


Um {\bf dicionário} é como uma lista, porém mais abrangente. Em uma lista,
os índices devem ser valores inteiros; em um dicionário,
os índices podem ser de qualquer tipo (praticamente).
% A {\bf dictionary} is like a list, but more general.  In a list,
% the index positions have to be integers; in a dictionary,
% the indices can be (almost) any type.

Pode-se considerar um dicionário como um mapeamento entre um conjunto de
índices (chamados de {\bf chaves}) e um conjunto de valores. Cada chave é
mapeada a um valor. A associação entre uma chave e um valor é chamada de
{\bf par chave-valor} ou também como um {\bf item}.
% You can think of a dictionary as a mapping between a set of indices
% (which are called {\bf keys}) and a set of values.  Each key maps to a
% value.  The association of a key and a value is called a {\bf
%   key-value pair} or sometimes an {\bf item}.

Como exemplo, construiremos um dicionário que mapeia palavras inglesas para
palavras em espanhol, portanto chaves e valores são strings.
% As an example, we'll build a dictionary that maps from English
% to Spanish words, so the keys and the values are all strings.

A função {\tt dict} cria um novo dicionário sem itens. Pelo fato de {\tt dict} ser o
nome de uma função padrão da linguagem, esse termo não pode ser usado como nome de variável.
% The function {\tt dict} creates a new dictionary with no items.
% Because {\tt dict} is the name of a built-in function, you
% should avoid using it as a variable name.

\index{funcão dict}
% \index{dict function}
\index{função!dict}
% \index{function!dict}

\beforeverb
\begin{verbatim}
>>> eng2ptbr = dict()
>>> print eng2ptbr
{}
\end{verbatim}
\afterverb
%

Os caracteres chaves, \verb"{}", representam um dicionário vazio.
Colchetes podem ser utilizados para adicionar itens ao dicionário:
% The curly brackets, \verb"{}", represent an empty dictionary.
% To add items to the dictionary, you can use square brackets:

\index{caractere chave}
% \index{squiggly bracket}
\index{caractere!chave}
% \index{bracket!squiggly}

\beforeverb
\begin{verbatim}
>>> eng2ptbr['one'] = 'um'
\end{verbatim}
\afterverb
%

Esta linha cria um item que mapeia da chave {\tt 'one'}
para o valor \verb"'um'". Se exibirmos o dicionário novamente, veremos
um par chave-valor com o caractere dois-pontos entre a chave e o valor:
% This line creates an item that maps from the key
% {\tt 'one'} to the value \verb"'um'".  If we print the
% dictionary again, we see a key-value pair with a colon
% between the key and value:

\beforeverb
\begin{verbatim}
>>> print eng2ptbr
{'one': 'um'}
\end{verbatim}
\afterverb
%

Esse formato de saída também é um formato de entrada. Por exemplo,
pode-se criar um novo dicionário com três itens:
% This output format is also an input format.  For example,
% you can create a new dictionary with three items:

\beforeverb
\begin{verbatim}
>>> eng2ptbr = {'one': 'um', 'two': 'dois', 'three': 'tres'}
\end{verbatim}
\afterverb
%

Mas se exibirmos {\tt eng2ptbr}, podemos nos surpreender:
% But if you print {\tt eng2ptbr}, you might be surprised:

\beforeverb
\begin{verbatim}
>>> print eng2ptbr
{'one': 'um', 'three': 'tres', 'two': 'dois'}
\end{verbatim}
\afterverb
%

A ordem dos pares chave-valor não é a mesma. De fato, se esse mesmo
exemplo for executado em outro computador, um resultado diferente pode
ser obtido. Em linhas gerais, a ordem dos elementos em um dicionário
é imprevisível.
% The order of the key-value pairs is not the same.  In fact, if
% you type the same example on your computer, you might get a
% different result.  In general, the order of items in
% a dictionary is unpredictable.

Entretanto, isso não é um problema, uma vez que os elementos de um dicionário
nunca são indexados por índices inteiros. Ao invés disso, usa-se as chaves para
se buscar os valores correspondentes:
% But that's not a problem because
% the elements of a dictionary are never indexed with integer indices.
% Instead, you use the keys to look up the corresponding values:

\beforeverb
\begin{verbatim}
>>> print eng2ptbr['two']
'dois'
\end{verbatim}
\afterverb
%

A ordem dos itens não importa, já que a chave {\tt 'two'} sempre é mapeada ao valor \verb"'dois'".
% The key {\tt 'two'} always maps to the value \verb"'dois'" so the order
% of the items doesn't matter.

Se a chave não está no dicionário, uma exceção é levantada:
% If the key isn't in the dictionary, you get an exception:

\index{exceção!KeyError}
% \index{exception!KeyError}
\index{KeyError}

\beforeverb
\begin{verbatim}
>>> print eng2ptbr['four']
KeyError: 'four'
\end{verbatim}
\afterverb
%

A função {\tt len} também pode ser usada em dicionários; ela devolve
o número de pares chave-valor:
% The {\tt len} function works on dictionaries; it returns the
% number of key-value pairs:

\index{função len}
% \index{len function}
\index{função!len}
% \index{function!len}

\beforeverb
\begin{verbatim}
>>> len(eng2ptbr)
3
\end{verbatim}
\afterverb
%

Pode-se utilizar o operador {\tt in} para se verificar se algo está representado
como uma \emph{chave} no dicionário (não serve para verificar diretamente
a presença de um valor).
% The {\tt in} operator works on dictionaries; it tells you whether
% something appears as a \emph{key} in the dictionary (appearing
% as a value is not good enough).

\index{membro!dictionário}
% \index{membership!dictionary}
\index{operador in}
% \index{in operator}
\index{operador!in}
% \index{operator!in}

\beforeverb
\begin{verbatim}
>>> 'one' in eng2ptbr
True
>>> 'um' in eng2ptbr
False
\end{verbatim}
\afterverb
%

Para verificar se algo está representado como um valor no dicionário, pode-se usar o método {\tt values}, o qual devolve os valores como uma lista e, desse
modo, o operador {\tt in} pode ser usado:
% To see whether something appears as a value in a dictionary, you
% can use the method {\tt values}, which returns the values as
% a list, and then use the {\tt in} operator:

\index{método values}
% \index{values method}
\index{método!values}
% \index{method!values}

\beforeverb
\begin{verbatim}
>>> vals = eng2ptbr.values()
>>> 'um' in vals
True
\end{verbatim}
\afterverb
%

O operador {\tt in} usa algoritmos diferentes para listas e dicionários. Para
listas é usado um algoritmo de busca linear. Conforme o tamanho da lista aumenta, o tempo de busca aumenta de maneira diretamente proporcional ao tamanho da lista.
Para dicionários, Python usa um algoritmo chamado {\bf tabela de hash}, a qual possui uma propriedade notável---o operador {\tt in} consume a mesma quantidade de tempo para se realizar a busca independente do número de itens existente no dicionário. Aqui não será explicado o porquê das funções de hash serem tão mágicas, mas informações adicionais sobre esse assunto podem ser lidas em \url{pt.wikipedia.org/wiki/Tabela_de_dispersão}.
% The {\tt in} operator uses different algorithms for lists and
% dictionaries.  For lists, it uses a linear search algorithm.
% As the list gets longer, the search time gets
% longer in direct proportion to the length of the list.
% For dictionaries, Python uses an
% algorithm called a {\bf hash table} that has a remarkable property---the
% {\tt in} operator takes about the same amount of time no matter how
% many items there are in a dictionary.  I won't explain
% why hash functions are so magical,
% but you can read more about it at
% \url{wikipedia.org/wiki/Hash_table}.

\index{tabela hash}
% \index{hash table}

\begin{ex}
\label{wordlist2}

\index{definir membro}
% \index{set membership}
\index{membro!definir}
% \index{membership!set}

Escreva um programa que leia as palavras do arquivo {\tt words.txt} e armazene-as como chaves em um dicionário. Os valores não importam. Então, use o operador {\tt in} como uma maneira rápida de verificar se uma string está no dicionário.
% Write a program that reads the words in {\tt words.txt} and
% stores them as keys in a dictionary.  It doesn't matter what the
% values are.  Then you can use the {\tt in} operator
% as a fast way to check whether a string is in
% the dictionary.

\end{ex}

\section{Dicionário como um conjunto de contagens}
%\section{Dictionary as a set of counters}
\label{histogram}

\index{contador}
% \index{counter}

Suponha que dada uma string deseja-se saber quantas vezes aparece cada letra. Há várias maneiras para que isso seja feito:
% Suppose you are given a string and you want to count how many
% times each letter appears.  There are several ways you could do it:

\begin{enumerate}

\item Poderiam ser criadas 26 variáveis, cada uma contendo uma letra do alfabeto. Então, a string poderia ser travessada e, para cada caractere, seria incrementado o contador correspondente, provavelmente utilizando-se operadores condicionais encadeado.
% \item You could create 26 variables, one for each letter of the
% alphabet.  Then you could traverse the string and, for each
% character, increment the corresponding counter, probably using
% a chained conditional.

\item Poderia ser criada uma lista com 26 elementos. Assim, cada caractere poderia ser convertido em um número (usando a função embutida {\tt ord}), o qual seria usado como um índice na lista, e se incrementaria o contador apropriado.
% \item You could create a list with 26 elements.  Then you could
% convert each character to a number (using the built-in function
% {\tt ord}), use the number as an index into the list, and increment
% the appropriate counter.

\item Poderia ser criado um dicionário, onde os caracteres são as chaves e os valores são as contagens correspondentes. Ao se encontrar um caractere pela primeira vez, um item é adicionado ao dicionário. Em seguida, o valor de um dado item seria incrementado.
% \item You could create a dictionary with characters as keys
% and counters as the corresponding values.  The first time you
% see a character, you would add an item to the dictionary.  After
% that you would increment the value of an existing item.

\end{enumerate}

Essas opções realizam a mesma computação, porém cada uma a implementa de um modo diferente.
% Each of these options performs the same computation, but each
% of them implements that computation in a different way.

\index{implementação}
% \index{implementation}

Uma {\bf implementação} é um modo de se executar uma computação; algumas implementações são melhores do que outras. Por exemplo, uma das vantagens de se utilizar a implementação com dicionário é que não há a necessidade de se saber de antemão quais letras aparecem na string, sendo que as letras serão adicionadas ao dicionário conforme for demandado.
% An {\bf implementation} is a way of performing a computation;
% some implementations are better than others.  For example,
% an advantage of the dictionary implementation is that we don't
% have to know ahead of time which letters appear in the string
% and we only have to make room for the letters that do appear.

Eis como o código ficaria:
%Here is what the code might look like:

\beforeverb
\begin{verbatim}
word = 'brontosaurus'
d = dict()
for c in word:
    if c not in d:
        d[c] = 1
    else:
        d[c] = d[c] + 1
print d
\end{verbatim}
\afterverb
%

De fato está sendo construído um {\bf histograma}, que é um termo estatístico para um conjunto de contagens (ou frequências).
% We are effectively computing a {\bf histogram}, which is a statistical
% term for a set of counters (or frequencies).

\index{histograma}
% \index{histogram}
\index{frequência}
% \index{frequency}
\index{travessia}
% \index{traversal}

O laço {\tt for} caminha por toda a string. Em cada iteração, se o caractere {\tt c} não está no dicionário, cria-se um novo item com chave {\tt c} e valor inicial 1 (já que essa letra foi encontrada um vez). Se {\tt c} já está no dicionário, o valor {\tt d[c]} é incrementado.
% The {\tt for} loop traverses
% the string.  Each time through the loop, if the character {\tt c} is
% not in the dictionary, we create a new item with key {\tt c} and the
% initial value 1 (since we have seen this letter once).  If {\tt c} is
% already in the dictionary we increment {\tt d[c]}.

\index{histograma}
% \index{histogram}

Eis a saída do programa:
%Here's the output of the program:

\beforeverb
\begin{verbatim}
{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}
\end{verbatim}
\afterverb
%

O histograma indica que as letras {\tt 'a'} e \verb"'b'" aparecem uma vez; \verb"'o'" aparece duas vezes, e assim por diante.
% The histogram indicates that the letters {\tt 'a'} and \verb"'b'"
% appear once; \verb"'o'" appears twice, and so on.

\index{método get}
% \index{get method}
\index{método!get}
% \index{method!get}

Dicionários têm um método chamado {\tt get}, que recebe como argumento uma chave e um valor padrão. Se a chave se encontra no dicionário, {\tt get} devolve o valor correspondente; caso contrário, devolve o valor padrão. Por exemplo:
% Dictionaries have a method called {\tt get} that takes a key
% and a default value.  If the key appears in the dictionary,
% {\tt get} returns the corresponding value; otherwise it returns
% the default value.  For example:

\beforeverb
\begin{verbatim}
>>> counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
>>> print counts.get('jan', 0)
100
>>> print counts.get('tim', 0)
0
\end{verbatim}
\afterverb
%

O método {\tt get} pode ser usado para escrever o histograma de maneira mais concisa.
Pelo fato de {\tt get} automaticamente lidar com a ausência de uma chave no dicionário, quatro linhas de código podem ser reduzidas para uma e o bloco {\tt if} pode ser removido.
% We can use {\tt get} to write our histogram loop more concisely.
% Because the {\tt get} method automatically handles the case where a key
% is not in a dictionary, we can reduce four lines down to one
% and eliminate the {\tt if} statement.

\beforeverb
\begin{verbatim}
word = 'brontosaurus'
d = dict()
for c in word:
    d[c] = d.get(c,0) + 1
print d
\end{verbatim}
\afterverb
%

O uso do método {\tt get} para simplificar esse laço de contagem é um ``idiomatismo'' comum em Python e será usado diversas vezes no decorrer do livro. Desse modo, vale a pena dedicar um tempo e comparar o laço usando {\tt if} e o operador {\tt in} com o laço usando o método {\tt get}. Eles fazem exatamente a mesma coisa, mas o segundo é mais sucinto.
% The use of the {\tt get} method to simplify this counting loop
% ends up being a very commonly used ``idiom'' in Python and
% we will use it many times in the rest of the book.   So you should
% take a moment and compare the loop using the {\tt if} statement
% and {\tt in} operator with the loop using the {\tt get} method.
% They do exactly the same thing, but one is more succinct.
\index{idiomatismo}
% \index{idiom}

\section{Dicionários e arquivos}
%\section{Dictionaries and files}

Um dos usos comuns de dicionários é na contagem da ocorrência de palavras em arquivos de texto.
Comecemos com um arquivo muito simples contendo palavras extraídas de \emph{Romeu e Julieta}.
% One of the common uses of a dictionary is to count the occurrence
% of words in a file with some written text.
% Let's start with a very simple file of
% words taken from the text of \emph{Romeo and Juliet}.

Para os primeiros exemplos, usaremos uma versão mais curta e simplificada do texto, sem pontuações. Em seguida, trabalharemos com o texto da cena com as pontuações incluídas.
% For the first set of examples, we will use a shortened and simplified version
% of the text with no punctuation.  Later we will work with the text of the
% scene with punctuation included.

\beforeverb
\begin{verbatim}
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
\end{verbatim}
\afterverb
%
Escreveremos um programa em Python, que lerá as linhas do arquivo, transformará cada linha em uma lista de palavras e, então, iterará sobre cada palavra na linha contando-a usando um dicionário.
% We will write a Python program to read through the lines of the file,
% break each line into a list of words, and then loop through each
% of the words in the line and count each word using a dictionary.

\index{laços aninhados}
% \index{nested loops}
\index{laço!aninhado}
% \index{loop!nested}
Veremos que temos dois laços {\tt for}. O laço externo lê as linhas do arquivo e o interno percorre cada palavra de uma linha em particular. Este é um exemplo de um padrão chamado {\bf laços aninhados} porque um dos laços é \emph{externo} e o outro é \emph{interno}.
% You will see that we have two {\tt for} loops.  The outer loop is reading the
% lines of the file and the inner loop is iterating through each
% of the words on that particular line.  This is an example
% of a pattern called {\bf nested loops} because one of the loops
% is the \emph{outer} loop and the other loop is the \emph{inner}
% loop.

Pelo fato do laço interno executar todas suas iterações para cada uma que o laço externo faz, diz-se que o laço interno itera ``mais rapidamente'' ao passo que o externo itera mais lentamente.
% Because the inner loop executes all of its iterations each time
% the outer loop makes a single iteration, we think of the inner
% loop as iterating ``more quickly'' and the outer loop as iterating
% more slowly.

\index{Romeu e Julieta}
% \index{Romeo and Juliet}
A combinação dos laços aninhados garante que contaremos todas as palavra de todas as linhas do arquivo de entrada.
% The combination of the two nested loops ensures that we will count
% every word on every line of the input file.

\beforeverb
\begin{verbatim}
fname = raw_input('Digite o nome do arquivo: ')
try:
    fhand = open(fname)
except:
    print 'Arquivo nao pode ser aberto:', fname
    exit()

counts = dict()
for line in fhand:
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1

print counts
\end{verbatim}
\afterverb
%
Quando rodamos o programa, vemos o resultado bruto das contagens de modo não sorteado.
(o arquivo {\tt romeo.txt} está disponível em \url{www.py4inf.com/code/romeo.txt})
% When we run the program, we see a raw dump of all of the counts in unsorted
% hash order.
% (the {\tt romeo.txt} file is available at
% \url{www.py4inf.com/code/romeo.txt})

\beforeverb
\begin{verbatim}
python count1.py
Digite o nome do arquivo: romeo.txt
{'and': 3, 'envious': 1, 'already': 1, 'fair': 1,
'is': 3, 'through': 1, 'pale': 1, 'yonder': 1,
'what': 1, 'sun': 2, 'Who': 1, 'But': 1, 'moon': 1,
'window': 1, 'sick': 1, 'east': 1, 'breaks': 1,
'grief': 1, 'with': 1, 'light': 1, 'It': 1, 'Arise': 1,
'kill': 1, 'the': 3, 'soft': 1, 'Juliet': 1}
\end{verbatim}
\afterverb
%
É um tanto quanto inconveniente procurar visualmente em um dicionário por palavras mais comuns e suas contagens. Desse modo, precisamos adicionar mais código Python para obter um resultado que seja mais útil.
% It is a bit inconvenient to look through the dictionary to find the
% most common words and their counts, so we need to add some more
% Python code to get us the output that will be more helpful.

\section{Laços de repetição e dicionário}

\index{dicionário!laço de repetição com}
% \index{dictionary!looping with}
\index{laço de repetição!com dicionários}
% \index{looping!with dictionaries}
\index{travessia}
% \index{traversal}

Se um dicionário for usado como a sequência em um bloco {\tt for}, esse iterará sobre as chaves do dicionário. Este laço exibe cada chave e o valor correspondente:
% If you use a dictionary as the sequence
% in a {\tt for} statement, it traverses
% the keys of the dictionary.  This loop
% prints each key and the corresponding value:

\beforeverb
\begin{verbatim}
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    print key, counts[key]
\end{verbatim}
\afterverb
%
Que resulta em:
% Here's what the output looks like:

\beforeverb
\begin{verbatim}
jan 100
chuck 1
annie 42
\end{verbatim}
\afterverb
%
Mais uma vez, as chaves não respeitam nenhum tipo de ordenamento.
% Again, the keys are in no particular order.

\index{estilo}
% \index{idiom}
Podemos usar este padrão para implementar os diferentes estilos de laço que foram descritos anteriormente. Por exemplo, se quiséssemos encontrar todas as entradas em um dicionário com valor acima de dez, poderíamos escrever o seguinte código:
% We can use this pattern to implement the various loop idioms
% that we have described earlier.  For example if we wanted
% to find all the entries in a dictionary with a value
% above ten, we could write the following code:

\beforeverb
\begin{verbatim}
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    if counts[key] > 10 :
        print key, counts[key]
\end{verbatim}
\afterverb
%
O laço {\tt for} itera pelas {\em chaves} do dicionário, então devemos usar o operador de índice para obter o {\em valor} correspondente para cada chave.
Eis o resultado da execução:
% The {\tt for} loop iterates through the
% {\em keys} of the dictionary, so we must
% use the index operator to retrieve the
% corresponding {\em value}
% for each key.
% Here's what the output looks like:

\beforeverb
\begin{verbatim}
jan 100
annie 42
\end{verbatim}
\afterverb
%
Vemos apenas as entradas com valor acima de dez.
% We see only the entries with a value above 10.

\index{método keys}
% \index{keys method}
\index{método!keys}
% \index{method!keys}

Para exibir as chaves em ordem alfabética, deve-se gerar uma lista das chaves do dicionário por meio do método {\tt keys}, disponível em objetos dicionário, e então ordenar essa lista. Em seguida, itera-se pela lista ordenada, procurando cada chave e exibindo os pares chave-valor de modo ordenado, como em:
% If you want to print the keys in alphabetical order, you first
% make a list of the keys in the dictionary using the
% {\tt keys} method available in dictionary objects,
% and then sort that list
% and loop through the sorted list, looking up each
% key and printing out key-value pairs in sorted order
% as follows:

\beforeverb
\begin{verbatim}
counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
lst = counts.keys()
print lst
lst.sort()
for key in lst:
    print key, counts[key]
\end{verbatim}
\afterverb
%
O que gera a seguinte saída:
% Here's what the output looks like:

\beforeverb
\begin{verbatim}
['jan', 'chuck', 'annie']
annie 42
chuck 1
jan 100
\end{verbatim}
\afterverb
%
Primeiramente, pode-se ver a lista não ordenada das chaves, obtida pelo método {\tt keys}. Em seguida, vemos os pares chave-valor gerados no laço {\tt for}.
% First you see the list of keys in unsorted order that
% we get from the {\tt keys} method.  Then we see the key-value
% pairs in order from the {\tt for} loop.

\section{Processamento avançado de texto}
%\section{Advanced text parsing}

\index{Romeu e Julieta}
%\index{Romeo and Juliet}

No exemplo acima, no qual usamos o arquivo {\tt romeo.txt}, todas as pontuações foram removidas para tornar o texto o mais simples possível. O texto original possui muitas pontuações, como mostrado abaixo.
% In the above example using the file {\tt romeo.txt},
% we made the file as simple as possible by removing
% all punctuation by hand.  The actual text
% has lots of punctuation, as shown below.

\beforeverb
\begin{verbatim}
But, soft! what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon,
Who is already sick and pale with grief,
\end{verbatim}
\afterverb
%
Uma vez que a função do Python {\tt split} procura por espaços e trata palavras como tokens separados por espaços, as palavras ``soft!'' e ``soft'' seriam tratadas como \emph{diferentes} e seriam criadas entradas separadas no dicionário para cada uma delas.
% Since the Python {\tt split} function looks for spaces and
% treats words as tokens separated by spaces, we would treat the
% words ``soft!'' and ``soft'' as \emph{different} words and create
% a separate dictionary entry for each word.

Além disso, como o arquivos possui letras capitalizadas, as palavras ``who'' e ``Who'' seriam tratadas como diferentes e teriam contagens diferentes.
% Also since the file has capitalization, we would treat
% ``who'' and ``Who'' as different words with different
% counts.

Podemos solucionar ambos os problemas usando os métodos de string {\tt lower}, {\tt punctuation} e {\tt translate}. Dentre esses três o método {\tt translate} é o mais complexo. Eis a documentação para {\tt translate}:
% We can solve both these problems by using the string
% methods {\tt lower}, {\tt punctuation}, and {\tt translate}.  The
% {\tt translate} is the most subtle of the methods.
% Here is the documentation for {\tt translate}:

\verb"string.translate(s, table[, deletechars])"

\emph{Deleta todos os caracteres de s que estão em deletechars (se presente) e traduz os caracteres usando table, que deve ser uma string com o comprimento de 256 caracteres fornecendo a tradução para cada valor de caractere, indexado pelo sua posição. Se table é None, então apenas a deleção de caracteres é realizada.}
% \emph{Delete all characters from s that are in deletechars (if present),
% and then translate the characters using table, which must
% be a 256-character string giving the translation for each
% character value, indexed by its ordinal. If table is None,
% then only the character deletion step is performed.}

Não iremos especificar o parâmetro {\tt table}, mas iremos usar {\tt deletechars} para deletar todas as pontuações. Iremos utilizar a lista de caracteres que o próprio Python considera como ``pontuação'':
% We will not specify the {\tt table} but we will use
% the {\tt deletechars} parameter to delete all of the punctuation.
% We will even let Python tell us the list of characters
% that it considers ``punctuation'':

\beforeverb
\begin{verbatim}
>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
\end{verbatim}
\afterverb
%
Faremos a seguinte modificação em nosso programa:
% We make the following modifications to our program:

\beforeverb
\begin{verbatim}
import string                                          # New Code

fname = raw_input('Digite o nome do arquivo: ')
try:
    fhand = open(fname)
except:
    print 'Arquivo nao pode ser aberto:', fname
    exit()

counts = dict()
for line in fhand:
    line = line.translate(None, string.punctuation)    # New Code
    line = line.lower()                                # New Code
    words = line.split()
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1

print counts
\end{verbatim}
\afterverb
%
O programa se manteve praticamente o mesmo,com a exceção de que usamos {\tt translate} para remover todas as pontuações e {\tt lower} para tornar a linha em caixa baixa.
Note que para Python 2.5 e versões anteriores, {\tt translate} não aceita {\tt None} como primeiro parâmetro. Então, use este código para chamar translate:
% We use {\tt translate} to remove all punctuation and {\tt lower} to
% force the line to lowercase.  Otherwise the program is unchanged.
% Note that for Python 2.5 and earlier, {\tt translate} does not
% accept {\tt None} as the first parameter so use this code instead
% for the translate call:

% OBSERVAÇÂO: Na versão original falta fechar o parênteses após 'punctuation'
\beforeverb
\begin{verbatim}
print a.translate(string.maketrans(' ',' '), string.punctuation)
\end{verbatim}
\afterverb
% \beforeverb
% \begin{verbatim}
% print a.translate(string.maketrans(' ',' '), string.punctuation
% \end{verbatim}
% \afterverb
%
Parte de aprender a ``Arte do Python'' ou ``Pensar pythonicamente'' está em
perceber que Python geralmente tem capacidades embutidas para analisar muitos
dados de problemas comuns. No decorrer do tempo, vê-se exemplos de código e
documentação suficientes para se saber onde procurar para ver se alguém já escreveu alguma coisa que faça seu trabalho mais fácil.
% Part of learning the ``Art of Python'' or ``Thinking Pythonically''
% is realizing that Python
% often has built-in capabilities for many common data analysis
% problems.  Over time, you will see enough example code and read
% enough of the documentation to know where to look to see if someone
% has already written something that makes your job much easier.

A seguir esté um versão abreviada da saída:
% The following is an abbreviated version of the output:

\beforeverb
\begin{verbatim}
Digite o nome do arquivo: romeo-full.txt
{'swearst': 1, 'all': 6, 'afeard': 1, 'leave': 2, 'these': 2,
'kinsmen': 2, 'what': 11, 'thinkst': 1, 'love': 24, 'cloak': 1,
a': 24, 'orchard': 2, 'light': 5, 'lovers': 2, 'romeo': 40,
'maiden': 1, 'whiteupturned': 1, 'juliet': 32, 'gentleman': 1,
'it': 22, 'leans': 1, 'canst': 1, 'having': 1, ...}
\end{verbatim}
\afterverb
%
Buscar informações nessa saída ainda é difícil e podemos usar Python para nos fornecer exatamente o que estamos procurando; contudo, para tanto, precisamos aprender sobre as {\bf tuplas} do Python. Retornaremos a esse exemplo uma vez que aprendermos sobre tuplas.
% Looking through this output is still unwieldy and we can use
% Python to give us exactly what we are looking for, but to do
% so, we need to learn about Python {\bf tuples}.  We
% will pick up this example once we learn about tuples.

\section{Depuração}
% \section{Debugging}
\index{depuração}
% \index{debugging}

Conforme se trabalha com conjuntos de dados maiores, pode ser difícil de depurá-los por exibição e checagem à mão. Eis algumas sugestões para depuração de conjuntos de dados grandes:
% As you work with bigger datasets it can become unwieldy to
% debug by printing and checking data by hand.  Here are some
% suggestions for debugging large datasets:

\begin{description}

\item[Reduza a entrada:] Se possível, reduza o tamanho do conjunto de dados.
Por exemplo, se o programa lê um arquivo de texto, comece com apenas 10 linhas,
ou com o menor exemplo que pode ser construído. Pode-se ainda editar os próprios arquivos, ou (melhor) modificar o programa de tal modo a ler apenas as {\tt n} linhas.
% \item[Scale down the input:] If possible, reduce the size of the
% dataset.  For example if the program reads a text file, start with
% just the first 10 lines, or with the smallest example you can find.
% You can either edit the files themselves, or (better) modify the
% program so it reads only the first {\tt n} lines.

Se houver um erro, pode-se reduzir {\tt n} até o menor valor que manifesta o erro, e, então, aumentá-lo gradualmente conforme se encontra e se corrige os erros.
% If there is an error, you can reduce {\tt n} to the smallest
% value that manifests the error, and then increase it gradually
% as you find and correct errors.

\item[Verificar sumários e tipos:] Ao invés de exibir e verificar o conjunto de dados por completo, considera-se exibir sumarizações dos dados: por exemplo, o número de itens em um dicionário ou o total de uma lista de números.
% \item[Check summaries and types:] Instead of printing and checking the
% entire dataset, consider printing summaries of the data: for example,
% the number of items in a dictionary or the total of a list of numbers.

Valores que não são do tipo correto são uma causa comum de erros de execução.
Para depurar esse tipo de erro, geralmente basta exibir o tipo dos valores em questão.
% A common cause of runtime errors is a value that is not the right
% type.  For debugging this kind of error, it is often enough to print
% the type of a value.

\item[Escreva auto-verificações:] Há momentos em que se pode escrever código para verificar erros automaticamente. Por exemplo, se está calculando-se a média de uma lista de números, pode-se verificar se o resultado não é maior que o maior valor na lista nem menor que o menor valor. Isso é chamado de ``verificação de sanidade'' porque ele detecta resultados que sejam ``completamente ilógicos''.
% \item[Write self-checks:]  Sometimes you can write code to check
% for errors automatically.  For example, if you are computing the
% average of a list of numbers, you could check that the result is
% not greater than the largest element in the list or less than
% the smallest.  This is called a ``sanity check'' because it detects
% results that are ``completely illogical''.

\index{verificação de sanidade}
% \index{sanity check}
\index{verificação de consistência}
% \index{consistency check}

Há outro tipo de teste que compara resultados de duas computações diferentes para ver se esses são consistentes. Tal verificação é chamada de ``verificação de consistência''
% Another kind of check compares the results of two different
% computations to see if they are consistent.  This is called a
% ``consistency check''.

\item[Exiba saídas de maneira aprazível:] Formatar a saída da depuração pode fazer com que seja mais fácil de se detectar erros.
% \item[Pretty print the output:] Formatting debugging output
% can make it easier to spot an error.

\end{description}

Novamente, tempo gasto construindo arcabouços pode reduzir o tempo gasto com depuração.
% Again, time you spend building scaffolding can reduce
% the time you spend debugging.

\index{arcabouço}
% \index{scaffolding}

% Observação: A ordem dos itens na sessão a seguir foi mudada para que se mantivesse a ordenação alfabética após a tradução do inglês para o português
\section{Glossário}
% \section{Glossary}

\begin{description}

  \item[busca:] Uma operação de dicionário que encontra um valor a partir de uma dada chave.
  % \item[lookup:] A dictionary operation that takes a key and finds
  % the corresponding value.
  \index{busca}
  % \index{lookup}

  \item[chave:] Um objeto que aparece em um dicionário como a primeira parte de um par chave-valor.
  % \item[key:] An object that appears in a dictionary as the
  % first part of a key-value pair.
  \index{chave}
  % \index{key}

\item[dicionário:] Um mapeamento entre um conjunto de chaves e seus valores correspondentes.
% \item[dictionary:] A mapping from a set of keys to their
% corresponding values.
\index{dicionário}
% \index{dictionary}

\item[função de hash:] A função usada por uma tabela de hash para calcular a posição de uma chave.
% \item[hash function:] A function used by a hashtable to compute the
% location for a key.
\index{função de hash}
% \index{hash function}

\item[histograma:] Um conjunto de contagens.
% \item[histogram:] A set of counters.
\index{histograma}
% \index{histogram}

\item[implementação:] Uma maneira de se realizar uma computação.
% \item[implementation:] A way of performing a computation.
\index{implementação}
% \index{implementation}

\item[item:] Outro nome para um par chave-valor.
% \item[item:] Another name for a key-value pair.
\index{item!dicionário}
% \index{item!dictionary}

\item[laços aninhados:] Quando há um ou mais laços ``dentro'' de outro laço. O laço interno é executado completamente para cada execução do laço externo.
% \item[nested loops:] When there are one or more loops ``inside'' of
% another loop.  The inner loop runs to completion each time the outer
% loop runs once.
\index{laços aninhados}
% \index{nested loops}
\index{laço!aninhado}
% \index{loop!nested}

\item[par chave-valor:] A representação de um mapeamento de uma chave a um valor.
% \item[key-value pair:] The representation of the mapping from
% a key to a value.
\index{par chave-valor}
% \index{key-value pair}

\item[tabela de hash:] O algoritmo usado para implementar os dicionários de Python.
% \item[hashtable:] The algorithm used to implement Python
% dictionaries.
\index{tabela de hash}
% \index{hashtable}

\item[valor:] Um objeto que aparece em um dicionário como a segunda parte em um par chave-valor. Esse é mais específico do que nosso uso anterior da palavra ``valor''.
% \item[value:] An object that appears in a dictionary as the
% second part of a key-value pair.  This is more specific than
% our previous use of the word ``value''.
\index{valor}
% \index{value}

\end{description}

\section{Exercícios}
% \section{Exercises}

\begin{ex}

Escreva um programa que categorize cada mensagem de e-mail pelo dia da semana que o commit (\url{https://pt.wikipedia.org/wiki/Commit}) foi feito. Para tanto, procure por linhas que comecem com ``From'', então busque pela terceira palavra e mantenha um procedimento de contagem para cada dia da semana. Ao final do programa, exiba o conteúdo do dicionário (ordem não importa).
% Write a program that categorizes each mail message by which
% day of the week the commit was done. To do this look for
% lines that start with ``From'', then look for the
% third word and keep a running count of each of the
% days of the week. At the end of the program print out the
% contents of your dictionary (order does not matter).

\beforeverb
\begin{verbatim}

Amostra de linha:
From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008

Amostra de execução:
python dow.py
Enter a file name: mbox-short.txt
{'Fri': 20, 'Thu': 6, 'Sat': 1}
\end{verbatim}
\afterverb
% \beforeverb
% \begin{verbatim}
%
% Sample Line:
% From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
%
% Sample Execution:
% python dow.py
% Enter a file name: mbox-short.txt
% {'Fri': 20, 'Thu': 6, 'Sat': 1}
% \end{verbatim}
% \afterverb
\end{ex}

\begin{ex}
Escreva um programa que leia um log (\url{https://pt.wikipedia.org/wiki/Log_de_dados}) de correio eletrônico, escreva um histograma usando um dicionário para contar quantas mensagens vieram de cada endereço de e-mail e, por fim, exiba o dicionário.
% Write a program to read through a mail log,
% build a histogram using a dictionary to count how many
% messages have come from each email address,
% and print the dictionary.

\beforeverb
\begin{verbatim}
Enter file name: mbox-short.txt
{'gopal.ramasammycook@gmail.com': 1, 'louis@media.berkeley.edu': 3,
'cwen@iupui.edu': 5, 'antranig@caret.cam.ac.uk': 1,
'rjlowe@iupui.edu': 2, 'gsilver@umich.edu': 3,
'david.horwitz@uct.ac.za': 4, 'wagnermr@iupui.edu': 1,
'zqian@umich.edu': 4, 'stephen.marquard@uct.ac.za': 2,
'ray@media.berkeley.edu': 1}
\end{verbatim}
\afterverb
\end{ex}

\begin{ex}
Insira código no programa acima para descobrir quem tem mais mensagens no arquivo.
% Add code to the above program to figure out who has the
% most messages in the file.

Após todos os dados terem sido lidos e o dicionário criado, percorra o dicionário usando um laço de máximo (veja Sessão~\ref{maximumloop}) para encontrar quem tem mais mensagens e exiba quantas mensagens existem para essa pessoa.
% After all the data has been read and the dictionary has been
% created, look through the dictionary using a maximum loop
%  (see Section~\ref{maximumloop})
% to find who has the most
% messages and print how many messages the person has.

\beforeverb
\begin{verbatim}
Enter a file name: mbox-short.txt
cwen@iupui.edu 5

Enter a file name: mbox.txt
zqian@umich.edu 195
\end{verbatim}
\afterverb
\end{ex}

\begin{ex}
Este programa leva em consideração o nome do domínio (ao invés do endereço) de onde a mensagem foi mandada e não de quem essa veio (isto é, o endereço de e-mail inteiro). Ao final do programa, exiba o conteúdo do dicionário.
% This program records the domain name (instead of the address)
% where the message was sent from instead of who the mail
% came from (i.e., the whole email address). At the end
% of the program, print out the contents of your dictionary.

\beforeverb
\begin{verbatim}
python schoolcount.py
Enter a file name: mbox-short.txt
{'media.berkeley.edu': 4, 'uct.ac.za': 6, 'umich.edu': 7,
'gmail.com': 1, 'caret.cam.ac.uk': 1, 'iupui.edu': 8}
\end{verbatim}
\afterverb
\end{ex}