{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Manipulação de dados - II" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## *DataFrames*\n", "\n", "Como dissemos anterioremente, o *DataFrame* é a segunda estrutura basilar do *pandas*. Um *DataFrame*:\n", "- é uma tabela, ou seja, é bidimensional;\n", "- tem cada coluna formada como uma *Series* do *pandas*;\n", "- pode ter *Series* contendo tipos de dado diferentes." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Criação de um *DataFrame*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "O método padrão para criarmos um *DataFrame* é através de uma função com mesmo nome.\n", "\n", "```python\n", "df_exemplo = pd.DataFrame(dados_de_interesse, index = indice_de_interesse, \n", " columns = colunas_de_interesse)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Ao criar um *DataFrame*, podemos informar\n", "- `index`: rótulos para as linhas (atributos *index* das *Series*).\n", "- `columns`: rótulos para as colunas (atributos *name* das *Series*)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "No _template_, `dados_de_interesse` pode ser\n", "\n", "* um dicionário de:\n", " * *arrays* unidimensionais do *numpy*;\n", " * listas;\n", " * dicionários;\n", " * *Series* do *pandas*.\n", "* um *array* bidimensional do *numpy*;\n", "* uma *Series* do *Pandas*;\n", "* outro *DataFrame*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### *DataFrame* a partir de dicionários de *Series*\n", "\n", "Neste método de criação, as *Series* do dicionário não precisam possuir o mesmo número de elementos. O *index* do *DataFrame* será dado pela **união** dos *index* de todas as *Series* contidas no dicionário." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplo:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "serie_Idade = pd.Series({'Ana':20, 'João': 19, 'Maria': 21, 'Pedro': 22}, name=\"Idade\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "serie_Peso = pd.Series({'Ana':55, 'João': 80, 'Maria': 62, 'Pedro': 67, 'Túlio': 73}, name=\"Peso\")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "serie_Altura = pd.Series({'Ana':162, 'João': 178, 'Maria': 162, 'Pedro': 165, 'Túlio': 171}, name=\"Altura\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "dicionario_series_exemplo = {'Idade': serie_Idade, 'Peso': serie_Peso, 'Altura': serie_Altura}" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "df_dict_series = pd.DataFrame(dicionario_series_exemplo)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana20.055162
João19.080178
Maria21.062162
Pedro22.067165
TúlioNaN73171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20.0 55 162\n", "João 19.0 80 178\n", "Maria 21.0 62 162\n", "Pedro 22.0 67 165\n", "Túlio NaN 73 171" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compare este resultado com a criação de uma planilha pelos métodos usuais. Veja que há muita flexibilidade para criarmos ou modificarmos uma tabela.\n", "\n", "Vejamos exemplos sobre como acessar intervalos de dados na tabela." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
Maria2162162
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "Maria 21 62 162" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(dicionario_series_exemplo, index=['Ana','Maria'])" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PesoAltura
Ana55162
Maria62162
\n", "
" ], "text/plain": [ " Peso Altura\n", "Ana 55 162\n", "Maria 62 162" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(dicionario_series_exemplo, index=['Ana','Maria'], columns=['Peso','Altura'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Neste exemplo, adicionamos a coluna `IMC`, ainda sem valores calculados." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PesoAlturaIMC
Ana55.0162.0NaN
Maria62.0162.0NaN
PaulaNaNNaNNaN
\n", "
" ], "text/plain": [ " Peso Altura IMC\n", "Ana 55.0 162.0 NaN\n", "Maria 62.0 162.0 NaN\n", "Paula NaN NaN NaN" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(dicionario_series_exemplo, index=['Ana','Maria','Paula'], \n", " columns=['Peso','Altura','IMC'])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "df_exemplo_IMC = pd.DataFrame(dicionario_series_exemplo, \n", " columns=['Peso','Altura','IMC'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Agora, mostramos como os valores do IMC podem ser calculados diretamente por computação vetorizada sobre as *Series*." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "df_exemplo_IMC['IMC']=round(df_exemplo_IMC['Peso']/(df_exemplo_IMC['Altura']/100)**2,2)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PesoAlturaIMC
Ana5516220.96
João8017825.25
Maria6216223.62
Pedro6716524.61
Túlio7317124.96
\n", "
" ], "text/plain": [ " Peso Altura IMC\n", "Ana 55 162 20.96\n", "João 80 178 25.25\n", "Maria 62 162 23.62\n", "Pedro 67 165 24.61\n", "Túlio 73 171 24.96" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo_IMC" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### *DataFrame* a partir de dicionários de listas ou *arrays* do *numpy*\n", "\n", "Neste método de criação, os *arrays* ou as listas **devem** possuir o mesmo comprimento. Se o *index* não for informado, o *index* será dado de forma similar ao do objeto tipo *Series*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplo com dicionário de listas:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "dicionario_lista_exemplo = {'Idade': [20,19,21,22,20],\n", " 'Peso': [55,80,62,67,73],\n", " 'Altura': [162,178,162,165,171]}" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
02055162
11980178
22162162
32267165
42073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "0 20 55 162\n", "1 19 80 178\n", "2 21 62 162\n", "3 22 67 165\n", "4 20 73 171" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(dicionario_lista_exemplo)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Mais exemplos:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
João1980178
Maria2162162
Pedro2267165
Túlio2073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "João 19 80 178\n", "Maria 21 62 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(dicionario_lista_exemplo, index=['Ana','João','Maria','Pedro','Túlio'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplos com dicionário de *arrays* do *numpy*:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "dicionario_array_exemplo = {'Idade': np.array([20,19,21,22,20]),\n", " 'Peso': np.array([55,80,62,67,73]),\n", " 'Altura': np.array([162,178,162,165,171])}" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
02055162
11980178
22162162
32267165
42073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "0 20 55 162\n", "1 19 80 178\n", "2 21 62 162\n", "3 22 67 165\n", "4 20 73 171" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(dicionario_array_exemplo)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Mais exemplos:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
João1980178
Maria2162162
Pedro2267165
Túlio2073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "João 19 80 178\n", "Maria 21 62 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(dicionario_array_exemplo, index=['Ana','João','Maria','Pedro','Túlio'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### *DataFrame* a partir de uma *Series* do *pandas*\n", "\n", "Neste caso, o *DataFrame* terá o mesmo *index* que a *Series* do *pandas* e apenas uma coluna." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "series_exemplo = pd.Series({'Ana':20, 'João': 19, 'Maria': 21, 'Pedro': 22, 'Túlio': 20})" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
Ana20
João19
Maria21
Pedro22
Túlio20
\n", "
" ], "text/plain": [ " 0\n", "Ana 20\n", "João 19\n", "Maria 21\n", "Pedro 22\n", "Túlio 20" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(series_exemplo)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Caso a *Series* possua um atributo `name` especificado, este será o nome da coluna do *DataFrame*." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "series_exemplo_Idade = pd.Series({'Ana':20, 'João': 19, 'Maria': 21, 'Pedro': 22, 'Túlio': 20}, name=\"Idade\")" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Idade
Ana20
João19
Maria21
Pedro22
Túlio20
\n", "
" ], "text/plain": [ " Idade\n", "Ana 20\n", "João 19\n", "Maria 21\n", "Pedro 22\n", "Túlio 20" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(series_exemplo_Idade)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### *DataFrame* a partir de lista de *Series* do *pandas*\n", "\n", "Neste caso, a entrada dos dados da lista no *DataFrame* será feita por linha." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AnaJoãoMariaPedroTúlio
Peso55.080.062.067.073.0
Altura162.0178.0162.0165.0171.0
Idade20.019.021.022.0NaN
\n", "
" ], "text/plain": [ " Ana João Maria Pedro Túlio\n", "Peso 55.0 80.0 62.0 67.0 73.0\n", "Altura 162.0 178.0 162.0 165.0 171.0\n", "Idade 20.0 19.0 21.0 22.0 NaN" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([serie_Peso, serie_Altura, serie_Idade])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Podemos corrigir a orientação usando o método `transpose`." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PesoAlturaIdade
Ana55.0162.020.0
João80.0178.019.0
Maria62.0162.021.0
Pedro67.0165.022.0
Túlio73.0171.0NaN
\n", "
" ], "text/plain": [ " Peso Altura Idade\n", "Ana 55.0 162.0 20.0\n", "João 80.0 178.0 19.0\n", "Maria 62.0 162.0 21.0\n", "Pedro 67.0 165.0 22.0\n", "Túlio 73.0 171.0 NaN" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame([serie_Peso, serie_Altura, serie_Idade]).transpose()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### *DataFrame* a partir de arquivos\n", "\n", "Para criar um *DataFrame* a partir de um arquivo, precisamos de funções do tipo `pd.read_FORMATO`, onde `FORMATO` indica o formato a ser importado sob o pressuposto de que a biblioteca *pandas* foi devidamente importada com `pd`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Os formatos mais comuns são: \n", "\n", "* *csv* (comma-separated values), \n", "* *xls* ou *xlsx* (formatos do Microsoft Excel),\n", "* *hdf5* (comumente utilizado em *big data*), \n", "* *json* (comumente utilizado em páginas da internet)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As funções para leitura correspondentes são: \n", "* `pd.read_csv`, \n", "* `pd.read_excel`, \n", "* `pd.read_hdf`, \n", "* `pd.read_json`, \n", "\n", "respectivamente." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "De todas elas, a função mais utilizada é `read_csv`. Ela possui vários argumentos. Vejamos os mais utilizados:\n", "\n", "* `file_path_or_buffer`: o endereço do arquivo a ser lido. Pode ser um endereço da internet.\n", "* `sep`: o separador entre as entradas de dados. O separador padrão é `,`.\n", "* `index_col`: a coluna que deve ser usada para formar o *index*. O padrão é `None`. Porém pode ser alterado para outro. Um separador comumente encontrado é o `\\t` (TAB).\n", "* `names`: nomes das colunas a serem usadas. O padrão é `None`.\n", "* `header`: número da linha que servirá como nome para as colunas. O padrão é `infer` (ou seja, tenta deduzir automaticamente). Se os nomes das colunas forem passados através do `names`, então `header` será automaticamente considerado como `None`. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Exemplo:** considere o arquivo `data/exemplo_data.csv` contendo:\n", "\n", "```\n", ",coluna_1,coluna_2\n", "2020-01-01,-0.4160923582996922,1.8103644347460834\n", "2020-01-02,-0.1379696602473578,2.5785204825192785\n", "2020-01-03,0.5758273450544708,0.06086648807755068\n", "2020-01-04,-0.017367186564883633,1.2995865328684455\n", "2020-01-05,1.3842792448510655,-0.3817320973859929\n", "2020-01-06,0.5497056238566345,-1.308789022968975\n", "2020-01-07,-0.2822962331437976,-1.6889791765925102\n", "2020-01-08,-0.9897300598660013,-0.028120707936426497\n", "2020-01-09,0.27558240737928663,-0.1776585993494299\n", "2020-01-10,0.6851316082235455,0.5025348904591399\n", "``` \n", "\n", "Para ler o arquivo acima basta fazer:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "df_exemplo_0 = pd.read_csv('../database/exemplo_data.csv')" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0coluna_1coluna_2
02020-01-01-0.4160921.810364
12020-01-02-0.1379702.578520
22020-01-030.5758270.060866
32020-01-04-0.0173671.299587
42020-01-051.384279-0.381732
52020-01-060.549706-1.308789
62020-01-07-0.282296-1.688979
72020-01-08-0.989730-0.028121
82020-01-090.275582-0.177659
92020-01-100.6851320.502535
\n", "
" ], "text/plain": [ " Unnamed: 0 coluna_1 coluna_2\n", "0 2020-01-01 -0.416092 1.810364\n", "1 2020-01-02 -0.137970 2.578520\n", "2 2020-01-03 0.575827 0.060866\n", "3 2020-01-04 -0.017367 1.299587\n", "4 2020-01-05 1.384279 -0.381732\n", "5 2020-01-06 0.549706 -1.308789\n", "6 2020-01-07 -0.282296 -1.688979\n", "7 2020-01-08 -0.989730 -0.028121\n", "8 2020-01-09 0.275582 -0.177659\n", "9 2020-01-10 0.685132 0.502535" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo_0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "No exemplo anterior, as colunas receberam nomes corretamentes exceto pela primeira coluna que gostaríamos de considerar como *index*. Neste caso fazemos:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "df_exemplo = pd.read_csv('../database/exemplo_data.csv', index_col=0)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2
2020-01-01-0.4160921.810364
2020-01-02-0.1379702.578520
2020-01-030.5758270.060866
2020-01-04-0.0173671.299587
2020-01-051.384279-0.381732
2020-01-060.549706-1.308789
2020-01-07-0.282296-1.688979
2020-01-08-0.989730-0.028121
2020-01-090.275582-0.177659
2020-01-100.6851320.502535
\n", "
" ], "text/plain": [ " coluna_1 coluna_2\n", "2020-01-01 -0.416092 1.810364\n", "2020-01-02 -0.137970 2.578520\n", "2020-01-03 0.575827 0.060866\n", "2020-01-04 -0.017367 1.299587\n", "2020-01-05 1.384279 -0.381732\n", "2020-01-06 0.549706 -1.308789\n", "2020-01-07 -0.282296 -1.688979\n", "2020-01-08 -0.989730 -0.028121\n", "2020-01-09 0.275582 -0.177659\n", "2020-01-10 0.685132 0.502535" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### O método *head* do *DataFrame*\n", "\n", "O método `head`, sem argumento, permite que visualizemos as 5 primeiras linhas do *DataFrame*." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2
2020-01-01-0.4160921.810364
2020-01-02-0.1379702.578520
2020-01-030.5758270.060866
2020-01-04-0.0173671.299587
2020-01-051.384279-0.381732
\n", "
" ], "text/plain": [ " coluna_1 coluna_2\n", "2020-01-01 -0.416092 1.810364\n", "2020-01-02 -0.137970 2.578520\n", "2020-01-03 0.575827 0.060866\n", "2020-01-04 -0.017367 1.299587\n", "2020-01-05 1.384279 -0.381732" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se for passado um argumento com valor `n`, as `n` primeiras linhas são impressas." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2
2020-01-01-0.4160921.810364
2020-01-02-0.1379702.578520
\n", "
" ], "text/plain": [ " coluna_1 coluna_2\n", "2020-01-01 -0.416092 1.810364\n", "2020-01-02 -0.137970 2.578520" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.head(2)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2
2020-01-01-0.4160921.810364
2020-01-02-0.1379702.578520
2020-01-030.5758270.060866
2020-01-04-0.0173671.299587
2020-01-051.384279-0.381732
2020-01-060.549706-1.308789
2020-01-07-0.282296-1.688979
\n", "
" ], "text/plain": [ " coluna_1 coluna_2\n", "2020-01-01 -0.416092 1.810364\n", "2020-01-02 -0.137970 2.578520\n", "2020-01-03 0.575827 0.060866\n", "2020-01-04 -0.017367 1.299587\n", "2020-01-05 1.384279 -0.381732\n", "2020-01-06 0.549706 -1.308789\n", "2020-01-07 -0.282296 -1.688979" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.head(7)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### O método `tail` do *DataFrame*\n", "\n", "O método `tail`, sem argumento, retorna as últimas 5 linhas do *DataFrame*." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2
2020-01-060.549706-1.308789
2020-01-07-0.282296-1.688979
2020-01-08-0.989730-0.028121
2020-01-090.275582-0.177659
2020-01-100.6851320.502535
\n", "
" ], "text/plain": [ " coluna_1 coluna_2\n", "2020-01-06 0.549706 -1.308789\n", "2020-01-07 -0.282296 -1.688979\n", "2020-01-08 -0.989730 -0.028121\n", "2020-01-09 0.275582 -0.177659\n", "2020-01-10 0.685132 0.502535" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.tail()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se for passado um argumento com valor `n`, as `n` últimas linhas são impressas." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2
2020-01-090.275582-0.177659
2020-01-100.6851320.502535
\n", "
" ], "text/plain": [ " coluna_1 coluna_2\n", "2020-01-09 0.275582 -0.177659\n", "2020-01-10 0.685132 0.502535" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.tail(2)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2
2020-01-04-0.0173671.299587
2020-01-051.384279-0.381732
2020-01-060.549706-1.308789
2020-01-07-0.282296-1.688979
2020-01-08-0.989730-0.028121
2020-01-090.275582-0.177659
2020-01-100.6851320.502535
\n", "
" ], "text/plain": [ " coluna_1 coluna_2\n", "2020-01-04 -0.017367 1.299587\n", "2020-01-05 1.384279 -0.381732\n", "2020-01-06 0.549706 -1.308789\n", "2020-01-07 -0.282296 -1.688979\n", "2020-01-08 -0.989730 -0.028121\n", "2020-01-09 0.275582 -0.177659\n", "2020-01-10 0.685132 0.502535" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.tail(7)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Atributos de *Series* e *DataFrames*\n", "\n", "Atributos comumente usados para *Series* e *DataFrames* são:\n", "\n", "* `shape`: fornece as dimensões do objeto em questão (*Series* ou *DataFrame*) em formato consistente com o atributo `shape` de um *array* do *numpy*.\n", "* `index`: fornece o índice do objeto. No caso do *DataFrame* são os rótulos das linhas.\n", "* `columns`: fornece as colunas (apenas disponível para *DataFrames*) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplo:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10, 2)" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.shape" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "serie_1 = pd.Series([1,2,3,4,5])" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5,)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_1.shape" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05',\n", " '2020-01-06', '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10'],\n", " dtype='object')" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.index" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RangeIndex(start=0, stop=5, step=1)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_1.index" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['coluna_1', 'coluna_2'], dtype='object')" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.columns" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se quisermos obter os dados contidos nos *index* ou nas *Series* podemos utilizar a propriedade `.array`." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "[0, 1, 2, 3, 4]\n", "Length: 5, dtype: int64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_1.index.array" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "['coluna_1', 'coluna_2']\n", "Length: 2, dtype: object" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.columns.array" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se o interesse for obter os dados como um `array` do *numpy*, devemos utilizar o método `.to_numpy()`.\n", "\n", "Exemplo:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_1.index.to_numpy()" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['coluna_1', 'coluna_2'], dtype=object)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.columns.to_numpy()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "O método `.to_numpy()` também está disponível em *DataFrames*:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.41609236, 1.81036443],\n", " [-0.13796966, 2.57852048],\n", " [ 0.57582735, 0.06086649],\n", " [-0.01736719, 1.29958653],\n", " [ 1.38427924, -0.3817321 ],\n", " [ 0.54970562, -1.30878902],\n", " [-0.28229623, -1.68897918],\n", " [-0.98973006, -0.02812071],\n", " [ 0.27558241, -0.1776586 ],\n", " [ 0.68513161, 0.50253489]])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.to_numpy()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A função do *numpy* `asarray()` é compatível com *index*, *columns* e *DataFrames* do *pandas*:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',\n", " '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',\n", " '2020-01-09', '2020-01-10'], dtype=object)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.asarray(df_exemplo.index)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['coluna_1', 'coluna_2'], dtype=object)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.asarray(df_exemplo.columns)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.41609236, 1.81036443],\n", " [-0.13796966, 2.57852048],\n", " [ 0.57582735, 0.06086649],\n", " [-0.01736719, 1.29958653],\n", " [ 1.38427924, -0.3817321 ],\n", " [ 0.54970562, -1.30878902],\n", " [-0.28229623, -1.68897918],\n", " [-0.98973006, -0.02812071],\n", " [ 0.27558241, -0.1776586 ],\n", " [ 0.68513161, 0.50253489]])" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.asarray(df_exemplo)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Informações sobre as colunas de um *DataFrame*\n", "\n", "Para obtermos uma breve descrição sobre as colunas de um *DataFrame* utilizamos o método `info`.\n", "\n", "Exemplo:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Index: 10 entries, 2020-01-01 to 2020-01-10\n", "Data columns (total 2 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 coluna_1 10 non-null float64\n", " 1 coluna_2 10 non-null float64\n", "dtypes: float64(2)\n", "memory usage: 240.0+ bytes\n" ] } ], "source": [ "df_exemplo.info()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Criando arquivos a partir de *DataFrames*\n", "\n", "Para criar arquivos a partir de *DataFrames*, basta utilizar os métodos do tipo `pd.to_FORMATO`, onde `FORMATO` indica o formato a ser exportado e supondo que a biblioteca *pandas* foi importada com `pd`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Com relação aos tipos de arquivo anteriores, os métodos para exportação correspondentes são:\n", "* `.to_csv` ('endereço_do_arquivo'), \n", "* `.to_excel` ('endereço_do_arquivo'), \n", "* `.to_hdf` ('endereço_do_arquivo'), \n", "* `.to_json`('endereço_do_arquivo'), \n", "\n", "onde `endereço_do_arquivo` é uma `str` que contém o endereço do arquivo a ser exportado." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplo:\n", " \n", "Para exportar para o arquivo `exemplo_novo.csv`, utilizaremos o método `.to_csv` ao *DataFrame* `df_exemplo`:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "df_exemplo.to_csv('../database/exemplo_novo.csv')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exemplo COVID-19 PB" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Dados diários de COVID-19 do estado da Paraíba:\n", "\n", "*Fonte: https://superset.plataformatarget.com.br/superset/dashboard/microdados/*" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "dados_covid_PB = pd.read_csv('https://superset.plataformatarget.com.br/superset/explore_json/?form_data=%7B%22slice_id%22%3A1550%7D&csv=true', \n", " sep=',', index_col=0)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Index: 559 entries, 2021-09-26 to 2020-03-16\n", "Data columns (total 7 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 casosAcumulados 559 non-null int64 \n", " 1 casosNovos 559 non-null int64 \n", " 2 descartados 559 non-null int64 \n", " 3 recuperados 559 non-null int64 \n", " 4 obitosAcumulados 559 non-null int64 \n", " 5 obitosNovos 558 non-null float64\n", " 6 Letalidade 559 non-null float64\n", "dtypes: float64(2), int64(5)\n", "memory usage: 34.9+ KB\n" ] } ], "source": [ "dados_covid_PB.info()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidade
data
2021-09-2644115513448433633375192942.00.0211
2021-09-2544102116548433133344192922.00.0211
2021-09-2444085622548430133284592903.00.0211
2021-09-2344063120348427633254092873.00.0211
2021-09-2244042829748426533195892846.00.0211
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2021-09-26 441155 134 484336 333751 \n", "2021-09-25 441021 165 484331 333441 \n", "2021-09-24 440856 225 484301 332845 \n", "2021-09-23 440631 203 484276 332540 \n", "2021-09-22 440428 297 484265 331958 \n", "\n", " obitosAcumulados obitosNovos Letalidade \n", "data \n", "2021-09-26 9294 2.0 0.0211 \n", "2021-09-25 9292 2.0 0.0211 \n", "2021-09-24 9290 3.0 0.0211 \n", "2021-09-23 9287 3.0 0.0211 \n", "2021-09-22 9284 6.0 0.0211 " ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.head()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidade
data
2020-03-20000000.00.0
2020-03-19000000.00.0
2020-03-18000000.00.0
2020-03-17000000.00.0
2020-03-16000000.00.0
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2020-03-20 0 0 0 0 \n", "2020-03-19 0 0 0 0 \n", "2020-03-18 0 0 0 0 \n", "2020-03-17 0 0 0 0 \n", "2020-03-16 0 0 0 0 \n", "\n", " obitosAcumulados obitosNovos Letalidade \n", "data \n", "2020-03-20 0 0.0 0.0 \n", "2020-03-19 0 0.0 0.0 \n", "2020-03-18 0 0.0 0.0 \n", "2020-03-17 0 0.0 0.0 \n", "2020-03-16 0 0.0 0.0 " ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.tail()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "dados_covid_PB['estado'] = 'PB'" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidadeestado
data
2021-09-2644115513448433633375192942.00.0211PB
2021-09-2544102116548433133344192922.00.0211PB
2021-09-2444085622548430133284592903.00.0211PB
2021-09-2344063120348427633254092873.00.0211PB
2021-09-2244042829748426533195892846.00.0211PB
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2021-09-26 441155 134 484336 333751 \n", "2021-09-25 441021 165 484331 333441 \n", "2021-09-24 440856 225 484301 332845 \n", "2021-09-23 440631 203 484276 332540 \n", "2021-09-22 440428 297 484265 331958 \n", "\n", " obitosAcumulados obitosNovos Letalidade estado \n", "data \n", "2021-09-26 9294 2.0 0.0211 PB \n", "2021-09-25 9292 2.0 0.0211 PB \n", "2021-09-24 9290 3.0 0.0211 PB \n", "2021-09-23 9287 3.0 0.0211 PB \n", "2021-09-22 9284 6.0 0.0211 PB " ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.head()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "dados_covid_PB.to_csv('../database/dadoscovidpb.csv')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Índices dos valores máximos ou mínimos\n", "\n", "Os métodos `idxmin()` e `idxmax()` retornam o *index* cuja entrada fornece o valor mínimo ou máximo da *Series* ou *DataFrame*. Se houver múltiplas ocorrências de mínimos ou máximos, o método retorna a primeira ocorrência.\n", "\n", "Vamos recriar um *DataFrame* genérico." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "serie_Idade = pd.Series({'Ana':20, 'João': 19, 'Maria': 21, 'Pedro': 22, 'Túlio': 20}, name=\"Idade\")\n", "serie_Peso = pd.Series({'Ana':55, 'João': 80, 'Maria': 62, 'Pedro': 67, 'Túlio': 73}, name=\"Peso\")\n", "serie_Altura = pd.Series({'Ana':162, 'João': 178, 'Maria': 162, 'Pedro': 165, 'Túlio': 171}, name=\"Altura\")" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "dicionario_series_exemplo = {'Idade': serie_Idade, 'Peso': serie_Peso, 'Altura': serie_Altura}" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "df_dict_series = pd.DataFrame(dicionario_series_exemplo)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
João1980178
Maria2162162
Pedro2267165
Túlio2073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "João 19 80 178\n", "Maria 21 62 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assim, podemos localizar quem possui menores idade, peso e altura." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "Idade João\n", "Peso Ana\n", "Altura Ana\n", "dtype: object" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.idxmin()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "De igual forma, localizamos quem possui maiores idade, peso e altura." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Idade Pedro\n", "Peso João\n", "Altura João\n", "dtype: object" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.idxmax()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Exemplo:** Aplicaremos as funções `idxmin()` e `idxmax()` aos dados do arquivo `data/exemplo_data.csv` para localizar entradas de interesse." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2
2020-01-01-0.4160921.810364
2020-01-02-0.1379702.578520
2020-01-030.5758270.060866
2020-01-04-0.0173671.299587
2020-01-051.384279-0.381732
2020-01-060.549706-1.308789
2020-01-07-0.282296-1.688979
2020-01-08-0.989730-0.028121
2020-01-090.275582-0.177659
2020-01-100.6851320.502535
\n", "
" ], "text/plain": [ " coluna_1 coluna_2\n", "2020-01-01 -0.416092 1.810364\n", "2020-01-02 -0.137970 2.578520\n", "2020-01-03 0.575827 0.060866\n", "2020-01-04 -0.017367 1.299587\n", "2020-01-05 1.384279 -0.381732\n", "2020-01-06 0.549706 -1.308789\n", "2020-01-07 -0.282296 -1.688979\n", "2020-01-08 -0.989730 -0.028121\n", "2020-01-09 0.275582 -0.177659\n", "2020-01-10 0.685132 0.502535" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo = pd.read_csv('../database/exemplo_data.csv', index_col=0); df_exemplo" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "df_exemplo = pd.DataFrame(df_exemplo, columns=['coluna_1','coluna_2','coluna_3'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inserimos uma terceira coluna com dados fictícios." ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2coluna_3
2020-01-01-0.4160921.8103641.0
2020-01-02-0.1379702.5785202.0
2020-01-030.5758270.0608663.0
2020-01-04-0.0173671.2995874.0
2020-01-051.384279-0.3817325.0
2020-01-060.549706-1.3087896.0
2020-01-07-0.282296-1.6889797.0
2020-01-08-0.989730-0.0281218.0
2020-01-090.275582-0.177659NaN
2020-01-100.6851320.502535NaN
\n", "
" ], "text/plain": [ " coluna_1 coluna_2 coluna_3\n", "2020-01-01 -0.416092 1.810364 1.0\n", "2020-01-02 -0.137970 2.578520 2.0\n", "2020-01-03 0.575827 0.060866 3.0\n", "2020-01-04 -0.017367 1.299587 4.0\n", "2020-01-05 1.384279 -0.381732 5.0\n", "2020-01-06 0.549706 -1.308789 6.0\n", "2020-01-07 -0.282296 -1.688979 7.0\n", "2020-01-08 -0.989730 -0.028121 8.0\n", "2020-01-09 0.275582 -0.177659 NaN\n", "2020-01-10 0.685132 0.502535 NaN" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo['coluna_3'] = pd.Series([1,2,3,4,5,6,7,8,np.nan,np.nan],index=df_exemplo.index)\n", "df_exemplo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Os *index* correspondentes aos menores e maiores valores são datas, evidentemente." ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "coluna_1 2020-01-08\n", "coluna_2 2020-01-07\n", "coluna_3 2020-01-01\n", "dtype: object" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.idxmin()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "coluna_1 2020-01-05\n", "coluna_2 2020-01-02\n", "coluna_3 2020-01-08\n", "dtype: object" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.idxmax()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Reindexação de *DataFrames*\n", "\n", "No *pandas*, o método `reindex`\n", "\n", "* reordena o *DataFrame* de acordo com o conjunto de rótulos inserido como argumento;\n", "* insere valores faltantes caso um rótulo do novo *index* não tenha valor atribuído no conjunto de dados;\n", "* remove valores correspondentes a rótulos que não estão presentes no novo *index*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplos:" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AlturaPesoIMC
VictorNaNNaNNaN
Túlio171.073.0NaN
Pedro165.067.0NaN
João178.080.0NaN
\n", "
" ], "text/plain": [ " Altura Peso IMC\n", "Victor NaN NaN NaN\n", "Túlio 171.0 73.0 NaN\n", "Pedro 165.0 67.0 NaN\n", "João 178.0 80.0 NaN" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.reindex(index=['Victor', 'Túlio', 'Pedro', 'João'], columns=['Altura','Peso','IMC'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Remoção de linhas ou colunas de um *DataFrame*\n", "\n", "Para remover linhas ou colunas de um *DataFrame* do *pandas* podemos utilizar o método `drop`. O argumento `axis` identifica o eixo de remoção: `axis=0`, que é o padrão, indica a remoção de uma ou mais linhas; `axis=1` indica a remoção de uma ou mais colunas." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Nos exemplos que segue, note que novos *DataFrames* são obtidos a partir de `df_dict_series` sem que este seja sobrescrito." ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
João1980178
Maria2162162
Pedro2267165
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "João 19 80 178\n", "Maria 21 62 162\n", "Pedro 22 67 165" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.drop('Túlio') # axis=0 implícito " ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
João1980178
Pedro2267165
Túlio2073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "João 19 80 178\n", "Pedro 22 67 165\n", "Túlio 20 73 171" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.drop(['Ana','Maria'], axis=0)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PesoAltura
Ana55162
João80178
Maria62162
Pedro67165
Túlio73171
\n", "
" ], "text/plain": [ " Peso Altura\n", "Ana 55 162\n", "João 80 178\n", "Maria 62 162\n", "Pedro 67 165\n", "Túlio 73 171" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.drop(['Idade'], axis=1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Renomeando *index* e *columns*\n", "\n", "O método `rename` retorna uma cópia na qual o *index* (no caso de *Series* e *DataFrames*) e *columns* (no caso de *DataFrames*) foram renomeados. O método aceita como entrada um dicionário, uma *Series* do *pandas* ou uma função." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplo:" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "serie_exemplo = pd.Series([1,2,3], index=['a','b','c'])" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1\n", "b 2\n", "c 3\n", "dtype: int64" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "abacaxi 1\n", "banana 2\n", "cebola 3\n", "dtype: int64" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo.rename({'a':'abacaxi', 'b':'banana', 'c': 'cebola'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Exemplo:" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
João1980178
Maria2162162
Pedro2267165
Túlio2073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "João 19 80 178\n", "Maria 21 62 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IPA
a2055162
j1980178
m2162162
p2267165
t2073171
\n", "
" ], "text/plain": [ " I P A\n", "a 20 55 162\n", "j 19 80 178\n", "m 21 62 162\n", "p 22 67 165\n", "t 20 73 171" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.rename(index = {'Ana':'a', 'João':'j', 'Maria':'m', 'Pedro':'p','Túlio':'t'},\n", " columns = {'Idade':'I', 'Peso':'P','Altura':'A'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No próximo exemplo, usamos uma *Series* para renomear os rótulos." ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "indice_novo = pd.Series({'Ana':'a', 'João':'j', 'Maria':'m', 'Pedro':'p','Túlio':'t'})" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
a2055162
j1980178
m2162162
p2267165
t2073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "a 20 55 162\n", "j 19 80 178\n", "m 21 62 162\n", "p 22 67 165\n", "t 20 73 171" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.rename(index = indice_novo)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neste exemplo, usamos a função `str.upper` (altera a `str` para \"todas maiúsculas\") para renomear colunas." ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDADEPESOALTURA
Ana2055162
João1980178
Maria2162162
Pedro2267165
Túlio2073171
\n", "
" ], "text/plain": [ " IDADE PESO ALTURA\n", "Ana 20 55 162\n", "João 19 80 178\n", "Maria 21 62 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.rename(columns=str.upper)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Ordenação de *Series* e *DataFrames*\n", "\n", "É possível ordenar ambos pelos rótulos do *index* (para tanto é necessário que eles sejam ordenáveis) ou por valores nas colunas. \n", "\n", "O método `sort_index` ordena a *Series* ou o *DataFrame* pelo *index*. O método `sort_values` ordena a *Series* ou o *DataFrame* pelos valores (escolhendo uma ou mais colunas no caso de *DataFrames*). No caso do *DataFrame*, o argumento `by` é necessário para indicar qual(is) coluna(s) será(ão) utilizada(s) como base para a ordenação." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplos:" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Maria 21\n", "Pedro 22\n", "Túlio 20\n", "João 19\n", "Ana 20\n", "dtype: int64" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_desordenada = pd.Series({'Maria': 21, 'Pedro': 22, 'Túlio': 20, 'João': 19, 'Ana':20}); \n", "serie_desordenada" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Ana 20\n", "João 19\n", "Maria 21\n", "Pedro 22\n", "Túlio 20\n", "dtype: int64" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_desordenada.sort_index() # ordenação alfabética" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Mais exemplos:" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [], "source": [ "df_desordenado = df_dict_series.reindex(index=['Pedro','Maria','Ana','Túlio','João'])" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Pedro2267165
Maria2162162
Ana2055162
Túlio2073171
João1980178
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Pedro 22 67 165\n", "Maria 21 62 162\n", "Ana 20 55 162\n", "Túlio 20 73 171\n", "João 19 80 178" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_desordenado" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
João1980178
Maria2162162
Pedro2267165
Túlio2073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "João 19 80 178\n", "Maria 21 62 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_desordenado.sort_index()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Mais exemplos:" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "João 19\n", "Túlio 20\n", "Ana 20\n", "Maria 21\n", "Pedro 22\n", "dtype: int64" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_desordenada.sort_values()" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Maria2162162
Ana2055162
Pedro2267165
Túlio2073171
João1980178
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Maria 21 62 162\n", "Ana 20 55 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171\n", "João 19 80 178" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_desordenado.sort_values(by=['Altura']) # ordena por 'Altura'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "No caso de \"empate\", podemos utilizar outra coluna para desempatar." ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
Maria2162162
Pedro2267165
Túlio2073171
João1980178
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "Maria 21 62 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171\n", "João 19 80 178" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_desordenado.sort_values(by=['Altura','Peso']) # usa a coluna 'Peso' para desempatar" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Os métodos `sort_index` e `sort_values` admitem o argumento opcional `ascending`, que permite inverter a ordenação caso tenha valor `False`." ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Túlio2073171
Pedro2267165
Maria2162162
João1980178
Ana2055162
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Túlio 20 73 171\n", "Pedro 22 67 165\n", "Maria 21 62 162\n", "João 19 80 178\n", "Ana 20 55 162" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_desordenado.sort_index(ascending=False)" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Pedro2267165
Maria2162162
Ana2055162
Túlio2073171
João1980178
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Pedro 22 67 165\n", "Maria 21 62 162\n", "Ana 20 55 162\n", "Túlio 20 73 171\n", "João 19 80 178" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_desordenado.sort_values(by=['Idade'], ascending=False)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Comparação de *Series* e *DataFrames*\n", "\n", "*Series* e *DataFrames* possuem os seguintes métodos de comparação lógica: \n", "\n", "- `eq` (igual);\n", "- `ne` (diferente);\n", "- `lt` (menor do que);\n", "- `gt` (maior do que);\n", "- `le` (menor ou igual a); \n", "- `ge` (maior ou igual a)\n", "\n", "que permitem a utilização dos operadores binários `==`, `!=`, `<`, `>`, `<=` e `>=`, respectivamente. As comparações são realizadas em cada entrada da *Series* ou do *DataFrame*.\n", "\n", "**Observação**: Para que esses métodos sejam aplicados, todos os objetos presentes nas colunas do *DataFrame* devem ser de mesma natureza. Por exemplo, se um *DataFrame* possui algumas colunas numéricas e outras com *strings*, ao realizar uma comparação do tipo `> 1`, um erro ocorrerá, pois o *pandas* tentará comparar objetos do tipo `int` com objetos do tipo `str`, assim gerando uma incompatibilidade.\n", "\n", "Exemplos:" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1\n", "b 2\n", "c 3\n", "dtype: int64" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "a False\n", "b True\n", "c False\n", "dtype: bool" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo == 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "De outra forma:" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a False\n", "b True\n", "c False\n", "dtype: bool" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo.eq(2)" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a False\n", "b True\n", "c True\n", "dtype: bool" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo > 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ou, na forma funcional:" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a False\n", "b True\n", "c True\n", "dtype: bool" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo.gt(1)" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2coluna_3
2020-01-01FalseTrueFalse
2020-01-02FalseTrueTrue
2020-01-03FalseFalseTrue
2020-01-04FalseTrueTrue
2020-01-05TrueFalseTrue
2020-01-06FalseFalseTrue
2020-01-07FalseFalseTrue
2020-01-08FalseFalseTrue
2020-01-09FalseFalseFalse
2020-01-10FalseFalseFalse
\n", "
" ], "text/plain": [ " coluna_1 coluna_2 coluna_3\n", "2020-01-01 False True False\n", "2020-01-02 False True True\n", "2020-01-03 False False True\n", "2020-01-04 False True True\n", "2020-01-05 True False True\n", "2020-01-06 False False True\n", "2020-01-07 False False True\n", "2020-01-08 False False True\n", "2020-01-09 False False False\n", "2020-01-10 False False False" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo > 1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Importante:** Ao comparar *np.nan*, o resultado tipicamente é falso:" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan == np.nan" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan > np.nan" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan >= np.nan" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Só é verdadeiro para indicar que é diferente:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan != np.nan" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Neste sentido, podemos ter tabelas iguais sem que a comparação usual funcione:" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# 'copy', como o nome sugere, gera uma cópia do DataFrame\n", "df_exemplo_2 = df_exemplo.copy() " ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coluna_1coluna_2coluna_3
2020-01-01TrueTrueTrue
2020-01-02TrueTrueTrue
2020-01-03TrueTrueTrue
2020-01-04TrueTrueTrue
2020-01-05TrueTrueTrue
2020-01-06TrueTrueTrue
2020-01-07TrueTrueTrue
2020-01-08TrueTrueTrue
2020-01-09TrueTrueFalse
2020-01-10TrueTrueFalse
\n", "
" ], "text/plain": [ " coluna_1 coluna_2 coluna_3\n", "2020-01-01 True True True\n", "2020-01-02 True True True\n", "2020-01-03 True True True\n", "2020-01-04 True True True\n", "2020-01-05 True True True\n", "2020-01-06 True True True\n", "2020-01-07 True True True\n", "2020-01-08 True True True\n", "2020-01-09 True True False\n", "2020-01-10 True True False" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df_exemplo == df_exemplo_2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "O motivo de haver entradas como `False` ainda que `df_exemplo_2` seja uma cópia exata de `df_exemplo` é a presença do `np.nan`. Neste caso, devemos utilizar o método `equals` para realizar a comparação." ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.equals(df_exemplo_2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Os métodos `any`, `all` e a propriedade `empty`\n", "\n", "O método `any` é aplicado a entradas booleanas (verdadeiras ou falsas) e retorna *verdadeiro* se existir alguma entrada verdadeira, ou *falso*, se todas forem falsas. O método `all` é aplicado a entradas booleanas e retorna *verdadeiro* se todas as entradas forem verdadeiras, ou *falso*, se houver pelo menos uma entrada falsa. A propriedade `empty` retorna *verdadeiro* se a *Series* ou o *DataFrame* estiver vazio, ou *falso* caso contrário." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exemplos:" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1\n", "b 2\n", "c 3\n", "dtype: int64" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a -1\n", "b 0\n", "c 1\n", "dtype: int64" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo_2 = serie_exemplo-2; \n", "serie_exemplo_2" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(serie_exemplo_2 > 0).any()" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(serie_exemplo > 1).all()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Este exemplo reproduz um valor `False` único." ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df_exemplo == df_exemplo_2).all().all()" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "serie_exemplo.empty" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Mais exemplos:" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "coluna_1 True\n", "coluna_2 True\n", "coluna_3 True\n", "dtype: bool" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(df_exemplo == df_exemplo_2).any()" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_exemplo.empty" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [], "source": [ "df_vazio = pd.DataFrame()" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_vazio.empty" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Seleção de colunas de um *DataFrame*\n", "\n", "Para selecionar colunas de um *DataFrame*, basta aplicar *colchetes* a uma lista contendo os nomes das colunas de interesse." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "No exemplo abaixo, temos um *DataFrame* contendo as colunas `'Idade'`, `'Peso'` e `'Altura'`. Selecionaremos `'Peso'` e `'Altura'`, apenas." ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PesoAltura
Ana55162
João80178
Maria62162
Pedro67165
Túlio73171
\n", "
" ], "text/plain": [ " Peso Altura\n", "Ana 55 162\n", "João 80 178\n", "Maria 62 162\n", "Pedro 67 165\n", "Túlio 73 171" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series[['Peso','Altura']]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se quisermos selecionar apenas uma coluna, não há necessidade de inserir uma lista. Basta utilizar o nome da coluna:" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Ana 55\n", "João 80\n", "Maria 62\n", "Pedro 67\n", "Túlio 73\n", "Name: Peso, dtype: int64" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series['Peso']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Para remover colunas, podemos utilizar o método `drop`." ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Idade
Ana20
João19
Maria21
Pedro22
Túlio20
\n", "
" ], "text/plain": [ " Idade\n", "Ana 20\n", "João 19\n", "Maria 21\n", "Pedro 22\n", "Túlio 20" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.drop(['Peso','Altura'], axis=1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Criação de novas colunas a partir de colunas existentes\n", "\n", "Um método eficiente para criar novas colunas a partir de colunas já existentes é `eval`. Neste método, podemos utilizar como argumento uma *string* contendo uma expressão matemática envolvendo nomes de colunas do *DataFrame*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Como exemplo, vamos ver como calcular o IMC no *DataFrame* anterior:" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Ana 20.957171\n", "João 25.249337\n", "Maria 23.624447\n", "Pedro 24.609734\n", "Túlio 24.964946\n", "dtype: float64" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.eval('Peso/(Altura/100)**2')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se quisermos obter um *DataFrame* contendo o IMC como uma nova coluna, podemos utilizar o método `assign` (sem modificar o *DataFrame* original):" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAlturaIMC
Ana205516220.96
João198017825.25
Maria216216223.62
Pedro226716524.61
Túlio207317124.96
\n", "
" ], "text/plain": [ " Idade Peso Altura IMC\n", "Ana 20 55 162 20.96\n", "João 19 80 178 25.25\n", "Maria 21 62 162 23.62\n", "Pedro 22 67 165 24.61\n", "Túlio 20 73 171 24.96" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series.assign(IMC=round(df_dict_series.eval('Peso/(Altura/100)**2'),2))" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAltura
Ana2055162
João1980178
Maria2162162
Pedro2267165
Túlio2073171
\n", "
" ], "text/plain": [ " Idade Peso Altura\n", "Ana 20 55 162\n", "João 19 80 178\n", "Maria 21 62 162\n", "Pedro 22 67 165\n", "Túlio 20 73 171" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series # não modificado" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se quisermos modificar o *DataFrame* para incluir a coluna IMC fazemos:" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [], "source": [ "df_dict_series['IMC']=round(df_dict_series.eval('Peso/(Altura/100)**2'),2)" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAlturaIMC
Ana205516220.96
João198017825.25
Maria216216223.62
Pedro226716524.61
Túlio207317124.96
\n", "
" ], "text/plain": [ " Idade Peso Altura IMC\n", "Ana 20 55 162 20.96\n", "João 19 80 178 25.25\n", "Maria 21 62 162 23.62\n", "Pedro 22 67 165 24.61\n", "Túlio 20 73 171 24.96" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series # modificado \"in-place\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Seleção de linhas de um *DataFrame*\n", "\n", "Podemos selecionar linhas de um *DataFrame* de diversas formas diferentes. Veremos agora algumas dessas formas.\n", "\n", "Diferentemente da forma de selecionar colunas, para selecionar diretamente linhas de um *DataFrame* devemos utilizar o método `loc` (fornecendo o *index*, isto é, o rótulo da linha) ou o `iloc` (fornecendo a posição da linha):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Trabalharemos a seguir com um banco de dados atualizado sobre a COVID-19. Para tanto, importaremos o módulo `datetime` que nos auxiliará com datas." ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [], "source": [ "import datetime" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "dados_covid_PB = pd.read_csv('https://superset.plataformatarget.com.br/superset/explore_json/?form_data=%7B%22slice_id%22%3A1550%7D&csv=true', \n", " sep=',', index_col=0)\n", "\n", "# busca o banco na data D-1, visto que a atualização\n", "# ocorre em D\n", "ontem = (datetime.date.today() - datetime.timedelta(days=1)).strftime('%Y-%m-%d') " ] }, { "cell_type": "code", "execution_count": 125, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidade
data
2021-09-2644115513448433633375192942.00.0211
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2021-09-26 441155 134 484336 333751 \n", "\n", " obitosAcumulados obitosNovos Letalidade \n", "data \n", "2021-09-26 9294 2.0 0.0211 " ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.head(1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Podemos ver as informações de um único dia como argumento. Para tanto, excluímos a coluna `'Letalidade'` (valor não inteiro) e convertemos o restante para valores inteiros:" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "casosAcumulados 441155\n", "casosNovos 134\n", "descartados 484336\n", "recuperados 333751\n", "obitosAcumulados 9294\n", "obitosNovos 2\n", "Name: 2021-09-26, dtype: int64" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.loc[ontem].drop('Letalidade').astype('int')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Podemos selecionar um intervalo de datas como argumento (excluindo letalidade):" ] }, { "cell_type": "code", "execution_count": 127, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovos
2021-02-011925981014234215149235406812.0
2021-02-02193465867234366149242408214.0
2021-02-031945191054234902149248409614.0
2021-02-04195482963235319149792411014.0
2021-02-051965421060236216150169412313.0
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "2021-02-01 192598 1014 234215 149235 \n", "2021-02-02 193465 867 234366 149242 \n", "2021-02-03 194519 1054 234902 149248 \n", "2021-02-04 195482 963 235319 149792 \n", "2021-02-05 196542 1060 236216 150169 \n", "\n", " obitosAcumulados obitosNovos \n", "2021-02-01 4068 12.0 \n", "2021-02-02 4082 14.0 \n", "2021-02-03 4096 14.0 \n", "2021-02-04 4110 14.0 \n", "2021-02-05 4123 13.0 " ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.index = pd.to_datetime(dados_covid_PB.index) # Convertendo o index de string para data\n", "dados_covid_PB.loc[pd.date_range('2021-02-01',periods=5,freq=\"D\")].drop('Letalidade',axis=1) \n", " #função pd.date_range é muito útil para criar índices a partir de datas." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Podemos colocar uma lista como argumento:" ] }, { "cell_type": "code", "execution_count": 128, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidade
2021-01-0116706257821989012738836808.00.0220
2021-02-011925981014234215149235406812.00.0211
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "2021-01-01 167062 578 219890 127388 \n", "2021-02-01 192598 1014 234215 149235 \n", "\n", " obitosAcumulados obitosNovos Letalidade \n", "2021-01-01 3680 8.0 0.0220 \n", "2021-02-01 4068 12.0 0.0211 " ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.loc[pd.to_datetime(['2021-01-01','2021-02-01'])]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Vamos agora examinar os dados da posição 100 (novamente excluindo a coluna letalidade e convertendo para inteiro):" ] }, { "cell_type": "code", "execution_count": 129, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "casosAcumulados 371133\n", "casosNovos 2555\n", "descartados 328941\n", "recuperados 243203\n", "obitosAcumulados 8316\n", "obitosNovos 47\n", "Name: 2021-06-18 00:00:00, dtype: int64" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.iloc[100].drop('Letalidade').astype('int')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Podemos colocar um intervalo como argumento:" ] }, { "cell_type": "code", "execution_count": 130, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovos
data
2021-08-07424983406456236298889904912
2021-08-06424577423455493298554903714
2021-08-0542415457045435729794590235
2021-08-0442358487245314629722290187
2021-08-034227123745135829618090118
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2021-08-07 424983 406 456236 298889 \n", "2021-08-06 424577 423 455493 298554 \n", "2021-08-05 424154 570 454357 297945 \n", "2021-08-04 423584 872 453146 297222 \n", "2021-08-03 422712 37 451358 296180 \n", "\n", " obitosAcumulados obitosNovos \n", "data \n", "2021-08-07 9049 12 \n", "2021-08-06 9037 14 \n", "2021-08-05 9023 5 \n", "2021-08-04 9018 7 \n", "2021-08-03 9011 8 " ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.iloc[50:55].drop('Letalidade', axis=1).astype('int') " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Seleção de colunas com `loc` e `iloc`\n", "\n", "Podemos selecionar colunas utilizando os métodos `loc` e `iloc` utilizando um argumento adicional." ] }, { "cell_type": "code", "execution_count": 131, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosNovosobitosNovos
data
2021-09-261342.0
2021-09-251652.0
2021-09-242253.0
2021-09-232033.0
2021-09-222976.0
.........
2020-03-2000.0
2020-03-1900.0
2020-03-1800.0
2020-03-1700.0
2020-03-1600.0
\n", "

559 rows × 2 columns

\n", "
" ], "text/plain": [ " casosNovos obitosNovos\n", "data \n", "2021-09-26 134 2.0\n", "2021-09-25 165 2.0\n", "2021-09-24 225 3.0\n", "2021-09-23 203 3.0\n", "2021-09-22 297 6.0\n", "... ... ...\n", "2020-03-20 0 0.0\n", "2020-03-19 0 0.0\n", "2020-03-18 0 0.0\n", "2020-03-17 0 0.0\n", "2020-03-16 0 0.0\n", "\n", "[559 rows x 2 columns]" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.loc[:,['casosNovos','obitosNovos']]" ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
obitosAcumuladosobitosNovos
data
2021-09-2692942.0
2021-09-2592922.0
2021-09-2492903.0
2021-09-2392873.0
2021-09-2292846.0
.........
2020-03-2000.0
2020-03-1900.0
2020-03-1800.0
2020-03-1700.0
2020-03-1600.0
\n", "

559 rows × 2 columns

\n", "
" ], "text/plain": [ " obitosAcumulados obitosNovos\n", "data \n", "2021-09-26 9294 2.0\n", "2021-09-25 9292 2.0\n", "2021-09-24 9290 3.0\n", "2021-09-23 9287 3.0\n", "2021-09-22 9284 6.0\n", "... ... ...\n", "2020-03-20 0 0.0\n", "2020-03-19 0 0.0\n", "2020-03-18 0 0.0\n", "2020-03-17 0 0.0\n", "2020-03-16 0 0.0\n", "\n", "[559 rows x 2 columns]" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.iloc[:,4:6] # fatiamento na coluna" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Seleção de linhas e colunas específicas com `loc` e `iloc`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Usando o mesmo princípio de *fatiamento* aplicado a *arrays* do numpy, podemos selecionar linhas e colunas em um intervalo específico de forma a obter uma subtabela." ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
obitosAcumuladosobitosNovos
data
2021-06-23845320.0
2021-06-22843322.0
2021-06-21841130.0
2021-06-20838133.0
2021-06-19834832.0
\n", "
" ], "text/plain": [ " obitosAcumulados obitosNovos\n", "data \n", "2021-06-23 8453 20.0\n", "2021-06-22 8433 22.0\n", "2021-06-21 8411 30.0\n", "2021-06-20 8381 33.0\n", "2021-06-19 8348 32.0" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.iloc[95:100,4:6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neste exemplo um pouco mais complexo, buscamos casos novos e óbitos novos em um período específico e ordenamos a tabela da data mais recente para a mais antiga." ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosNovosobitosNovos
2020-04-1060.0
2020-04-09244.0
2020-04-08143.0
2020-04-0750.0
2020-04-0610.0
\n", "
" ], "text/plain": [ " casosNovos obitosNovos\n", "2020-04-10 6 0.0\n", "2020-04-09 24 4.0\n", "2020-04-08 14 3.0\n", "2020-04-07 5 0.0\n", "2020-04-06 1 0.0" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.loc[pd.date_range('2020-04-06','2020-04-10'),['casosNovos','obitosNovos']].sort_index(ascending=False)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Suponha que o peso de Ana foi medido corretamente, mas registrado de maneira errônea no *DataFrame* `df_dict_series` como 55." ] }, { "cell_type": "code", "execution_count": 135, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAlturaIMC
Ana205516220.96
João198017825.25
Maria216216223.62
Pedro226716524.61
Túlio207317124.96
\n", "
" ], "text/plain": [ " Idade Peso Altura IMC\n", "Ana 20 55 162 20.96\n", "João 19 80 178 25.25\n", "Maria 21 62 162 23.62\n", "Pedro 22 67 165 24.61\n", "Túlio 20 73 171 24.96" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Supondo que, na realidade, o valor é 65, alteramos a entrada específica com um simples `loc`. Em seguida, atualizamos a tabela." ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "df_dict_series.loc['Ana','Peso'] = 65\n", "\n", "df_dict_series = df_dict_series.assign(IMC=round(df_dict_series.eval('Peso/(Altura/100)**2'),2)) # O IMC mudou" ] }, { "cell_type": "code", "execution_count": 137, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAlturaIMC
Ana206516224.77
João198017825.25
Maria216216223.62
Pedro226716524.61
Túlio207317124.96
\n", "
" ], "text/plain": [ " Idade Peso Altura IMC\n", "Ana 20 65 162 24.77\n", "João 19 80 178 25.25\n", "Maria 21 62 162 23.62\n", "Pedro 22 67 165 24.61\n", "Túlio 20 73 171 24.96" ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Seleção de linhas através de critérios lógicos ou funções\n", "\n", "Vamos selecionar quais os dias em que houve mais de 40 mortes registradas:" ] }, { "cell_type": "code", "execution_count": 138, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidade
data
2021-06-183711332555328941243203831647.00.0224
2021-06-113547152451322301235778805855.00.0227
2021-06-103522643911231258235329800346.00.0227
2021-04-272893191037290111204340672246.00.0232
2021-04-19280967904286168194644648743.00.0231
2021-04-142753011218282121193451631642.00.0229
2021-04-122728111486279801193188623458.00.0229
2021-04-072670081551274623189753605049.00.0227
2021-04-06265457817273026186879600152.00.0226
2021-04-052646401059272719186670594943.00.0225
2021-04-03262500789272207186334586646.00.0223
2021-04-012604841311271069186084579147.00.0222
2021-03-312591731553270560183668574473.00.0222
2021-03-302576201150269399183369567170.00.0220
2021-03-29256470968268985183028560149.00.0218
2021-03-282555021203268866182873555243.00.0217
2021-03-272542991410268309182680550957.00.0217
2021-03-262528891611266785182404545248.00.0216
2021-03-252512781672265533180125540450.00.0215
2021-03-242496061154264150177812535441.00.0214
2021-03-232484521140264146177591531341.00.0214
2021-03-202455641300264128174980521245.00.0212
2021-03-192442641269264122174769516744.00.0212
2021-03-182429951403258188172789512343.00.0211
2021-03-172415921194256574172492508044.00.0210
2021-03-16240398804255491172218503644.00.0209
2021-03-152395941025255162171911499259.00.0208
2021-03-102327711365251742170510479750.00.0206
2021-03-042256721385245919164712461254.00.0204
2020-07-216884411647860525028155841.00.0226
2020-07-156393914777439923695138341.00.0216
2020-07-024953613614827216349104442.00.0211
2020-06-30469571900430701493097746.00.0208
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2021-06-18 371133 2555 328941 243203 \n", "2021-06-11 354715 2451 322301 235778 \n", "2021-06-10 352264 3911 231258 235329 \n", "2021-04-27 289319 1037 290111 204340 \n", "2021-04-19 280967 904 286168 194644 \n", "2021-04-14 275301 1218 282121 193451 \n", "2021-04-12 272811 1486 279801 193188 \n", "2021-04-07 267008 1551 274623 189753 \n", "2021-04-06 265457 817 273026 186879 \n", "2021-04-05 264640 1059 272719 186670 \n", "2021-04-03 262500 789 272207 186334 \n", "2021-04-01 260484 1311 271069 186084 \n", "2021-03-31 259173 1553 270560 183668 \n", "2021-03-30 257620 1150 269399 183369 \n", "2021-03-29 256470 968 268985 183028 \n", "2021-03-28 255502 1203 268866 182873 \n", "2021-03-27 254299 1410 268309 182680 \n", "2021-03-26 252889 1611 266785 182404 \n", "2021-03-25 251278 1672 265533 180125 \n", "2021-03-24 249606 1154 264150 177812 \n", "2021-03-23 248452 1140 264146 177591 \n", "2021-03-20 245564 1300 264128 174980 \n", "2021-03-19 244264 1269 264122 174769 \n", "2021-03-18 242995 1403 258188 172789 \n", "2021-03-17 241592 1194 256574 172492 \n", "2021-03-16 240398 804 255491 172218 \n", "2021-03-15 239594 1025 255162 171911 \n", "2021-03-10 232771 1365 251742 170510 \n", "2021-03-04 225672 1385 245919 164712 \n", "2020-07-21 68844 1164 78605 25028 \n", "2020-07-15 63939 1477 74399 23695 \n", "2020-07-02 49536 1361 48272 16349 \n", "2020-06-30 46957 1900 43070 14930 \n", "\n", " obitosAcumulados obitosNovos Letalidade \n", "data \n", "2021-06-18 8316 47.0 0.0224 \n", "2021-06-11 8058 55.0 0.0227 \n", "2021-06-10 8003 46.0 0.0227 \n", "2021-04-27 6722 46.0 0.0232 \n", "2021-04-19 6487 43.0 0.0231 \n", "2021-04-14 6316 42.0 0.0229 \n", "2021-04-12 6234 58.0 0.0229 \n", "2021-04-07 6050 49.0 0.0227 \n", "2021-04-06 6001 52.0 0.0226 \n", "2021-04-05 5949 43.0 0.0225 \n", "2021-04-03 5866 46.0 0.0223 \n", "2021-04-01 5791 47.0 0.0222 \n", "2021-03-31 5744 73.0 0.0222 \n", "2021-03-30 5671 70.0 0.0220 \n", "2021-03-29 5601 49.0 0.0218 \n", "2021-03-28 5552 43.0 0.0217 \n", "2021-03-27 5509 57.0 0.0217 \n", "2021-03-26 5452 48.0 0.0216 \n", "2021-03-25 5404 50.0 0.0215 \n", "2021-03-24 5354 41.0 0.0214 \n", "2021-03-23 5313 41.0 0.0214 \n", "2021-03-20 5212 45.0 0.0212 \n", "2021-03-19 5167 44.0 0.0212 \n", "2021-03-18 5123 43.0 0.0211 \n", "2021-03-17 5080 44.0 0.0210 \n", "2021-03-16 5036 44.0 0.0209 \n", "2021-03-15 4992 59.0 0.0208 \n", "2021-03-10 4797 50.0 0.0206 \n", "2021-03-04 4612 54.0 0.0204 \n", "2020-07-21 1558 41.0 0.0226 \n", "2020-07-15 1383 41.0 0.0216 \n", "2020-07-02 1044 42.0 0.0211 \n", "2020-06-30 977 46.0 0.0208 " ] }, "execution_count": 138, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.loc[dados_covid_PB['obitosNovos']>40]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Selecionando os dias com mais de 25 óbitos e mais de 1500 casos novos:" ] }, { "cell_type": "code", "execution_count": 139, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidade
data
2021-07-023984401536332530262757865426.00.0217
2021-06-243850182391332313250519847926.00.0220
2021-06-213780642114330969249135841130.00.0222
2021-06-203759502178330721248667838133.00.0223
2021-06-193737722639329903248304834832.00.0223
2021-06-183711332555328941243203831647.00.0224
2021-06-173685782457327929242820826932.00.0224
2021-06-163661212328326517242184823740.00.0225
2021-06-153637932358325032241749819739.00.0225
2021-06-143614352575324438241293815830.00.0226
2021-06-133588602034324327236555812833.00.0226
2021-06-123568262111323444236186809537.00.0227
2021-06-113547152451322301235778805855.00.0227
2021-06-103522643911231258235329800346.00.0227
2021-06-063441741982319055230002784929.00.0228
2021-06-053421921992318096229790782029.00.0229
2021-06-043402002421317390229489779126.00.0229
2021-06-033377792660316011229062776530.00.0230
2021-06-023351192403314538225278773532.00.0231
2021-06-013327161751314520224912770331.00.0232
2021-05-293287712429312011224255761728.00.0232
2021-05-283263421917311013223896758933.00.0233
2021-05-273244251887310639223537755636.00.0233
2021-05-263225382262309492221134752036.00.0233
2021-05-253202762015308275220621748436.00.0234
2021-04-072670081551274623189753605049.00.0227
2021-03-312591731553270560183668574473.00.0222
2021-03-262528891611266785182404545248.00.0216
2021-03-252512781672265533180125540450.00.0215
2021-03-132373131694254238171130489631.00.0206
2020-08-0485760154910650038554190131.00.0222
2020-07-318279416869675335971181126.00.0219
2020-07-237310421328404728566161837.00.0221
2020-07-105911815046956721481122933.00.0208
2020-07-085634415426754919999117126.00.0208
2020-07-075480216516493319373114527.00.0209
2020-06-30469571900430701493097746.00.0208
2020-06-0922452150120650467153427.00.0238
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2021-07-02 398440 1536 332530 262757 \n", "2021-06-24 385018 2391 332313 250519 \n", "2021-06-21 378064 2114 330969 249135 \n", "2021-06-20 375950 2178 330721 248667 \n", "2021-06-19 373772 2639 329903 248304 \n", "2021-06-18 371133 2555 328941 243203 \n", "2021-06-17 368578 2457 327929 242820 \n", "2021-06-16 366121 2328 326517 242184 \n", "2021-06-15 363793 2358 325032 241749 \n", "2021-06-14 361435 2575 324438 241293 \n", "2021-06-13 358860 2034 324327 236555 \n", "2021-06-12 356826 2111 323444 236186 \n", "2021-06-11 354715 2451 322301 235778 \n", "2021-06-10 352264 3911 231258 235329 \n", "2021-06-06 344174 1982 319055 230002 \n", "2021-06-05 342192 1992 318096 229790 \n", "2021-06-04 340200 2421 317390 229489 \n", "2021-06-03 337779 2660 316011 229062 \n", "2021-06-02 335119 2403 314538 225278 \n", "2021-06-01 332716 1751 314520 224912 \n", "2021-05-29 328771 2429 312011 224255 \n", "2021-05-28 326342 1917 311013 223896 \n", "2021-05-27 324425 1887 310639 223537 \n", "2021-05-26 322538 2262 309492 221134 \n", "2021-05-25 320276 2015 308275 220621 \n", "2021-04-07 267008 1551 274623 189753 \n", "2021-03-31 259173 1553 270560 183668 \n", "2021-03-26 252889 1611 266785 182404 \n", "2021-03-25 251278 1672 265533 180125 \n", "2021-03-13 237313 1694 254238 171130 \n", "2020-08-04 85760 1549 106500 38554 \n", "2020-07-31 82794 1686 96753 35971 \n", "2020-07-23 73104 2132 84047 28566 \n", "2020-07-10 59118 1504 69567 21481 \n", "2020-07-08 56344 1542 67549 19999 \n", "2020-07-07 54802 1651 64933 19373 \n", "2020-06-30 46957 1900 43070 14930 \n", "2020-06-09 22452 1501 20650 4671 \n", "\n", " obitosAcumulados obitosNovos Letalidade \n", "data \n", "2021-07-02 8654 26.0 0.0217 \n", "2021-06-24 8479 26.0 0.0220 \n", "2021-06-21 8411 30.0 0.0222 \n", "2021-06-20 8381 33.0 0.0223 \n", "2021-06-19 8348 32.0 0.0223 \n", "2021-06-18 8316 47.0 0.0224 \n", "2021-06-17 8269 32.0 0.0224 \n", "2021-06-16 8237 40.0 0.0225 \n", "2021-06-15 8197 39.0 0.0225 \n", "2021-06-14 8158 30.0 0.0226 \n", "2021-06-13 8128 33.0 0.0226 \n", "2021-06-12 8095 37.0 0.0227 \n", "2021-06-11 8058 55.0 0.0227 \n", "2021-06-10 8003 46.0 0.0227 \n", "2021-06-06 7849 29.0 0.0228 \n", "2021-06-05 7820 29.0 0.0229 \n", "2021-06-04 7791 26.0 0.0229 \n", "2021-06-03 7765 30.0 0.0230 \n", "2021-06-02 7735 32.0 0.0231 \n", "2021-06-01 7703 31.0 0.0232 \n", "2021-05-29 7617 28.0 0.0232 \n", "2021-05-28 7589 33.0 0.0233 \n", "2021-05-27 7556 36.0 0.0233 \n", "2021-05-26 7520 36.0 0.0233 \n", "2021-05-25 7484 36.0 0.0234 \n", "2021-04-07 6050 49.0 0.0227 \n", "2021-03-31 5744 73.0 0.0222 \n", "2021-03-26 5452 48.0 0.0216 \n", "2021-03-25 5404 50.0 0.0215 \n", "2021-03-13 4896 31.0 0.0206 \n", "2020-08-04 1901 31.0 0.0222 \n", "2020-07-31 1811 26.0 0.0219 \n", "2020-07-23 1618 37.0 0.0221 \n", "2020-07-10 1229 33.0 0.0208 \n", "2020-07-08 1171 26.0 0.0208 \n", "2020-07-07 1145 27.0 0.0209 \n", "2020-06-30 977 46.0 0.0208 \n", "2020-06-09 534 27.0 0.0238 " ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.loc[(dados_covid_PB.obitosNovos > 25) & (dados_covid_PB.casosNovos>1500)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Obs**.: Note que podemos utilizar o nome da coluna como um atributo." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Vamos inserir uma coluna sobrenome no `df_dict_series`:" ] }, { "cell_type": "code", "execution_count": 140, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAlturaIMCSobrenome
Ana206516224.77Silva
João198017825.25PraDo
Maria216216223.62Sales
Pedro226716524.61MachadO
Túlio207317124.96Coutinho
\n", "
" ], "text/plain": [ " Idade Peso Altura IMC Sobrenome\n", "Ana 20 65 162 24.77 Silva\n", "João 19 80 178 25.25 PraDo\n", "Maria 21 62 162 23.62 Sales\n", "Pedro 22 67 165 24.61 MachadO\n", "Túlio 20 73 171 24.96 Coutinho" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_dict_series['Sobrenome'] = ['Silva', 'PraDo', 'Sales', 'MachadO', 'Coutinho']\n", "df_dict_series" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Vamos encontrar as linhas cujo sobrenome termina em \"do\". Para tanto, note que a função abaixo retorna `True` se o final é \"do\" e `False` caso contrário.\n", "\n", "```python\n", "def verifica_final_do(palavra):\n", " return palavra.lower()[-2:] == 'do'\n", "```\n", "**Obs**.: Note que convertemos tudo para minúsculo." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Agora vamos utilizar essa função para alcançar nosso objetivo:" ] }, { "cell_type": "code", "execution_count": 141, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Ana False\n", "João True\n", "Maria False\n", "Pedro True\n", "Túlio False\n", "Name: Sobrenome, dtype: bool" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 'map' aplica a função lambda a cada elemento da *Series*\n", "df_dict_series['Sobrenome'].map(lambda palavra: palavra.lower()[-2:]=='do') " ] }, { "cell_type": "code", "execution_count": 142, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdadePesoAlturaIMCSobrenome
João198017825.25PraDo
Pedro226716524.61MachadO
\n", "
" ], "text/plain": [ " Idade Peso Altura IMC Sobrenome\n", "João 19 80 178 25.25 PraDo\n", "Pedro 22 67 165 24.61 MachadO" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# procurando no df inteiro\n", "df_dict_series.loc[df_dict_series['Sobrenome'].map(lambda palavra: palavra.lower()[-2:]=='do')]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Vamos selecionar as linhas do mês 2 (fevereiro) usando `index.month`:" ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidade
data
2021-02-282211151392245686158304449625.00.0203
2021-02-272197231416245121158132447120.00.0203
2021-02-262183071186244356157849445118.00.0204
2021-02-252171211594243890157586443314.00.0204
2021-02-242155271320243253157329441915.00.0205
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2021-02-28 221115 1392 245686 158304 \n", "2021-02-27 219723 1416 245121 158132 \n", "2021-02-26 218307 1186 244356 157849 \n", "2021-02-25 217121 1594 243890 157586 \n", "2021-02-24 215527 1320 243253 157329 \n", "\n", " obitosAcumulados obitosNovos Letalidade \n", "data \n", "2021-02-28 4496 25.0 0.0203 \n", "2021-02-27 4471 20.0 0.0203 \n", "2021-02-26 4451 18.0 0.0204 \n", "2021-02-25 4433 14.0 0.0204 \n", "2021-02-24 4419 15.0 0.0205 " ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.loc[dados_covid_PB.index.month==2].head()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Seleção de linhas com o método *query*\n", "\n", "Similarmente ao método `eval`, ao utilizarmos `query`, podemos criar expressões lógicas a partir de nomes das colunas do *DataFrame*." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Assim, podemos reescrever o código\n", "\n", "```python\n", "dados_covid_PB.loc[(dados_covid_PB.obitosNovos>25) & \n", " (dados_covid_PB.casosNovos>1500)]\n", "```\n", "como" ] }, { "cell_type": "code", "execution_count": 144, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
casosAcumuladoscasosNovosdescartadosrecuperadosobitosAcumuladosobitosNovosLetalidade
data
2021-07-023984401536332530262757865426.00.0217
2021-06-243850182391332313250519847926.00.0220
2021-06-213780642114330969249135841130.00.0222
2021-06-203759502178330721248667838133.00.0223
2021-06-193737722639329903248304834832.00.0223
2021-06-183711332555328941243203831647.00.0224
2021-06-173685782457327929242820826932.00.0224
2021-06-163661212328326517242184823740.00.0225
2021-06-153637932358325032241749819739.00.0225
2021-06-143614352575324438241293815830.00.0226
2021-06-133588602034324327236555812833.00.0226
2021-06-123568262111323444236186809537.00.0227
2021-06-113547152451322301235778805855.00.0227
2021-06-103522643911231258235329800346.00.0227
2021-06-063441741982319055230002784929.00.0228
2021-06-053421921992318096229790782029.00.0229
2021-06-043402002421317390229489779126.00.0229
2021-06-033377792660316011229062776530.00.0230
2021-06-023351192403314538225278773532.00.0231
2021-06-013327161751314520224912770331.00.0232
2021-05-293287712429312011224255761728.00.0232
2021-05-283263421917311013223896758933.00.0233
2021-05-273244251887310639223537755636.00.0233
2021-05-263225382262309492221134752036.00.0233
2021-05-253202762015308275220621748436.00.0234
2021-04-072670081551274623189753605049.00.0227
2021-03-312591731553270560183668574473.00.0222
2021-03-262528891611266785182404545248.00.0216
2021-03-252512781672265533180125540450.00.0215
2021-03-132373131694254238171130489631.00.0206
2020-08-0485760154910650038554190131.00.0222
2020-07-318279416869675335971181126.00.0219
2020-07-237310421328404728566161837.00.0221
2020-07-105911815046956721481122933.00.0208
2020-07-085634415426754919999117126.00.0208
2020-07-075480216516493319373114527.00.0209
2020-06-30469571900430701493097746.00.0208
2020-06-0922452150120650467153427.00.0238
\n", "
" ], "text/plain": [ " casosAcumulados casosNovos descartados recuperados \\\n", "data \n", "2021-07-02 398440 1536 332530 262757 \n", "2021-06-24 385018 2391 332313 250519 \n", "2021-06-21 378064 2114 330969 249135 \n", "2021-06-20 375950 2178 330721 248667 \n", "2021-06-19 373772 2639 329903 248304 \n", "2021-06-18 371133 2555 328941 243203 \n", "2021-06-17 368578 2457 327929 242820 \n", "2021-06-16 366121 2328 326517 242184 \n", "2021-06-15 363793 2358 325032 241749 \n", "2021-06-14 361435 2575 324438 241293 \n", "2021-06-13 358860 2034 324327 236555 \n", "2021-06-12 356826 2111 323444 236186 \n", "2021-06-11 354715 2451 322301 235778 \n", "2021-06-10 352264 3911 231258 235329 \n", "2021-06-06 344174 1982 319055 230002 \n", "2021-06-05 342192 1992 318096 229790 \n", "2021-06-04 340200 2421 317390 229489 \n", "2021-06-03 337779 2660 316011 229062 \n", "2021-06-02 335119 2403 314538 225278 \n", "2021-06-01 332716 1751 314520 224912 \n", "2021-05-29 328771 2429 312011 224255 \n", "2021-05-28 326342 1917 311013 223896 \n", "2021-05-27 324425 1887 310639 223537 \n", "2021-05-26 322538 2262 309492 221134 \n", "2021-05-25 320276 2015 308275 220621 \n", "2021-04-07 267008 1551 274623 189753 \n", "2021-03-31 259173 1553 270560 183668 \n", "2021-03-26 252889 1611 266785 182404 \n", "2021-03-25 251278 1672 265533 180125 \n", "2021-03-13 237313 1694 254238 171130 \n", "2020-08-04 85760 1549 106500 38554 \n", "2020-07-31 82794 1686 96753 35971 \n", "2020-07-23 73104 2132 84047 28566 \n", "2020-07-10 59118 1504 69567 21481 \n", "2020-07-08 56344 1542 67549 19999 \n", "2020-07-07 54802 1651 64933 19373 \n", "2020-06-30 46957 1900 43070 14930 \n", "2020-06-09 22452 1501 20650 4671 \n", "\n", " obitosAcumulados obitosNovos Letalidade \n", "data \n", "2021-07-02 8654 26.0 0.0217 \n", "2021-06-24 8479 26.0 0.0220 \n", "2021-06-21 8411 30.0 0.0222 \n", "2021-06-20 8381 33.0 0.0223 \n", "2021-06-19 8348 32.0 0.0223 \n", "2021-06-18 8316 47.0 0.0224 \n", "2021-06-17 8269 32.0 0.0224 \n", "2021-06-16 8237 40.0 0.0225 \n", "2021-06-15 8197 39.0 0.0225 \n", "2021-06-14 8158 30.0 0.0226 \n", "2021-06-13 8128 33.0 0.0226 \n", "2021-06-12 8095 37.0 0.0227 \n", "2021-06-11 8058 55.0 0.0227 \n", "2021-06-10 8003 46.0 0.0227 \n", "2021-06-06 7849 29.0 0.0228 \n", "2021-06-05 7820 29.0 0.0229 \n", "2021-06-04 7791 26.0 0.0229 \n", "2021-06-03 7765 30.0 0.0230 \n", "2021-06-02 7735 32.0 0.0231 \n", "2021-06-01 7703 31.0 0.0232 \n", "2021-05-29 7617 28.0 0.0232 \n", "2021-05-28 7589 33.0 0.0233 \n", "2021-05-27 7556 36.0 0.0233 \n", "2021-05-26 7520 36.0 0.0233 \n", "2021-05-25 7484 36.0 0.0234 \n", "2021-04-07 6050 49.0 0.0227 \n", "2021-03-31 5744 73.0 0.0222 \n", "2021-03-26 5452 48.0 0.0216 \n", "2021-03-25 5404 50.0 0.0215 \n", "2021-03-13 4896 31.0 0.0206 \n", "2020-08-04 1901 31.0 0.0222 \n", "2020-07-31 1811 26.0 0.0219 \n", "2020-07-23 1618 37.0 0.0221 \n", "2020-07-10 1229 33.0 0.0208 \n", "2020-07-08 1171 26.0 0.0208 \n", "2020-07-07 1145 27.0 0.0209 \n", "2020-06-30 977 46.0 0.0208 \n", "2020-06-09 534 27.0 0.0238 " ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dados_covid_PB.query('obitosNovos>25 and casosNovos>1500') # note que 'and' é usado em vez de '&'" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "latex_metadata": { "affiliation": "Departamento de Computação Científica / UFPB", "author": "Gustavo Oliveira e Andréa Rocha", "date": "Julho de 2020", "title": "Aula 6A - Manipulação de Dados" }, "rise": { "enable_chalkboard": true, "footer": "UFPB - CI - DCC", "header": "Gustavo Oliveira / Andrea Rocha", "theme": "sky" } }, "nbformat": 4, "nbformat_minor": 4 }