From f6235a9506c5ae1bebda1bffd2e0d2388e34dad1 Mon Sep 17 00:00:00 2001 From: discdiver Date: Fri, 1 May 2020 07:59:55 -0400 Subject: [PATCH] create html version of second python notebook --- api-guide/ids-api-guide-python-2.html | 1020 ++++++++++++++++++++----- 1 file changed, 815 insertions(+), 205 deletions(-) diff --git a/api-guide/ids-api-guide-python-2.html b/api-guide/ids-api-guide-python-2.html index 9854e5f..6ea55ad 100644 --- a/api-guide/ids-api-guide-python-2.html +++ b/api-guide/ids-api-guide-python-2.html @@ -13017,45 +13017,6 @@ .highlight .vm { color: #19177C } /* Name.Variable.Magic */ .highlight .il { color: #666666 } /* Literal.Number.Integer.Long */ - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ExternalDebtStock
countrydate
East Asia & Pacific (excluding high income)20181.404732e+12
20171.287079e+12
20161.170963e+12
20151.037230e+12
20141.039942e+12
+ + + + + + + + + +
+
+
+

Let's make a copy of our DataFrame so we don't need to call the API again if we want to fix something.

+ +
+
+
+
In [9]:
+
+
+
EXD_df = EXD.copy()
+
+
@@ -13415,96 +13470,484 @@

4. Explore the data!
-

Data Cleaning

As you saw in the preview of the data in section 3, the DataFrame's format needs to be cleaned up. We want to reshape the data. This will get it ready to present in a table or in a visualization.

+

Data Cleaning

As you saw in the preview of the data in section 3, the DataFrame's format needs to be cleaned up. We want to reshape the data by moving the hierarchical index into the columns and make the date column the index. These changes will make the data ready to present in a table or in a visualization.

+ +
+
+

+
+
+
+

Let's move the index into the columns first.

-
In [8]:
+
In [10]:
-
# Reshape the data
-EXDreshaped = pd.DataFrame(EXD.to_records())
+
EXD_cleaned = EXD_df.reset_index()
+EXD_cleaned.head()
 
+
+
+ + +
+ +
Out[10]:
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
countrydateExternalDebtStock
0East Asia & Pacific (excluding high income)20181.404732e+12
1East Asia & Pacific (excluding high income)20171.287079e+12
2East Asia & Pacific (excluding high income)20161.170963e+12
3East Asia & Pacific (excluding high income)20151.037230e+12
4East Asia & Pacific (excluding high income)20141.039942e+12
+
+
+ +
+ +
+
+
-

The data for the long-term external debt stock is currently in units. To improve a table's or chart's readability, convert the units to billions and round the number to 0 decimal places. To do this, create a function called "formatNum" that you can then run on your DataFrame.

+

Basic statistics

+
+
+
+
+
+
In [11]:
+
+
+
EXD_cleaned.info()
+
+
+
+ +
+
+ + +
+ +
+ + +
+
<class 'pandas.core.frame.DataFrame'>
+RangeIndex: 60 entries, 0 to 59
+Data columns (total 3 columns):
+ #   Column             Non-Null Count  Dtype  
+---  ------             --------------  -----  
+ 0   country            60 non-null     object 
+ 1   date               60 non-null     object 
+ 2   ExternalDebtStock  60 non-null     float64
+dtypes: float64(1), object(2)
+memory usage: 1.5+ KB
+
+
+
+
+
+
-
In [9]:
+
In [12]:
-
# Creating a function that will change units to billions and round to 0 decimal places
-def formatNum(x):
-    y = x/1000000000
-    z = round(y)
-    return(z)
-
-# Running the function on the desired data column
-EXDreshaped.ExternalDebtStock = formatNum(EXDreshaped.ExternalDebtStock)
+
EXD_cleaned.describe(include='all')
 
+
+
+ + +
+ +
Out[12]:
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
countrydateExternalDebtStock
count60606.000000e+01
unique610NaN
topSub-Saharan Africa (excluding high income)2015NaN
freq106NaN
meanNaNNaN7.196861e+11
stdNaNNaN4.553943e+11
minNaNNaN1.417278e+11
25%NaNNaN3.166273e+11
50%NaNNaN5.567326e+11
75%NaNNaN1.177590e+12
maxNaNNaN1.534062e+12
+
+
+ +
+ +
+
+
-

These next two sections of code will clean up the naming of headers and regions. First, it will rename the column headers. Second, it will remove the redundant "(excluding high income)" from the region names. We can instead include that information in the title of the chart.

+

Let's improve the column names.

-
In [10]:
+
In [13]:
-
# Renaming column headers
-EXDclean = EXDreshaped.rename(index=str, columns={
-    "date":"Year",
-    "country":"Region",
-})
+
EXD_cleaned.columns=['Region', 'Year', 'ExternalDebtStock']
+EXD_cleaned.head()
 
+
+
+ + +
+ +
Out[13]:
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
RegionYearExternalDebtStock
0East Asia & Pacific (excluding high income)20181.404732e+12
1East Asia & Pacific (excluding high income)20171.287079e+12
2East Asia & Pacific (excluding high income)20161.170963e+12
3East Asia & Pacific (excluding high income)20151.037230e+12
4East Asia & Pacific (excluding high income)20141.039942e+12
+
+
+ +
+ +
+
+ +
+
+
+
+

Make the Year column the index, make it a datetime dtype, and sort it. This will improve the appearance of the DataFrame and make it possible to use pandas plotting methods to visualize the data in a line chart later.

+ +
+
-
In [15]:
+
In [14]:
-
# Remove the "(excluding high income)" from each of the region names
-EXDclean["Region"] = EXDclean["Region"].str.replace("excluding high income","").str.replace(")","").str.replace("(","")
+
EXD_cleaned.index = pd.to_datetime(EXD_cleaned['Year'])
+EXD_cleaned = EXD_cleaned.sort_index()
+EXD_cleaned.head()
 
+
+
+ + +
+ +
Out[14]:
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
RegionYearExternalDebtStock
Year
2009-01-01Latin America & Caribbean (excluding high income)20097.469656e+11
2009-01-01South Asia20092.933071e+11
2009-01-01Middle East & North Africa (excluding high inc...20091.417278e+11
2009-01-01Europe & Central Asia (excluding high income)20099.504303e+11
2009-01-01East Asia & Pacific (excluding high income)20094.944027e+11
+
+
+ +
+ +
+
+
-

Now our data should be ready to present in a table or visualize in a chart. Let's take a look at the first five lines again so we can compare the cleaned up data to the raw output in section 3.

+

The data for the long-term external debt stock is currently in units. To improve readability, let's convert the units to billions and round the number to 0 decimal places.

@@ -13514,7 +13957,20 @@

Data CleaningIn [20]:

-
print(EXDclean.head())
+
EXD_cleaned['ExternalDebtStock'] = round(EXD_cleaned['ExternalDebtStock']/1_000_000_000, 0)
+
+ +
+
+
+ +
+
-
In [13]:
+
In [17]:
-
# Defining the data source
-source = EXDclean
-
-# Creating the chart
-chart = px.line(EXDclean, 
-                x="Year",
-                y="ExternalDebtStock",
-                color="Region",
-                title="Regional Long-term External Debt Stock (excluding High-Income countries)(USD billion)")
-chart.update_layout(
-                plot_bgcolor="white")
-
-# Displaying the chart
-chart
+
EXD_cleaned["Region"] = EXD_cleaned["Region"].str.replace("\(excluding high income\)","")
+EXD_cleaned['Region'].value_counts()
 
@@ -13584,92 +14086,200 @@

Data Visualization -
+
Out[17]:
-
- - + +
+
Middle East & North Africa     10
+East Asia & Pacific            10
+Europe & Central Asia          10
+Latin America & Caribbean      10
+South Asia                     10
+Sub-Saharan Africa             10
+Name: Region, dtype: int64
+
+ +
+ +

+
+
+
+
+
+

Now our data should be ready to present in a table or visualize in a chart. Let's take a look at the first five rows again so we can compare the cleaned up data to the raw output in section 3.

+ +
+
+
+
+
+
In [18]:
+
+
+
EXD_cleaned.head()
+
+
+
+ +
+
+
-
+
Out[18]:
-
+
- - -
- -
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
RegionYearExternalDebtStock
Year
2009-01-01Latin America & Caribbean2009747.0
2009-01-01South Asia2009293.0
2009-01-01Middle East & North Africa2009142.0
2009-01-01Europe & Central Asia2009950.0
2009-01-01East Asia & Pacific2009494.0
+
+
+
+
+
+
+
+
+
+

Let's make a basic line graph with pandas (using Matplotlib as the backend plotting engine).

+
+
+
+
+
+
In [19]:
+
+
+
EXD_cleaned.groupby('Region')['ExternalDebtStock'].plot(
+    kind='line', 
+    legend='Region',
+    figsize=(10, 8),
+    title="Regional Long-term External Debt Stock (excluding High-Income countries)(USD billion)"
+);
+
+ +
+
+
+ +
+
+ + +
+ +
+ + + + +
+ +
+ +
+ +
+
+ +
+
+
+
+

Feel free to explore the data by modifying the chart or making your own!

+ +
+
+
+
+
+
+

Summary

You've seen how to retrieve data from the World Bank API and use it to make a visualization!

+

We can't wait to see what interesting insights you uncover with data from World Bank! 🌍

+ +
+