Skip to content

Latest commit

 

History

History
3440 lines (3077 loc) · 75.8 KB

File metadata and controls

3440 lines (3077 loc) · 75.8 KB

Exploratory Data Analysis - Global Terrorism

As a security/defense analyst, we will try to find hot zone of terrorism across the world. We will create a Power BI Dashboard using the dataset Global Terrorism. Overview Page of Dashboard


This project will be split into 2 parts:

  • data preprocessing using Python Notebook: where we will split our one large 181691 records and 128 wide columns into star schema for better analysis and performance.

  • Power BI Dashboard:

    • where we tried to provide general insights such as:

      • total events.
      • top attacks type and most used weapons.
      • most active groups.
      • top targeted nations and units, and reasons behind these attacks.
      • Property Loss in USD per regions and countries.
      • Time Trend Analysis on attacks per region and countries between years [1970 - 2017]. Overview Dashboard Page with region selected | 2014 was the most active year across all Time Period.
    • secondly we presented Countries Insights:

      • Most Active Group per Country.
      • total suicide attacks, kidnapping events, and Property loss.
      • provided a heatmap for attacks per states and cities Countries Insight Page | Iraq is one of the most dangerous countries across world with a total ~23,000 attack with around 3,926 attack taking place on the year 2014.
    • also we produced groups insights such as

      • Active Years
      • Total Attacks
      • Success Rate
      • Is the group use suicidal attacks as a mean for accomplishing goals by checking suicide attacks rate to total attacks.
      • Top attacks type and favorite weapons per group.
      • Most Targeted Institutions. Group Insight Page Group Insight Page 2 Group Insight Page 3 | ETA is one of the oldest Groups working between the years 1972 and 2010, working across all of Europe, with a total attacks of 1650 and a success Rate of 85.45%, they mostly attacked the Civil Guard and Police Targets while mostly using Explosives and Firearms as their main weapons.

Part 1: Data Preprocessing:

After Exploring the data and modeling the Data here is the star schema used for developing the dataset

Schema

Jupyter Notebook
import pandas as pd
import numpy as np

df = pd.read_csv('./Dataset/globalterrorismdb_0718dist.csv', encoding = "ISO-8859-1")

print(*df.columns, sep=' , ')
/var/folders/00/0s56_3496fb7660g8b7bwcyh0000gn/T/ipykernel_70881/2654467892.py:4: DtypeWarning: Columns (4,6,31,33,61,62,63,76,79,90,92,94,96,114,115,121) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv('./Dataset/globalterrorismdb_0718dist.csv', encoding = "ISO-8859-1")


eventid , iyear , imonth , iday , approxdate , extended , resolution , country , country_txt , region , region_txt , provstate , city , latitude , longitude , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , attacktype1 , attacktype1_txt , attacktype2 , attacktype2_txt , attacktype3 , attacktype3_txt , targtype1 , targtype1_txt , targsubtype1 , targsubtype1_txt , corp1 , target1 , natlty1 , natlty1_txt , targtype2 , targtype2_txt , targsubtype2 , targsubtype2_txt , corp2 , target2 , natlty2 , natlty2_txt , targtype3 , targtype3_txt , targsubtype3 , targsubtype3_txt , corp3 , target3 , natlty3 , natlty3_txt , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weaptype1 , weaptype1_txt , weapsubtype1 , weapsubtype1_txt , weaptype2 , weaptype2_txt , weapsubtype2 , weapsubtype2_txt , weaptype3 , weaptype3_txt , weapsubtype3 , weapsubtype3_txt , weaptype4 , weaptype4_txt , weapsubtype4 , weapsubtype4_txt , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related
df.shape
(181691, 135)

1-DimLocation

dim_location = df.loc[:,
    ['country_txt',
     'region_txt',
     'provstate',
     'city']
]

replace_regex = '\(((\s*\w*\s*)*)\)|-*|\,|\'|\/|\.*|\(*\)*|\?|\=|\*'

dim_location['city'] = dim_location['city'].\
    replace(to_replace=replace_regex, value="", regex=True).\
    fillna('Unkonwn').\
    str.replace('&', 'and').\
    str.replace('Unkknown', 'Unknown').\
    str.title().\
    str.strip()

dim_location['provstate'] = df['provstate'].\
    replace(to_replace=replace_regex, value="", regex=True).\
    fillna('Unknown').\
    str.replace('é', 'e').\
    str.replace('&', 'and').\
    str.replace('temouchent', 'tecmouchent').\
    str.replace('Bosnia-Herzegovina', 'Bosnia And Herzegovina').\
    str.replace('ñ', 'n').\
    str.title().\
    str.strip()

dim_location.drop_duplicates(inplace=True)
dim_location['Location Id'] = dim_location.reset_index(drop=True).index + 1


dim_location.rename(columns={
                'country_txt': 'Country',
                'region_txt': 'Region',
                'provstate': 'Provenance/State',
                'city': 'City',}, inplace=True)
print(dim_location.shape)
dim_location.head()
(40676, 5)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Country Region Provenance/State City Location Id
0 Dominican Republic Central America & Caribbean Unknown Santo Domingo 1
1 Mexico North America Federal Mexico City 2
2 Philippines Southeast Asia Tarlac Unknown 3
3 Greece Western Europe Attica Athens 4
4 Japan East Asia Fukouka Fukouka 5
df = df.merge(dim_location,
              how='left',
              left_on=['country_txt', 'region_txt', 'provstate', 'city'],
              right_on=['Country', 'Region', 'Provenance/State', 'City',])
df.drop([
    'country',
    'country_txt',
    'region',
    'region_txt',
    'provstate',
    'city',
    'latitude',
    'longitude',
    'Country',
    'Region',
    'Provenance/State',
    'City'
], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid iyear imonth iday approxdate extended resolution specificity vicinity location ... scite1 scite2 scite3 dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related Location Id
0 197000000001 1970 7 2 NaN 0 NaN 1.0 0 NaN ... NaN NaN NaN PGIS 0 0 0 0 NaN NaN
1 197000000002 1970 0 0 NaN 0 NaN 1.0 0 NaN ... NaN NaN NaN PGIS 0 1 1 1 NaN NaN
2 197001000001 1970 1 0 NaN 0 NaN 4.0 0 NaN ... NaN NaN NaN PGIS -9 -9 1 1 NaN 3.0
3 197001000002 1970 1 0 NaN 0 NaN 1.0 0 NaN ... NaN NaN NaN PGIS -9 -9 1 1 NaN 4.0
4 197001000003 1970 1 0 NaN 0 NaN 1.0 0 NaN ... NaN NaN NaN PGIS -9 -9 1 1 NaN 5.0

5 rows × 128 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 128) 

eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , attacktype1 , attacktype1_txt , attacktype2 , attacktype2_txt , attacktype3 , attacktype3_txt , targtype1 , targtype1_txt , targsubtype1 , targsubtype1_txt , corp1 , target1 , natlty1 , natlty1_txt , targtype2 , targtype2_txt , targsubtype2 , targsubtype2_txt , corp2 , target2 , natlty2 , natlty2_txt , targtype3 , targtype3_txt , targsubtype3 , targsubtype3_txt , corp3 , target3 , natlty3 , natlty3_txt , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weaptype1 , weaptype1_txt , weapsubtype1 , weapsubtype1_txt , weaptype2 , weaptype2_txt , weapsubtype2 , weapsubtype2_txt , weaptype3 , weaptype3_txt , weapsubtype3 , weapsubtype3_txt , weaptype4 , weaptype4_txt , weapsubtype4 , weapsubtype4_txt , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id

2-DimAttackType

dim_attack_type = df.loc[:, ['attacktype1_txt']]

dim_attack_type['attacktype1_txt'].fillna('Unknown', inplace=True)

dim_attack_type.drop_duplicates(inplace=True)
dim_attack_type['Attack Type Id'] = dim_attack_type.reset_index(drop=True).index + 1

dim_attack_type.rename(columns={'attacktype1_txt': 'Attack Type'}, inplace=True)

print(dim_attack_type.shape)
dim_attack_type
(9, 2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Attack Type Attack Type Id
0 Assassination 1
1 Hostage Taking (Kidnapping) 2
3 Bombing/Explosion 3
4 Facility/Infrastructure Attack 4
5 Armed Assault 5
12 Hijacking 6
16 Unknown 7
298 Unarmed Assault 8
336 Hostage Taking (Barricade Incident) 9
df = df.merge(dim_attack_type,
              how='left',
              left_on=['attacktype1_txt'],
              right_on=['Attack Type'])

df.drop([
    'attacktype1',
    'attacktype1_txt',
    'attacktype2',
    'attacktype2_txt',
    'attacktype3',
    'attacktype3_txt',
    'Attack Type'], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid iyear imonth iday approxdate extended resolution specificity vicinity location ... scite2 scite3 dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related Location Id Attack Type Id
0 197000000001 1970 7 2 NaN 0 NaN 1.0 0 NaN ... NaN NaN PGIS 0 0 0 0 NaN NaN 1
1 197000000002 1970 0 0 NaN 0 NaN 1.0 0 NaN ... NaN NaN PGIS 0 1 1 1 NaN NaN 2
2 197001000001 1970 1 0 NaN 0 NaN 4.0 0 NaN ... NaN NaN PGIS -9 -9 1 1 NaN 3.0 1
3 197001000002 1970 1 0 NaN 0 NaN 1.0 0 NaN ... NaN NaN PGIS -9 -9 1 1 NaN 4.0 3
4 197001000003 1970 1 0 NaN 0 NaN 1.0 0 NaN ... NaN NaN PGIS -9 -9 1 1 NaN 5.0 4

5 rows × 123 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 123) 

eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , targtype1 , targtype1_txt , targsubtype1 , targsubtype1_txt , corp1 , target1 , natlty1 , natlty1_txt , targtype2 , targtype2_txt , targsubtype2 , targsubtype2_txt , corp2 , target2 , natlty2 , natlty2_txt , targtype3 , targtype3_txt , targsubtype3 , targsubtype3_txt , corp3 , target3 , natlty3 , natlty3_txt , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weaptype1 , weaptype1_txt , weapsubtype1 , weapsubtype1_txt , weaptype2 , weaptype2_txt , weapsubtype2 , weapsubtype2_txt , weaptype3 , weaptype3_txt , weapsubtype3 , weapsubtype3_txt , weaptype4 , weaptype4_txt , weapsubtype4 , weapsubtype4_txt , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id

3-DimTargetType & DimTarget

dim_target_type = df.loc[:, ['targtype1_txt', 'targsubtype1_txt']]

dim_target_type['targtype1_txt'].fillna('Unknown', inplace=True)
dim_target_type['targsubtype1_txt'].fillna('Unknown', inplace=True)

dim_target_type.drop_duplicates(inplace=True)
dim_target_type['Target Type Id'] = dim_target_type.reset_index(drop=True).index + 1

dim_target_type.rename(columns={
    'targtype1_txt': 'Target Type',
    'targsubtype1_txt': 'Target Subtype',
}, inplace=True)

print(dim_target_type.shape)
dim_target_type.head()
(139, 3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Target Type Target Subtype Target Type Id
0 Private Citizens & Property Named Civilian 1
1 Government (Diplomatic) Diplomatic Personnel (outside of embassy, cons... 2
2 Journalists & Media Radio Journalist/Staff/Facility 3
3 Government (Diplomatic) Embassy/Consulate 4
5 Police Police Building (headquarters, station, school) 5
dim_target = df.loc[:, [
    'target1',
    'corp1',
    'natlty1_txt'
]]

dim_target['target1'].fillna('Unknown', inplace=True)
dim_target['corp1'].fillna('Unknown', inplace=True)
dim_target['corp1'] = dim_target['corp1'].str.replace('\"', '')
dim_target['natlty1_txt'].fillna('Unknown', inplace=True)

dim_target.drop_duplicates(inplace=True)
dim_target['Target Id'] = dim_target.reset_index(drop=True).index + 1

dim_target.rename(columns={
    'target1': 'Target',
    'corp1': 'Corporate',
    'natlty1_txt': 'Nationality'
}, inplace=True)

print(dim_target.shape)
dim_target.head()
(114866, 4)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Target Corporate Nationality Target Id
0 Julio Guzman Unknown Dominican Republic 1
1 Nadine Chaval, daughter Belgian Ambassador Daughter Belgium 2
2 Employee Voice of America United States 3
3 U.S. Embassy Unknown United States 4
4 U.S. Consulate Unknown United States 5
df = df.merge(dim_target_type,
              how='left',
              left_on=['targtype1_txt', 'targsubtype1_txt'],
              right_on=['Target Type', 'Target Subtype'])
df = df.merge(dim_target,
              how='left',
              left_on=['target1', 'corp1', 'natlty1_txt'],
              right_on=['Target', 'Corporate', 'Nationality'])

df.drop([
    'targtype1', 'targtype1_txt', 'targsubtype1', 'targsubtype1_txt',
    'corp1', 'target1', 'natlty1', 'natlty1_txt', 
    'targtype2', 'targtype2_txt', 'targsubtype2', 'targsubtype2_txt', 
    'corp2', 'target2', 'natlty2', 'natlty2_txt', 
    'targtype3', 'targtype3_txt', 'targsubtype3', 'targsubtype3_txt', 
    'corp3', 'target3', 'natlty3', 'natlty3_txt',
    'Target Type', 'Target Subtype',
    'Target', 'Corporate', 'Nationality'
], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid iyear imonth iday approxdate extended resolution specificity vicinity location ... dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related Location Id Attack Type Id Target Type Id Target Id
0 197000000001 1970 7 2 NaN 0 NaN 1.0 0 NaN ... PGIS 0 0 0 0 NaN NaN 1 1.0 NaN
1 197000000002 1970 0 0 NaN 0 NaN 1.0 0 NaN ... PGIS 0 1 1 1 NaN NaN 2 2.0 2.0
2 197001000001 1970 1 0 NaN 0 NaN 4.0 0 NaN ... PGIS -9 -9 1 1 NaN 3.0 1 3.0 3.0
3 197001000002 1970 1 0 NaN 0 NaN 1.0 0 NaN ... PGIS -9 -9 1 1 NaN 4.0 3 4.0 NaN
4 197001000003 1970 1 0 NaN 0 NaN 1.0 0 NaN ... PGIS -9 -9 1 1 NaN 5.0 4 4.0 NaN

5 rows × 101 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 101) 

eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weaptype1 , weaptype1_txt , weapsubtype1 , weapsubtype1_txt , weaptype2 , weaptype2_txt , weapsubtype2 , weapsubtype2_txt , weaptype3 , weaptype3_txt , weapsubtype3 , weapsubtype3_txt , weaptype4 , weaptype4_txt , weapsubtype4 , weapsubtype4_txt , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id

4- DimWeapon

dim_weapon = df.loc[:, [
    'weaptype1_txt',
    'weapsubtype1_txt'
]]

dim_weapon['weaptype1_txt'].fillna('Unknown', inplace=True)
dim_weapon['weapsubtype1_txt'].fillna('Unknown', inplace=True)

dim_weapon.drop_duplicates(inplace=True)
dim_weapon['Weapon Id'] = dim_weapon.reset_index(drop=True).index + 1

dim_weapon.rename(columns={
    'weaptype1_txt': 'Weapon Type',
    'weapsubtype1_txt': 'Weapon Subtype'
}, inplace=True)

print(dim_weapon.shape)
dim_weapon.head()
(45, 3)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Weapon Type Weapon Subtype Weapon Id
0 Unknown Unknown 1
3 Explosives Unknown Explosive Type 2
4 Incendiary Unknown 3
5 Firearms Unknown Gun Type 4
6 Firearms Automatic or Semi-Automatic Rifle 5
df = df.merge(dim_weapon,
              how='left',
              left_on=['weaptype1_txt','weapsubtype1_txt'],
              right_on=['Weapon Type', 'Weapon Subtype'])

df.drop([
    'weaptype1', 'weaptype1_txt', 'weapsubtype1', 'weapsubtype1_txt', 
    'weaptype2', 'weaptype2_txt', 'weapsubtype2', 'weapsubtype2_txt', 
    'weaptype3', 'weaptype3_txt', 'weapsubtype3', 'weapsubtype3_txt', 
    'weaptype4', 'weaptype4_txt', 'weapsubtype4', 'weapsubtype4_txt',
    'Weapon Type', 'Weapon Subtype'
], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid iyear imonth iday approxdate extended resolution specificity vicinity location ... INT_LOG INT_IDEO INT_MISC INT_ANY related Location Id Attack Type Id Target Type Id Target Id Weapon Id
0 197000000001 1970 7 2 NaN 0 NaN 1.0 0 NaN ... 0 0 0 0 NaN NaN 1 1.0 NaN NaN
1 197000000002 1970 0 0 NaN 0 NaN 1.0 0 NaN ... 0 1 1 1 NaN NaN 2 2.0 2.0 NaN
2 197001000001 1970 1 0 NaN 0 NaN 4.0 0 NaN ... -9 -9 1 1 NaN 3.0 1 3.0 3.0 NaN
3 197001000002 1970 1 0 NaN 0 NaN 1.0 0 NaN ... -9 -9 1 1 NaN 4.0 3 4.0 NaN 2.0
4 197001000003 1970 1 0 NaN 0 NaN 1.0 0 NaN ... -9 -9 1 1 NaN 5.0 4 4.0 NaN NaN

5 rows × 86 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 86) 

eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , gname , gsubname , gname2 , gsubname2 , gname3 , gsubname3 , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id

5-DimGroup

dim_group = df.loc[:, ['gname']]

dim_group['gname'].fillna('Unknown', inplace=True)

dim_group.drop_duplicates(inplace=True)
dim_group['Group Id'] = dim_group.reset_index(drop=True).index + 1

dim_group.rename(columns={
    'gname': 'Group',
}, inplace=True)

print(dim_group.shape)
dim_group.head()
(3537, 2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Group Group Id
0 MANO-D 1
1 23rd of September Communist League 2
2 Unknown 3
5 Black Nationalists 4
6 Tupamaros (Uruguay) 5
df = df.merge(dim_group, how='left', left_on=['gname'], right_on=['Group'])

df.drop([
    'gname',
    'gsubname',
    'gname2',
    'gsubname2',
    'gname3',
    'gsubname3',
    'Group'
], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid iyear imonth iday approxdate extended resolution specificity vicinity location ... INT_IDEO INT_MISC INT_ANY related Location Id Attack Type Id Target Type Id Target Id Weapon Id Group Id
0 197000000001 1970 7 2 NaN 0 NaN 1.0 0 NaN ... 0 0 0 NaN NaN 1 1.0 NaN NaN 1
1 197000000002 1970 0 0 NaN 0 NaN 1.0 0 NaN ... 1 1 1 NaN NaN 2 2.0 2.0 NaN 2
2 197001000001 1970 1 0 NaN 0 NaN 4.0 0 NaN ... -9 1 1 NaN 3.0 1 3.0 3.0 NaN 3
3 197001000002 1970 1 0 NaN 0 NaN 1.0 0 NaN ... -9 1 1 NaN 4.0 3 4.0 NaN 2.0 3
4 197001000003 1970 1 0 NaN 0 NaN 1.0 0 NaN ... -9 1 1 NaN 5.0 4 4.0 NaN NaN 3

5 rows × 81 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 81) 

eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , location , summary , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , motive , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , weapdetail , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , propcomment , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , ransomnote , hostkidoutcome , hostkidoutcome_txt , nreleased , addnotes , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id

6-DimEventDesc.

dim_event_desc = df.loc[:, [
    'eventid',
    'location',
    'summary',
    'motive',
    'weapdetail',
    'ransomnote',
    'propcomment',
    'addnotes'
]]

dim_event_desc.rename(columns={
    'location': 'Location',
    'summary': 'Summary',
    'motive': 'Motive',
    'weapdetail': 'Weapon Details',
    'ransomnote': 'Ransom Note', 
    'propcomment': 'Property Damage Comment',
    'addnotes': 'Notes'
}, inplace=True)

print(dim_event_desc.shape)
dim_event_desc.head()
(181691, 8)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid Location Summary Motive Weapon Details Ransom Note Property Damage Comment Notes
0 197000000001 NaN NaN NaN NaN NaN NaN NaN
1 197000000002 NaN NaN NaN NaN NaN NaN NaN
2 197001000001 NaN NaN NaN NaN NaN NaN NaN
3 197001000002 NaN NaN NaN Explosive NaN NaN NaN
4 197001000003 NaN NaN NaN Incendiary NaN NaN NaN
df.drop([
    'location',
    'summary',
    'motive',
    'weapdetail',
    'ransomnote',
    'propcomment',
    'addnotes'
], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid iyear imonth iday approxdate extended resolution specificity vicinity crit1 ... INT_IDEO INT_MISC INT_ANY related Location Id Attack Type Id Target Type Id Target Id Weapon Id Group Id
0 197000000001 1970 7 2 NaN 0 NaN 1.0 0 1 ... 0 0 0 NaN NaN 1 1.0 NaN NaN 1
1 197000000002 1970 0 0 NaN 0 NaN 1.0 0 1 ... 1 1 1 NaN NaN 2 2.0 2.0 NaN 2
2 197001000001 1970 1 0 NaN 0 NaN 4.0 0 1 ... -9 1 1 NaN 3.0 1 3.0 3.0 NaN 3
3 197001000002 1970 1 0 NaN 0 NaN 1.0 0 1 ... -9 1 1 NaN 4.0 3 4.0 NaN 2.0 3
4 197001000003 1970 1 0 NaN 0 NaN 1.0 0 1 ... -9 1 1 NaN 5.0 4 4.0 NaN NaN 3

5 rows × 74 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 74) 

eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claimmode_txt , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , hostkidoutcome , hostkidoutcome_txt , nreleased , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id

7- DimCalimMode

dim_claim_mode = df.loc[:, ['claimmode_txt']]

dim_claim_mode['claimmode_txt'].fillna('Unknown', inplace=True)

dim_claim_mode.drop_duplicates(inplace=True)
dim_claim_mode['Claim Mode Id'] = dim_claim_mode.reset_index(drop=True).index + 1

dim_claim_mode.rename(columns={'claimmode_txt': 'Calim Mode'}, inplace=True)

print(dim_claim_mode.shape)
dim_claim_mode
(10, 2)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Calim Mode Claim Mode Id
0 Unknown 1
8 Letter 2
129 Call (pre-incident) 3
139 Other 4
225 Personal claim 5
606 Note left at scene 6
786 Call (post-incident) 7
68253 Posted to website, blog, etc. 8
68281 E-mail 9
70371 Video 10
df = df.merge(dim_claim_mode, how='left', left_on=['claimmode_txt'], right_on=['Calim Mode'])

df.drop(['claimmode_txt', 'Calim Mode'], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid iyear imonth iday approxdate extended resolution specificity vicinity crit1 ... INT_MISC INT_ANY related Location Id Attack Type Id Target Type Id Target Id Weapon Id Group Id Claim Mode Id
0 197000000001 1970 7 2 NaN 0 NaN 1.0 0 1 ... 0 0 NaN NaN 1 1.0 NaN NaN 1 NaN
1 197000000002 1970 0 0 NaN 0 NaN 1.0 0 1 ... 1 1 NaN NaN 2 2.0 2.0 NaN 2 NaN
2 197001000001 1970 1 0 NaN 0 NaN 4.0 0 1 ... 1 1 NaN 3.0 1 3.0 3.0 NaN 3 NaN
3 197001000002 1970 1 0 NaN 0 NaN 1.0 0 1 ... 1 1 NaN 4.0 3 4.0 NaN 2.0 3 NaN
4 197001000003 1970 1 0 NaN 0 NaN 1.0 0 1 ... 1 1 NaN 5.0 4 4.0 NaN NaN 3 NaN

5 rows × 74 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 74) 

eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , ishostkid , nhostkid , nhostkidus , nhours , ndays , divert , kidhijcountry , ransom , ransomamt , ransomamtus , ransompaid , ransompaidus , hostkidoutcome , hostkidoutcome_txt , nreleased , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id , Claim Mode Id

8- FactKidnapping

fact_kidnapping = df.loc[(df['ishostkid'] == 1), [
    'eventid',
    'nhostkid',
    'nhostkidus',
    'nhours',
    'ndays',
    'divert',
    'kidhijcountry',
    'ransom', 
    'ransomamt', 
    'ransomamtus', 
    'ransompaid', 
    'ransompaidus', 
    'hostkidoutcome_txt', 
    'nreleased'
]]

fact_kidnapping = fact_kidnapping.apply(lambda x: x.fillna(np.nan) if x.dtype.kind in 'biufc' else x.fillna('Unknown'))

fact_kidnapping.drop_duplicates(inplace=True)

fact_kidnapping.rename(columns={
    'nhostkid': 'Number Of Hostages Killed',
    'nhostkidus': 'Number Of Hostages Killed (US)',
    'nhours': 'Hours of Kidnapping',
    'ndays': 'Days of Kidnapping',
    'divert': 'Country That Kidnappers Diverted To',
    'kidhijcountry': 'Country of Kidnapping Resolution',
    'ransom': 'Ransom Demanded?', 
    'ransomamt': 'Ransom Amount', 
    'ransomamtus': 'Ransom Amount (US)', 
    'ransompaid': 'Ransom Paid', 
    'ransompaidus': 'Ransom Paid (US)', 
    'hostkidoutcome_txt': 'Kidnapping Outcome', 
    'nreleased': 'Number Released/Escaped/Rescued'
}, inplace=True)

fact_kidnapping
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid Number Of Hostages Killed Number Of Hostages Killed (US) Hours of Kidnapping Days of Kidnapping Country That Kidnappers Diverted To Country of Kidnapping Resolution Ransom Demanded? Ransom Amount Ransom Amount (US) Ransom Paid Ransom Paid (US) Kidnapping Outcome Number Released/Escaped/Rescued
1 197000000002 1.0 0.0 NaN NaN Unknown Mexico 1.0 800000.0 NaN NaN NaN Unknown NaN
27 197001220001 1.0 0.0 NaN NaN Unknown Venezuela 1.0 110000.0 NaN 110000.0 NaN Hostage(s) released by perpetrators 1.0
103 197003030001 1.0 0.0 NaN NaN Unknown Spain 1.0 0.0 NaN NaN NaN Unknown NaN
112 197003060001 1.0 1.0 0.0 2.0 Unknown Guatemala 1.0 0.0 NaN NaN NaN Hostage(s) released by perpetrators 1.0
123 197003110001 1.0 0.0 0.0 4.0 Unknown Brazil 1.0 0.0 NaN NaN NaN Hostage(s) released by perpetrators 1.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
181620 201712280040 3.0 0.0 NaN 2.0 Unknown Unknown 0.0 NaN NaN NaN NaN Successful Rescue 3.0
181630 201712290008 1.0 0.0 NaN -99.0 Unknown Unknown 0.0 NaN NaN NaN NaN Unknown -99.0
181636 201712290017 12.0 0.0 NaN -99.0 Unknown Unknown 0.0 NaN NaN NaN NaN Unknown -99.0
181660 201712300018 11.0 0.0 -99.0 NaN Unknown Unknown 0.0 NaN NaN NaN NaN Successful Rescue 11.0
181684 201712310019 1.0 0.0 14.0 NaN Unknown Unknown 0.0 NaN NaN NaN NaN Successful Rescue 1.0

13572 rows × 14 columns

df.drop([
    'ishostkid' ,
    'nhostkid',
    'nhostkidus',
    'nhours',
    'ndays',
    'divert',
    'kidhijcountry',
    'ransom',
    'ransomamt',
    'ransomamtus',
    'ransompaid',
    'ransompaidus',
    'hostkidoutcome',
    'hostkidoutcome_txt',
    'nreleased'
], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid iyear imonth iday approxdate extended resolution specificity vicinity crit1 ... INT_MISC INT_ANY related Location Id Attack Type Id Target Type Id Target Id Weapon Id Group Id Claim Mode Id
0 197000000001 1970 7 2 NaN 0 NaN 1.0 0 1 ... 0 0 NaN NaN 1 1.0 NaN NaN 1 NaN
1 197000000002 1970 0 0 NaN 0 NaN 1.0 0 1 ... 1 1 NaN NaN 2 2.0 2.0 NaN 2 NaN
2 197001000001 1970 1 0 NaN 0 NaN 4.0 0 1 ... 1 1 NaN 3.0 1 3.0 3.0 NaN 3 NaN
3 197001000002 1970 1 0 NaN 0 NaN 1.0 0 1 ... 1 1 NaN 4.0 3 4.0 NaN 2.0 3 NaN
4 197001000003 1970 1 0 NaN 0 NaN 1.0 0 1 ... 1 1 NaN 5.0 4 4.0 NaN NaN 3 NaN

5 rows × 59 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 59) 

eventid , iyear , imonth , iday , approxdate , extended , resolution , specificity , vicinity , crit1 , crit2 , crit3 , doubtterr , alternative , alternative_txt , multiple , success , suicide , guncertain1 , guncertain2 , guncertain3 , individual , nperps , nperpcap , claimed , claimmode , claim2 , claimmode2 , claimmode2_txt , claim3 , claimmode3 , claimmode3_txt , compclaim , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent , propextent_txt , propvalue , scite1 , scite2 , scite3 , dbsource , INT_LOG , INT_IDEO , INT_MISC , INT_ANY , related , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id , Claim Mode Id

9- FactAttackEvent Clean-up

df.drop([
    'iyear', 'imonth', 'iday', 'approxdate',
    'extended', 'resolution', 'specificity', 'vicinity',
    'crit1', 'crit2', 'crit3', 'doubtterr', 'alternative', 'alternative_txt', 'multiple',
    'guncertain1', 'guncertain2', 'guncertain3', 'compclaim', 'propextent',
    'claimmode', 'claim2' , 'claimmode2' , 'claimmode2_txt' , 'claim3' , 'claimmode3' , 'claimmode3_txt',
    'scite1' , 'scite2' , 'scite3' , 'dbsource' , 'INT_LOG' , 'INT_IDEO' , 'INT_MISC' , 'INT_ANY' , 'related'
], axis=1, inplace=True)

df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid success suicide individual nperps nperpcap claimed nkill nkillus nkillter ... property propextent_txt propvalue Location Id Attack Type Id Target Type Id Target Id Weapon Id Group Id Claim Mode Id
0 197000000001 1 0 0 NaN NaN NaN 1.0 NaN NaN ... 0 NaN NaN NaN 1 1.0 NaN NaN 1 NaN
1 197000000002 1 0 0 7.0 NaN NaN 0.0 NaN NaN ... 0 NaN NaN NaN 2 2.0 2.0 NaN 2 NaN
2 197001000001 1 0 0 NaN NaN NaN 1.0 NaN NaN ... 0 NaN NaN 3.0 1 3.0 3.0 NaN 3 NaN
3 197001000002 1 0 0 NaN NaN NaN NaN NaN NaN ... 1 NaN NaN 4.0 3 4.0 NaN 2.0 3 NaN
4 197001000003 1 0 0 NaN NaN NaN NaN NaN NaN ... 1 NaN NaN 5.0 4 4.0 NaN NaN 3 NaN

5 rows × 23 columns

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 23) 

eventid , success , suicide , individual , nperps , nperpcap , claimed , nkill , nkillus , nkillter , nwound , nwoundus , nwoundte , property , propextent_txt , propvalue , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id , Claim Mode Id
df.rename(columns={
    'success': 'Successful Attack?',
    'suicide': 'Suicide Attack?',
    'individual': 'Unaffiliated Individual(s)',
    'nperps': 'Number of Perpetrators',
    'nperpcap': 'Number of Perpetrators Captured',
    'claimed': 'Claim of Responsibility?',
    'nkill': 'Number of Fatalities',
    'nkillus': 'Number of Fatalities (US)',
    'nkillter': 'Number of Perpetrator Fatalities',
    'nwound': 'Number of Injured',
    'nwoundus': 'Number of Injured (US)',
    'nwoundte': 'Number of Perpetrators Injured',
    'property': 'Property Damage?',
    'propextent_txt': 'Extent of Property Damage',
    'propvalue': 'Value of Property Damage (USD)'
}, inplace=True)

print(df.shape, '\n')
print(*df.columns, sep=' , ')
(181691, 23) 

eventid , Successful Attack? , Suicide Attack? , Unaffiliated Individual(s) , Number of Perpetrators , Number of Perpetrators Captured , Claim of Responsibility? , Number of Fatalities , Number of Fatalities (US) , Number of Perpetrator Fatalities , Number of Injured , Number of Injured (US) , Number of Perpetrators Injured , Property Damage? , Extent of Property Damage , Value of Property Damage (USD) , Location Id , Attack Type Id , Target Type Id , Target Id , Weapon Id , Group Id , Claim Mode Id
# reorder columns

df = df.loc[:,[
    'eventid',
    'Location Id',
    'Attack Type Id',
    'Group Id',
    'Target Type Id',
    'Target Id',
    'Weapon Id',
    'Claim Mode Id',
    'Successful Attack?',
    'Suicide Attack?',
    'Unaffiliated Individual(s)',
    'Number of Perpetrators',
    'Number of Perpetrators Captured',
    'Claim of Responsibility?',
    'Number of Fatalities',
    'Number of Fatalities (US)',
    'Number of Perpetrator Fatalities',
    'Number of Injured',
    'Number of Injured (US)',
    'Number of Perpetrators Injured',
    'Property Damage?',
    'Extent of Property Damage',
    'Value of Property Damage (USD)'
]]
df = df.apply(lambda x: x.fillna(np.nan) if x.dtype.kind in 'biufc' else x.fillna('Unknown'))

df.loc[
    (df['Number of Perpetrators'] == -99.0) | (df['Number of Perpetrators'] == -9.0),
    'Number of Perpetrators'] = np.nan
df.loc[
    (df['Number of Perpetrators Captured'] == -99.0) | (df['Number of Perpetrators Captured'] == -9.0),
    'Number of Perpetrators Captured'] = np.nan
df.loc[(df['Value of Property Damage (USD)'] == -99.0),'Value of Property Damage (USD)'] = np.nan

df['Claim of Responsibility?'].replace({-9.0: 'Unknown', np.nan: 'Unknown', 0: 'No', 1: 'Yes'}, inplace=True)
df['Property Damage?'].replace({-9.0: 'Unknown', np.nan: 'Unknown', 0: 'No', 1: 'Yes'}, inplace=True)

df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
eventid Location Id Attack Type Id Group Id Target Type Id Target Id Weapon Id Claim Mode Id Successful Attack? Suicide Attack? ... Claim of Responsibility? Number of Fatalities Number of Fatalities (US) Number of Perpetrator Fatalities Number of Injured Number of Injured (US) Number of Perpetrators Injured Property Damage? Extent of Property Damage Value of Property Damage (USD)
0 197000000001 NaN 1 1 1.0 NaN NaN NaN 1 0 ... Unknown 1.0 NaN NaN 0.0 NaN NaN No Unknown NaN
1 197000000002 NaN 2 2 2.0 2.0 NaN NaN 1 0 ... Unknown 0.0 NaN NaN 0.0 NaN NaN No Unknown NaN
2 197001000001 3.0 1 3 3.0 3.0 NaN NaN 1 0 ... Unknown 1.0 NaN NaN 0.0 NaN NaN No Unknown NaN
3 197001000002 4.0 3 3 4.0 NaN 2.0 NaN 1 0 ... Unknown NaN NaN NaN NaN NaN NaN Yes Unknown NaN
4 197001000003 5.0 4 3 4.0 NaN NaN NaN 1 0 ... Unknown NaN NaN NaN NaN NaN NaN Yes Unknown NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
181686 201712310022 40675.0 5 2700 123.0 87673.0 4.0 1.0 1 0 ... Yes 1.0 0.0 0.0 2.0 0.0 0.0 Unknown Unknown NaN
181687 201712310029 28343.0 3 599 10.0 114865.0 17.0 NaN 1 0 ... No 2.0 0.0 0.0 7.0 0.0 0.0 Yes Unknown NaN
181688 201712310030 40676.0 4 2977 18.0 103106.0 9.0 NaN 1 0 ... No 0.0 0.0 0.0 0.0 0.0 0.0 Yes Unknown NaN
181689 201712310031 1592.0 3 3 9.0 114866.0 24.0 NaN 0 0 ... No 0.0 0.0 0.0 0.0 0.0 0.0 Unknown Unknown NaN
181690 201712310032 1795.0 3 3 NaN 52139.0 2.0 NaN 0 0 ... No 0.0 0.0 0.0 0.0 0.0 0.0 No Unknown NaN

181691 rows × 23 columns

dim_location.to_csv('dim_location.csv', index=False)
dim_attack_type.to_csv('dim_attack_type.csv', index=False)
dim_target_type.to_csv('dim_target_type.csv', index=False)
dim_target.to_csv('dim_target.csv', index=False)
dim_weapon.to_csv('dim_weapon.csv', index=False)
dim_group.to_csv('dim_group.csv', index=False)
dim_event_desc.to_csv('dim_event_desc.csv', index=False)
dim_claim_mode.to_csv('dim_claim_mode.csv', index=False)
fact_kidnapping.to_csv('fact_kidnapping.csv', index=False)
df.to_csv('fact_attack_event.csv', index=False)

Part 2: Data Exploration and Analysis Dashboard

Due to limitation of sharing Power BI dashboard, I have will share the screenshots of the dashboard and the link to the dashboard. GlobalTerrorismDashboard.pbix

Dashboard Screenshots

1-Overview:

Dashboard Overview Dashboard Overview Dashboard Overview Dashboard Overview Dashboard Overview Dashboard Overview


2- Country:

Dashboard Overview Dashboard Country Dashboard Country Dashboard Country

2.1- Group by Country:

Dashboard Country Dashboard Country Dashboard Country


3- Group:

Dashboard Group Dashboard Group Dashboard Group Dashboard Group Dashboard Group Dashboard Group Dashboard Group

3.1- Group by Country and Year:

Dashboard Group Dashboard Group Dashboard Group

Please refer to the

  • GlobalTerrorismDashboard.pbix for more details.
    • The dashboard is interactive, you can click on any of the charts to filter the data.
  • The Codebook for more details about the data, its collection process and the data dictionary.

P.S: This Dashboard was developed as part of the Spark Foundation Internship Program, doesn't represent the views of the Analyst. and it is not intended to be used for any other purpose except as a show case study.