Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Latest commit

 

History

History
109 lines (77 loc) · 4.51 KB

README.md

File metadata and controls

109 lines (77 loc) · 4.51 KB

This repository contains all necessary scripts to run and understand government interest patents.

There are two steps involved in running the scripts:

1. Data Preparation - The scripts in the 1_R_Data_Prep folder will prepare data for analysis.

2. Data Visualization - The scripts in the 2_Data_Viz_Generate folder will generate visualizations.

Step 1: Data Preparation

a. Download the bulk download data files from http://www.patentsview.org/download/. 
These are the bulk download data files you will need:
	1. assignee
	2. foreigncitation
	3. government_interest
	4. government_organization
	5. inventor_gender 
	6. nber 
	7. patent 
	8. patent_assignee
	9. patent_govintorg 
	10.patent_inventor 
	11.rawassignee
	12. usapplicationcitation
	13. uspatentcitation
	14. wipo
	15. wipo_field

	Save these in the 'data_to_read' folder under the '2_Data_Viz_Generate' folder. 

b. Go to the folder '1_R_Data_Prep'. Open the scripts from this folder in R/RStudio. 

c. First look at requirements.R. Make sure you change the **input_folder** and **output_folder** variables to match the folder paths where you stored the bulk download files from part a above (Example folder path: "<Your-Path-Here>/government-interest/2_Data_Viz_Generate/data_to_read/"). 

d. Run the script "assignees_looked_up_types.R" first. Then run through the remaining R scripts in numerical order. When running scripts, make sure your working directory matches your current directory (Example folder path: *"<Your-Path-Here>/government-interest/1_R_Data_Prep/"*)

Note: Some of the scripts will take time to run since several bulk download tables are large. 
Here are estimated running times:
	* assignees_looked_up_types.R (~ 17 minutes)
	* 1_intermediate_patcit.R (~ 1 hour, 15 minutes)
	* 2_create_core_tables (~ 1 hour, 10 minutes)
	* 3_create_assignee_table (~ 6 minutes)
	* 4_inventor_gender (~ 3 minutes)
	* 5_create_5yr_citation_1thru5 (~ 1 hour)

e. All the temporary tables generated from running the scripts in '1_R_Data_Prep' will be saved to the output folder path you specified ("2_Data_Viz_Generate/data_to_read/" folder).
These tables include the following:
	
	Assignees_looked_up_types.R:
		1. assignees_lookedup_types.csv
		2. temp_gi_assignee_type.csv

	Script 1:
		1. temp_num_foreign_documents_cited.csv
		2. temp_num_us_applications_cited.csv
		3. temp_num_us_patents_cited.csv
		4. temp_num_times_cited_by_us_patents.csv
		5. temp_patent_counts_fac_vfinal.csv

	Script 2:
		1. temp_patent_level_all.csv
		2. temp_patent_level_gi_subset.csv
		3. temp_patent_level_nongi_subset.csv

	Script 3:
		1. temp_gi_assignee_type.csv
		2. all_assignees.csv
		3. assignee_type.csv

	Script 4:
		1. temp_govt_associated_inventors_clean.csv
		2. temp_gi_inventor_gender.csv
		3. temp_gi_has_female_inv.csv

	Script 5:
		1. temp_5yr_citations_by_cite_yr1
		2. temp_5yr_citations_by_cite_yr2
		3. temp_5yr_citations_by_cite_yr3
		4. temp_5yr_citations_by_cite_yr4
		5. temp_5yr_citations_by_cite_yr5


f. Additional files: 

1. Go to the following webpage: https://www.aaas.org/programs/r-d-budget-and-policy/historical-trends-federal-rd.
Download the Excel file for "Total R&D by Agency, 1976-2018" under the _By Agency_ section. 

2. Open this file in Excel and transpose the data table. To do this, copy the data table (only upto the "Total R&D" section (ignore the R&D: Defense and Nondefense section)) and use the paste special option _transpose the table_. 

3. Save the transposed table as a csv file with the name "agencies.csv" in the '2_Data_Viz_Generate/data_to_read/' folder.

Step 2: Data Visualizations

a. Go to the folder '2_Data_Viz_Generate'. Open the scripts from this folder in R/RStudio. Make sure you change your working directory to match your current directory (Example folder path: *"<Your-Path-Here>/government-interest/2_Data_Viz_Generate/"*)

b. Run through the script "requirements.R". Be sure to set the input and output folder paths.

c. Next, run the script "govIntBrief.R". This will generate all visualizations and runs for ~ 41 minutes.

d. Run through the "requirements.R" script and then run the script "govIntBrief_assignee.R".

e. Run through the "requirements.R" script and then run the script "govIntBrief_gender.R".

This script will generate output to two folders:
(1) Folder **'data_viz/'**: a folder to store all of the viz that will be generated from running this R script
(2) Folder **'out/'**: a folder to store all of the tables that will be generated from running this R script