Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong assignee/inventor coordinates #92

Open
WWakker opened this issue Nov 13, 2020 · 2 comments
Open

Wrong assignee/inventor coordinates #92

WWakker opened this issue Nov 13, 2020 · 2 comments

Comments

@WWakker
Copy link

WWakker commented Nov 13, 2020

I have checked all assignee and inventor coordinates from 1976 to June 2020 to see how reliable this data is, by checking all coordinates against a country bounding box (a square withNorth, East, South, West bounds). Turns out, not so much; there are many errors (223,195 to be precise).

Here are some of the problems that I found:

ASSIGNEE:

  • Waterloo, Canada gives coordinates in Belgium in 4217 cases
  • JP has coordinates 0.1, 0.1 in 4847 cases

INVENTOR:

  • TW Taoyuan County gives coordinates 47.7391, 18.1267 in 11075 cases (Hungary)
  • TW Taoyuan Hsien gives coordinates 38.1822, 116.111 in 7492 cases (China)
  • TW Changhua County gives coordinates 5.45, 10.0667 in 6090 cases (Cameroon)
  • KR Ichon-shi gives coordinates 40.2994, 128.202 in 2770 cases (North Korea)

CA: California instead of Canada
SC: Scotland instead of Seychelles

For example, here's a plot of assignee and inventor coordinates that have country codes "CA" outside of the bounding box of Canada:
2_CA_34111_errors

I'm sharing a CSV file with all errors that I found in a zip file, in case you want to look into this.
coords_outside_bounding_box.zip

@WWakker WWakker changed the title wrong assignee/inventor coordinates Wrong assignee/inventor coordinates Nov 13, 2020
@emelluso
Copy link

Thank you, @WWakker, we are investigating a replacement for the lat/long lookup data as it is not consistently accurate for non-U.S. locations. Looking for alternative sources of location data to better improve the accuracy of this visualization!

@WWakker
Copy link
Author

WWakker commented Nov 16, 2020

Thanks for looking into this!

However, I wouldn't say it is consistently accurate for US locations either. It's in ninth place with regards to number of coordinates outside of the bounding box according to my analysis. Although, most patents are based in the US so relative to the total number of patents based in the US it is indeed a lot better.
9_US_6964_errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants