-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trademark Search Support #64
Comments
Hi Parker, great tool, I would your professional opinion, I want to do something fairly simply but I have been hitting my head for 3 days now. I am looking for a means to search a brand or trademark like "Nike" and to then get a list of possible owners like "Nike, Inc". I can retrieve this information from the website https://tmsearch.uspto.gov/, but I have not idea what API to use 🤷 |
Hey! Thanks for the note! So, there is no API for that kind of thing, but you're correct that the closest USPTO service is TESS. So you'd need to automate manipulating TESS. I've done something similar for Pat/FT and App/FT on the patent side, which also both don't have API's. A good place to start is Freeform TESS searching. I will say, I made a run years ago at including a TESS module in patent_client, and the persistent issue I had was cookie management. TESS was designed in ancient times (for the internet), so it has really obtuse ways of handling state that make it hard to wrap with a client library. That said, I'm highly confident that it can be done. And would highly encourage you to make a go at it! |
Thanks a lot Parker, that explains why I see so few attempts in doing that online! |
The patent office has one but you'll need an API key https://developer.uspto.gov/api-catalog/tsdr-data-api I have one, so they aren't hard to get. They have a Swagger UI page for it but they didn't update it after they added the API key. I have an unsanctioned copy here https://mustberuss.github.io/TSDR-Swagger/ that I can't get them to adopt. It won't work online (CORS not allowed on their end) but the generated curl commands work, in case you want to kick the API while you are writing a manager for it. Or even better, my Swagger object can be imported into postman to give you a nicely loaded collection for the API https://mustberuss.github.io/TSDR-Swagger/myswagger_v1_tsdr_uspto.json Also check out https://github.com/Ethan3600/USPTO-Trademark-API I contributed there a while back. I don't remember all the details but I don't think it needed an API key. Oops, my bad, for these you need to already know a registration or serial number. Back to trying to scrape TESS... |
Exactly, that makes it eternally tough.
|
Not sure if anyone is working on this but I found a script here https://stackoverflow.com/a/43519721 that would do the initial search. Something similar could go against the freeform search page where the result size can be set up to 500 records. That page lists the codes for each field. My favorite trick is to and in a Registration Number not equal to 0 to limit the search to ones that have been registered It's still not clear to me how to produce a manager, model and schema though. I'll need to reread the developer doc. |
So, I was poking around on this one, and I think I have a solution. Similar to what I did over on Patent Public Search, I think I can bake the state management into the session object. So here we are - my scratch code for a TESS session that tracks the weird state object! This does this with a few key features:
There's still a ways to go to make this a functioning part of Patent_Client, but I think this solves the hardest piece of the puzzle: import re
import requests
import threading
from concurrent.futures import ThreadPoolExecutor
state_re = re.compile(r"state=(?P<session_id>\d{4}:\w+).(?P<query_id>\d+).(?P<record_id>\d+)")
def dict_replace(dictionary, text, replacement):
# Recursively iterate through a dictionary
# and replace all occurences of "text" with "replacement"
# Used to put the session ID in request information
if not dictionary:
return dictionary
for k, v in dictionary.items():
if isinstance(v, dict):
dictionary[k] = dict_replace(v, text, replacement)
elif isinstance(v, str) and v == text:
dictionary[k] = replacement
return dictionary
class TessState():
def __init__(self, response_body):
state = state_re.search(response_body).groupdict()
self.session_id = state['session_id']
self.query_id = state['query_id']
self.record_id = state['record_id']
def __str__(self):
return f"{self.session_id}.{self.query_id}.{self.record_id}"
class TessSession(requests.Session):
heartbeat_interval = 30
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.state = None
self.state_lock = threading.Lock()
self.keep_alive_thread = ThreadPoolExecutor(thread_name_prefix="TESS-Keep-Alive")
self.headers['user-agent'] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"
def request(self, *args, **kwargs):
with self.state_lock:
state = bool(self.state)
if not state:
self.login()
with self.state_lock:
if "url" in kwargs: # URL is passed as a keyword argument
kwargs['url'] = kwargs['url'].replace("{{state}}", str(self.state))
if "params" in kwargs:
kwargs['params'] = dict_replace(kwargs['params'], "{{state}}", str(self.state))
if "data" in kwargs:
kwargs['data'] = dict_replace(kwargs['data'], "{{state}}", str(self.state))
if "json" in kwargs:
kwargs['json'] = dict_replace(kwargs['json'], "{{state}}", str(self.state))
response = super().request(*args, **kwargs)
if state_re.search(response.text):
with self.state_lock:
self.state = TessState(response.text)
return response
def login(self):
login_response = super().request("get", "https://tmsearch.uspto.gov/bin/gate.exe", params={"f": "login", "p_lang": "english", "p_d": "trmk"})
with self.state_lock:
self.state = TessState(login_response.text)
print(f"Logged in! Current State: {self.state}")
# Kill the existing keep-alive thread
self.keep_alive_thread.shutdown(wait=False, cancel_futures=True)
# Create a new keep-alive thread
self.keep_alive_thread = ThreadPoolExecutor(thread_name_prefix="TESS-Keep-Alive")
self.keep_alive_thread.submit(self.keep_alive)
def keep_alive(self):
while True:
with self.state_lock:
response = super().request("get", "https://tmsearch.uspto.gov/bin/gate.exe", params={"f": "tess", "state": str(state)})
time.sleep(self.heartbeat_interval)
def __del__(self):
self.keep_alive_thread.shutdown(wait=False, cancel_futures=True)
super().request("post", "https://tmsearch.uspto.gov/bin/gate.exe", data={"state": str(state), "f": "logout", "a_logout": "Logout"})
|
#70 is a PR to track the work on this. I've abandoned using the separate keep-alive thread for TESS, mostly because it makes testability a total nightmare. Instead, I'm trying to preserve all the necessary state to recreate any given result, so if the session expires, all necessary steps can be "replayed" to get back to the same spot. The biggest issue now is getting to individual TESS records. The way that TESS links to them is using the state object, in the form {session_id}.{query_id}.{record_id}, so if the session ID ever expires, the only way to get back to a record is to replay the request, and fetch the matching record ID. Which is a pain. In the context of patent client, I think that means that every search result needs to have a bit of metadata with the original query, so that if the related TESS record needs to be fetched, the query can be repeated. Fun times! |
Looks like we'll have some time to figure this out. I just attended an advanced trademark searching webinar where they mentioned TESS will be replaced in about a year. (The slides are here and the recording will be posted in a few weeks. Learned a few things about regex like searches etc.) |
so, at the end of the day, do we have anything that does a simple trademark search, takes text as input and returns a list of trademarks on this name? |
Nope not yet, and if I am wrong please scream as loud as you can, this could save me hours on a monthly basis |
No description provided.
The text was updated successfully, but these errors were encountered: