Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error message when reading an artifact from github-catalog #1479

Open
yoavkatz opened this issue Jan 6, 2025 · 2 comments · Fixed by #1494
Open

Error message when reading an artifact from github-catalog #1479

yoavkatz opened this issue Jan 6, 2025 · 2 comments · Fixed by #1494
Assignees

Comments

@yoavkatz
Copy link
Member

yoavkatz commented Jan 6, 2025

This code fails on 1.16.0 and above, and works on 1.15.9.

import unitxt
from unitxt import load_dataset 
unitxt.settings.allow_unverified_code=True
recipe = "card=cards.doc_vqa.en,template=templates.qa.with_context.simple,loader_limit=100,demos_taken_from=train,augmentor=augmentors.no_augmentation,demos_removed_from_data=True,max_test_instances=20"
dataset = load_dataset(dataset_query=recipe)['test'].select(range(20))

This is the error:

  File "/Users/yoavkatz/migrate/v1/unitxt/src/unitxt/artifact.py", line 596, in fetch_artifact
    artifact_to_return = catalog.get_with_overwrite(
  File "/Users/yoavkatz/migrate/v1/unitxt/src/unitxt/catalog.py", line 63, in get_with_overwrite
    return self.load(name, overwrite_args=overwrite_args)
  File "/Users/yoavkatz/migrate/v1/unitxt/src/unitxt/catalog.py", line 53, in load
    return Artifact.load(
  File "/Users/yoavkatz/migrate/v1/unitxt/src/unitxt/artifact.py", line 288, in load
    return artifact_link.load(overwrite_args)
  File "/Users/yoavkatz/migrate/v1/unitxt/src/unitxt/artifact.py", line 506, in load
    d = artifacts_json_cache(path)
  File "/Users/yoavkatz/migrate/v1/unitxt/src/unitxt/utils.py", line 127, in artifacts_json_cache
    return load_json(artifact_path)
  File "/Users/yoavkatz/migrate/v1/unitxt/src/unitxt/utils.py", line 131, in load_json
    with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'https://raw.githubusercontent.com/IBM/unitxt/1.16.0/src/unitxt/catalog/tasks/qa/with_context.json'

It seems the code tries load an artifact from a remote catalog as a local file.

Probably relates to

 def load(self, overwrite_args: dict) -> Artifact:
        # identify the catalog for the artifact_linked_to
        assert (
            self.artifact_linked_to is not None
        ), "'artifact_linked_to' must be non-None in order to load it from the catalog. Currently, it is None."
        assert isinstance(
            self.artifact_linked_to, str
        ), f"'artifact_linked_to' should be a string (expressing a name of a catalog entry). Currently, its type is: {type(self.artifact_linked_to)}."
        needed_catalog = None
        catalogs = list(Catalogs())
        for catalog in catalogs:
            if self.artifact_linked_to in catalog:
                needed_catalog = catalog

        if needed_catalog is None:
            raise UnitxtArtifactNotFoundError(self.artifact_linked_to, catalogs)

        path = needed_catalog.path(self.artifact_linked_to) 
        d = artifacts_json_cache(path). <-- this is line 506.  It fetches locally a file.
@dafnapension
Copy link
Collaborator

dafnapension commented Jan 7, 2025

Hi @yoavkatz ,
Thanks, an interesting detective task!
The piece of code that looks for a 'fruitful' catalog is:

        catalogs = list(Catalogs())
        for catalog in catalogs:
            if self.artifact_linked_to in catalog:
                needed_catalog = catalog

This returns the last matching catalog. For some reason, the check whether the needed url is in the catalog, for the internet catalog, returns OK for the catalog of 1.16.0 and not OK for 1.15.10. So for 1.15.10 the code does not continue to expect input from the internet catalog, whereas for 1.16.0, it does. However the code that loads the expected artifact is only tailored for local file system:

def load_json(path):
    with open(path) as f:
        try:
            return json.load(f)

So I did two things:
(1) fixed the above fetching that seems tailored for file system only, and not internet, replaced it, in the case of github_catalog, with requests.get
(2) changed the above loop that looks for a fruitful catalog, so that it stops on the first fruitful local catalog, or, if no local catalog contains the artifact - settle for a remote catalog, having fixed the load therefrom in (1) above.

@dafnapension
Copy link
Collaborator

The problem in reading from github still exists:

github_catalog = GithubCatalog()
path = github_catalog.path("cards.cola[task=tasks.classification.multi_class[metrics=[metrics.accuracy]]")
artifact = Artifact.load(path=path)

print(artifact.__id__)

# assert employed overwrites
print(artifact.task)

Throws a similar exception: trying to fetch an artifact from github as if github were a file system.

@dafnapension dafnapension changed the title Error message when linking to artifact Error message when reading an artifact from github-catalog Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants