Meeting Notes #1
Replies: 13 comments 6 replies
-
DATA LAKE 8/1/23
|
Beta Was this translation helpful? Give feedback.
-
DATA LAKE 8/8/23
|
Beta Was this translation helpful? Give feedback.
-
DATA LAKE 8/15/23
|
Beta Was this translation helpful? Give feedback.
-
DATA LAKE 8/22/23
|
Beta Was this translation helpful? Give feedback.
-
Datalake Meeting notes, 29/08/2023 Current problem: CSV/TSV format is temporary on the way to making each row a JSON file Question: why JSON-LD and not RDF (which solr works with) What would this look like for Cantus Ultimus? |
Beta Was this translation helpful? Give feedback.
-
Datalake Meeting notes, 05/09/2023 Van: discussion of flattening of database with each piece/file type as a row (see https://github.com/malajvan/linkedmusic-datalake/tree/main/simssadb) Features to use in mock-database (SIMSSA+Cantus+??): composers? range? final pitch? For Cantus portion, will need 50 chants with “composer” (pope Gregory) and the modes Once the files are prepped these @dchiller @jinh0 and @homework36 will look into indexing in solr/elastic search to run some queries across databases (“find pieces that are in Phrygian”). @malajvan to delegate various other issues that need prepping. Difficulties of dealing with music-specific features discussed. See discussion here: #10 |
Beta Was this translation helpful? Give feedback.
-
Data Lake Meeting Notes, 12/09/2023 Some choices made in SIMSSA: -P86, composer—this links to Wikidata -Note that Q-IDs have the format "wd:Q1234" in order to expand properly using pyld; P-ID are values and not expanded, thus us different format. Compare to CantusDB JSON-LD: [Further discussion on properties and context: #13] Some initial forays into solr, explanation of schemas: -Started by indexing the unexpanded file; expanded may be more useful in the future Future datasets: |
Beta Was this translation helpful? Give feedback.
-
Meeting notes, 19/09/2023 Update on CantusDB data dump: Discussion of IDs vs values and effect on indexing. Decision to try both. (see also #14) Desired workflow: want things automatic so that when databases update, we don’t have to do anything (provided the metadata scheme is unchanged). Note: there is a service on OpenRefine (Reconcile-CSV)--takes two databases and reconcile with one another; could use this to reconcile old and new? -SIMSSA flattening update: getting rid of VIAF ID (use to reconcile but not store). -CantusDB: generated a new file with more+improved columns. #11 (comment) Properties discussion: Solr update: -Manual suffix adding: now automated for Cantus and SImssa, but The Session had a breaking use case: -Implications for Solr : might want to return tunes, recordings or settings, depending on the query ("all recordings by X artist") Future datasets: Elastic search: still work in progress To do: For project meeting demo, have an ugly box with a query. |
Beta Was this translation helpful? Give feedback.
-
9/26
non-reconciled values should just be a string (could be pure text option) - van is resolving everything to text
cantus ID has been added
|
Beta Was this translation helpful? Give feedback.
-
DATA LAKE 10/3
|
Beta Was this translation helpful? Give feedback.
-
DATA LAKE 10/10
OpenRefine (OR) remembers what you've chosen before with Wikidata - export a list of actions and just reapply it (need same column names) schema.org discussion post TO DO: make sure to document decisions made with Alastair at upcoming meeting on GitHub
search update discussion post grabbing the multiple languages from the wikidata profile for search
elastic search/json-ld problem discussion post (#24) |
Beta Was this translation helpful? Give feedback.
-
Agenda for 24 October
|
Beta Was this translation helpful? Give feedback.
-
DATA LAKE 10/24
|
Beta Was this translation helpful? Give feedback.
-
SIMSSA to Data Lake: all of SIMSSA DB is sample data imported to CSV
- possibly just dump it into JSON-LD
Beta Was this translation helpful? Give feedback.
All reactions