Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maherukh/Issue64 #67

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 16 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,37 @@
# CDLI Daily Bulk Data Dump

**Last update was August 2022. Head to the [open-data](https://github.com/cdli-gh/open-data) repository for the current data dumps**
**Last update was August 2022. Head to the [open-data] (https://github.com/cdli-gh/open-data) repository for the current data dumps**

The repository contains a daily dump of all public catalogue and text data from the Cuneiform Digital Library Initiative.

## Getting the data

Make sure you have the Git Large File Storage extentions ([`git-lfs`](https://github.com/git-lfs/git-lfs)) installed, see [here](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) for instructions. For installing under, say, Ubuntu, you can also use
Make sure you have the Git Large File Storage extensions ([`git-lfs`] (https://github.com/git-lfs/git-lfs)) installed, see [here](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage) for instructions. For installing under, say, Ubuntu, you can also use.

$> curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$> sudo apt-get install git-lfs

Clone the repository
Clone the repository:

Speed up the process of cloning the repository, it is recommended to use the command.

$> git clone https://github.com/cdli-gh/data --depth 1

The above command only fetches the most recent commit, which is much faster than fetching all historical commits.

Fetch all historical commits using:

$> git clone https://github.com/cdli-gh/data

Retrieve Git LSF data:
Retrieve Git LFS data:

$> cd data
$> git lfs fetch

## Format
### Text Data
The CDLI transliterations dump is offered in plain text UTF-8 ATF format.
For more information about ATF, visit :
For more information about ATF, visit:

http://oracc.museum.upenn.edu/doc/help/editinginatf/cdliatf/index.html (Scroll down for an example).

Expand All @@ -32,13 +40,13 @@ For more information about ATF, visit :
The catalogue is offered in a UTF-8 comma separated format. Most fields are thoroughly explained here:

https://cdli.ucla.edu/?q=cdli-search-information
Our data schema is currently being remodeled, get in touch if you would like a sneak peak!
Our data schema is currently being remodelled, get in touch if you would like a sneak peek!

To view a sample of the catalogue, you can use the head command on a Unix machine using this syntax, while you are in the directory where the file is stored:
View a sample of the catalogue, you can use the head command on a Unix machine using this syntax, while you are in the directory where the file is stored:
```
head cdli_catalogue_1of2.csv
```
With Windows Power Shell, try
With Windows Power Shell, try.
```
Get-Content *filename* -Head *n*
```
Expand Down