Skip to content

HenoHocuM/fast_align

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fast_align

fast_align is a simple, fast, unsupervised word aligner.

If you use this software, please cite:

The source code in this repository is provided under the terms of the Apache License, Version 2.0.

Input format

Input to fast_align must be tokenized and aligned into parallel sentences. Each line is a source language sentence and its target language translation, separated by a triple pipe symbol (|||). An example is as follows.

doch jetzt ist der Held gefallen . ||| but now the hero has fallen .
neue Modelle werden erprobt . ||| new models are being tested .
doch fehlen uns neue Ressourcen . ||| but we lack new resources .

Compiling and using fast_align

fast_align requires only a C++ compiler; it can be compiled by typing make at the command line prompt.

Run fast_align to see a list of command line options. Here is an example invocation:

./fast_align -i text.fr-en -d -o -v > forward.align

Output

fast_align produces outputs in the i-j "Pharaoh" format, where a pair i-j indicates that the ith word of the source is aligned to the jth word of the target sentence. For example, an good alignment of the above example corpus would be:

0-0 1-1 2-4 3-2 4-3 5-5 6-6
0-0 1-1 2-2 2-3 3-4 4-5
0-0 1-2 2-1 3-3 4-4 5-5

About

Simple, fast unsupervised word aligner

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published