Author: Natalia Korchagina. University of Zurich, 2020.
This folder contains resources produced during my PhD project "Temporal entity extraction from hisrotical texts".

Spelling normalization.

1. Neural machine translation model was trained using the dl4mt library. 
2. The model is in SpellNormModel.zip
3. To use the model, unzip it first.
4. Install the dl4mt library https://github.com/nyu-dl/dl4mt-c2c
5. Run the translation script and indicate SpellNormModel.npz as the model.

Example: 

python /home/user/korchagina/dl4mt-c2c/translate/translate_char2char.py -model SpellNormModel.npz  -translate hs_de -saveto normalized_output.txt -source input.csv  > translate.log 2> translate.err

6. To improve the output of the neural MT system, I proceeded with the dictionary lookup. Manually produced mappings between historical and modern tokens described in Chapter 4 were applied in the following order of domain and temporal relevance: Gold_Norm, LemmData, GerManC.

7. Gold_Norm - manually normalized subset of the Gold Standard:

Gold_Norm.csv

8. LemmData - historical-modern word pairs from the database of historical terms of the Swiss Law Sources Foundation:

LemmData.csv