ead.py contains two scripts for a simple analysis of EAD files. save_comments(directory, comments_file) saves all comments in a directory of EAD documents to a file. count_files_containing_tag(tag, directory) counts how many files in a directory contain a tag.
There are 14 files which are encoded in utf-16. ead.py chokes on them:
- EMF.xml
- CS.xml
- KCGB.xml
- FLM.xml
- ANLM.xml
- KCAC.xml
- JMK.xml
- BLM.xml
- AG.xml
- JRNS.xml
- AMT.xml
- EFB.xml
- BRA.xml
- KCAS.xml
Also, there is one HTML file amont the XML files, aaa-00109.html.