Summary
I assist the ePubs, Purdue University Press in maintaining their open access repository. By undertaking the following activities
- Automation of processing academic texts to extract necessary metadata.
- Vet the metadata data using web scraping techniques to ascertain whether a particular academic text complies the requirement of a text being permissible in the open access domain.
- Batch upload (~2000 academic publishings) to the Electronic Theses and Dissertations repositories.
- Collaborate with other open access publishers to post Purdue authors’ content from their repository to our repository at purdue.
This position requires a significant amount of web scraping and text analysis to efficiently work with scores of document at once. Time is of the essence because processing over 2000 academic publishings in a span of 6 months with minimal errors is a crucial task.