Text and data mining (TDM)
The goal is the optimum reusability of digital data for science.
"Text and data mining" is an umbrella term to describe algorithm or statistics-based analysis methods for the discovery of meaning structures in digital text and other data. Text and data mining is normally used to evaluate extensive data sets (Big Data). Libraries also have large amounts of digital information at their disposal. ETH Library advocates providing this data wherever legally possible and useful for academic re-use within the scope of text and data mining. Only holding-related data is checked. Personal data is not included.
Fields of action
ETH Library is involved in the use of text and data mining in the following fields:
- Within the scope of the revision of Swiss copyright law, ETH Library champions the introduction of a so-called academic limit, which should enable the use of technical methods in analyzing works for research purposes.
- Moreover, ETH Library also tackles the following topics within the scope of a workgroup deployed within the library: legal foundations for text and data mining, selection of ETH Library holdings that suggest themselves for text and data mining, use of text and data mining to index contents.
Legal advice and support for members of ETH Zurich
The providers of ETH Library's licensed resources (e.g. e-journals, e-books) only permit text and data mining under limited conditions. Further negotiations or the conclusion of an additional agreement are often necessary, as the policies of the publishers Elsevier (external link), Wiley (external link) and Springer Nature (external link) exemplify.
Please contact us if you are planning a text and data mining project based on the resources licensed by ETH Library. We would gladly handle the necessary clarifications with the providers in question. Please note that unauthorised text and data mining is an infringement of the licence terms agreed between ETH Zurich and the providers and may result in loss of access for the whole of ETH Zurich.
Besides licensed content, there are also freely accessible resources that permit unlimited text mining:
- Arxiv (external link)Free access to preprints from the fields of physics, mathematics, IT, statistics, financial mathematics and biology
- BioMed Central (external link)Over 300 BioMed Central, Chemistry Central and SpringerOpen open access journals from the fields of biology and medicine
- Chronicling America: Historical American Newspapers (external link)Collection of digitized historical newspapers from the years 1789 to 1924
- Digital Public Library of America (external link)Access to digital copies of cultural assets from American museums, libraries and archives
- Europeana (external link)Digital library with digital copies on academic and cultural heritage from over 2,000 European institutions
- HathiTrust Digital Library (external link)Digital copies from over 120 academic institutions worldwide
- Internet Archive (external link)Access to millions of freely accessible books and texts
- Public Library of Science (PLOS) (external link)Access to the contents of journals published by Public Library of Science, an academic open access publisher
- PubMed Central: Databases and Text Mining Tools (external link)Diverse freely accessible mining tools which can be used to browse PubMed Central, an archive with freely accessible contents from the fields of biology and biomedicine.
The CrossRef Text and Data Mining Tool (external link) is a free publisher-independent service offered by the company Cross Ref (e.g. AIP, APA, APS, Elsevier, HighWire Press, Springer, Taylor&Francis, Walter de Gruyter, Wiley). In order to gain access to the full contents, however, negotiations with the providers are often necessary here, too. Therefore, please contact us if you have any questions.