↵ Return Home
What is SciA11y?
This is an experimental prototype created by Semantic Scholar that aims to render scientific paper PDFs in HTML so they can be more easily read by screen readers or on mobile devices. All data are extracted from PDF, which is an imperfect process. As such, some components of each paper may be missing, e.g., some figures and math equations may not be extracted. We are working on ways to improve extraction quality.
In the meantime, we want to know whether the current renderer could be useful to you. We have implemented the following features to help navigate each paper:
- A Table of Contents at the start of the document links to all section headers, figures, and tables.
- Referenced papers are listed at the end. Links are provided between inline citations and reference items when available. Links following each reference take you to the first inline mention of that reference in a particular section.
- Figures and tables are available under figure tags, along with captions. When we fail to extract a figure or table, we denote this in the figure caption. Alt-text is not currently extracted; we are working on it.
Please note that table content and mathematical equations are not currently extracted. This is something we would like to eventually include, though the technology is in development.
How do I use SciA11y?
The main way to use SciA11y is by searching for papers to read. The search bar on the homepage can be used to look for specific papers, or to search for papers using relevant keywords. Queries are issued to the Semantic Scholar search API. Only papers with CC (non-ND) licenses with SciA11y HTML renders are shown in the search results.
What data is in SciA11y?
Currently, 1.5M papers are available in this demo. These papers have CC licenses which allow us to reproduce their content in an adapted form. Papers in this demo come from a static dataset that was created in April 2020 and consequently does not include any papers published after this date.
Why am I not finding a specific paper in search results?
This demo currently serves a static snapshot of 1.5M papers published before April 2020 for which we have generated a successful parse and for which we have found an appropriate license to redistribute the work in an altered form. It is likely that the paper you are looking for does not currently fall under one of these categorizations. Additionally, because we use the Semantic Scholar API for search, we currently load only up to the top 5 results returned. If the paper is not in SciA11y or not in the top search results, it will not be shown on the search results page.
Will this feature be available in Semantic Scholar?
The team has plans to introduce this feature in Semantic Scholar, but additional development and testing is needed to make this happen. If you would really like this feature or a variant of it to be available in Semantic Scholar, please reach out and let us know!
Please send questions or feedback to Lucy Lu Wang or Jonathan Bragg.