Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I have been doing some research and I am trying to understand what is the standard way to read a pptx with JavaScript/Typescript in the browser.

A lot of the libraries I have found are mainly for node like textract . I found one library called JS-PPTX but the last commit was made in 2016 so that's not super promising.

Most of the libraries are about creating a Power Point presentation, but what I really need to do is be able to read the file and identify the contents of the slides.

I am happy to read the raw file format and try to parse it if that is better, but I just need a way to upload and read the file with the FileReader Api .

Or if there is a way to convert the pptx to another format that is easier to read I would be into that. One library I found called PPTX2HTML , but this last commit is from 2017.

I found this Stack Overflow post , but it is from 2010 so I am hoping there is an evolution of thought.

"what is the standard way to read a pptx with JavaScript/Typescript in the browser." - there isn't a standard way because there's no standards-organization or authority for this kind of activity (i.e. Microsoft has not released a JavaScript API for manipulating Office PowerPoint files). Dai Nov 21, 2020 at 1:52 "read the file and identify the contents of the slides" - this is non-trivial. This probably ranks slightly lower in difficulty trying to semantically parse a PDF file: I understand that PowerPoint slides are only minimally structured. Dai Nov 21, 2020 at 1:53 Yeah I saw that Microsoft has a JS api but that seems to be to make add ons within there products. There are libraries to read PDF with JavaScript but there doesn't seem to be a library for pptx. Do you have any suggestions on how to move forward ? Grant Herman Nov 21, 2020 at 2:02 I'd do it server-side, not client-side, using a proven library like Aspose. If this is for a client-side application (Electron? PhoneGap/Cordova?) then if it was for desktop-use I'd probably use an external .NET/COM process using the locally-installed Office COM automation library. This assumes Office is installed on the user's computers. Dai Nov 21, 2020 at 2:09

PPTX ( see the spec here ) is a zipped, XML-based file format that is part of the Microsoft Office Open XML (also known as OOXML or OpenXML) specification, introduced as part of Microsoft Office 2007 and later.

Browsers can parse XML, so you probably have to:

  • read the file with FileReader ,
  • unzip it somehow
  • parse it with DOMParser
  • maybe transform it with XSLT
  • Thanks for contributing an answer to Stack Overflow!

    • Please be sure to answer the question . Provide details and share your research!

    But avoid

    • Asking for help, clarification, or responding to other answers.
    • Making statements based on opinion; back them up with references or personal experience.

    To learn more, see our tips on writing great answers .