XSL to get text from Apple Pages documents

Pages is the name of Apple’s basic word processor program that comes with their iWork suite of applications. It’s not a bad program, but a number of months ago I needed to switch up to MS Word for the Mac.

Well, this morning I was looking through some old files and found a text document I wanted to print that I had done using Pages. Unfortunately, I had removed iWork from my Mac, so I no longer had the software to open the Pages document.

After a cursory search on the Internet for a program that would let me open Pages docs without having the program itself, I came up empty-handed.

So, I inspected the Pages document and realized it was a package. (Right click on the document icon and Show Package Contents.) The package contained an index.xml.gz file, which I unzipped and found within the body of my document amidst a whole bunch of XML code.

I momentarily considered reconstructing the text in TextWrangler, but thought it might be fun to write an XSLT file to do the work.

Please note that this is a 1st draft meant to retrieve the text from my document. It will not handle anything fancy, just text. Plus, it will only try to make each chunk of text into a plain-text paragraph in HTML, suitable for copying and pasting out of a browser window. Use at your own risk. 🙂

Ok, here’s the textFromPages.xsl file.

Others may take this initial XSL file and do what they will with it. I hope that if you take this and make it better, you’ll comment on this post to let me (and others) know.

To have it be useful to you, you’ll need to know how to apply an XSL transformation to a source XML file (specifically the index.xml from Pages).

Hint: Firefox will do the transformation for you if you include the proper xml-stylesheet directive right after the XML prologue in the source XML file. It looks like this: <?xml-stylesheet href="textFromPages.xsl" type="text/xsl" ?>

xsl to tranform xhtml pages

I don’t know why it took me this long to realize this. I’ve been writing xhtml for a couple years now, and around the same time I started playing with xsl stylesheets, but it just occurred to me in a real way that I can probably use xslt to transform my xhtml pages (at my business site, for instance) into forms more useful to other devices. Cell phones and PDAs, for instance.

It probably took me so long because XHTML looks so much like HTML to me, that it didn’t completely sink in that it is truly XML. Yet it is, namespaces and all.

Now that I realize this, I appreciate even more it’s role as an intermediary between html and xml. XHTML doesn’t need xsl to transform it or style it. It is so close to real html that even older browsers can handle it fine, and it works very smoothly with css as is.

So, this realization basically just means that making my site more available on handhelds is even easier than I first thought. Granted, I haven’t gotten into the sticky details of it all yet…