UnDefined

Jonahlyn's Personal Blog

Using jQuery to Extract the Contents of a Web Page

I wanted to get the text contents of this web page into excel. Sure it’s backed by a database (if you can call Access a database :-p) but I thought it would be fun to play with jQuery instead. This is probably something that could also be done with python or perl, but like I said I wanted to play with jQuery.

The page itself does not use jQuery but it’s possible to add jQuery to any page with the awesome jQuerify bookmarklet so that’s what I did. Then I fired up the console in Chrome and started hacking.

First, I added the link URL as text after each link.

Next, I executed a series of commands to get rid of things like headings, paragraphs, breaks, images, spaces etc.

Finally, I re-attempted the whole thing as a single command using jquery’s end() method and its ability to chain method calls together.

The less than optimal markup made this a really good test of my knowledge and I learned a lot. Thanks for reading!