Forgive me if this gets long-winded, but you did ask me to be more specific

. The quick answer: I want to take financial statements that all U.S. corportations submit and find out the truth (e.g. look for the next Enron, Worldcom,). Now to accomplish that goal, the site I'm building would need to accomplish multiple subgoals.
1. It needs to download new SEC filings from Edgar Online as they become available. These are the xml source documents. This is something that I'd like to schedule daily / weekly, and there are 2000+ U.S. companies that file maybe 10 of these statements per year.
2. I need to read the values for every element in the source documents. Some of the source values I'd be reusing a lot, some of them I'd need only once. All of these documents use a schema so I should know the structure they all have, HOWEVER, there are 2000+ elements. Would it be most efficient to read this all at once, or line by line?
3. However the server reads the data it then needs to perform a lot of functions on the elements, and then write the results to an output xml document.
4. After the output xml document is created, we'd need to validate the data structure against a schema that I am creating. Would PHP be best suited to accomplish this, or perhaps something else? I seem to remember reading that the default xml parser that comes with PHP doesn't validate, but I could be wrong.
If you could advise me on the above 4 issues then I'd be in ok shape. But if you're feeling especially helpful, I'm wondering something else. Once we have the validated xml output do you know of any nice libraries to transform that into a .pdf file for presentation?