Tim, you raise some very important points which should be thought through carefully before attempting to create a CAT module.
Mike Linacre, the author of the paper linked to above, is the programmer behind the Rasch analysis packages "Winsteps" and "Facets", details at www.winsteps.com. Rasch analysis is related to a family of psychometric models called item response theory, or IRT for short. These are very widely used, see for example this paper by Edward Wolfe
Unidimensionality has been argued about for decades in the psychometric literature. Basically, in order to give a meaningful summed score from a test, all items must have a shared underlying unidimensional trait. Even in a very carefully developed test, there will be other small dimensions within the data. Ideally every question will contribute to the main factor plus also contribute a very small unique factor, so technically there will be as many dimensions as there are questions. The question is thus not whether a test is perfectly unidimensional (only a test with a single question will meet that standard), but whether the main trait is sufficiently large that the dataset is "usefully" unidimensional or "essentially" unidimensional. If it is not, then you either need to report the results as two or more separate scores (which you cannot meaningfully add together) or use a multidimensional model such as Wu and Adams discuss
Multidimensional models are complex, controversial, and need extremely large datasets to give any advantage over unidimensional models, so these are utterly impractical for Moodle. Basically, if you want to implement CAT, an assumption that you can't avoid is that all your test questions must largely measure the same unidimensional trait. Many classroom assessments will not meet this standard, so CAT will only be of benefit for users with some grounding in psychometrics and test development. An analogy is comparing a rocket with a bicycle, rockets are faster, but they require users to have a bit more technical know-how to actually be beneficial.
Another fundamental assumption of the IRT model Linacre uses is that persons and items can be mapped onto a single invariant scale. This is essentially built into the definition of measurement underlying that model, so if you don't accept that assumption, you need a different model. Unfortunately, rejecting that assumption implicitly rejects CAT, which requires that persons and test items can be mapped onto a common invariant scale of measurement, otherwise, what basis can you use for matching persons and items?
Those are theoretical issues that have been argued about for decades in the psychometric literature. Personally, I'm satisfied that the theoretical objections to CAT have been adequately addressed, but I'm unconvinced that a Moodle CAT module is a practical undertaking. CAT is of questionable value for classroom level testing, you really need a minimum of 1000 students to make it work, ideally considerably more, so it's not clear to me that there would be much genuine demand for it.
A related problem is that you need very large item banks for CAT to be effective. Test security is critical, so you can't recycle questions too frequently to minimise the chance of students getting the same questions that their friends got in an earlier administration. Although it is apparently possible to pilot CAT with as few as 200 questions, 2000 or more questions seems to be a more realistic minimum, plus continual development of new questions to replace older ones as they become compromised. In other words, if you were to use it for a high-stakes test, you are probably going to need to write 2000 high-quality questions in the first year, and then another 100 per month for as long as you use the test. Classroom teachers rarely have training in writing test questions, so the majority of teacher written items will probably not perform consistently enough to be suitable for CAT.
Finally, there are some basic technical issues with trying to administer CAT over a network rather than as a local installation. Security is an obvious question, high-stakes CAT just doesn't work if the item bank isn't secure, and I don't see the point of trying to use CAT for low-stakes testing. Another is that every time a student responds to a CAT question, an algorithm must be run to decide whether to administer another question and what question is most suitable to be administered next. If all that processing is done by a central server, then performance will suffer during high demand, or it may crash the server completely. Results from students who take the tests during busy times may thus be penalised unless some thought is given to these issues.
On the whole, CAT is technically fascinating development, but for most testing purposes, it strikes me as a solution looking for a problem. It's not something I would rush into without a great deal of consideration.