I implelemented the EM algorithm  for Probabilistic Latent Semantic Indexing (pLSI)  in Python.
python pLSI.py input-file-name number-of-topics maximum-number-of-iterations log-likelihood-difference-threshold
Input File Format:
The input file format is very simple, each line should be of the following format:
document-ID word-ID TFIDF-value
where the document and word ID are integers, and TFIDF-value is float.
Actually, the TFIDF-value can be any other values, e.g., Term Frequency, as long as it's numeric.
These three fields in each line can be separated by spaces or tabs.
 Thomas Hofmann. Probabilistic Latent Semantic Indexing. SIGIR. 1999.
 Das et al. Google News Personalization: Scalable OnlineCollaborative Filtering. WWW. 2007