Spreadsheet walk-through |
||||||

On this page, I'm going to attempt to walk through the spreadsheet step by step. It might be handy to print this page, and have the spreadsheet open in front of you. Column A contains the Greek words used in the study. They are printed in B-Greek. Columns B through T contain the raw data for the study. Each column represents one synoptic category. Each row is one vocabulary item. Each cell is the count or absolute frequency of the vocabulary item in that synoptic category. Row 811 has sum of all the rows above it. Column V is the sum of all the columns to the right. Column W it the relative frequency of each vocabulary item in all of the synoptic categories combined. Columns Y and Z are for doing pair-wise comparisons of the synoptic categories. When the macro program, associated with this sheet is run, it will take each possible pair of categories, and examine them. The goal is to see if the relative frequency of vocabulary items in one category can help predict the relative frequency of vocabulary items in the other category. The categories under examination are placed in columns Y and Z while they are examined. The results are placed in the matrix in columns AP through BJ. To view the program hit Alt-PF11, and in the box at left select modules, and then module 1. Hit Alt-PF11 to return to the main sheet. To run program from the main sheet go to - Tools: Macro: Macros: Macro1: Run. It will take about 10 minutes to run. Column AB shows the estimated absolute frequency of the vocabulary item in the test category (Y), based on the relative frequency of the vocabulary item in all the categories combined, and the total number of occurrences of all vocabulary items in the test category. This will be used as the Gamma in a Poisson distribution. Also see my attempt at explaining the math. Column AF is like AB except that it uses information from the predictor category (Z), instead of information from the combination of all the categories. Column AG will be a weighted combination of the two columns AB and AF. The weighting factors, beta, and (1 - beta) , are in cells AM3 and AM4. Beta is found using Excel's solver feature. I'll say more about it below. Now we begin a procedure known as the maximum likelihood method. In column AC, the we calculate the likelihood of the actual number of occurrences of the vocabulary item in the test category (Y), based on the Poisson distribution with the gamma given in column AB. Column AH does the same, except it uses the fitted gamma. Columns AD and AI take the natural log of the columns to their left. We do this, because we would have to multiply out all the probabilities in the columns to the left to get the total probability, which would be a tiny number. To avoid dealing with such tiny numbers, we add the logs of the probabilities rather than multiplying the probabilities. The results of adding up the logs are in row 811. The cell AI-811 is the cell that Excel's solver will try to maximize by adjusting the beta parameter. The beta for each pair of categories is the one that maximizes this cell AI-811 which represents the log of the overall probability. Finally, we can use a likelihood ratio test to compare cells AI-811 and cell AD-811. The test statistic for this comparison is in cells AM5 and the final probability is in cell AM6. These results are then placed in the result matrix. Columns AP through BJ. In the result matrix, the top number in each pair is the calculated beta, and the bottom number in each pair is the probability that any relation between these categories is just chance. Because there are about 180 separate comparisons, and we'd like to have about 95% confidence that all the relations that we have found are correct, we should look at relations where the probability is less than .0003 |
||||||

Back to synoptic main page |