Disaggregated data analysis is a menacing term for a simple process. Simply put, disaggregated data analysis is a type of analysis in which you look at results for separate groups of respondents, all with the intent of providing more accurate information about what the data tells you and how you can use the data to effectively improve programs. Just at looking at individual respondent scores can often be too detailed to help with overall program decisions, looking at whole group results is often too general to guide programmatic decision making. For example, knowing the average reading percentile ranking for a grade level doesn't tell you how your English Learners are performing compared to your native English-speaking students or how girls are performing as compared to boys. You need to dig a little deeper to get the information you really need to determine program effectiveness and plan program modifications.
Generally speaking, basic disaggregation should always be done by grade (for students), gender, ethnicity, and language status (if applicable). Often, socioeconomic status is added to this list. However, you will want to add other levels of analysis that answer your specific evaluation questions. For example, if you want to know if there is any difference between the achievement of students who regularly attended the after school tutorial program and those who did not, you will need to determine a few things. First, you have to decide what "regularly attended" means (i.e., attendance for at least 30 days) and then you will disaggregate the data to see the achievement of those who attended the after school program at least 30 days and those who did not.
There are many ways you can look at that data. That is why your evaluation questions are so important. What do you really want to know?
You can also examine disaggregated data over time by using a matched score analysis in which you look at assessment results for particular students or program participants over time, including only those who have been in your program and have been assessed for the two-year period. For example, you can examine two years' worth of data for students by grade level, language status, and socioeconomic status, but you will learn more about the real effectiveness of your program if you include only the students who have scores from both years (matched scores = matching student scores from one year to the next).
In short, there are many ways to look at data. The closer you look, the more you'll see.