One of my previous roles included working as a Performance Analyst in Coventry City Council's adult social care department. Whilst in this role I was asked to support the production of a list of clients to receive the annual Adult Social Care Survey (details at the Health and Social Care Information Centre's website). The HSCIC required us to create a list of our service users to receive the survey (the 'sample'), which would be created from the list of people receiving support on a certain date (the 'population' from which the sample will be selected). The list would need to include representative number of people from four different categories (e.g. 25% of the sample must be people with learning difficulties, 15% would be aged 65+ in a residential care setting, etc.). We were also asked to remove certain people from the sample if it was considered inappropriate for them to be contacted (e.g. the person lacked the mental capacity to be capable of completing the survey).
So far, this was not very complicated, but management felt that the conditions applied by the HSCIC may not produce a truely representative sample that would cover the full range of our client base. For example, as people over the age of 85 are more likely to suffer from dementia, they are more likely to be excluded from the sample and therefore it is less likely that their views will be heard. I was therefore tasked to produce a sample that was representative to a much more granular level that the four groupings that the HSCIC applied. The final requirement included a list of over 30 conditions that must be met! Graciously, I was allowed some tollerance on these conditions (e.g. representative to within 10%).
The problem that I now faced was to produce a sample that:
It can be hard to grasp the problem speaking in abstract terms, so let's look at a specific example. Consider a standard deck of playing cards. Let's say that you want to produce a (braodly representative) sample as follows:
The challenge that you face is that these criteria are not mutually exclsive. If you have the correct number of kings/queens/jacks but you need to add an extra red, the next red that you draw might be a king meaning that you've now got too many kings/queens/jacks!
Keeping track of this when you only have a few criteria isn't too bad, but as you add more and more criteria it gets more and more difficult to handle.
This kind of problem cries out for an automated solution. However, after looking we were unable to find a product that could deliver. Therefore, I have produced a VBA macro to run in MS Excel. The macro works on this principal:
My MS Excel Sampling tool is available under the GNU General Public Licence version 3 (GNU GPLv3). This means that it can be used for free by anyone (individual or corporation) for both commerical and non-comercial tasks. However, it should be said that I not a very good VBA developer, and use of this solutions is at the users own risk. I will also add that I would be very welcoming of improvements to the code from any willing contributors (which would be made available to users under the same terms as those described above).