Re: Samples how do we get them?
You might be interested in the following note. It was offered in response to a question about samples and how do you know that you have a suitable sample. Well it almost all rests on what is called precision and your sample frame.
Population and Samples Selection
In research usually we deal with a sample from the area under study. There is no simple way to select or calculate a sample size and it will depend on what you are doing and which research method you are using. In general, what follows only strictly really applies to a survey.
You might be interested in the following note. It was offered in response to a question about samples and how do you know that you have a suitable sample. Well it almost all rests on what is called precision and your sample frame.
Population and Samples Selection
In research usually we deal with a sample from the area under study. There is no simple way to select or calculate a sample size and it will depend on what you are doing and which research method you are using. In general, what follows only strictly really applies to a survey.
Population – the complete set of things, people or events that you are studying and on which you wish to pronounce or say something of value. Normally, the population is large or very large and you cannot hope to collect from what might be a huge number of things, people or events. Happily, statistically, it turns out that a sample of sufficient size can give a very high degree of accuracy regarding any conclusions we might reach on the population as a whole.
Sampling Frame – this simply means a list of all those eligible to be included in the study. Notice here how population is not usually the same as the sampling frame and it is just a convenient way to identify sample points. For example, if you were surveying the population of Southampton you might use for convenience a telephone directory to find suitable people but obviously that is not going to be everybody but just a list of possibilities, it follows that choice of sample frame is vital if we are to have a reliable sample.
Sample - a subset of the population extracted from the sample frame and from which evidence is collected.
Precision - Precision is about how credible a sample is, how precisely it represents the population. The question is therefore, how do we know when we have a precise and credible sample? Notice here a valid response is assured if the sampling frame is credible because then we can be sure we are selecting the right sample points from which to gain relevant evidence.
Selection and Randomness – since hopefully we have specified accurately our sample frame it is clear that those in it are not just randomly selected but once we have it we must now select from it for an actual sample. Most often we try to create a “Random Sample” which means that everyone in the frame has an equal chance of being selected as part of your sample. There are other ways of selecting a sample and these are often related to practical necessity or other special consideration and a summary can be found at https://sites.google.com/site/researchmethodsdirect/sampling-strategeis
Bias - can be introduced by the choices you make either through the design itself or features of the collection process. Most commonly by using invalid groups, method of distribution (out of date list, email etc), non responses or the language used to collect the data.
Sample Size – Sample size is important but how big a sample do you need? Interestingly it turns out it’s not a simple proportion of the population. As a simple analogy, consider a large pot of soup; how much do you need to taste to decide if it's got enough salt? Clearly, one does not need a huge amount so here we might usefully think of the full Pot as representing the population; a spoonful is the sample and the size of the spoon is sample size. It is not easy to find a calculation for a sample size that will work in every case and that there are numerous formulae for doing it based on different scenarios. However, it is accepted on most courses that anything less that 35 respondents is not really acceptable. A rough formula is as follows based on the normal distribution and 95% confidence limits is n = 1500p(1 - p)/r
Sample Style - a sample can be one dimensional or multidimensional. That is in the one dimensional studies all the respondents share a similar set of characteristics and in the multi-dimensional case there may be several sets of respondents selected on different sets of criteria. Be careful not to confuse the word dimension here, which is about variations in the respondents and data dimensions which are about the problem space itself.
Selection Criteria – define as accurately as you can what a sample point looks like in the sense that you can identity it when you see it. For example, if you use a questionnaire you must say accurately who the questionnaire goes to and when it comes back you must be able to check that the respondent actually meets the sample criteria as anyone might have in reality filled it in. Typical sampling methods: random, purposeful, stratified, snowball, quota and is important that you select one that is suitable and convenient. See http://sites.google.com/site/researchmethodsdirect/data-collection-methods
Inference - remember that you use evidence from the sample to draw conclusions about the population. It follows that the accuracy of conclusions depends on whether the sample precision is such that it has the same characteristics as the population.
Response Rates - One can improve response rates by becoming aware that providing data is a cost to each respondent so the main principle is to reduce the effort involved and increase the benefit. For example, one might make any collection tasks short, explain purpose and value, give incentives, assure anonymity and send reminders. But make sure survey questions are well thought out and match the study objective and always remember that It is better to have a sample that properly represents the population even if the precision is lower.
Survey Administration – it goes hand in hand with knowing what the data is in knowing where the data is so you can go and collect it. For surveys the main methods are: postal, web-based, face to face interviews, telephone interviews and direct observation but when selecting a method or methods take into account the target groups and where they are located.
Ethical Profile – you need to be clear as to what you are doing, the way you are doing it and what you are asking for it to be ethically acceptable. Two things are at stake: the results may be biased and the results may not be acceptable in the sense that they cannot be ethically used. In simple terms, but you have to feel certain that the information you seek is legitimately available to you.
Design of Questions – it is obvious one has to choose your survey questions with care but in general: each question should deal with ONE idea at a time, avoid jargon or colloquialisms, be simple and direct with normal speech patterns, avoid use of negatives because people who read quickly may miss them and I many cases they can often make later data processing difficult.
Vehicle – the primary mechanism employed to collect data: interview, questionnaire, observation, role playing, seminar, focus groups, document searching and so on and the main ones are listed in table 3.
Recording Profile – describe how the data will be physically recorded. Typically this might include: written report/transcripts, record sheets, video, sound recording, computer logging, excerpts from documents and so on.
Model or Simulate and Pilot - strictly this is NOT a step that one records anywhere but its acts as a check. So I recommend that you invent some data just to see that what you have said makes sense and you can write it down. So I could, for example invent a few job profiles for people who work in IT support services and by that means I can feel confident I know what I am looking for as data.
Sampling Frame – this simply means a list of all those eligible to be included in the study. Notice here how population is not usually the same as the sampling frame and it is just a convenient way to identify sample points. For example, if you were surveying the population of Southampton you might use for convenience a telephone directory to find suitable people but obviously that is not going to be everybody but just a list of possibilities, it follows that choice of sample frame is vital if we are to have a reliable sample.
Sample - a subset of the population extracted from the sample frame and from which evidence is collected.
Precision - Precision is about how credible a sample is, how precisely it represents the population. The question is therefore, how do we know when we have a precise and credible sample? Notice here a valid response is assured if the sampling frame is credible because then we can be sure we are selecting the right sample points from which to gain relevant evidence.
Selection and Randomness – since hopefully we have specified accurately our sample frame it is clear that those in it are not just randomly selected but once we have it we must now select from it for an actual sample. Most often we try to create a “Random Sample” which means that everyone in the frame has an equal chance of being selected as part of your sample. There are other ways of selecting a sample and these are often related to practical necessity or other special consideration and a summary can be found at https://sites.google.com/site/researchmethodsdirect/sampling-strategeis
Bias - can be introduced by the choices you make either through the design itself or features of the collection process. Most commonly by using invalid groups, method of distribution (out of date list, email etc), non responses or the language used to collect the data.
Sample Size – Sample size is important but how big a sample do you need? Interestingly it turns out it’s not a simple proportion of the population. As a simple analogy, consider a large pot of soup; how much do you need to taste to decide if it's got enough salt? Clearly, one does not need a huge amount so here we might usefully think of the full Pot as representing the population; a spoonful is the sample and the size of the spoon is sample size. It is not easy to find a calculation for a sample size that will work in every case and that there are numerous formulae for doing it based on different scenarios. However, it is accepted on most courses that anything less that 35 respondents is not really acceptable. A rough formula is as follows based on the normal distribution and 95% confidence limits is n = 1500p(1 - p)/r
Estimated Sample Size
– this is the estimate of the number of sample points needed
Prevalence (p) - prevalence of the variable of interest; how many of the returned questionnaires meet the sample criteria. It is always hard to know what this value might be so one might decide to use say 85% and that is what is used in the above estimating formula.
Expected Rate of Return (r) – not every questionnaire you send out will be returned so one builds in an estimate so that at least you have some assurance of a minimum sample size.

Prevalence (p) - prevalence of the variable of interest; how many of the returned questionnaires meet the sample criteria. It is always hard to know what this value might be so one might decide to use say 85% and that is what is used in the above estimating formula.
Expected Rate of Return (r) – not every questionnaire you send out will be returned so one builds in an estimate so that at least you have some assurance of a minimum sample size.
Example - Suppose we expect that 85% (0.85) of the returned forms meet the criteria and we estimate a poor return of just 50% of questionnaires then we have:
N = 1500*0.85(1-0.85)/0.5 = 1275*0.15 = 190 is estimated required sample size rounded
N* = 0.5 * 190 = 95 expected return and as this is greater than 35 it is reasonable
N** = 190/0.5 = 380 questionnaire to be sent out if we hope to get the full sample size.
N = 1500*0.85(1-0.85)/0.5 = 1275*0.15 = 190 is estimated required sample size rounded
N* = 0.5 * 190 = 95 expected return and as this is greater than 35 it is reasonable
N** = 190/0.5 = 380 questionnaire to be sent out if we hope to get the full sample size.
Sample Style - a sample can be one dimensional or multidimensional. That is in the one dimensional studies all the respondents share a similar set of characteristics and in the multi-dimensional case there may be several sets of respondents selected on different sets of criteria. Be careful not to confuse the word dimension here, which is about variations in the respondents and data dimensions which are about the problem space itself.
Selection Criteria – define as accurately as you can what a sample point looks like in the sense that you can identity it when you see it. For example, if you use a questionnaire you must say accurately who the questionnaire goes to and when it comes back you must be able to check that the respondent actually meets the sample criteria as anyone might have in reality filled it in. Typical sampling methods: random, purposeful, stratified, snowball, quota and is important that you select one that is suitable and convenient. See http://sites.google.com/site/researchmethodsdirect/data-collection-methods
Inference - remember that you use evidence from the sample to draw conclusions about the population. It follows that the accuracy of conclusions depends on whether the sample precision is such that it has the same characteristics as the population.
Response Rates - One can improve response rates by becoming aware that providing data is a cost to each respondent so the main principle is to reduce the effort involved and increase the benefit. For example, one might make any collection tasks short, explain purpose and value, give incentives, assure anonymity and send reminders. But make sure survey questions are well thought out and match the study objective and always remember that It is better to have a sample that properly represents the population even if the precision is lower.
Survey Administration – it goes hand in hand with knowing what the data is in knowing where the data is so you can go and collect it. For surveys the main methods are: postal, web-based, face to face interviews, telephone interviews and direct observation but when selecting a method or methods take into account the target groups and where they are located.
Ethical Profile – you need to be clear as to what you are doing, the way you are doing it and what you are asking for it to be ethically acceptable. Two things are at stake: the results may be biased and the results may not be acceptable in the sense that they cannot be ethically used. In simple terms, but you have to feel certain that the information you seek is legitimately available to you.
Design of Questions – it is obvious one has to choose your survey questions with care but in general: each question should deal with ONE idea at a time, avoid jargon or colloquialisms, be simple and direct with normal speech patterns, avoid use of negatives because people who read quickly may miss them and I many cases they can often make later data processing difficult.
Vehicle – the primary mechanism employed to collect data: interview, questionnaire, observation, role playing, seminar, focus groups, document searching and so on and the main ones are listed in table 3.
Recording Profile – describe how the data will be physically recorded. Typically this might include: written report/transcripts, record sheets, video, sound recording, computer logging, excerpts from documents and so on.
Model or Simulate and Pilot - strictly this is NOT a step that one records anywhere but its acts as a check. So I recommend that you invent some data just to see that what you have said makes sense and you can write it down. So I could, for example invent a few job profiles for people who work in IT support services and by that means I can feel confident I know what I am looking for as data.
Last edited: