# Activity Sampling - equations and explanations

## What is Activity Sampling?

Activity Sampling, also referred to as work sampling, is a method of data collection through observations whereby you take samples of the occurrences, rather than a continuous observational study. It was developed by L.H.C. Tippett in 1927 as a way of monitoring large numbers of machines and workers spread out over a large space in the textile industry.

Observations are a very good way of collecting data for a study or to understand a phenomena. It can provide reliable data (more so than getting the participant to provide it - participants might hide true facts from researchers if the truth will make them look bad) providing you know what you're observing. Of course, observations come with their own drawbacks, such as people acting differently when being observed thus skewing the data. A big drawback though with observational studies is the time it takes to conduct. A continuous study lasting one day will take one day to observe and collect the data from.

This is where activity sampling can help. Rather than doing a continuous observation, the researcher takes samples of activities at random intervals and notes what is occurring at that time. By using this sampling, the activity of those being observed is less likely to be affected by the observations as they won't necessarily know when the observations will take place. Some may change their work behaviour for the entire study however, an advantage of activity sampling is the ability to take samples over several days or weeks which drastically reduces the variations likely to skew your results.

Conducting an activity sampling study is relatively straight forward to do, however, like most data collection methods, a pilot study is required. This is to obtain ratios of activity usage to determine certain factors like how many observations you require from your study (more on this later).

## Setting up an activity sampling study

Setting up a study is relatively easy. All you need is an activity you wish to observe, categories of activity for it and the consent of those being observed. For the purpose of this post, we'll use the example of recording when a worker is using the printer in a job which requires a decent amount of printing/copying/scanning. We can assume we have the consent of the individual (they're imaginary, I can't see why they'd complain) and we have an activity - using the printer.

For each activity it is important to have clearly defined categories for which that activity can be in. You can think of these as 'states' for the activity. Every activity has two basic states - it is either in use or it isn't. The number of categories you have depends on your activity and what you wish to record. For the printer, it could have states such as:

• Printing
• Copying
• Scanning
• Not in use

For the purpose of explanation, we'll stick with two categories - in use or not in use.

So we have our consent, our activity and our categories. The only thing we need now is to know how many observations to make. And for that, we conduct our pilot study.

## Equation 1 - Working out how many observations you need

It seems odd, but to know how many observations you need to conduct, you first need to take some observations. This is because the equation for working out the number of observations you require needs to know the ratio between categories. The equation is as follows:

$\ N = \frac{ou}{E^2}$

Where N is the number of observations you need, o is the ratio of observations where the activity was occupied (in use), u is the ratio of the unoccupied state (not in use) and E is the desired error rate.

Let's say that in our pilot study we observed our participant 20 times, and in those 20 times we saw them using the printer 8 times. That gives us a ratio of 8/20 for an 'in use' state of the printer (or 2/5 or 0.4) and 12/20 (3/5 or 0.6) times when it isn't.
For our study we might be happy to achieve a confidence level of 0.05 (the E). Common confidence levels are 0.05 or 0.01, where we can say only 5% of data will not be accurate or 1% respectively.

So our equation would look something like this:

$\ N = \frac{0.4 * 0.6}{0.05^2}$

$\ N = \frac{0.24}{0.0025}$

$\ N = 96$

From this then, we can see that we need to obtain 96 samples in order to have 95% confidence that the data we retrieve accurately reflects the usage of the printer. For comparison, to be 99% confident in the data we would need to obtain 2400 samples.

## Equation 2 - working out the error rate when you only have samples

Doing 96 samples is all very well and good but what happens if you took your samples before knowing how many you needed? What happens if you don't have the time or other resources to conduct the pilot study and the full study at 96 samples?

In this case, we can use a second equation which will work out what our confidence level is from the number of samples we have. This equation is:

$\ E = \sqrt \frac{ou}{n}$

The letters all stand for the same things. This time we would take our ratios from our samples. Let's say we took 40 samples over a working day. With our ratio of 0.4 for the printer being in use, our equation becomes this:

$\ E = \sqrt \frac{0.4*0.6}{40}$

$\ E = \sqrt \frac{0.24}{40}$

$\ E = \sqrt 0.006$

$\ E = 0.08$

From this, we can see that while we conducted 40 samples we can only be 92% confident that we have captured enough samples.

## Conclusions

Activity sampling is used in research to obtain quick glimpses into working patterns. It is not as detailed as a continuous observational study however, with the right amount of samples, we can achieve statistical confidence that our data should contain what we're looking for.
There are some limitations however. Activity sampling analysis can only tell us the confidence level of our data, it cannot tell us exactly how accurate it is (for that we would need a continuous study) and as such, while we may have conducted our 40 samples there is no guarantee that any of them will fall within the categories we have. With two categories (in use or not) it might be safer, but this isn't always guaranteed.
The more samples you obtain, the less chance variance will cause issues. However, it can take a long time to obtain enough samples to achieve this. If we wanted to go for 0.01 confidence in our example here (needing 2400 samples) and we took one sample every 5 minutes of an 8 hour working day, we would need to observe our worker for 25 days before we obtained enough samples to be statistically confident in our data. That's a little over 1 working month, which is a long time to be collecting data.

## References

For those conducting academic work, or perhaps want to have a more academic delve into activity sampling, here are some papers which I've read and used in my work which you might find useful. All were available to me via Google Scholar if you search the title.

Kelly, Joe, 1964. The study of executive behaviour by activity sampling. Human Relations.
Thomas, H. Randolph, 1991. Labor Productivity and Work Sampling: The Bottom Line. Journal of Construction Engineering and Management, 117(3), pp.423–444.
Robinson, Mark A., 2010. Work sampling: Methodological advances and new applications. Human Factors and Ergonomics in Manufacturing & Service Industries, 20(1), pp.42–60.