Upper Limb Function 3 Months Post‐Stroke: How Accurate Are Physiotherapist Predictions? (2025)

1 Introduction

In developed countries, stroke is the leading cause of disability (Krishnamurthi etal.2020). In Denmark, the incidence of stroke is approximately 346 per 100,000 person years (Health DNBo. Danish National Board of Health2014). The related personal and societal consequences are major (Jennum etal.2015; Jørgensen etal.1997).

It has been reported that 48% of stroke survivors in the acute phase and 30%–66% in the chronic phase suffer from upper limb (UL) impairment after stroke (Nijland etal.2010; Kwakkel etal.2003). Impairment in UL function strongly affects the ability to perform activities of daily living (ADL) (Dromerick etal.2006), and quality of life (Franceschini etal.2010; Nichols-Larsen etal.2005). An accurate prognosis of UL function can determine whether the primary focus of UL rehabilitation should evolve around compensating strategies or improving dexterity. This is of high relevance to provide targeted and efficient rehabilitation.

Several studies regarding the prediction of UL function post-stroke have been published (Franceschini etal.2010; Nichols-Larsen etal.2005; Bembenek etal.2012; Byblow etal.2015; Kim and Winstein2017; Stinear etal.2017). These studies used a variety of validated UL function measures such as the Action Research Arm test (ARAT), Fugl–Meyer Motor Assessment Upper extremity (FMA), and Shoulder Abduction Finger Extension score (SAFE) to predict recovery of UL function post stroke (Franceschini etal.2010; Nichols-Larsen etal.2005; Bembenek etal.2012; Byblow etal.2015; Kim and Winstein2017; Stinear etal.2017). In recent years, the prediction potential of biomarkers, particularly the assessment of corticospinal tract integrity, has also been explored (Bembenek etal.2012; Byblow etal.2015; Kim and Winstein2017; Stinear etal.2017) and multifactorial prediction models such as the Predict Recovery Potential (PREP2) algorithm have emerged (Stinear etal.2017).

A survey from 2018 showed that only 35% of occupational therapists and physiotherapists (PTs) were aware of prediction models and only 9% used prediction models in their clinical practice (Kiær etal.2021). Therapists generally expressed that UL prediction models may be an aid to their clinical reasoning (Lundquist, Pallesen, etal.2021; Connell etal.2021), and particularly those who have used the models in their clinical practice seemed positive (Connell etal.2021). Still, many senior therapists believe that prediction models are mainly relevant for recent graduates, while senior therapists may draw on experience (Lundquist, Pallesen, etal.2021).

Few studies that have focused on therapists' ability to predict UL function report prediction accuracies ranging from 51.5% to 72% (Nijland etal.2013; Kwakkel etal.2000; Korner-Bitensky etal.1989; Kent etal.1993). In these studies, ARAT scores, functional tasks, and independence in ADL were predicted from 9weeks to 6months post-stroke. However, the studies were limited by factors such as lack of blinding to follow-up assessment, predictions did not solely rely on PTs' clinical reasoning, and the predicted outcome did only concern a simple ADL task (raising cup to mouth). Furthermore, to our knowledge, no published studies have compared the accuracy of PTs UL prediction with the accuracy of a UL prediction model.

Therefore, we aimed to: (1) Determine the prediction accuracy of UL function 3months post-stroke when predictions are made 2weeks post-stroke, based on the knowledge and experience of PTs in neurorehabilitation. (2) Investigate whether the accuracy of PT predictions was influenced by their seniority within stroke rehabilitation and/or their level of continued education. (3) Compare the accuracy of PT predictions to that of an algorithm applied 2weeks post-stroke reported by Lundquist, Nielsen etal.(2021).

2 Methods

This study analyses secondary data collected from a prospective, observational study assessing the accuracy of an algorithm applied 2weeks post-stroke (Lundquist, Nielsen, etal.2021).

Stroke patients admitted to xxx Neurorehabilitation and Research Center in xxx from June 2018 to October 2019 were included if the following criteria were met:

  • First or recurrent hemorrhagic or ischemic stroke verified by CT or MRI scan

  • Admitted within 2weeks after stroke

  • SAFE score<10

  • Age≥18years

  • Ability to cognitively comply with examinations, defined by a Functional Independence Measure cognitive score≥11 in combination with the rehabilitation team considering the patient able to participate

Exclusion criteria were as follows:

The primary PT of each patient's rehabilitation team conducted the UL prediction. Based on their experience and clinical reasoning, they answered a survey regarding the prediction of UL function 3months post-stroke. To ensure that the PTs had enough time to see the patient and make the observations and examinations they found relevant, the predictions were based on at least 3 sessions with the patient within the first 2weeks following stroke onset. The PTs were asked to assign the predicted UL function based on the ARAT score. The ARAT evaluates 19 tests of arm motor function, both distally and proximally. Each test is given an ordinal score of 0, 1, 2, or 3, with higher values indicating better arm motor status. The total ARAT score is the sum of the 19 tests, and thus the maximum score is 57 (Yozbatiran etal.2008). Previous studies have predicted UL function in one of four categories based on the ARAT (Stinear etal.2017; Lundquist, Nielsen, etal.2021). For the present study, the same four categories were used and the PTs predicted UL function at 3months after stroke in one of the following categories:

  1. POOR: The patient will not regain any useful function in the UL within 12weeks post-stroke. The patient will not be able to use or include the paretic hand in activities of daily living (ARAT score 0–12).

  2. LIMITED: The patient will regain minimal function of the UL within 12weeks post-stroke. The patient will, with compensating strategies, be able to include the paretic hand in some activities of daily living (ARAT score 13–33).

  3. GOOD: The patient will regain good or very good function in UL within 12weeks post-stroke. The patient will be able to include the paretic hand in most activities of daily living, though with reduced power and dexterity (ARAT score 34–50).

  4. EXCELLENT: The patient will most likely regain complete or almost complete functions of the UL within 12weeks post-stroke (ARAT score 51–57).

The following information regarding the PT who predicted the UL function was obtained: seniority in neurology/neurological rehabilitation (number of years), and amount of continuing education (number of courses attended). To ensure that predictions were made within 2weeks of stroke onset, the date of prediction was registered. At 3months follow up the patient’s UL function was assessed by one of three experienced research therapists using the ARAT. Research therapists were blinded to the predicted level of UL function and were not involved in patient care. To enhance inter-rater reliability, research therapists received a thorough introduction on how to administer ARAT and a comprehensive manual based on previous research was provided (Yozbatiran etal.2008). Furthermore, before commencing the study and 3months into the study, several patients were assessed by all research therapists, and their results were discussed until agreement was achieved (Lundquist, Nielsen, etal.2021).

2.1 Statistical Methods

Correct Classification Rates (CCR) and 95% confidence intervals (CI) were calculated to determine the accuracy of the PT predictions when compared to the actual ARAT score at follow-up. CCR and CI were calculated overall. For all four categories, the CCR, sensitivity (the percentage of patients that the PTs correctly predicted would achieve a specific ARAT outcome category), specificity (the percentage of patients that the PTs correctly predicted would NOT achieve a specific ARAT outcome category), positive predictive value (PPV), negative predictive value (NPV), and confidence intervals were calculated. Due to clustering of the data (mean number of predictions per PT n=2) multilevel mixed-effects logistic regression was used to explore the odds ratio for correct classification within the four ARAT categories. The influence of the PTs' seniority and continued education on prediction accuracy was explored using the number of years of experience and the number of courses attended in the field of neurorehabilitation as covariates in the regression analyses. Years of experience were entered as a continuous variable and the number of courses as a categorical variable (0–4, 5–9, or≥10 courses). Multicollinearity between the variables “seniority” and “courses” was assessed using post-estimations of the Variance Inflation Factor.

McNemar's test was applied to compare the PT prediction accuracy to an algorithm applied 2weeks post-stroke to the same patients.

All analyses were 2-tailed with a significance level of 5% for rejecting the null hypothesis and were conducted in STATA12. A statistician was consulted for the analysis of data.

3 Results

Eighty-eight patients from the original prospective longitudinal study by Lundquist, Nielsen etal.(2021) were enrolled in this study and data from 81 patients were included in the analyses. Reasons for the seven exclusions included: The PT forgot to make the prediction (n=2), the completed prediction survey was lost in the mail (n=1), or the ARAT follow-up test was not completed (n=4). Baseline predictions were made by 40 PTs. The mean number of patients evaluated by each PT was 2 (range: 1–7). Baseline characteristics of participating patients and PTs are provided in Table1.

TABLE 1. Baseline characteristics.
Patient (n=81)
Days from stroke onset to inclusion, mean (sd) 13±1.47
Age, mean year (sd) 64.01±10.64
Sex, n (%)
Male 47 (58.02)
Female 34 (41.98)
Stroke type, n (%)
Ischemic 64 (79.01)
Hemorrhagic 17 (20.99)
Dominant hand affected n=80, n (%)
Right 67 (84)
Left 13 (16)
First stroke, n (%)
Yes 76 (94)
No 5 (6)
FIM score n=78, mean (Sd) 70 (23)
FIM motor subscore 46 (18)
FIM cognitive subscore 24 (6)
ARAT score, median (IQR) 15 (4–39)
Therapist (n=40)
Experiencea, median (IQR) 15 (10–17)
Coursesb, n (%)
0–4 9 (23)
5–9 18 (45)
≥10 13 (33)
  • Abbreviations: ARAT=action research arm test, FIM=functional independence measure, IQR=inter quartile range, Sd=standard deviation.
  • a Experience: number of years in the neurological fields of physiotherapy.
  • b Courses: amount of continuing education (number of courses attended).

At baseline, 7 patients (9%) were predicted to have poor UL function at 3months post-stroke, 18 patients (22%) were predicted limited, 40 patients (50%) were predicted good, and 16 patients (20%) were predicted excellent UL function (Table2). At 3months post-stroke, 33 of 81 patients achieved the predicted category, and the overall CCR was 41% (95% CI: 30–51). In 16 of 81 patients (20%), the prediction was too optimistic, and the patients did not achieve the predicted UL function. In 32 of 81 patients (40%), the prediction was too pessimistic and the actual UL function at 3months exceeded the predicted function (Table2). Specificity was highest in the outcome categories excellent (91% [95% CI: 78–98]) and poor (97% [95% CI: 90–100]). Sensitivity was below 60% for all four categories. CCR was calculated for each of the four categories and was highest for patients with a predicted excellent UL recovery followed by the category poor UL recovery (Table3). NPV was for all categories above 60% (Table3).

TABLE 2. Predicted and actual ARAT categories and agreement between them.
Actual ARAT outcome category at 3months (n)
Therapist prediction of UL function (n) Excellent Good Limited Poor Total, n (%)
Excellent 12 3 1 0 16 (20%)
Good 22 12 6 0 40 (50%)
Limited 3 5 4 6 18 (22%)
Poor 0 1 1 5 7 (9%)
Total n (%) 37 (46%) 21 (26%) 12 (15%) 11 (14%) 81 (100%)
  • Note: Gray: Patients for whom the outcome category was equivalent to the predicted category (n=33).
TABLE 3. Accuracy of the therapists' predictions.
CCR % (95 CI) Sensitivity % (95 CI) Specificity % (95 CI) PPV % (95 CI) NPV % (95 CI)
Overall (n=81) 41 (30–51)
Excellent (n=16) 75 (49–90) 32 (18–50) 91 (78–98) 75 (48–93) 62 (49–73)
Good (n=40) 30 (18–46) 57 (34–78) 53 (40–66) 30 (17–47) 78 (62–89)
Limited (n=18) 22 (9–47) 33 (10–65) 80 (68–88) 22 (6–48) 87 (77–94)
Poor (n=7) 71 (32–93) 46 (17–77) 97 (90–100) 71 (29–96) 92 (83–97)
  • Abbreviations: 95 CI=95% confidence intervals, CCR=correct classification rate, NPV=negative predictive value, PPV=positive predictive value.

Multilevel mixed-effects logistic regression analysis was used to calculate the OR for a true prediction (Table4). Compared to excellent, the crude OR for a true prediction when UL function at 3months post-stroke was predicted poor was 0.69 (95% CI:0.10–7.10). Crude OR for UL function predicted as limited orgood was 0.05 (95% CI: 0.01–0.59) and 0.09 (95% CI: 0.01–0.63) respectively. The estimates remained similar after adjustment for seniority and number of courses (analysis 2 and3, Table4).

TABLE 4. Multilevel mixed-effects logistic regression analyses showing the association between correct prediction at 3months post-stroke and the predicted ARAT category adjusted for seniority and number of continuing education courses.
Regression analysis 1 Regression analysis 2 Regression analysis 3
Crude estimate Adjusted for seniority Adjusted for seniority and number of courses
Or (95% CI) p Or (95% CI) p Or (95% CI) p
Prediction
Poor ARAT; 0–12 0.69 (0.07–7.07) 0.75 0.67 (0.06–6.88) 0.73 0.67 (0.07–6.88) 0.74
Limited ARAT; 13–33 0.05 (0.01–0.59) 0.02 0.06 (0.01–0.63) 0.02 0.06 (0.01–0.71) 0.03
Good ARAT; 34–50 0.09 (0.01–0.63) 0.02 0.09 (0.01–0.67) 0.02 0.11 (0.02–0.73) 0.02
Excellent ARAT; 51–57 Reference Reference Reference
Seniority (years) 1.05 (0.96–1.15) 0.3 1.03 (0.91–1.17) 0.64
Courses (n)
0–4 Reference
5–9 1.95 (0.24–15.7) 0.53
≥10 1.78 (0.15–20.8) 0.65
  • Note: Multilevel mixed-effects logistic regression of correct classification of upper limp function 3months post stroke and predicted ARAT scores, analyses 2 and 3 are adjusted for the continuous variable seniority (years) and the categorical variable courses in the neurological field.
  • Abbreviations: 95% CI=95% confidence interval, OR=odds ratio, p=p-value.

Multilevel mixed-effects logistic regression analyses showed a tendency towards increased prediction accuracy with seniority. Crude OR for a true prediction increased by 5% per year; OR=1.05 (95% CI: 0.96–1.15); however, the estimate was not statistically significant. A tendency towards improved prediction accuracy with continued education was also found. Compared with 0–4 courses OR for a true prediction increased with the number of courses, OR5–9 courses=1.95 (95% CI: 0.24–15.70), OR≥10 courses=1.78 (95% CI: 0.15–20.80); however, the estimates were not statistically significant (Table4).

The post-estimations of multicollinearity between the variables: “seniority” and “courses” within the logistic regression analyses showed a tolerance value of 0.7, indicating a strong positive linear relationship between the two variables. This is unsurprising, as it reflects the interconnection between seniority and the number of courses in a clinical setting.

McNemar's test revealed a tendency, but not a statistically significant superiority, in the prediction accuracy of an algorithm applied 2weeks post-stroke compared to the PT predictions (Odds ratio 2 [95% CI: 0.96–4.39], McNemar p=0.0455, exact McNemar p=0.0652).

4 Discussion

Overall, CCR of PT predictions regarding UL function 3months post-stroke was 41% and varied across the 4 UL prediction categories. CCR was highest if UL recovery was predicted as excellent (75%) or poor (71%) and lowest if UL recovery was predicted as limited (22%) or good (30%). Though the overall prediction accuracy was notably higher than what could be generated by chance (25%), it was lower than that reported in the current literature. Nijland etal. found a CCR of 60% when the UL prediction was carried out 72h after stroke onset, increasing to 72% when carried out 10days post-stroke (Nijland etal.2013). Despite the study populations being comparable in the present study and the study by Nijland etal., several differences exist. First, in the study by Nijland etal., predictions referred to UL recovery 6months post-stroke compared to 3months in the present study. Second, the UL function was assessed in only three ARAT categories, thus, leaving less room for incorrect predictions and having a higher (33.3%) expected accuracy by chance. Third, in their study, the therapists who performed the initial prediction also conducted the follow-up assessment and were consequently not blinded for baseline predictions. Knowledge of baseline predictions could have influenced outcome assessment, which could explain the higher prediction accuracy. Kwakkel etal.(2000) reported estimates of prediction accuracies ranging from 51.5% to 63.6%. However, the therapists in the study were instructed to use the ARAT score to optimize their prediction of the outcome, which we believed enhanced the chances of correct prediction. In contrast, PTs in the present study were not provided with any specific instructions on how to reach their predicted category but were allowed to base their predictions on the knowledge and tests they deemed best. Kent etal. reported prediction accuracies of 60% (Kent etal.1993). However, predictions were not carried out within a predefined time range post-stroke, nor was the time span between prediction and follow-up assessment standardized. This leaves many possible explanations for the higher accuracy compared to what was found in the present study.

In this study, we observed a tendency, but not a statistically significant difference, towards improved prediction accuracy with seniority. This is in line with findings reported by Nijland etal.(2013), Kwakkel etal.(2000), and Korner-Bitensky etal.(1989). Like the present study, the study by Nijland etal. mainly included experienced therapists (median seniority 24years [IQR=23–27]). The lack of recent graduates and little variation in years of experience may explain why “years of experience” did not influence the accuracy of prediction in either study. The group of therapists responsible for the predictions in the study by Kwakkel etal. varied more in terms of experience; however, neither did they find that prediction accuracy was influenced by experience (Kwakkel etal.2000). In the present study, the number of courses completed in neurorehabilitation was associated with prediction accuracy, though this finding was not statistically significant (Table4). This is in line with findings from previous studies (Nijland etal.2013; Kwakkel etal.2000). Importantly, in the present study, we did not obtain knowledge on specific course content, and we do not know to what extent it was specifically related to functional recovery of the UL. For all studies, it applies that study populations were small, and thus findings regarding seniority and courses may simply reflect the small sample size.

The PREP2 algorithm is based on the SAFE score, the age of the patient, the NIHSS, and motor-evoked potentials elicited by transcranial magnetic stimulation within the first week post-stroke to predict UL functional capacity at 3months after stroke (Stinear etal.2017). The PREP2 algorithm showed an overall accuracy of 75% (95%CI: 45%–84%) in a New Zealand stroke population (Stinear etal.2017). Barth etal. applied an algorithm without a biomarker to a US stroke population within the first week post-stroke and found an overall accuracy of 61% (95%CI: 46%–75%) (Barth etal.2022). The present study population is a subset of the study population in Lundquist, Nielsen etal.(2021). In both studies, predictions were performed within 2weeks post-stroke onset, and the predicted outcomes were UL function 3months post-stroke. In the parent study, the predictions were based on an algorithm and an overall CCR of 60% (95% CI: 51–71) was found (Lundquist, Nielsen, etal.2021). In summary, the accuracy found in studies on prediction models seems comparable to or higher than that found in the present study. An explanation for the overall lower CCR in the present study is that it appears that the PTs aired on the side of caution, with 40% of their predictions being too pessimistic.

In both the present study and the study by Lundquist, Nielsen etal.(2021), similar tendencies were observed, with the CCR being highest for predicted “poor” and “excellent” outcomes compared to “good” and “limited” outcomes. The explanations for the lower prediction accuracy for midrange outcomes are mainly unknown. Patients generally experience the greatest amount of improvement in UL function in the first month post-stroke (Bernhardt etal.2017). In patients with no or little function at baseline, the room for improvement is large, but the individual improvement varies, and patients may end up in either of the four outcome categories. From clinical experience, we know that efficient UL rehabilitation and hence recovery are influenced by cognitive abilities such as neglect, the motivation of the patient, and the degree of spasticity developed post-stroke (Onno van der Groen etal.2024). On the contrary, in patients with already good UL function at baseline, there is little room for improvement and these patients will likely end up with an excellent UL function 3months post-stroke. Thus, an excellent outcome is easier to predict than a limited or good one.

In the present study, loss to follow-up was small (n=7 [8.6%]) and was neither associated with the PT predictions nor with the 3-month follow-up. Therefore, we do not expect this loss to follow-up to be associated with bias. The survey provided to the PTs regarding prediction categories has not been validated. However, the prediction categories (poor, limited, good, excellent) were described in detail and developed based on the established and validated ARAT test (McDonnell2008; Platz etal.2005). To ensure a high reliability of the outcome assessment, known ARAT limitations were accounted for by a thorough ARAT calibrationprocess of all investigators (McDonnell2008; Carpinella etal.2014). Furthermore, ARAT assessments were performed by three experienced research therapists not involved in patient care and blinded to predicted 3months UL recovery. Thus, the risk of bias should be minimal.

The generalizability of this study is limited to some extent as the study population consisted of selected patients referred to inpatient rehabilitation. However, the inclusion of patients with diverse cognitive and UL function deficits enhances the potential applicability of the findings to similar rehabilitation settings.

In conclusion, PT predictions of UL function 3months post-stroke based on their experience in neurorehabilitation showed an accuracy of 41% (95% CI: 30–51). Prediction accuracywas higher for the recovery levels “excellent”, and “poor” compared to midrange levels (“limited” and “good”) on the ARAT scale. No statistically significant improvement in prediction accuracy was found with seniority or continuing education courses. The results indicate that physiotherapists may strengthen their clinical reasoning through the use of prediction models. However, to date, the accuracy of prediction models still seems to depend on biomarkers and specific time points after stroke.

4.1 Implications on Physiotherapy Practice

  • Accurate UL function prognosis is crucial for targeted rehabilitation

  • Based on their experience in neurorehabilitation, physiotherapists predictions of UL function 3months post-stroke showed an accuracy of 41% (95% CI: 30%–51%).

  • The use of standardized clinical assessments or prediction models may be a useful adjunct to a physiotherapist's clinical reasoning.

Acknowledgments

We thank the physiotherapists at the Hammel Neurorehabilition and Research Center for their participation.

    Ethics Statement

    The study was reported to the Danish Data Protection Agency and approved by the regional ethics committee for the Central Denmark Region (project number: 628213).

    Consent

    Written informed consent was obtained from all included patients in accordance with the Helsinki Declaration.

    Conflicts of Interest

    The authors declare no conflicts of interest.

    Permission to Reproduce Material From Other Sources

    The authors have nothing to report.

    Open Research

    Anonymized data are available on request.

    References

    Upper Limb Function 3 Months Post‐Stroke: How Accurate Are Physiotherapist Predictions? (2025)
    Top Articles
    Latest Posts
    Recommended Articles
    Article information

    Author: The Hon. Margery Christiansen

    Last Updated:

    Views: 6754

    Rating: 5 / 5 (50 voted)

    Reviews: 81% of readers found this page helpful

    Author information

    Name: The Hon. Margery Christiansen

    Birthday: 2000-07-07

    Address: 5050 Breitenberg Knoll, New Robert, MI 45409

    Phone: +2556892639372

    Job: Investor Mining Engineer

    Hobby: Sketching, Cosplaying, Glassblowing, Genealogy, Crocheting, Archery, Skateboarding

    Introduction: My name is The Hon. Margery Christiansen, I am a bright, adorable, precious, inexpensive, gorgeous, comfortable, happy person who loves writing and wants to share my knowledge and understanding with you.