Part 3: From Language Learning to Human–AI Collaboration: Rethinking How We Measure Skills in Motion

Anke Sanders
Sep 12, 2025
8 min read

In Part 2, we explored the skills wheel and spider graph. Today, let’s turn to the other side of the equation: how we measure skills in motion, especially as human–AI collaboration becomes a skill in its own right.

All illustrations were created by me using Canva.

Let me take a little detour today to my work in academia. In my “life prior to the corporate world,” I worked at the intersection of language and culture acquisition where assessments were widespread. Grades and certificates often shaped students’ trajectories. It had determined my own trajectory most of my (student) life as well. So for me it became important to push for transparency; progress reports, constant dialogue, clear rubrics… to me personally as well as a professor, a grade was just a snapshot. Passing a vocabulary quiz doesn’t prove you can hold a real conversation; a TOEFL score shows a level but not how or how quickly someone improves once immersed. Language acquisition is a journey. And so is life and building skills - whether for work or leisure.

In one class, a student who struggled on every vocab quiz jumped two proficiency bands after attending a language immersion program for a month living with a host family. He didn’t memorize faster but he learned to navigate ambiguity in real time. That’s when I began adding in a progress layer into static assessments. Little did I know I would be reminded of that years later…

Organizations fall into a similar trap. Promotions arrive after people have already performed at the next level. But in a world where skills and skills demand fluctuate more (and more quickly than ever) is that even fair? Or more so, is it a sign that we can’t yet measure readiness in real time? Too often, the supports that should enable growth become gatekeepers or with AI, the supports are not in place yet because everything changes so quickly.

This is where the concept of skills volatility comes in. In his recent issue of his Newsletter Skills Work! Brandon Carson describes the concept of skills vitality. It’s like he says, it isn’t about jobs vanishing overnight; it’s about tasks being rewritten faster than companies can update workflows and learning. With 39% of core skills projected to change by 2030 (WEF), static snapshots won’t cut it, we need living, dynamic measures.

Why Dynamic Assessments Matter

A dynamic assessment approach shifts the focus from proof to trajectory. Just as in language learning we track how (fast) someone moves from guided practice to spontaneous speech, in the workplace we can track how (quickly) skills adapt, transfer, and scale across new contexts.

It then becomes about seeing the trajectory of growth, the context that shapes it, and the scaffolding that supports it. Scaffolding is the bridge, so enough support to stretch beyond comfort, gradually removed as skills take flight.

A attempt to visualize; maybe the arrows should go both ways?!

When we turn to human–AI collaboration as an example, we raise additional questions. Work in that context requires new skills like prompt framing, judgment under uncertainty, and reflective oversight of AI output. Designing fair assessments for this blended reality is no longer optional but a strategic imperative as we develop competence and move from AI literacy to fluency and ideally enablement.

What Dynamic Assessment Actually Measures

To summarize what we’ve discussed so far, dynamic assessment is about more than testing whether someone knows something at a single moment in time. It captures growth in three ways: trajectory, context, and scaffolding.

In language learning, this shows up when a student doesn’t just memorize a word but uses it flexibly in conversation. At work, it’s the difference between knowing Python today and being able to pick up a new programming language tomorrow when the problem demands it. That’s trajectory.

Context matters just as much. Students who ace grammar drills often stumble in real-life situations. Likewise, professionals may recite agile principles but struggle when a project derails. This is often closely tied to transfer which brings us to the Nextdoor, scaffolding.

Scaffolding is the structured support that accelerates growth. A teacher may provide sentence starters until the student internalizes them. In the workplace, scaffolding might come in the form of an AI copilot. The real measure isn’t whether someone performs flawlessly with support, but how quickly they grow once the scaffolding fades.

Dynamic assessment, then, isn’t about checking a box. It’s about watching how quickly someone learns, how flexibly they apply knowledge, and how sustainably they grow when given the right support.

From Outcomes to Observations

So how do we measure in this more dynamic way? Here, education offers a powerful blueprint: Evidence-Centered Design (ECD).

ECD was originally developed to ensure that assessments didn’t just produce scores, but that every score had evidence or data about what someone knows and can do. Instead of measuring outputs in isolation, it links claims, evidence, and tasks in a systematic way:

Claim: Define what you want to say about the learner.

In a workplace, this might sound like: “Can adapt workflows with AI responsibly.”

Evidence: Specify what behaviors would prove that claim.

For example, identifying risks, setting evaluation metrics, running tests, and documenting mitigations.

Task: Design authentic activities that elicit that evidence.

For example: revising a flawed AI output under time pressure, with scaffolding first and independence later.

What makes ECD particularly valuable is that it resists the trap of static snapshots. It pushes us to ask: If this is the claim, what would I need to see over time to be convinced? In other words, it bakes trajectory and context into the design itself.

In language learning, this shift was transformative. Rather than treating a grammar quiz as proof of proficiency, frameworks like ACTFL or the CEFR moved toward performance descriptors in real contexts: Can the learner negotiate meaning in conversation? Can they use language to complete a task? Similarly, Poehner & Infante (2019) showed how Vygotskian dynamic assessment uses scaffolding not just to test, but to mediate development. Assessments then become part of the growth process itself.

For workplace learning, the same principle applies. A claim about “AI fluency” is meaningless without evidence of how someone interacts with the technology in context. And evidence is meaningless without tasks designed to reveal it. As Black & Wiliam (1998) argued in their “Inside the Black Box” work, assessment is not just about measurement it’s one of the strongest levers for learning if designed well. (Add in constructive, immediate feedback and we are really getting somewhere).

Rubrics and Task Mapping: The “How”

In my own work, I use rubrics to capture the deeper human capabilities needed in an AI-shaped workplace, for example, we can look at the core skills for human-AI collaboration, such as problem framing, iteration, metacognition, and ethics. I’ll dive into that in a separate blog soon.

Example core skills that form the foundation of human–AI collaboration.

Carson offers a complementary practice: task-mapping enterprise critical roles (ECRs). He suggests identifying 3–4 high-volume tasks per critical role (especially those newly AI-affected) and mapping evidence of “what good looks like,” the signals available, and the KPI targets.

Together, rubrics and task maps provide both the depth (how we think) and the breadth (where volatility strikes) that a skills strategy requires.

Volatility Index + AI Maturity Curve

Carson proposes building a volatility index as a way to quantify which tasks are most at risk of disruption. I see this as a powerful diagnostic lens that naturally feeds into the AI Literacy → Fluency → Enablement maturity curve.

Identifying volatility and moving through it.

The index via volatility clusters tells us where we are fragile. The curve shows us how to movey forward.

Side by side, they create both urgency and a pathway.

Analogy: Cloud Migration and Language Learning

Carson points to the 2010s cloud migration as an earlier example of volatility: training failed not because people couldn’t “use the tool” but because the work itself changed.

I’ve seen the same pattern in language learning. Passing grammar exams doesn’t prepare you for the messy, real-time flux of conversation. Likewise, training for AI must go beyond tool use, it must rewire how we think and work.

Multiple Lenses, Richer Pictures

I’m surprised (and then I’m also not) how much we can actually glean from language acquisition and assessment methods. Language assessment uses self-reflection, peer feedback, and teacher evaluation/observation in context and nowadays in many cases AI-supported or system-captured perspectives. Each lens reveals a different aspect of growth.

We grow through self, peers, managers, and systems together.

Workplace assessment should do the same. Self captures ownership and self awareness. Peer feedback to highlight collaboration and the space where connection matters. Manager feedback/observations to contextualize in the organizational frame mapped to performance and overarching goals. And finally, systems analytics to surface patterns of working. Together, they create a fairer picture of skills in motion.

Using AI in Assessment with Caution

Just as automated scoring in language testing risks penalizing accents, AI workplace tools can introduce bias or over-automation. AI can scaffold, record, or surface insights, but it should augment, not replace, human judgment.

Blueprint for Practice

If we take these lessons seriously, workplace assessments can look more like the best language classrooms: authentic tasks, tracked progress, and blended perspectives.

Imagine defining future-critical “can do” statements, designing role-plays and simulations, capturing not just outputs but learning journeys, applying rubrics that reveal growth, and triangulating self, peer, and system input.

That’s not a checklist, it’s a mindset. A commitment to measuring growth in motion rather than fitting people into snapshots.

Carson names the structural challenge, volatility. I extend the lens to the human experience: how we think, decide, and create alongside AI.

This is where metacognition, curiosity, and culture become anchors. Volatility may be inevitable, but with the right design, it can become fuel for resilience and reinvention.

What do you think?

Join in on the discussion via LinkedIn.

References

Future of Work & AI

World Economic Forum. (2025). Future of Jobs Report 2025.

https://www.weforum.org/reports/future-of-jobs-report-2025

OECD. (2021). OECD Skills Outlook 2021: Learning for Life.

https://www.oecd.org/education/oecd-skills-outlook-2021-e82f7c32-en.htm

Tankelevitch, L., Kewenig, V., Simkute, A., Scott, A. E., Sarkar, A., & Sellen, A. (2024). The Metacognitive Demands and Opportunities of Generative AI. CHI 2024.

https://arxiv.org/abs/2403.19519

National Institute of Standards and Technology (NIST). (2023). AI Risk Management Framework (RMF).

https://www.nist.gov/itl/ai-risk-management-framework

U.S. Equal Employment Opportunity Commission (EEOC). (2023). Assessing Adverse Impact in Software, Algorithms, and Artificial Intelligence used in Employment Selection Procedures.

https://www.eeoc.gov/ai/assessing-adverse-impact

Skills Volatility & Organizational Learning

Carson, B. (2025). Skills Volatility: Preparing for a Future of Rapid Task Change. LinkedIn Article/Post.

https://www.linkedin.com/pulse/skills-volatility-brandon-carson

Organizational Psychology & Learning Science

Pulakos, E. D., Arad, S., Donovan, M. A., & Plamondon, K. E. (2000). Adaptability in the Workplace: Development of a Taxonomy of Adaptive Performance. Journal of Applied Psychology, 85(4), 612–624.

https://doi.org/10.1037/0021-9010.85.4.612

DeRue, D. S., Ashford, S. J., & Myers, C. G. (2012). Learning Agility: A Construct Whose Time Has Come. Industrial and Organizational Psychology, 5(3), 258–279.

https://doi.org/10.1111/j.1754-9434.2012.01443.x

Lievens, F., & Sackett, P. R. (2012). The Validity of Interpersonal Skills Assessment via Situational Judgment Tests for Predicting Academic Success and Job Performance. International Journal of Selection and Assessment, 20(4), 339–344.

https://doi.org/10.1111/j.1468-2389.2012.00518.x

Black, P., & Wiliam, D. (1998). Inside the Black Box: Raising Standards Through Classroom Assessment. Phi Delta Kappan, 80(2), 139–148.

https://doi.org/10.1177/003172171009200510

Hattie, J., & Timperley, H. (2007). The Power of Feedback. Review of Educational Research, 77(1), 81–112.

https://doi.org/10.3102/003465430298487

Language Learning, Culture & Assessment

Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford University Press.

Canale, M., & Swain, M. (1980). Theoretical Bases of Communicative Approaches to Second Language Teaching and Testing. Applied Linguistics, 1(1), 1–47.

ACTFL. (2012). ACTFL Proficiency Guidelines 2012.

https://www.actfl.org/resources/actfl-proficiency-guidelines-2012

Kramsch, C. (1993). Context and Culture in Language Teaching. Oxford University Press.

Council of Europe. (2001/2020 updates). Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR).

https://www.coe.int/en/web/common-european-framework-reference-languages

Poehner, M. E., & Infante, P. (2019). Mediated Development: A Vygotskian Approach to Transforming Second Language Learner Abilities. Frontiers in Psychology, 10, 853.

https://doi.org/10.3389/fpsyg.2019.00853

Part 3: From Language Learning to Human–AI Collaboration: Rethinking How We Measure Skills in Motion

Recent Posts

Comments