What Scores Fron the Diagnosic Assessment of Reading

Introduction

Many existing learning and cess systems generate scores, levels, and ranks to evaluate students' learning outcomes. This single outcome evaluation class has caused many problems, such as hurting students' self-esteem, heightening excessive competition, and increasing the learning brunt, which are not conducive to the overall evolution of students (Lei, 2020). Therefore, new approaches are needed to improve effect evaluation in the stage of basic pedagogy by keeping the evaluation content consistent with the curriculum criteria, providing diagnostic information on students' strengths and weaknesses in learning, and offer evidence for schools to implement intervention measures.

Cerebral diagnostic models (CDMs) are confirmatory latent course models that combine cognitive theory and psychometric models to reveal the innate structure of a given ability by estimating an individual's noesis and skill mastery land (Leighton and Gierl, 2007). CDMs can group examinees into similar latent classes and thus tin can recoup for the deficiency of unmarried effect results generatedvia classical exam theory and traditional item response theory (Ravand and Robitzsch, 2018). Due to the need for formative evaluation and instructions, CDMs take become popular in educational settings. Notwithstanding, Ravand and Baghaei (2020) noted that over 95% of CDM studies are methodological or simulation-oriented, approximately 4% are retrofitting, and less than 1% focus on real diagnostic test evolution in contempo decades. Therefore, real CDM awarding studies have rarely found their ways into educational systems, probably because of the lack of reliability and validity prove and thus limited conviction in the information provided by CDMs (Sessoms and Henson, 2018). In that location is still a wide gap between CDMs and educational practices, and truthful CDM studies to develop diagnostic tests from scratch are urgently needed (Alderson, 2010; Sessoms and Henson, 2018; Ravand and Baghaei, 2020).

CDA Framework

One of the ultimate purposes of CDMs is to brand inferences about what attributes an examinee has mastered using a diagnostic assessment. That is, CDA offers valuable information on the diagnostic quality of test items too as the skill mastery patterns of test-takers, classifying those who have not mastered the item's required skills, named non-masters, equally distinct from those who have, named masters. The CDA frameworks have been proposed and optimized since Rupp and Templin (2008) published the starting time didactic introduction (de la Torre and Chiu, 2016; Ravand and Baghaei, 2020). In general, the structure of CDA depends on two major elements: the implicit theory department and the CDM section.

The offset step in CDA is to specify the implicit attributes that a exam-taker must possess to solve an item. The generic term "aspect" is divers as posited noesis and thinking skill (de la Torre and Douglas, 2004) or a description of the processes, subskills, and strategies that are vital for the successful execution of a particular examination (Leighton et al., 2004). In one case the target attributes are defined via domain experts or recollect-aloud protocols, individual test items tin be coded at the point of detail development as a Q-matrix, an incidence matrix that transforms cognitive attributes into observable detail response patterns (Tatsuoka, 1990; Li, 2011). It is essential to point out that diagnostic feedback is valid only when the attribute specification is complete, the items effectively measure the targeted attributes, and the attributes are correctly specified in the Q-matrix (Ravand and Baghaei, 2020). The quality of inferences near students is unlikely to be ensured in retrofitting studies, as they commonly include items that fail to adequately tap specific cognitive characteristics (Gierl and Cui, 2008; Chen and de la Torre, 2014).

So, CDMs are utilized to group examinees with similar skill mastery profiles, to evaluate the diagnostic capacity of items and tests and thus to reveal the degree to which they can measure the postulated attributes (Ravand and Robitzsch, 2018). CDMs brand various assumptions to reveal the innate construction of a given ability by estimating the interactions among attributes (Leighton and Gierl, 2007). That is, representative CDMs tin can mainly be classified into three types: compensatory, not-compensatory, and general models. In compensatory CDMs, mastering one or more targeted attributes can compensate for other attributes that are not mastered. The deterministic input noisy-or-gate model (DINO; Templin and Henson, 2006) and the additive CDM (A-CDM; de la Torre, 2011) are the about representative compensatory CDMs. In contrast, if an aspect has not been mastered, the probability of a correct response in the not-compensatory CDM would exist low, as other mastered attributes cannot fully compensate for information technology. Representative non-compensatory CDMs include the deterministic input noisy-and-gate model (DINA; Haertel, 1989) and the reduced reparameterized unified model (R-RUM; Hartz, 2002). General CDMs allow the estimation of both compensatory and non-compensatory interactions amongst attributes inside the same test, which has influentially led to the unification of various CDMs. The most famous general model is the general DINA model (G-DINA; de la Torre, 2011), which can be transformed into the abovementioned CDMs just by setting specific constraints to zero or changing link functions.

Similar other statistical models, a CDM has no value if information technology fits the data poorly (de La Torre and Lee, 2010). Specifically, the fettle of CDMs can be ascertained in 2 ways. Relative fit indices evaluate whether the fit of i model differs significantly from that of another, and the model with smaller relative fit values is judged to improve fit the data (Lei and Li, 2016). According to previous research, iii well-known relative fit indices are besides applicable to CDM studies, including −ii log-likelihood (−2LL), Akaike'due south information benchmark (AIC), and Bayesian information criterion (BIC; Lei and Li, 2016). In improver, accented fit indices examine the adequacy of a unmarried model (Liu et al., 2017). For instance, a model can be considered a practiced fit only if the value of the standardized root mean square residual (SRMSR) is less than 0.05 (Maydeu-Olivares, 2013; George and Robitzsch, 2015). In improver, the max χ², which is the mean of the χ² exam statistics of independence for all item pairs, was found to exist sensitive in specifying model misfit (Chen and Thissen, 1997; Lei and Li, 2016). A pregnant value of p of max χ² suggests that the model fits poorly (George and Robitzsch, 2015).

CDM Applications in Reading Tests

As ane of the about frequently assessed skills, reading is considered a prerequisite for success in school and life (Kim and Wagner, 2015). As complex and multiple-chore abilities, the innate characteristics of reading comprehension accept been widely discussed (Barnes, 2015). For case, the construction-integration model regards reading equally a significant-construction process that involves interaction betwixt both reader and text and is influenced strongly by background knowledge (Kintsch, 1991; Snowfall, 2002). This model characterized reading every bit an iterative and context-dependent process by which readers integrate information from a text (Compton and Pearson, 2016). In contrast, theorists of component models have pointed out that some important language cognition, cognitive processes, and reading strategies make relatively independent contributions to reading comprehension (Cain et al., 2004; Cain, 2009). These models indicate that subcomponents of reading, including only not limited to vocabulary, syntax, morphology, semantics, inference, reasoning, soapbox comprehension, working retentiveness, and comprehension monitoring, are strong and persistent predictors for readers from children to adults (Aaron et al., 2008; Kim, 2017). Although many studies plant that Chinese reading and English language reading shared significantly in common (Mo, 1992; Chen et al., 1993), a consensus has not been reached on the number of subcomponents involved at different developmental stages. For example, Mo (1992) proposed that the structure of Chinese language reading displayed a "replacing developmental design." Factor analysis results of a reading test battery suggested that 75% of the variance in course-six students' reading ability was explained by six factors, including word decoding, integration and coherence, inference, retention and storage, fast reading, and transfer ability. Every bit grades increased to the secondary and high school levels, the influences of the abovementioned factors remained important but were partly replaced by newly emerged, higher-level factors such every bit generalization ability, evaluation power, and semantic inference ability.

Early research on reading cognitive diagnosis tried to explore the separability of reading power and identify whether there are relatively contained cognitive components, processes, or skills in reading ability. For case, Jang (2009) found that prove in Markov concatenation Monte Carlo aggregation supported the separability of reading into 9 attributes, and almost LanguEdge test items have skillful diagnostic and discrimination power to mensurate the attributes well. And so, CDMs have been practical to retrofit the data of large-scale reading assessments such every bit the Progress in International Reading Literacy Report (PIRLS), the Program for International Student Assessment (PISA), the Test of English as a Foreign Language (TOEFL), the Michigan English Linguistic communication Assessment Battery (MELAB), and the Iranian National University Archway Exam (east.chiliad., Jang, 2005; Sawaki et al., 2009; Li, 2011; Chen and de la Torre, 2014; Chen and Chen, 2016; Ravand, 2016; Ravand and Robitzsch, 2018; Javidanmehr and Sarab, 2019; George and Robitzsch, 2021; Toprak-Yildiz, 2021). Many studies have used ane preset CDM for reading tests, including DINA (George and Robitzsch, 2021), Fusion (Jang, 2009; Li, 2011), LCDM (Toprak and Cakir, 2021), or G-DINA (Ravand, 2016) models. Only a few compared multiple CDMs and found that general models, such every bit 1000-DINA or LCDM, had meliorate fits for reading assessment data (Chen and Chen, 2016; Li et al., 2016; Ravand and Robitzsch, 2018; Javidanmehr and Sarab, 2019). In some cases, compensatory models such every bit A-CDM or LLM have shown a relatively close fit to those of full general models (Li et al., 2016; Chen and de la Torre, 2014). Therefore, researchers chosen for further comparing of general and reduced CDMs for optimal performance and for an understanding of the interaction mechanism among reading attributes.

In the context of existent CDA applications in reading assessment, research is relatively deficient. Ane notable effort was conducted past Xie (2014), in which a reading comprehension assessment of modern Chinese prose for inferior high school students was developed and validated. Fusion model results revealed an unstructured aspect hierarchy of Chinese reading, which was composed of word decoding, formal schema, information extraction, information deduced, content assay, content generalization, and text evaluation. In addition, Toprak and Cakir (2021) examined the 2d language reading comprehension ability of Turkish adults with a cognitive diagnostic reading test using the CDA framework.

We nerveless a total of 15 relevant empirical reading studies in diverse age groups with various language backgrounds and summarized a listing of candidate attributes (run across Supplementary Table ane for details) and CDMs for the next phases of test development and analysis. This detailed review yielded 6 ordinarily specified cerebral attributes, including vocabulary, syntax, retrieving data, making inferences, integration, and evaluation. Text-related attributes, such as narrative text, expository text, and discontinuous text, were also specified in studies of PIRLS and PISA. However, the abovementioned large-scale reading assessments were more often than not designed and adult under a unidimensional detail response theory approach. CDM implementations to extract diagnostic feedback may raise severe issues with model fit, item characteristics, and diagnostic inferences for retrofitting data (Rupp and Templin, 2008; Gierl et al., 2010; Sessoms and Henson, 2018).

Primary students are in the key stages of reading development, during which they demand to transition from "learning to read" to "reading to acquire," and begin to see difficulties in new comprehension requirements (Carlson et al., 2014). The need for suitable instructions and reading materials equally scaffolding is felt mostly at the master level; therefore, assessing the extent to which the reading power and subskills of students grow is valuable during their primary school years. However, students' reading ability grows so much over the class of their schooling that a unmarried-booklet testing design for all grades is beset with problems (Brennan, 2006). Multilevel booklet designs are typically adopted, of which the contents and difficulty tin can be purposefully differed to remainder test precision and efficiency. Nevertheless, to the best of our knowledge, all CDM implementations were conducted on a single reading booklet for second language learners or course 4 students and to a higher place. Several authors (e.yard., Ravand, 2016; Sessoms and Henson, 2018) take briefly noted that CDM applications might be specific to unlike characteristics of items or students. The construct equivalence of reading attributes and the generalizability of CDMs to other primal developmental stages of reading remain unproven.

To address these bug, this study had iii goals: (a) to illustrate how the cognitive diagnostic assessment (CDA) framework can be practical to develop the Diagnostic Chinese Reading Comprehension Assessment (DCRCA) for principal students at various key stages, (b) to evaluate the aspect equivalence and model fit adequacy of the CDMs for different developmental stages, and (c) to validate the diagnostic inferences of the DCRCA nearly primary students' reading subskills. To answer these questions, the study was mostly concerned with the structure of cerebral models of Chinese reading, the model-data fit evaluation of CDMs for iii reading booklets, the validation of diagnostic psychometric properties, and the skill mastery profiles of primary students. This procedure tin shed light on the express CDA applications in reading test development and provide new methodologies for exploring reading skill construction. To the all-time of our knowledge, this is the first reading cess whose CDM model fitness, diagnostic reliability and validity were examined at diverse developmental stages.

Materials and Methods

The evolution and validation of the reading assessment followed the guidelines of the CDA framework (Ravand and Baghaei, 2020). The enquiry processes are outlined in Figure 1.

www.frontiersin.org

Figure 1. An overview of the enquiry processes.

Attributes Specification

Reading attributes were specified through multiple steps, involving domain experts and examination-takers who participated in the determination of the core reading features for farther curricular utilise.

Literature review: Candidate attributes were summarized past reviewing 15 empirically validated studies (see Supplementary Table 1), specially based on those of Chinese reading and native language reading of master students (Xie, 2014; Yun, 2017; George and Robitzsch, 2021; Toprak-Yildiz, 2021). This detailed review yielded 6 unremarkably specified cognitive attributes, including retrieving information, making inferences, integration and summation, evaluation, vocabulary, and syntax, likewise as 3 text-related attributes, including narrative text, expository text and discontinuous text.

Skilful panel'southward judgments: As reading attributes are highly dependent on the characteristics of Chinese reading and the framework of reading education, researchers invited five experts in reading assessment or teaching to obtain their judgments of big-scale reading assessments and the Chinese Linguistic communication Curriculum Criterion for Compulsory Teaching (abbreviated every bit the curriculum criterion). The "syntax" aspect was first excluded because the curriculum benchmark does non abet any grammar teaching or evaluation at the primary school level but emphasizes helping students cover naturally occurring materials in a real linguistic communication environment (Ministry building of Teaching, 2011). Vocabulary is considered every bit important equally reading comprehension at the chief level, and therefore, this skill was excluded and evaluated past the Chinese Character Recognition Assessment in the test battery. Infrequent attributes were too discussed example past case. For example, formal schema (Xie, 2014) was excluded because it might blend text evaluation with text-type attributes. The importance of literary text (i.e., narrative text and poetry) at the master level has been emphasized by the curriculum benchmark as well as large-scale assessments, including the PIRLS and PISA. However, inconsistencies in other text types accept been observed. The curriculum criterion merges expository text (extracted from PIRLS) and discontinuous text (extracted from PISA) into practical text, as they have similarities in their reading objectives and strategies (Compulsory Instruction Curriculum and Textbook Committee of the Ministry of Education, 2012). After discussion, all experts agreed that this inconsistency was worth further evaluation via empirical results.

Educatee think-aloud protocols: To clarify the cognitive procedures that test-takers went through, 15 students from grades 2 to 6 were selected for think-aloud protocols. These students verbalized their thoughts when solving sample items. According to their answers and oral explanations, researchers identified clues to cognitive processes with an eye on the attributes inferred from the previous procedures. Overall, researchers specified and defined an initial gear up of viii attributes that might be crucial for primary schoolhouse students (Table i).

www.frontiersin.org

Tabular array 1. Definitions of the initial reading attributes.

Examination Development

According to the curriculum criterion, reading pedagogy can exist divided into 3 key stages at the primary level. Cardinal phase one is for grades 1 to 2, key stage two is for grades 3 to 4, and key phase three is for grades 5 to 6. Therefore, three booklets of reading diagnosis items were compiled for students at each fundamental stage. An initial mutual Q-matrix for the three booklets was intentionally designed, equally each particular reflects i of the four cerebral processes of reading comprehension (α₁–α₄) and one text-related attribute (α₅, α_6a, and α_6b). The genre and complexity of texts were controlled, every bit they were important factors in assessing reading comprehension (Collins et al., 2020). Fragments of literary texts (including fairy tales, stories, fables, narratives, novels, and children's poems) and practical texts (including explanatory texts, simple argumentative articles, and discontinuous texts) were advisedly selected and modified equally item stems. A Chinese readability formula (Liu et al., 2021) was adopted to calculate the length, token types, lexical difficulty, role word ratio, and overall difficulty of each text. The boilerplate text length of the three booklets ranges from 150.60 to 278.57 characters, and the average text difficulty levels for the three booklets are 3.38, 3.69, and four.twoscore (for details, please run across Supplementary Table 2). Therefore, the three booklets are composed of conceptually appropriate short texts with increased complexity.

The item generation procedures were as follows: mapping cognitive and text-blazon attributes to compile 73 draft multiple-choice items, an expert review to cantankerous-validate the Q-matrix, and item refinement following the skilful review. Then, after the kickoff pilot using 2 booklets for grade 1–2 and 3–6 students (n = 378), 17 problematic items were removed according to the item discrimination index (particular-total correlation <0.19), and several items were modified. Grade 1 students were excluded from further report considering they could non adapt to the computer assessment procedures. The second pilot included 56 items in 3 booklets, and each booklet consisted of 18–twenty items. Pilot information were obtained from 5,949 class 2–6 students. Both classical examination theory and a 2PL detail response model analysis were conducted. Five items with unsatisfactory discrimination (item-total correlation <0.30 or IRT discrimination <0.50) and three items with moderate to large differential item operation issues on gender (upshot size >0.88) were removed. A total of 48 items were retained, and four items were modified or rearranged for facility (passing rates by grade < 0.20 or > 0.90). The iv cognitive attributes were intentionally balanced in testing frequency (4 to 5 times each attribute), and the proportion of literacy and practical texts were similar in the three booklets. Therefore, as shown in the last line of Table two, the total testing frequencies of the attributes were similar in the three last booklets, with slight differences in item order and proportions of text blazon.

www.frontiersin.org

Tabular array 2. Initial Q-Matrices.

Measures

The Diagnostic Chinese Reading Comprehension Assessment (DCRCA)

DCRCA was adult as a multiple-selection, computer-based, online reading comprehension cess to identify cognitive processes used during understanding literacy or practical short passages. The last DCRCA for grades two to 6 comprises 3 booklets, and each booklet contains 16 items. These items required students to respond multiple-option questions on their comprehension of short passages. Students' responses were scored dichotomously (0 = incorrect, one = correct) for each particular. Equally already described, each item was intentionally constructed by experts to align with precisely 1 of the four processes of reading comprehension (α₁–α₄) and one text-related attribute (α_five–α_six). The full testing frequencies of the attributes were similar in the three concluding booklets, while the short passages in the three booklets were compiled with increased complexity. Cronbach's α values for the assessment of the three booklets were 0.82, 0.71, and 0.64.

The Chinese Word Recognition Assessment

The Chinese word recognition assessment was adopted for validation purposes, and information technology was adapted from the Chinese character recognition task (Li et al., 2012) to measure students' word recognition skills. Students listened to the sound of a word composed of a given Chinese character and so chose the correct graphic symbol from three distracting character options. A total of 150 character items were collected based on Chinese linguistic communication textbooks (Shu et al., 2003). The maximum score of this assessment was 150. The internal reliability of the cess was 0.91.

Sample

The written report was conducted for a regional reading education project in Changchun City, People's republic of china. The project aims to investigate the development of principal students' reading ability, recommend books suitable for reading, and provide them with respective reading courses. A total of 21,466 form 2 to grade 6 students from xx principal schools completed the assessments in November 2020, accounting for 94.ane% of the total sample. Students were aged from 7.3 to thirteen.2 years, and the proportion of male students was 52.4% in full.

Procedure

Considering the large number of students participating in the DCRCA, the organization and implementation were completed by Chinese teachers and estimator teachers of each class. Researchers trained all teachers and provided them with standardized cess manuals. The assessments were administered collectively via an online spider web page, which presented i item at a fourth dimension to students. The web folio fix all items as compulsory, so at that place was no missing value in the formal test as long every bit the educatee submitted successfully. Considering primary students' reckoner proficiency, students only needed to click medium-size options with mice to answer all questions. Students took approximately xx min to successively consummate the test battery, including the Chinese Give-and-take Recognition Assessment and the DCRCA. All students received an cess analysis report with a recommended reading list and learning suggestions one month after the testing.

Assay

Data were analyzed using R studio (R Core Squad, 2021). Every bit a correctly specified Q-matrix is considered a prerequisite of model-data fettle and depression bias in diagnostic classifications (Rupp and Templin, 2008; Kunina-Habenicht et al., 2012), both theoretical and empirical procedures (de la Torre and Chiu, 2016) were applied iteratively to obtain the all-time attribute numbers and the best detail-attribute relationships using the "GDINA" packet, version 2.eight.0 (Ma and de la Torre, 2020). The "CDM" package, version 7.5–15, was used for fitting CDMs (east.g., DINA, DINO, R-RUM, A-CDM, and Thousand-DINA) based on the MMLE/EM algorithm (George et al., 2016; Robitzsch and George, 2019). The CDM package allows the estimation of rich sets of models, fit indices, and diagnostic validity with various emphases, which can assist researchers find the near appropriate model. Two-parameter logistic item response theory (2PL-IRT) statistics were calculated using the ltm package (Rizopoulos, 2006).

Results

Q-Matrix Validation

Three types of Q-matrices were created for each booklet to evaluate the applicability of attributes. Q1 contained only the four commonly agreed-upon cognitive attributes (α_i–α₄), Q2 added two text-type attributes (α₅ and α_vi) to Q1 with reference to the curriculum benchmark, and Q3 added iii text-blazon attributes (α_five, α_6a, and α_6b) to Q1 with reference to PISA and PIRLS. These Q-matrices were compared based on the model-information fit of the 1000-DINA model and likelihood ratio test (come across Table iii).

www.frontiersin.org

Tabular array 3. Model-data fitting results for Q-matrix validation.

The SRMSR values of all Q-matrices were acceptable (below the 0.05 rule of thumb suggested by Maydeu-Olivares, 2013), while none of Q1 could exist accustomed based on the max χ². The -2LL and AIC values suggested a direction of improvement from Q1 to Q2, while the fit values of Q2 and Q3 were shut in all booklets. Likelihood ratio tests were adopted betwixt the adjacent Q-matrices inside each booklet. We institute that (1) all Q2 and Q3 values were significantly better than Q1 values (p < 0.001); (2) the -2LL and AIC differences between Q2 and Q3 were small and unstable, equally p values fluctuated around significance boundaries for booklets KS1 to KS3 (p ≈ 0.006, 1.00 and 0.049 respectively); and (3) the BIC consistently favored Q2 over Q3, as it was more than compact and efficient. In summary, the fit indices showed similarities across booklets, suggesting that the attribute structure was the same across key stages. Based on the in a higher place results, we chose Q2 every bit a basis to finalize the item-attribute relationship.

An empirical Q-matrix validation procedure was conducted on all Q2s to compare the proportion of variance deemed for (PVAF) by plausible q-vectors for a given particular (de la Torre and Chiu, 2016). A given q-vector was accounted right if it was the simplest vector with a PVAF above 0.95. The validation results suggested no modification for booklet KS2 or KS3 and generated suggested Q-vectors for items half dozen and 15 in booklet KS1. This indicated a relatively high attribute-wise agreement betwixt the provisional and data-driven Q-matrices beyond all booklets. After adept revisions and iterative modeling, researchers concluded that the suggested changes in the Q-matrix were consistent with what the item truly assessed. The likelihood ratio test suggested that the fit of finalized Q2 was significantly meliorate than that of the initial Q2 and was slightly better than that of Q3 for booklet KS1. The final Q-matrices are given in Tabular array 4.

www.frontiersin.org

Table 4. Final Q-Matrices.

Model Comparison

To select the optimal CDM for the whole cess and to reveal the relationships amidst reading attributes, we compared v representative CDMs including DINA, DINO, R-RUM, A-CDM, and G-DINA models, for each booklet using the concluding Q-matrices. As Tabular array five shows, the five CDMs performed stably across booklets. The AIC and -2LL values for the 1000-DINA models were the lowest in the iii booklets, followed by the A-CDM and the R-RUM models, while the values of the more than parsimonious DINO and DINA models were observably worse. The BIC favored A-CDM, G-DINA, and A-CDM in booklets KS1 to KS3. Likelihood ratio tests suggested that none of the other CDMs fit as adept as the M-DINA model. For the absolute fit values, the SRMSR values of all CDMs were below 0.05. However, but the G-DINA had insignificant max χ² values in all cases, indicating a good fit to the data, while the DINO and DINA models were stably rejected by the significance of max χⁱⁱ in all cases. It is evident that the G-DINA model fits the entire cess information reasonably improve than the more parsimonious reduced models.

www.frontiersin.org

Table 5. Model fit comparison of CDMs using the final Q-matrices.

Reliabilities and Validity

Blueprint accuracy (Pa) and design consistency (Pc) indices show the caste to which the examinees were accurately and consistently classified as masters and non-masters (Cui et al., 2012). Therefore, they were adopted every bit indicators of reliability in Table half-dozen. The Pa values for each split attribute were between 0.68 and 0.95, and the Pc values were between 0.63 and 0.92. Despite a lack of consensus on general guidelines for what constitutes a high or acceptable reliability (Templin and Bradshaw, 2013), these results indicated an above adequate capacity of measuring students' reading attributes.

www.frontiersin.org

Table vi. Mastery classification reliability.

Bear witness of internal validity was provided using particular mastery plots to quantify the diverse discriminatory and diagnostic capacities of test items (Roussos et al., 2007; von Davier and Lee, 2019). Effigy 2 shows the detail correct proportions for the masters versus the non-masters. The boilerplate item proportion correct difference was 0.53, and the differences in 41 out of the 48 items were greater than 0.xl. This high value indicates a good fit betwixt models and information, suggesting a stiff diagnostic ability of items and the DCRCA. In addition, this provided a valuable tool for finding poor items. For example, the differences of items 5 and 9 in booklet KS2 were smaller than 0.thirty. An in-depth test suggested that these items were difficult; therefore, the particular proportion right for masters tended to be close to that for non-masters.

www.frontiersin.org

Figure 2. Detail mastery plots.

To farther verify the external validity, the correlations between the scores on the DCRCA and the Chinese give-and-take recognition examination were calculated. Word recognition scores were positively correlated with reading scores [KS1, r (4251) = 0.69, p < 0.001, KS2, r (8863) = 0.65, p < 0.001, KS3, r (8352) = 0.57, p < 0.001]. To summarize, the results suggested that the reliability and validity of the DCRCA were satisfactory.

Skill Profiles

CDMs allocate examination-takers into latent classes, which represent skill mastery/non-mastery profiles for attributes specified in the Q-matrix. With the six-attribute Q-matrix structure, 64 theoretically existing latent classes (ii^k) were identified. For infinite considerations, only 15 skill profiles of the class 2 students are presented in Tabular array 7, as 49 classes showed lower posterior probabilities than 0.1%, suggesting that these skill classes may not be relevant to the information. Among the remaining 15 classes, the latent form [111111], mastery of all the subskills, had the highest posterior probability, followed past [000000], mastery of none of the subskills. CDM revealed that other dominant latent classes were [000011] and [111100], to which 27.15% of the exam-takers belong. The profile [000011] might reverberate children's cognition and experiences in reading specific text genres in the given items, while the profile [111100] might reverberate children's skills and experiences in answering specific reading tasks. This result supported the RAND report (RAND Reading Report Group, 2002) that mastery of the outset iv cognitive attributes and the terminal two text attributes may exist relatively independent sources of variance in dissimilar reading comprehension scores.

www.frontiersin.org

Table vii. Latent classes and posterior probabilities.

Give-and-take and Conclusion

This study adult and validated an instrument for diagnosing the strengths and weaknesses of Chinese reading comprehension ability at the primary level. Due to the criticism about a lack of true CDA research for educational purposes, the DCRCA was designed to meet the requirements of the Chinese curriculum criterion under the CDA framework proposed past Ravand and Baghaei (2020). Multiple steps were applied to maximize the diagnostic capacity and effectiveness of the DCRCA, including (1) gathering information almost previous reading models and assessments; (ii) specifying attribute lists based on the literature, student think-aloud protocols and skillful review; (3) standardized examination development and pilots; (four) empirical comparisons and refinements of Q-matrices and CDMs; and (five) reliability and validity analyses using the formal examination information. The results indicate that the overall quality of the DCRCA is satisfactory and that the diagnostic classifications are reliable, accurate, and valid.

Following multiple procedures of aspect specification, model-data fit comparison, and empirical validation, the Q-matrix construction results yielded six concluding reading attributes, including four cognitive attributes that are consistent with cognitive processing and previous empirical studies of reading and 2 text-related attributes that were synthesized from large-calibration cess frameworks and the Chinese curricular benchmark. Adding text-related attributes significantly improved the model-data fits of Q-matrices, implying that businesslike or background knowledge of different text types might be vital in successful reading. The literacy text aspect is consequent with previous enquiry, while the applied text attribute is a newly extracted attribute in CDM studies on reading. Our attempts to combine expository text with discontinuous text attributes may reveal their similarity in reading strategies and worth farther investigation. The validation of text-related attributes likewise improved the awarding value and telescopic of the DCRCA because these attributes come from the experiences of educators and thus might be easier to recognize and railroad train (Perfetti et al., 2005). Besides, the six-aspect structure has been scrutinized equally a theoretical framework of reading comprehension for students at unlike developmental stages. This result provides evidence regarding the construct of primary-level Chinese reading and the DCRCA from theoretical and empirical perspectives.

The choice of the CDMs is critical in all CDA studies, as the optimal model not simply caters to the diagnostic demands of the assessment but as well reveals the interrelationships of attributes in the given domain. V representative CDMs were compared, and the superiority of the Thousand-DINA model was supported by all booklets and model-data fit. Therefore, information technology is rubber to analyze the DCRCA with the saturated Yard-DINA model, which appeared to be flexible in all-around diverse relationships among reading skills (Chen and Chen, 2016; Li et al., 2016; Ravand, 2016). The A-CDM model performed the closest level of fit indices to the Grand-DINA model. From a theoretical perspective, the A-CDM model could exist a special case of the G-DINA model by only estimating the main effects of attributes, as the departure between the two models is that G-DINA allows additional estimation of interactions among latent skills (de la Torre, 2011). Therefore, given that the majority of the DCRCA items were designed to map one of the cognitive processes and one text blazon of reading, our findings back up Stanovich'southward (1980) interactive view of reading that holds both cognitive processes and text-related attributes to exist crucial and interactive in successful execution of reading comprehension.

In improver, our results showed that the accented fit indices preferred neither compensatory (A-CDM and DINO) nor not-compensatory (R-RUM and DINA) types of CDM, and max χ² rejected all the reduced models in booklet KS2. Consequently, electric current results are non enough to assert that the relationship of reading attributes is either compensatory or non-compensatory. This is consistent with the findings of Jang (2009), Li et al. (2016), and Javidanmehr and Sarab (2019), who also voted for the co-existence of compensatory and non-compensatory relationships amongst the latent reading subcomponents.

The present study examined the diagnostic reliability and validity of the DCRCA. Reliability bear witness is by and large considered essential support for interpreting examination results. The pattern accuracy and consistency index (Cui et al., 2012) suggested that the DCRCA reliably measures multiple reading attributes. Validity analyses are rarely conducted, with less than 22% of studies providing such information co-ordinate to the literature review (Sessoms and Henson, 2018). Therefore, construct, internal, and external validities are provided for the Q-matrix and the DCRCA. The Q-matrix validation results suggest that the provisional Q-matrices accept an approximately 95% aspect-wise agreement charge per unit across booklets, which provides potent evidence for the construct validity of Q-matrix constructions (Deonovic et al., 2019). The internal validity evidence showed that the average proportion correct differences for each item were sufficiently big for most of the test items, indicating that these items have satisfying diagnostic capacity to differentiate masters from non-masters of reading. The mean score differences of simply iv% of the items were less than 0.3, much lower than the proportion of 23% in retrofitted studies (Jang, 2009). This might exist because retrofitting studies had to include many items that were weakly associated with targeted attributes. The possible presence of nondiagnostic items could lead to critical issues in the validity of measures of skill competencies, and thus, the test inferences might exist limited.

The nowadays study contributes to instructional practices at the elementary school level, as the assessment tin provide reliable, valid and useful diagnostic information. This is the first empirical study that attempts to provide testify in construct invariance of diagnosing Chinese reading attributes at different chief grades. Equally reading cess can function every bit formative assessment, such diagnostic feedback could be further utilized by teachers and educators for monitoring learning progressions, providing remedial instructions for reading courses and programs. Withal, some limitations are besides worth enumerating. Showtime, the present inquiry did not examine how diagnostic feedback is perceived and utilized by students and teachers in a classroom setting. More studies are needed to reveal the influences of CDA applications. Second, as the DCRCA was not equated vertically, the attribute mastery states can be compared simply inside each key stage. Future studies are needed to use appropriate longitudinal CDMs (Zhan et al., 2019) or vertical equating methods (von Davier et al., 2008) for CDA to investigate the developmental course of students' reading attributes. Third, the present study did not include a sufficient number of items to assess attributes a_6a and a_6b. The Q-matrices may not be exhaustive to capture all reading comprehension and probable atomic number 82 to limitations of the present written report. Therefore, caution should be taken in interpreting our terminal results, and explorations of a more balanced Q-matrix construction are needed in the hereafter. Final, although the results related to model fit and detail parameters were fairly acceptable, future research should seek to improve the psychometric backdrop to make the report inferences more reliable. Therefore, the written report was only a get-go. A deeper understanding of CDM application may exist deduced by interpreting the ascendant skill classes every bit learning states and the combination of skill classes as learning paths and learning progressions (Wu et al., 2020). Futurity studies are needed to help instructors design suitable learning plans with fine-grained diagnostic reports of students. In addition, more well-designed items can exist generated and scaled as formative and summative assessments to satisfy expectations from the curriculum criterion. With the assistance of the DCRCA, teachers could design their own classroom reading materials and assessments as learning objectives that they wish students to attain.

Information Availability Statement

The raw data supporting the conclusions of this article will be fabricated available by the authors, without undue reservation.

Ethics Statement

The studies involving human participants were reviewed and approved past Ethics and Human Safety Committee, Faculty of Psychology, Beijing Normal University. Written informed consent to participate in this report was provided by the participants' legal guardian/next of kin.

Writer Contributions

YL and JL conceived the study. YL and MZ organized the pilots and analyzed the original data. MZ developed the test items and conducted the recollect-aloud protocols. YL collected the formal test, analyzed the data, and wrote the manuscript. JL provided technical advices. All authors contributed to the article and approved the submitted version.

Funding

This piece of work was supported past the National Natural Science Foundation of Red china (31861143039) and the National Key R&D Program of Mainland china (2019YFA0709503).

Conflict of Interest

The authors declare that the inquiry was conducted in the absence of whatever commercial or financial relationships that could be construed equally a potential conflict of interest.

Publisher'southward Annotation

All claims expressed in this article are solely those of the authors and practise non necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may exist evaluated in this commodity, or merits that may be made past its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Textile for this article can be found online at: https://world wide web.frontiersin.org/articles/10.3389/fpsyg.2021.786612/full#supplementary-material

References

Aaron, P. Chiliad., Joshi, R. M., Gooden, R., and Bentum, K. E. (2008). Diagnosis and treatment of Reading disabilities based on the component model of reading: an alternative to the discrepancy model of LD. J. Acquire. Disabil. 41, 67–84. doi: 10.1177/0022219407310838

PubMed Abstract | CrossRef Full Text | Google Scholar

Alderson, J. C. (2010). "Cognitive diagnosis and Q-matrices in language assessment": a commentary. Lang. Assess. Q. 7, 96–103. doi: 10.1080/15434300903426748

CrossRef Full Text | Google Scholar

Barnes, M. A. (2015). "What exercise models of Reading comprehension and its development have to contribute to a science of comprehension teaching and assessment for adolescents?" in Improving Reading Comprehension of Middle and High Schoolhouse Students. eds. Santi, G. 50., and Reed, D. Thou. (Cham: Springer International Publishing), 1–18.

Google Scholar

Brennan, R. L. (2006). Educational Measurement. fourth Edn. Rowman & Littlefield Publishers.

Google Scholar

Cain, Chiliad. (2009). Making Sense of Text: Skills that Support Text Comprehension and Its Development, Perspectives on Language and Literacy. Springer.

Google Scholar

Cain, K., Oakhill, J., and Bryant, P. (2004). Children's Reading comprehension ability: concurrent prediction by working retention, verbal ability, and component skills. J. Educ. Psychol. 96, 31–42. doi: 10.1037/0022-0663.96.i.31

CrossRef Full Text | Google Scholar

Carlson, Southward. Due east., Seipel, B., and McMaster, K. (2014). Evolution of a new reading comprehension cess: identifying comprehension differences amongst readers. Larn. Individ. Differ. 32, forty–53. doi: 10.1016/j.lindif.2014.03.003

CrossRef Full Text | Google Scholar

Chen, H., and Chen, J. (2016). Retrofitting non-cognitive-diagnostic reading assessment Under the generalized DINA model framework. Lang. Assess. Q. 13, 218–230. doi: 10.1080/15434303.2016.1210610

CrossRef Full Text | Google Scholar

Chen, J., and de la Torre, J. (2014). A procedure for diagnostically modeling extant large-scale assessment data: the instance of the program for international pupil cess in Reading. Psychology 05:1967. doi: x.4236/psych.2014.518200

CrossRef Full Text | Google Scholar

Chen, West.-H., and Thissen, D. (1997). Local dependence indexes for item pairs using detail response theory. J. Educ. Behav. Stat. 22, 265–289. doi: 10.3102/10769986022003265

CrossRef Full Text | Google Scholar

Collins, A. A., Compton, D. Fifty., Lindström, E. R., and Gilbert, J. 1000. (2020). Performance variations across reading comprehension assessments: examining the unique contributions of text, activity, and reader. Read. Writ. 33, 605–634. doi: ten.1007/s11145-019-09972-v

CrossRef Total Text | Google Scholar

Common Core State Standards Initiative (2010). Mutual Cadre Country Standards for English language Arts & Literacy in History/Social Studies, Science, and Technical Subjects. In Common Core Land Standards Initiative.

Google Scholar

Compulsory Education Curriculum and Textbook Commission of the Ministry of Educational activity (2012). Interpretation of Chinese language curriculum standard for compulsory teaching (2011 Edition) (in Chinese). Beijing: Higher Educational activity Press.

Google Scholar

Compton, D. 50., and Pearson, P. D. (2016). Identifying robust variations associated with reading comprehension skill: the search for force per unit area points. J. Res. Educ. Result nine, 223–231. doi: 10.1080/19345747.2016.1149007

CrossRef Full Text | Google Scholar

Cui, Y., Gierl, M. J., and Chang, H.-H. (2012). Estimating classification consistency and accurateness for cognitive diagnostic cess. J. Educ. Meas. 49, 19–38. doi: 10.1111/j.1745-3984.2011.00158.ten

CrossRef Full Text | Google Scholar

de La Torre, J., and Lee, Y.-S. (2010). A note on the invariance of the DINA model parameters. J. Educ. Meas. 47, 115–127. doi: 10.1111/j.1745-3984.2009.00102.x

CrossRef Total Text | Google Scholar

Deonovic, B., Chopade, P., Yudelson, G., de la Torre, J., and von Davier, A. A. (2019). "Application of cognitive diagnostic models to learning and assessment systems." in Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages. eds. von Davier, M., and Lee, Y.-S.. (Springer, Cham: Springer International Publishing).

Google Scholar

George, A. C., and Robitzsch, A. (2015). Cognitive diagnosis models in R: a didactic. Quant. Meth. Psych. 11, 189–205. doi: 10.20982/tqmp.11.three.p189

CrossRef Full Text | Google Scholar

George, A. C., and Robitzsch, A. (2021). Validating theoretical assumptions almost reading with cognitive diagnosis models. Int. J. Test. 21, 105–129. doi: 10.1080/15305058.2021.1931238

CrossRef Full Text | Google Scholar

George, A. C., Robitzsch, A., Kiefer, T., Groß, J., and Ünlü, A. (2016). The R bundle CDM for cognitive diagnosis models. J. Stat. Softw. 74, 1–24. doi: 10.18637/jss.v074.i02

CrossRef Full Text | Google Scholar

Gierl, M. J., Alves, C., and Majeau, R. T. (2010). Using the attribute bureaucracy method to make diagnostic inferences nearly examinees' knowledge and skills in mathematics: an operational implementation of cerebral diagnostic assessment. Int. J. Test. 10, 318–341. doi: 10.1080/15305058.2010.509554

CrossRef Full Text | Google Scholar

Gierl, 1000. J., and Cui, Y. (2008). Defining characteristics of diagnostic classification models and the problem of retrofitting in cerebral diagnostic assessment. Measurement half-dozen, 263–268. doi: 10.1080/15366360802497762

CrossRef Total Text | Google Scholar

Grabe, W. (2009). Reading in a Second Linguistic communication: Moving From Theory to Practice. Cambridge University Printing; Cambridge Cadre.

Google Scholar

Haertel, E. H. (1989). Using restricted latent class models to map the skill construction of achievement items. J. Educ. Meas. 26, 301–321. doi: 10.1111/j.1745-3984.1989.tb00336.10

CrossRef Total Text | Google Scholar

Hartz, Due south. M. (2002). A Bayesian framework for the Unified Model for assessing cognitive abilities: Blending theory with practicality. [doctoral dissertation]. University of Illinois at Urbana-Champaign.

Google Scholar

Jang, E. E. (2005). A validity narrative: Furnishings of reading skills diagnosis on teaching and learning in the context of NG TOEFL. [doctoral dissertation]. University of Illinois at Urbana-Champaign.

Google Scholar

Jang, E. Due east. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: validity arguments for fusion model awarding to LanguEdge cess. Lang. Exam. 26, 031–073. doi: x.1177/0265532208097336

CrossRef Full Text | Google Scholar

Javidanmehr, Z., and Sarab, One thousand. R. A. (2019). Retrofitting non-diagnostic Reading comprehension cess: application of the Chiliad-DINA model to a high stakes reading comprehension test. Lang. Assess. Q. 16, 294–311. doi: ten.1080/15434303.2019.1654479

CrossRef Full Text | Google Scholar

Kim, Y.-S. M. (2017). Why the simple view of Reading is not simplistic: unpacking component skills of reading using a direct and indirect upshot model of Reading (DIER). Sci. Stud. Read. 21, 310–333. doi: x.1080/10888438.2017.1291643

CrossRef Full Text | Google Scholar

Kim, Y.-South. G., and Wagner, R. Thousand. (2015). Text (Oral) Reading fluency every bit a construct in Reading evolution: an investigation of its mediating function for children from grades 1 to 4. Sci. Stud. Read. nineteen, 224–242. doi: 10.1080/10888438.2015.1007375

CrossRef Total Text | Google Scholar

Kintsch, W. (1991). "The role of knowledge in discourse comprehension: a structure-integration model." in Advances in Psychology. Vol. 79, eds. Stelmach, G. Due east., and Vroon, P. A. (North-The netherlands), 107–153.

Google Scholar

Kunina-Habenicht, O., Rupp, A. A., and Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. J. Educ. Meas. 49, 59–81. doi: 10.1111/j.1745-3984.2011.00160.10

CrossRef Full Text | Google Scholar

Lei, X. (2020). On the Problems of Educational Evaluation Reform in China. Communist china Examinations, No. 341(09), thirteen–17.

Google Scholar

Lei, P.-West., and Li, H. (2016). Operation of fit indices in choosing correct cognitive diagnostic models and Q-matrices. Appl. Psychol. Meas. 40, 405–417. doi: 10.1177/0146621616647954

PubMed Abstract | CrossRef Full Text | Google Scholar

Leighton, J. P., and Gierl, M. J. (2007). Cerebral Diagnostic Assessment for Education: Theory and Applications (1st Version). Cambridge University Press.

Google Scholar

Leighton, J. P., Gierl, M. J., and Hunka, S. Chiliad. (2004). The attribute hierarchy method for cognitive assessment: a variation on Tatsuoka'due south rule-space approach. J. Educ. Meas. 41, 205–237. doi: 10.1111/j.1745-3984.2004.tb01163.x

CrossRef Full Text | Google Scholar

Li, H., Hunter, C. Five., and Lei, P. -W. (2016). he pick of cognitive diagnostic models for a reading comprehension test. Language Testing. 33, 391–409. doi: x.1z177/0265532215590848

CrossRef Full Text | Google Scholar

Li, H., Shu, H., McBride-Chang, C., Liu, H., and Peng, H. (2012). Chinese children's grapheme recognition: visuo-orthographic, phonological processing and morphological skills. J. Res. Read. 35, 287–307. doi: 10.1111/j.1467-9817.2010.01460.x

CrossRef Full Text | Google Scholar

Liu, R., Huggins-Manley, A. C., and Bulut, O. (2017). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educ. Psychol. Meas. 78, 357–383. doi: x.1177/0013164416685599

CrossRef Total Text | Google Scholar

Liu, One thousand., Li, Y., Wang, X., Gan, Fifty., and Li, H. (2021). Leveled Reading for primary students: structure and evaluation of Chinese readability formulas based on textbooks. Appl. Linguis. 2, 116–126. doi: 10.16499/j.cnki.1003-5397.2021.02.010

CrossRef Full Text | Google Scholar

Ma, W., and de la Torre, J. (2020). GDINA: An R package for cognitive diagnosis Modeling. J. Stat. Softw. 93, 1–26. doi: 10.18637/jss.v093.i14

CrossRef Full Text | Google Scholar

Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models. Measurement 11, 71–101. doi: x.1080/15366367.2013.831680

CrossRef Full Text | Google Scholar

Ministry of Teaching (2011). The Chinese Language Curriculum Benchmark for Compulsory Teaching. 2011th Edn. Beijing Normal University Press.

Google Scholar

Mo, L. (1992). Study on the characteristics of the development Of Chinese reading ability structure of heart and primary school students. Acta Psychol. Sin. 24, 12–twenty.

Google Scholar

O'Reilly, T., and Sheehan, Grand. Thou. (2009). Cognitively based cess of, for, and equally learning: a framework for assessing reading competency. ETS Res. Rep. Ser. 2009, i–43. doi: 10.1002/j.2333-8504.2009.tb02183.ten

CrossRef Full Text | Google Scholar

Perfetti, C. A., Landi, N., and Oakhill, J. (2005). "The Acquisition of Reading Comprehension Skill." in The Science of Reading: A Handbook. eds. Snowling, M. J., and Hulme, C. (Blackwell Publishing)

Google Scholar

RAND Reading Study Group (2002). Reading for Understanding: Toward an R&D Programme in Reading Comprehension. Santa Monica, CA: RAND.

Google Scholar

Ravand, H. (2016). Application of a cognitive diagnostic model to a loftier-stakes reading comprehension test. J. Psychoeduc. Assess. 34, 782–799. doi: 10.1177/0734282915623053

CrossRef Full Text | Google Scholar

Ravand, H., and Baghaei, P. (2020). Diagnostic nomenclature models: recent developments, practical bug, and prospects. Int. J. Test. 20, 24–56. doi: 10.1080/15305058.2019.1588278

CrossRef Full Text | Google Scholar

Ravand, H., and Robitzsch, A. (2018). Cerebral diagnostic model of best choice: a study of reading comprehension. Educ. Psychol. 38, 1255–1277. doi: ten.1080/01443410.2018.1489524

CrossRef Total Text | Google Scholar

Rizopoulos, D. (2006). Ltm: An R bundle for latent variable modelling and detail response theory analyses. J. Stat. Softw. 17, 1–25. doi: 10.18637/jss.v017.i05

CrossRef Full Text | Google Scholar

Robitzsch, A., and George, A. C. (2019). "The R package CDM for diagnostic Modeling." in Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages, 549–572. Cham: Springer International Publishing.

Google Scholar

Roussos, L. A., DiBello, 50. Five., Stout, W., Hartz, Southward. M., Henson, R. A., and Templin, J. L. (2007). "The fusion model skills diagnosis organisation." in Cerebral Diagnostic Cess for Teaching: Theory and Applications. eds. Leighton, J., and Gierl, One thousand. (Cambridge Core: Cambridge University Printing), 275–318.

Google Scholar

Rupp, A. A., and Templin, J. L. (2008). Unique characteristics of diagnostic classification models: a comprehensive review of the current state-of-the-fine art. Measurement 6, 219–262. doi: x.1080/15366360802490866

CrossRef Total Text | Google Scholar

Sawaki, Y., Kim, H.-J., and Gentile, C. (2009). Q-matrix structure: defining the link Between constructs and test items in large-calibration reading and listening comprehension assessments. Lang. Appraise. Q. half-dozen, 190–209. doi: x.1080/15434300902801917

CrossRef Total Text | Google Scholar

Sessoms, J., and Henson, R. A. (2018). Applications of diagnostic classification models: a literature review and disquisitional commentary. Measurement 16, ane–17. doi: 10.1080/15366367.2018.1435104

CrossRef Full Text | Google Scholar

Shu, H., Chen, Ten., Anderson, R. C., Wu, N., and Xuan, Y. (2003). Backdrop of school Chinese: implications for learning to read. Child Dev. 74, 27–47. doi: ten.1111/1467-8624.00519

PubMed Abstract | CrossRef Total Text | Google Scholar

Stanovich, K. E. (1980). Toward an interactive-compensatory model of individual differences in the development of reading fluency. Read. Res. Q. 16, 32–71. doi: 10.2307/747348

CrossRef Total Text | Google Scholar

Tatsuoka, Grand. G. (1990). "Toward an integration of particular-response theory and cognitive error diagnosis." in Diagnostic Monitoring of Skill and Knowledge Acquisition. eds. Frederiksen, N., Glaser, R., Lesgold, A., and Shafto, Thou. 1000. (Hillsdale, NJ: Erlbaum), 453–488.

Google Scholar

Templin, J., and Bradshaw, 50. (2013). Measuring the reliability of diagnostic classification model examinee estimates. J. Classif. 30, 251–275. doi: 10.1007/s00357-013-9129-4

CrossRef Total Text | Google Scholar

Toprak, T. E., and Cakir, A. (2021). Examining the L2 reading comprehension ability of adult ELLs: developing a diagnostic test within the cerebral diagnostic assessment framework. Lang. Examination. 38, 106–131. doi: x.1177/0265532220941470

CrossRef Full Text | Google Scholar

Toprak-Yildiz, T. East. (2021). An international comparison using cognitive diagnostic assessment: fourth graders' diagnostic profile of reading skills on PIRLS 2016. Stud. Educ. Eval. lxx:101057. doi: ten.1016/j.stueduc.2021.101057

CrossRef Full Text | Google Scholar

van Dijk, T. A., and Kintsch, Westward. (1983). Strategies of Discourse Comprehension. New York: Academic Press.

Google Scholar

von Davier, A., Carstensen, C. H., and von Davier, M. (2008). "Linking competencies in horizontal, vertical, and longitudinal settings and measuring growth." in Cess of Competencies in Educational Contexts. eds. Hartig, J., Klieme, E., and Leutner, D. (New York: Hogrefe & Huber), 121–149.

Google Scholar

von Davier, M., and Lee, Y.-S. (eds.) (2019). Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages. Springer International Publishing. Cham.

Google Scholar

Wu, 10., Wu, R., Chang, H.-H., Kong, Q., and Zhang, Y. (2020). International comparative study on PISA mathematics achievement test based on cerebral diagnostic models. Front. Psychol. xi:2230. doi: 10.3389/fpsyg.2020.02230

PubMed Abstract | CrossRef Total Text | Google Scholar

carpenteroneven.blogspot.com

Source: https://www.frontiersin.org/articles/10.3389/fpsyg.2021.786612/full

What Scores Fron the Diagnosic Assessment of Reading

Introduction

CDA Framework

CDM Applications in Reading Tests

Materials and Methods

Attributes Specification

Examination Development

Measures

The Diagnostic Chinese Reading Comprehension Assessment (DCRCA)

The Chinese Word Recognition Assessment

Sample

Procedure

Assay

Results

Q-Matrix Validation

Model Comparison

Reliabilities and Validity

Skill Profiles

Give-and-take and Conclusion

Information Availability Statement

Ethics Statement

Writer Contributions

Funding

Conflict of Interest

Publisher'southward Annotation

Supplementary Material

References

0 Response to "What Scores Fron the Diagnosic Assessment of Reading"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel