LAB: Multiple Correspondence Analysis (MCA)
M1 MIDS/MFA/LOGOS |
Année 2024 |
Besides the usual packages (tidyverse
, …), we shall require FactoMineR
and related packages.
The GSS
dataset
We will use data coming from the General Social Survey. The General Social Survey data can be retrieved using the gssr
package. If needed, install the gssr
package and its companion package gssrdoc
.
Loading required package: gssr
Package loaded. To attach the GSS data, type data(gss_all) at the console.
For the codebook, type data(gss_dict).
For the panel data and documentation, type e.g. data(gss_panel08_long) and data(gss_panel_doc).
For help on a specific GSS variable, type ?varname at the console.
Loading required package: gssrdoc
The data we will use are panel data made available by the GSS. In order to explore them, it’s very useful toload gss_dict
and gss_panel_doc
.
Check the online help.
Code
?gss_panel_doc
gss_dict
and gss_panel_doc
are dataframes that can be queried:
id | text |
---|---|
relhhd8 | 1622. What is (PERSON)'s relationship to (HEAD OF HOUSEHOLD)? |
denom16 | 115a. If Protestant: What specific denomination is that, if any? |
conjudge | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? j. U.S. Supreme Court. |
wordh | 1612. We would like to know something about how people go about guessing words they do not know. On this card are listed some words--you may know some of them, and you may not know quite a few of them. On each line the first word is in capital letters like BEAST. Then there are five other words. Tell me the number of the word that comes closest to the meaning of the word in capital letters. For example, if the word in capital letters is BEAST, you would say "4" since "animal" come closer to BEAST than any of the other words. If you wish, I will read the words to you. These words are difficult for almost everyone just give me your best guess if you are not sure of the answer. |
compwage | 1087. Do you believe your fixed annual wages in calendar year 2005 were higher or lower than those of employees with similar experience and job descriptions in other companies in your region? Please answer on a 1 to 5 scale. |
In the panel questionnaire, some questions have race
in the field id
. Check again the online help.
Code
?race
The answer is something like that:
Question 24. What race do you consider yourself?
And it contains a brief summary of the answers given through the years.
Year | Black | Other | White | iap (NA) | Total |
---|---|---|---|---|---|
2010 | 311 | 183 | 1550 | - | 2044 |
2012 | 301 | 196 | 1477 | - | 1974 |
2014 | 386 | 262 | 1890 | - | 2538 |
(this is just an excerpt)
Code
id | description | text |
---|---|---|
race | RACE | 24. What race do you consider yourself? |
racecen3 | RACECEN3 | 1602. What is your race? Indicate one or more races that you consider yourself to be. |
natrace | NATRACE | 68. We are faced with many problems in this country, none of some of these problems, and for each one I'd like you to tell me whether you think we're spending too much money on it, too little money, or about the right amount. h. Improving the conditions of Blacks. |
natracey | NATRACEY | 69. We are faced with many problems in this country, none of some of these problems, and for each one I'd like you to tell me whether you think we're spending too much money on it, too little money, or about the right amount. h. Assistance to blacks. |
intrace3 | INTRACE3 | What is your race? Indicate one or more races that you consider yourself to be. |
The data set we will use comes from year 2010 panel data.
Code
data("gss_panel10_long")
gss_panel10_long
# A tibble: 6,132 × 1,200
firstid wave oversamp sampcode sample form formwt vpsu vstrat
<fct> <dbl> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl> <dbl> <dbl+>
1 1 1 1 501 9 [2000 FP] 2 [ALTERNAT… 1 1 2240
2 1 2 1 501 9 [2000 FP] 2 [ALTERNAT… 1 1 2240
3 1 3 1 501 9 [2000 FP] 2 [ALTERNAT… 1 1 2240
4 2 1 1 501 9 [2000 FP] 1 [STANDARD… 1 1 2240
5 2 2 1 501 9 [2000 FP] 1 [STANDARD… 1 1 2240
6 2 3 1 501 9 [2000 FP] 1 [STANDARD… 1 1 2240
7 3 1 1 501 9 [2000 FP] 1 [STANDARD… 1 2 2240
8 3 2 1 501 9 [2000 FP] 1 [STANDARD… 1 2 2240
9 3 3 1 501 9 [2000 FP] 1 [STANDARD… 1 2 2240
10 4 1 1 501 9 [2000 FP] 2 [ALTERNAT… 1 2 2240
# ℹ 6,122 more rows
# ℹ 1,191 more variables: samptype <dbl+lbl>, wtpan12 <dbl+lbl>,
# wtpan123 <dbl+lbl>, wtpannr12 <dbl+lbl>, wtpannr123 <dbl+lbl>,
# id <dbl+lbl>, mar1 <dbl+lbl>, mar2 <dbl+lbl>, mar3 <dbl+lbl>,
# mar4 <dbl+lbl>, mar5 <dbl+lbl>, mar6 <dbl+lbl>, mar7 <dbl+lbl>,
# mar8 <dbl+lbl>, mar9 <dbl+lbl>, mar10 <dbl>, mar11 <dbl+lbl>, mar12 <dbl>,
# mar13 <dbl>, mar14 <dbl>, abany <dbl+lbl>, abdefect <dbl+lbl>, …
At the beginning, the panel was made of roughly 2000 respondents. And the respondents were interviewed in 2010, 2012 and 2014 (the three waves).
# A tibble: 1,304 × 4
firstid wave id sex
<fct> <dbl> <dbl+lbl> <dbl+lbl>
1 1 3 10001 1 [MALE]
2 2 3 10002 2 [FEMALE]
3 3 3 10003 2 [FEMALE]
4 6 3 10004 1 [MALE]
5 7 3 10005 2 [FEMALE]
6 9 3 10006 2 [FEMALE]
7 10 3 10007 2 [FEMALE]
8 11 3 10008 2 [FEMALE]
9 12 3 10009 1 [MALE]
10 13 3 10010 1 [MALE]
# ℹ 1,294 more rows
Some respondents left the survey. Attrition can be monitored within the next query.
The confidence topic
Do GSS surveys are made of a huge number of questions. Not all questions were asked to the respondents. Indeed, each question was asked to two thirds of the respondents. Questions were related to demographic features (Age, Sex, Level of education, Employment and so on). Answers to these questions can be used to determine whether the panel sample is representative. Data can be compared with last census data (there is a census every ten years in the US).
A number of questions are related to the confidence topic. Respondents were asked about the level of confidence they put in a wide array of institutions.
Code
id | description | text |
---|---|---|
confinan | CONFINAN | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? a. Banks and financial institutions. |
conbus | CONBUS | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? b. Major companies. |
conclerg | CONCLERG | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? c. Organized religion. |
coneduc | CONEDUC | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? d. Education. |
confed | CONFED | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? e. Executive branch of the federal government. |
conlabor | CONLABOR | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? f. Organized labor. |
conpress | CONPRESS | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? g. Press. |
conmedic | CONMEDIC | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? h. Medicine. |
contv | CONTV | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? i. Television. |
conjudge | CONJUDGE | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? j. U.S. Supreme Court. |
consci | CONSCI | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? k. Scientific community. |
conlegis | CONLEGIS | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? l. Congress |
conarmy | CONARMY | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? m. Military |
For institutions like Science or Medicine, or Executive branch of federal government, Respondents were asked whether they have: Great deal of confidence, Only some confidence, Hardly any confidence in the institution. The same Likert scale with 3 levels was used for all institutions.
From the gss_panel10_long
dataset, extract columns corresponding to questions from the confidence topic
We first gather the id
of the questions pertaining to the confidence topic.
Code
confidence_ids <- gss_panel_doc |>
filter(str_detect(text, "confidence")) |>
slice_sample(n=1, by=text) |>
pull(id)
Then we protect the panel data on the questions pertaining to the confidence topic. We also keep the respondent key, sex, and the relevant wave. s
Code
firstid | wave | id_1: id | sex_1: RESPONDENTS SEX | confinan_1: CONFID IN BANKS & FINANCIAL INSTITUTIONS | conbus_1: CONFIDENCE IN MAJOR COMPANIES | conclerg_1: CONFIDENCE IN ORGANIZED RELIGION | coneduc_1: CONFIDENCE IN EDUCATION | confed_1: CONFID. IN EXEC BRANCH OF FED GOVT | conlabor_1: CONFIDENCE IN ORGANIZED LABOR | conpress_1: CONFIDENCE IN PRESS | conmedic_1: CONFIDENCE IN MEDICINE | contv_1: CONFIDENCE IN TELEVISION | conjudge_1: CONFID. IN UNITED STATES SUPREME COURT | consci_1: CONFIDENCE IN SCIENTIFIC COMMUNITY | econsci | conlegis_1: CONFIDENCE IN CONGRESS | conarmy_1: CONFIDENCE IN MILITARY |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1225 | 1 | 1225 | 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1511 | 1 | 1511 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1153 | 1 | 1153 | 2 | 3 | 2 | 2 | 1 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | NA | 2 | 1 |
1585 | 1 | 1585 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | NA | 1 | 3 |
940 | 1 | 940 | 2 | 3 | 2 | 3 | 2 | 3 | 3 | 2 | 1 | 2 | 3 | 2 | NA | 3 | 2 |
110 | 1 | 110 | 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
559 | 1 | 559 | 2 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
1536 | 1 | 1536 | 1 | 3 | 2 | 2 | 2 | 2 | 2 | 3 | 2 | 3 | 2 | 2 | NA | 3 | 1 |
846 | 1 | 846 | 2 | 3 | 2 | 2 | 3 | 3 | 2 | 3 | 1 | 3 | 2 | 1 | 4 | 3 | 2 |
220 | 1 | 220 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
Let us go back to the documentation.
Code
id | description | text | properties_1 | marginals_1 |
---|---|---|---|---|
confinan | CONFINAN | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? a. Banks and financial institutions. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/265"), c("CONFINAN_1", "CONFINAN_1", "CONFINAN_1") | c(30.1, 55.2, 14.7, NA, NA, 100), c("395", "725", "193", "671", "16", "2,000"), c("1", "2", "3", "0", "8", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", "Total"), c("CONFINAN_1", "CONFINAN_1", "CONFINAN_1", "CONFINAN_1", "CONFINAN_1", "CONFINAN_1") |
conbus | CONBUS | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? b. Major companies. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/247"), c("CONBUS_1", "CONBUS_1", "CONBUS_1") | c(17.6, 62.3, 20.1, NA, NA, NA, 100), c("228", "806", "260", "671", "32", "3", "2,000"), c("1", "2", "3", "0", "8", "9", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", NA, "Total"), c("CONBUS_1", "CONBUS_1", "CONBUS_1", "CONBUS_1", "CONBUS_1", "CONBUS_1", "CONBUS_1") |
conclerg | CONCLERG | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? c. Organized religion. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/250"), c("CONCLERG_1", "CONCLERG_1", "CONCLERG_1") | c(25.1, 52.5, 22.4, NA, NA, NA, 100), c("324", "678", "289", "671", "35", "3", "2,000"), c("1", "2", "3", "0", "8", "9", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", NA, "Total"), c("CONCLERG_1", "CONCLERG_1", "CONCLERG_1", "CONCLERG_1", "CONCLERG_1", "CONCLERG_1", "CONCLERG_1") |
coneduc | CONEDUC | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? d. Education. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/259"), c("CONEDUC_1", "CONEDUC_1", "CONEDUC_1") | c(27.1, 56.5, 16.4, NA, NA, 100), c("359", "747", "217", "671", "6", "2,000"), c("1", "2", "3", "0", "8", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", "Total"), c("CONEDUC_1", "CONEDUC_1", "CONEDUC_1", "CONEDUC_1", "CONEDUC_1", "CONEDUC_1") |
confed | CONFED | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? e. Executive branch of the federal government. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/262"), c("CONFED_1", "CONFED_1", "CONFED_1") | c(13.8, 45.4, 40.8, NA, NA, NA, 100), c("179", "590", "531", "671", "26", "3", "2,000"), c("1", "2", "3", "0", "8", "9", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", NA, "Total"), c("CONFED_1", "CONFED_1", "CONFED_1", "CONFED_1", "CONFED_1", "CONFED_1", "CONFED_1") |
conlabor | CONLABOR | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? f. Organized labor. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/289"), c("CONLABOR_1", "CONLABOR_1", "CONLABOR_1") | c(12.3, 58.1, 29.6, NA, NA, 100), c("155", "731", "372", "671", "71", "2,000"), c("1", "2", "3", "0", "8", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", "Total"), c("CONLABOR_1", "CONLABOR_1", "CONLABOR_1", "CONLABOR_1", "CONLABOR_1", "CONLABOR_1") |
conpress | CONPRESS | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? g. Press. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/298"), c("CONPRESS_1", "CONPRESS_1", "CONPRESS_1") | c(10, 48.5, 41.5, NA, NA, NA, 100), c("132", "638", "546", "671", "12", "1", "2,000"), c("1", "2", "3", "0", "8", "9", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", NA, "Total"), c("CONPRESS_1", "CONPRESS_1", "CONPRESS_1", "CONPRESS_1", "CONPRESS_1", "CONPRESS_1", "CONPRESS_1") |
conmedic | CONMEDIC | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? h. Medicine. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/295"), c("CONMEDIC_1", "CONMEDIC_1", "CONMEDIC_1") | c(40.1, 49.3, 10.6, NA, NA, 100), c("530", "651", "140", "671", "8", "2,000"), c("1", "2", "3", "0", "8", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", "Total"), c("CONMEDIC_1", "CONMEDIC_1", "CONMEDIC_1", "CONMEDIC_1", "CONMEDIC_1", "CONMEDIC_1") |
contv | CONTV | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? i. Television. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/328"), c("CONTV_1", "CONTV_1", "CONTV_1") | c(9.4, 48.2, 42.3, NA, NA, NA, 100), c("124", "634", "556", "671", "14", "1", "2,000"), c("1", "2", "3", "0", "8", "9", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", NA, "Total"), c("CONTV_1", "CONTV_1", "CONTV_1", "CONTV_1", "CONTV_1", "CONTV_1", "CONTV_1") |
conjudge | CONJUDGE | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? j. U.S. Supreme Court. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/286"), c("CONJUDGE_1", "CONJUDGE_1", "CONJUDGE_1") | c(31.8, 50.3, 17.8, NA, NA, NA, 100), c("410", "649", "230", "671", "38", "2", "2,000"), c("1", "2", "3", "0", "8", "9", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", NA, "Total"), c("CONJUDGE_1", "CONJUDGE_1", "CONJUDGE_1", "CONJUDGE_1", "CONJUDGE_1", "CONJUDGE_1", "CONJUDGE_1") |
consci | CONSCI | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? k. Scientific community. | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/319"), c("CONSCI_1", "CONSCI_1", "CONSCI_1") | c(41.6, 51, 7.3, NA, NA, NA, 100), c("527", "646", "93", "671", "62", "1", "2,000"), c("1", "2", "3", "0", "8", "9", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", NA, "Total"), c("CONSCI_1", "CONSCI_1", "CONSCI_1", "CONSCI_1", "CONSCI_1", "CONSCI_1", "CONSCI_1") |
conlegis | CONLEGIS | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? l. Congress | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/292"), c("CONLEGIS_1", "CONLEGIS_1", "CONLEGIS_1") | c(10.8, 51.2, 37.9, NA, NA, NA, 100), c("141", "666", "493", "671", "28", "1", "2,000"), c("1", "2", "3", "0", "8", "9", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", NA, "Total"), c("CONLEGIS_1", "CONLEGIS_1", "CONLEGIS_1", "CONLEGIS_1", "CONLEGIS_1", "CONLEGIS_1", "CONLEGIS_1") |
conarmy | CONARMY | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? m. Military | c("Data type", "Missing-data codes", "Record/column"), c("numeric", "0,8,9", "1/244"), c("CONARMY_1", "CONARMY_1", "CONARMY_1") | c(46.8, 39.8, 13.4, NA, NA, 100), c("613", "521", "175", "671", "20", "2,000"), c("1", "2", "3", "0", "8", NA), c("A GREAT DEAL", "ONLY SOME", "HARDLY ANY", "IAP", "DK", "Total"), c("CONARMY_1", "CONARMY_1", "CONARMY_1", "CONARMY_1", "CONARMY_1", "CONARMY_1") |
Levels labels for answers to confidence questions are:
Code
New names:
New names:
• `id` -> `id...1`
• `id` -> `id...6`
Code
foo |> gt::gt()
id...1 | description | text | property | value...5 | id...6 | percent | n | value...9 | label | id...11 |
---|---|---|---|---|---|---|---|---|---|---|
confinan | CONFINAN | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? a. Banks and financial institutions. | Data type | numeric | CONFINAN_1 | 30.1 | 395 | 1 | A GREAT DEAL | CONFINAN_1 |
confinan | CONFINAN | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? a. Banks and financial institutions. | Data type | numeric | CONFINAN_1 | 55.2 | 725 | 2 | ONLY SOME | CONFINAN_1 |
confinan | CONFINAN | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? a. Banks and financial institutions. | Data type | numeric | CONFINAN_1 | 14.7 | 193 | 3 | HARDLY ANY | CONFINAN_1 |
confinan | CONFINAN | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? a. Banks and financial institutions. | Data type | numeric | CONFINAN_1 | NA | 671 | 0 | IAP | CONFINAN_1 |
confinan | CONFINAN | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? a. Banks and financial institutions. | Data type | numeric | CONFINAN_1 | NA | 16 | 8 | DK | CONFINAN_1 |
confinan | CONFINAN | far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? a. Banks and financial institutions. | Data type | numeric | CONFINAN_1 | 100.0 | 2,000 | NA | Total | CONFINAN_1 |
-
IAP
stands for respondent was not asked the question. -
DK
stands for Don’t Know
If respondents were not given the opportunity to provide a response to a question because it was not included in the survey they were administered, then their responses are coded as “IAP.”
Code
panel_doc <- gssrdoc::gss_panel_doc
Table wrangling
Before proceeding to Multiple Correspondence Analysis (CA), let us select the set of active variables.
Project gss_panel10_long
on firstid, wave, id, sex,
and columns with names in
confinan conbus conclerg coneduc confed conlabor conpress conmedic contv conjudge consci conlegis conarmy
Filter so as to keep only wave 1
.
Transform all relevant columns into factors.
Code
confidence_panel10 <- gss_panel10_long |>
filter(wave==1) |>
select(firstid, wave, id, sex, contains(confidence_ids)) |>
mutate(across(starts_with("con"), as_factor)) |>
mutate(sex=as_factor(sex))
Code
confidence_10 <- gss_panel_doc |>
filter(id %in% confidence_ids) |>
select(-ends_with("2"), -ends_with("3")) |>
unnest(
cols=c(properties_1), names_repair = "universal") |>
unnest(
cols=c(marginals_1), names_repair = "universal") |>
mutate(text=str_squish(text)) |>
mutate(text= str_match(text, "\\? [\\w]\\.(.*)")) |>
mutate(text=text[,2]) |>
filter(property == "Missing-data codes") |>
select(-starts_with("id"), -property, -`value...5`)
New names:
New names:
• `id` -> `id...1`
• `id` -> `id...6`
Elementary statistics
Use skimmr
to summarize your dataset.
Code
confidence_panel10 |>
skimr::skim(where(is.factor)) |>
select(-factor.ordered, -skim_type)|>
filter(str_starts(skim_variable,"con")) |>
gt::gt() |>
gt::fmt_number(decimals=3) |>
gt::tab_caption(
caption = "GSS Panel data 2010 First wave, Topic confidence"
)
skim_variable | n_missing | complete_rate | factor.n_unique | factor.top_counts |
---|---|---|---|---|
confinan | 680.000 | 0.667 | 3.000 | ONL: 656, HAR: 564, A G: 144, IAP: 0 |
conbus | 705.000 | 0.655 | 3.000 | ONL: 848, HAR: 317, A G: 174, IAP: 0 |
conclerg | 725.000 | 0.645 | 3.000 | ONL: 720, HAR: 328, A G: 271, IAP: 0 |
coneduc | 681.000 | 0.667 | 3.000 | ONL: 793, A G: 367, HAR: 203, IAP: 0 |
confed | 698.000 | 0.659 | 3.000 | ONL: 620, HAR: 499, A G: 227, IAP: 0 |
conlabor | 744.000 | 0.636 | 3.000 | ONL: 781, HAR: 373, A G: 146, IAP: 0 |
conpress | 689.000 | 0.663 | 3.000 | ONL: 621, HAR: 594, A G: 140, IAP: 0 |
conmedic | 681.000 | 0.667 | 3.000 | ONL: 647, A G: 558, HAR: 158, IAP: 0 |
contv | 681.000 | 0.667 | 3.000 | ONL: 677, HAR: 521, A G: 165, IAP: 0 |
conjudge | 709.000 | 0.653 | 3.000 | ONL: 706, A G: 400, HAR: 229, IAP: 0 |
consci | 738.000 | 0.639 | 3.000 | ONL: 683, A G: 544, HAR: 79, IAP: 0 |
conlegis | 697.000 | 0.659 | 3.000 | ONL: 635, HAR: 587, A G: 125, IAP: 0 |
conarmy | 688.000 | 0.663 | 3.000 | A G: 720, ONL: 507, HAR: 129, IAP: 0 |
There are a lot of missing data in your data set.
How are missing cells related?
Drop rows with missing data in the confidence questions.
What are the dimensions of your data set?
Code
confidence_data <- confidence_panel10 |>
mutate(across(starts_with("con"), .fns=\(x) !is.na(x), .names="not_{.col}_isna")) |>
rowwise() |>
filter(all(c_across(ends_with("isna")))) |>
ungroup() |>
select(-ends_with("isna"), -firstid, -wave, -econsci) |>
mutate(across(starts_with("con"), fct_drop), sex=as_factor(sex))
confidence_data |>
slice_sample(n=10) |>
gt::gt()
id_1: id | sex_1: RESPONDENTS SEX | confinan_1: CONFID IN BANKS & FINANCIAL INSTITUTIONS | conbus_1: CONFIDENCE IN MAJOR COMPANIES | conclerg_1: CONFIDENCE IN ORGANIZED RELIGION | coneduc_1: CONFIDENCE IN EDUCATION | confed_1: CONFID. IN EXEC BRANCH OF FED GOVT | conlabor_1: CONFIDENCE IN ORGANIZED LABOR | conpress_1: CONFIDENCE IN PRESS | conmedic_1: CONFIDENCE IN MEDICINE | contv_1: CONFIDENCE IN TELEVISION | conjudge_1: CONFID. IN UNITED STATES SUPREME COURT | consci_1: CONFIDENCE IN SCIENTIFIC COMMUNITY | conlegis_1: CONFIDENCE IN CONGRESS | conarmy_1: CONFIDENCE IN MILITARY |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1894 | FEMALE | ONLY SOME | ONLY SOME | A GREAT DEAL | A GREAT DEAL | ONLY SOME | ONLY SOME | HARDLY ANY | A GREAT DEAL | ONLY SOME | A GREAT DEAL | A GREAT DEAL | ONLY SOME | ONLY SOME |
1639 | FEMALE | ONLY SOME | ONLY SOME | ONLY SOME | A GREAT DEAL | ONLY SOME | ONLY SOME | ONLY SOME | A GREAT DEAL | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | A GREAT DEAL |
718 | MALE | HARDLY ANY | ONLY SOME | HARDLY ANY | ONLY SOME | HARDLY ANY | ONLY SOME | HARDLY ANY | A GREAT DEAL | ONLY SOME | HARDLY ANY | ONLY SOME | HARDLY ANY | HARDLY ANY |
1101 | FEMALE | ONLY SOME | ONLY SOME | HARDLY ANY | ONLY SOME | A GREAT DEAL | HARDLY ANY | A GREAT DEAL | ONLY SOME | A GREAT DEAL | A GREAT DEAL | ONLY SOME | ONLY SOME | A GREAT DEAL |
1120 | FEMALE | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | HARDLY ANY | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME |
1417 | FEMALE | ONLY SOME | ONLY SOME | HARDLY ANY | ONLY SOME | HARDLY ANY | ONLY SOME | HARDLY ANY | HARDLY ANY | HARDLY ANY | ONLY SOME | A GREAT DEAL | ONLY SOME | A GREAT DEAL |
1454 | MALE | HARDLY ANY | ONLY SOME | HARDLY ANY | ONLY SOME | HARDLY ANY | ONLY SOME | HARDLY ANY | A GREAT DEAL | HARDLY ANY | A GREAT DEAL | A GREAT DEAL | HARDLY ANY | A GREAT DEAL |
1598 | MALE | HARDLY ANY | ONLY SOME | ONLY SOME | A GREAT DEAL | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | ONLY SOME | HARDLY ANY | A GREAT DEAL |
62 | FEMALE | A GREAT DEAL | A GREAT DEAL | ONLY SOME | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL | ONLY SOME | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL |
1157 | MALE | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL | ONLY SOME | ONLY SOME | HARDLY ANY | A GREAT DEAL | ONLY SOME | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL | A GREAT DEAL |
In your explore possible associations between the answers to the different confidence questions?
How would you test possible independence between answer to confidence in science and confidence in the Army.
TODO
The case for using MCA
In order to construct a field of ideological and political attitudes, which will subsequently be used as a reference for locating the empirical typologies of response styles and survey compliance, we apply multiple correspondence analysis (MCA). MCA belongs to the family of techniques used in geometric data analysis (Le Roux and Rouanet 2004). It allows for the extraction of the most important dimensions in a set of categorical variables, and the graphical representation of variable categories and individuals relative to each other in a coordinate system. Distances between categories as well as individuals can be interpreted as a measure of (dis)similarity: If categories often co-appear in individual’s responses, they are located close together in the space produced by MCA. Rare co-appearances, accordingly, result in a larger distance between the respective categories. Furthermore, illustrative variables can passively be projected into the field, a technique that has been termed visual regression analysis (Lebart et al. 1984). Whereas the space is determined by the distances between the categories of active variables, passive variables do not alter the structure of the constructed field, but appear in their average and hence most likely position.
From https://doi.org/10.1007/s11135-016-0458-3
In this lab, we won’t look at the field of ideological and political attitudes, but rather at the field of confidence level in a variety of institutions.
Multiple Correspondance Analysis
The input of multiple correspondence analysis is a data frame \(X\) with \(n\) rows and \(p\) categorical columns. Multiple Correspondence Analysis starts by building the indicator matrix. The indicator matrix is built by one-hot encoding of each categorical variable.
A categorical variable \(V_j\) (factor) with \(q\) levels is mapped to \(q\) \(\{0,1\}\) -valued variables \(V_{j,r}\) for \(r \leq q\)
If levels are indexed by \(\{1, \ldots, q\}\), if the value of the categorical variable \(V_j\) from row \(i\) is \(k \in \{1, \ldots, q\}\), the bina$ \[k \mapsto \underbrace{0,\ldots, 0}_{k-1}, 1, \underbrace{0, \ldots, 0}_{q-k}\]
The indicator matrix has as many rows as the data matrix
The number of columns of the indicator matrix is the sum of the number of levels of the categorical variables/columns of the data matrix
The indicator matrix is a numerical matrix. It is suitable for factorial methodss
Recall \(X\) is the data matrix with \(n\) rows (individuals) and \(p\) categorical columns (variables)
For \(j \in \{1, \ldots, p\}\), let \(J_j\) denote the number of levels(categories) of variable \(j\)
Let \(q = \sum_{j\leq p} J_j\) be the sum of the number of levels throughout the variables
Let \(Z\) be the incidence matrix with \(n\) rows and \(q\) columns
For \(j\leq p\) and \(k \leq J_j\), let \(\langle j, k\rangle = \sum_{j'<j} J_{j'}+k\)
Let \(N = n \times p = \sum_{i\leq n} \sum_{j \leq p} X_{i,j}\) and \[P = \frac{1}{N} Z\]
(the correspondence matrix for MCA)
The row wise sums of correspondence matrix \(P\) are all equal to \(1/n=p/N\)
The column wise sum of the correspondence matrix \(P\) for the \(k\)th level of the \(j\)th variable of \(X\) ( \(j \leq p\) ) is \[N_{\langle j,k\rangle}/N = f_{\langle j,k\rangle}/p\]
where \(f_{\langle j,k\rangle}\) stands for the relative frequency of level \(k\) of the \(j\)th variable
\[D_r = \frac{1}{n}\text{Id}_n\qquad D_c =\text{diag}\left(\frac{f_{\langle j,k\rangle}}{p}\right)_{j \leq p, k\leq J_j}\]
In MCA, we compute the SVD \(U \times D \times V^\top\) of the standardized residuals matrix:
\[S = D_r^{-1/2}\times \left(P - r\times c^\top\right) \times D_c^{-1/2} = \sqrt{n}\left(P - r\times c^\top\right) \times D_c^{-1/2}\]
Coefficient \(i, \langle j, k\rangle\) of \(S\) is \[\frac{\mathbb{I}_{i, \langle j, k\rangle}- f_{\langle j,k\rangle}}{\sqrt{n f_{\langle j,k\rangle}/p}}\]
MCA consists in computing the SVD of the standardized residuals matrix \(S = U \times D \times V^\top\)
From the SVD, we get
- \(D_r^{-1/2} \times U\) standardized coordinates of rows
- \(D_c^{-1/2} \times V\) standardized coordinates of columns
- \(D_r^{-1/2} \times U \times D\) principal coordinates of rows
- \(D_c^{-1/2} \times V \times D\) principal coordinates of columns
- Squared singular values: the principal inertia
When calling svd(.)
, the argument should be \[D_r^{1/2}\times \left(D_r^{-1} \times P \times D_c^{-1}- \mathbf{I}\times \mathbf{I}^\top \right)\times D_c^{1/2}= D_r^{-1/2}\times \left( P - r \times c^\top \right)\times D_c^{-1/2}\]
As
\[D_r^{-1} \times P \times D_c^{-1} - \mathbf{I}\mathbf{I}^\top = (D_r^{-1/2} \times U)\times D \times (D_c^{-1/2}\times V)^\top\]
\[(D_r^{-1/2} \times U)\times D \times (D_c^{-1/2}\times V)^\top\]
is the extended SVD of
\[D_r^{-1} \times P \times D_c^{-1} - \mathbf{I}\mathbf{I}^\top\]
with respect to \(D_r\) and \(D_c\)
Perform MCA on the indicator matrix.
You may use FactoMineR::MCA()
. It is interesting to compute the correspondence analysis in your own way, by preparing the matrix that is handled to svd()
and returning a named list containing all relevant information.
We lazyly rely on FactoMineR::MCA
.
Code
res.mca <- confidence_data |>
select(starts_with("con")) |>
FactoMineR::MCA(graph=F)
The result is an object of S3
class MCA
.
s
If you did use FactoMineR::MCA()
, explain the organization of the result.
The result of FactoMineR::MCA(...)
is a named and nested list with five elements:
eig
- a matrix/array containing enough information to build a screeplot.
call
-
a list of 9, containing the call to
CA()
, an object of typelanguage
, telling (in principle) the user howCA()
was called. However, this is a quoted expression. Here we need to guess the value ofy
in the calling environment understand what’s going on.
Element call
also contains the table margin distributions marge.col
and marge.row
. The truncation rank ncp
(number of components) can be assigned before computing the SVD (default value is 5). Element \(X\) stores the contingency table that was effectively used for computing Correpondence Analysis.
ind
- Information gathered from SVD to facilitate individulals (respndents) analysis.
var
-
a list structured in the same way as element
ind
. Used for variables/categories profiles analysis svd
-
a list of 3, just as the resuld of
svd()
containing the singular values, the left and right singular vectors of matrix \(...\)
print(res.mca)
Name | Description | |
---|---|---|
1 | $eig |
“eigenvalues” |
2 | $var |
“results for the variables (categories)” |
3 | $var$coord |
“coord. of the categories” |
4 | $var$cos2 |
“cos2 for the categories” |
5 | $var$contrib |
“contributions of the categories” |
6 | $var$v.test |
“v-test for the categories” |
7 | $ind |
“results for the individuals” |
8 | $ind$coord |
“coord. for the individuals” |
9 | $ind$cos2 |
“cos2 for the individuals” |
10 | $ind$contrib |
“contributions of the individuals” |
11 | $quali.sup |
“results for the supplementary categorical variables” |
12 | $quali.sup$coord |
“coord. for the supplementary categories” |
13 | $quali.sup$cos2 |
“cos2 for the supplementary categories” |
14 | $quali.sup$v.test |
“v-test for the supplementary categories” |
15 | $call |
“intermediate results” |
16 | $call$marge.col |
“weights of columns” |
17 | $call$marge.li |
“weights of rows” |
res.mca$svd
is not part of the output!
eig
is computed from the singular values inres.mca$svd
var
contains material for plotting information about categories and variables on factorial planesind
conatins material for plotting information about individuals on on factorial planes
In principle, all relevant information can be gathered from components svd
, call.marge.li
, and call.marge.col
.
Screeplots
Draw screeplots. Why are they useful? Comment briefly.
We first define a tidying function for preparing the screeplot. It is designed along the lines of method tidy
for objects of class prcomp
. It consists in building a data frame from component eig
of the MCA
object.
Code
tidy_mca_scree <- function(res_mca){
res_mca$eig |>
as_tibble(rownames="Dim") |>
rename(
`% Inertia` = `percentage of variance`,
`Cumulative % Inertia` = `cumulative percentage of variance`
) |>
mutate(
PC = as.integer(str_match(Dim, 'dim ([0-9]*)')[,2])
)
}
We can now build a column plot.
Code
Note that the share of cumulated inertia doesn’t grow very fast with the number of components. Situation is different from what we met when performing Correspondence Analysis or Principal Component Analysis. This tells us that we will hqve to look beyond the first two axis. This is commonplace with multiple correspondence analysis. Look at the reference below where most of the interesting information can be found on the 3rd and 4th axis.
Individuals
Perform Individual profiles analysis.
What are the classical plots? How can you build them from the output of FactoMiner::MCA
?
Build the table of row contributions (the so-called \(\cos^2\))
Attribute ind
of objects of class MCA
(exported from FactoMineR
) is the starting point of any row profiles analysis.
We design an augment method. For objects of class MCA
. This is like what is done for objects of class prcomp
and different from what we did with objects of class CA
. Indeed, MCA
on the indicator matrix handles a sample just like PCA
and not a contingency table like CA
.
Attribute ind
is a named list made of \(4\) components.
coord
-
a matrix with named rows and columns. The number of rows of
coord
matches the number of rows of the table fed toMCA
(here, the number of respondents in the panel). The number of columns matches the rank of the truncated SVD that underlies Multiple Correspondance Analysis. Here it is the default truncation \(5\).
The principal coordinates are the principal coordinates of each individual in terms of the principal component.
The columns of coord
are pairwise orthogonal in the usual inner product space.
Function augment_mca
Bind the columns of coord
with the columns of the original dataset.
Code
res.mca |>
#| label: augment_mca_glimpse
augment_mca(confidence_data) |>
glimpse()
Rows: 1,183
Columns: 20
$ `Dim 1` <dbl> 1.35465733, -0.10861408, -0.33683136, 0.72450001, 0.72525336,…
$ `Dim 2` <dbl> 0.41190167, -0.90245225, -0.45183706, 0.25425970, 0.10225521,…
$ `Dim 3` <dbl> 0.161371575, 0.010268886, 0.107542556, -0.009273224, -0.38139…
$ `Dim 4` <dbl> 0.141129674, 0.310980059, -0.097880877, -0.085996832, 0.36106…
$ `Dim 5` <dbl> 0.227285260, 0.086562193, 0.016680275, -0.115998402, -0.06681…
$ id <dbl+lbl> 4, 6, 7, 9, 11, 15, 19, 20, 21, 24, 26, 28, 31, 41, 4…
$ sex <dbl+lbl> 2, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 1, …
$ confinan <fct> A GREAT DEAL, ONLY SOME, ONLY SOME, HARDLY ANY, ONLY SOME, ON…
$ conbus <fct> ONLY SOME, ONLY SOME, ONLY SOME, ONLY SOME, ONLY SOME, ONLY S…
$ conclerg <fct> A GREAT DEAL, ONLY SOME, ONLY SOME, A GREAT DEAL, A GREAT DEA…
$ coneduc <fct> A GREAT DEAL, ONLY SOME, ONLY SOME, ONLY SOME, A GREAT DEAL, …
$ confed <fct> A GREAT DEAL, ONLY SOME, ONLY SOME, A GREAT DEAL, A GREAT DEA…
$ conlabor <fct> A GREAT DEAL, ONLY SOME, HARDLY ANY, ONLY SOME, HARDLY ANY, O…
$ conpress <fct> A GREAT DEAL, ONLY SOME, ONLY SOME, HARDLY ANY, ONLY SOME, HA…
$ conmedic <fct> A GREAT DEAL, ONLY SOME, HARDLY ANY, A GREAT DEAL, ONLY SOME,…
$ contv <fct> ONLY SOME, ONLY SOME, HARDLY ANY, ONLY SOME, ONLY SOME, ONLY …
$ conjudge <fct> A GREAT DEAL, ONLY SOME, ONLY SOME, A GREAT DEAL, A GREAT DEA…
$ consci <fct> A GREAT DEAL, ONLY SOME, ONLY SOME, A GREAT DEAL, ONLY SOME, …
$ conlegis <fct> ONLY SOME, ONLY SOME, ONLY SOME, A GREAT DEAL, A GREAT DEAL, …
$ conarmy <fct> A GREAT DEAL, ONLY SOME, ONLY SOME, A GREAT DEAL, A GREAT DEA…
Note that, we can recover ind$coord
from the left singular vectors and the singular values.
The output of augment_mca
is the basis of the individual plots. Individual plots are defined by picking a pair of principal axes. Him use the 3rd and the 4th axis, and we map the sex
column from the original dataset to the fill
aesthetics.
Code
res.mca |>
augment_mca(confidence_data) |>
ggplot() +
aes(x=`Dim 3`, y=`Dim 4`, fill=sex) +
geom_point(alpha=.2, size=.2) +
coord_fixed() +
scale_fill_viridis_d()
cos2
-
Coefficients of matrix
cos2
are the share of row inertia from the corresponding cell incoord
Code
res.mca |>
augment_mca(confidence_data) |>
ggplot() +
aes(x=`Dim 1`, y=`Dim 2`, fill=sex) +
geom_point(alpha=.2, size=.2) +
coord_fixed() +
scale_fill_viridis_d()
contrib
Not too surprisingly, coord
, contrib
, and cos2
share the same row names and column names.
Variables/Categories
Perform column profiles (categories) analysis. You may use factoextra::fviz_mca_var()
The var
component. Of an MCA object is related with the right singular vectors of the residual matrix.
Code
p <- res.mca |>
factoextra::fviz_mca_var() +
coord_fixed()
p +
facet_zoom(
xy=`Dim 1`>=0 & `Dim 2`>=0,
xlim=c(0,2),ylim=c(0,2))
Following what has been done for objects of class prcomp
, we design a tidy_mca_var
function that turns the var
component into a dataframe. Row names which correspond to categories, are turned into a column we call levels
.
Code
tidy_mca_var <- function(res_mca){
res_mca$var$coord |>
as_tibble(rownames="levels")
}
A value in column levels
indicate a variable., that is a topic (science, religion, army, …) and a confidence level (A value on a Likert scale). We enrich the data frame by extracting the variable (topic
) and the value (sentiment
).
We are then ready to plot an analog of the correlation circle.
Code
tidied_mca_var <- tidy_mca_var(res.mca) |>
mutate(sentiment=str_split(as.character(levels),'_')) |>
mutate(topic=map_chr(sentiment, \(x)x[1]), sentiment=map_chr(sentiment, \(x)x[2]))
p <- tidied_mca_var |>
ggplot() +
aes(x=`Dim 1`, y=`Dim 2`, label=levels, fill=topic, shape=sentiment) +
geom_point(alpha=.5, show.legend = F) +
coord_fixed() +
scale_color_viridis_d() +
theme(legend.position="none") +
labs(
title = "MCA: Variables categories",
subtitle = "Color: topic, Shape: confidence level",
caption = "GSS data, panel 2010, wave 1"
)
p |>
plotly::ggplotly()
Resorting to plotly
allows us to mitigate the overplotting effect.
The plot on the first two axes seems easy to interpret. As we move along the first axis from the left, we see growing confidence.
On the positive part of the second axis, we have extreme confidence levels, either high or low. This suggests that individuals have trends: either we trust all institutions; or none of them; or finally a little bit all of them.
How could we comfort such a narrative?
What is the v.test
component of the var
component of an MCA
object?
Here is a chatbot answer:
In Multiple Correspondence Analysis (MCA), the v-test (also known as the test-value or t-test value) is a statistical measure used to assess the significance of the association between a category and a dimension (factor). It helps determine which categories contribute the most to the definition of a given dimension.
Definition and Calculation
The v-test is computed as:
\[v = \frac{p - P}{\sqrt{P(1 - P)/n}}\]
where: - \(p\) = proportion of individuals in a given category for the specific dimension, - \(P\) = overall proportion of individuals in that category across all dimensions, - \(n\) = total number of individuals.
Alternatively, the v-test can also be expressed as:
\[v = \frac{\text{category coordinate}}{\text{standard deviation of the coordinate}}\]
This essentially provides a z-score, which follows a standard normal distribution under the null hypothesis.
Interpretation
- A high absolute v-test value (typically above 2 or below -2) indicates that the category is strongly associated with the given dimension.
- Positive v-test: The category is overrepresented in that dimension.
- Negative v-test: The category is underrepresented in that dimension.
- If the absolute value of the v-test is below 2, the association is not statistically significant (at a 5% significance level).
Why is it useful?
- It helps interpret MCA results quantitatively rather than relying only on visual inspection.
- It aids in identifying key categories that structure the data on a given axis.
- It provides statistical validation for category contributions to dimensions.
Would you like a practical example or interpretation of a specific MCA result?
Symmetric plots
Build the symmetric plots (biplots) for multiple correspondence analysis.
From the shelf
We resort to factoextra
. In principle, the biplot can be constructed from the coord
matrices in the var
and the ind
components of the MCA
object.
Code
p <- res.mca |>
factoextra::fviz_mca_biplot(
label= "var",
alpha.ind=.5
)
p
Code
# p |> plotly::ggplotly()
Mosaicplots
MCA can be complemented by Mosaicplots, Double Decker plots, Chi-square tests, and Correspondence analyses between pair of variables.
Draw a mosaic plot to visualize association between confidence levesl in Science and confidence level in Medicine.
Code
confidence_data |>
select(consci, conmedic) |>
table() |>
mosaicplot(color=T)
Code
confidence_data |>
select(consci, conmedic) |>
table() |>
chisq.test() |>
broom::glance() |>
gt::gt() |>
gt::tab_caption("Chi2 independence test for Confidence in Science and Confidence in Medicine")
statistic | p.value | parameter | method |
---|---|---|---|
137.8878 | 7.994493e-29 | 4 | Pearson's Chi-squared test |
Code
confidence_data |>
select(consci, conclerg) |>
table() |>
chisq.test() |>
broom::glance() |>
gt::gt() |>
gt::tab_caption("Chi2 independence test for Confidence in Science and Confidence in Clergy")
statistic | p.value | parameter | method |
---|---|---|---|
39.27785 | 6.103953e-08 | 4 | Pearson's Chi-squared test |
Code
t_sex_clerg_sci <- confidence_data |>
select(sex, conclerg, consci) |>
table()
t_sex_clerg_sci |>
vcd::doubledecker( )
Code
t_sex_clerg_sci |>
mosaicplot(color=T, main="Confidence in Clergy and Science")
Further references
Barth, Alice and Schmitz, Andreas. 2018. Response quality and ideological dispositions: an integrative approach using geometric and classifying techniques. Quality & Quantity
When analyzing survey data, response quality has consequential implications for substantial conclusions. Differences in response quality are usually explained by personality, or socio-demographic or cognitive characteristics. Little, however, is known about how respondents’ political attitudes, values, and opinions impact on quality aspects. This is a striking analytical omission, as potential associations between political values and various forms of response biases and artefacts call into question surveys’ ability to represent ‘public opinion’. In this contribution, response quality is traced back to respondents’ political and ideological dispositions. For this purpose, a relational understanding of response quality is applied that takes into account different aspects of response behaviors, as well as the interrelations between these indicators. Using data from the US General Social Survey (2010–2014), an empirical typology of response quality is created via finite mixture analysis. The resulting classes are then related to positions in the US field of ideological dispositions constructed via multiple correspondence analysis. The analyses reveal that there are (1) different combinations of response patterns and thus different empirical response types, and (2) that these types of response quality systematically vary with regard to the respondents’ political and ideological (dis)positions. Implications of the findings for public opinion surveys are discussed.