Title: | Interactively Measure Occupations in Interviews and Beyond |
---|---|
Description: | Perform interactive occupation coding during interviews as described in Peycheva, D., Sakshaug, J., Calderwood, L. (2021) <doi:10.2478/jos-2021-0042> and Schierholz, M., Gensicke, M., Tschersich, N., Kreuter, F. (2018) <doi:10.1111/rssa.12297>. Generate suggestions for occupational categories based on free text input, with pre-trained machine learning models in German and a ready-to-use shiny application provided for quick and easy data collection. |
Authors: | Jan Simson [aut, cre] |
Maintainer: | Jan Simson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.3 |
Built: | 2025-02-12 04:27:28 UTC |
Source: | https://github.com/occupationmeasurement/occupationmeasurement |
The Algorithm used here corresponds to Algorithm #10 in (Schierholz, 2019). Note: This function should not be used directly, but rather as a step / algorithm in get_job_suggestions.
algo_similarity_based_reasoning( text_processed, sim_name = "wordwise", probabilities = occupationMeasurement::pretrained_models$similarity_based_reasoning, ... )
algo_similarity_based_reasoning( text_processed, sim_name = "wordwise", probabilities = occupationMeasurement::pretrained_models$similarity_based_reasoning, ... )
text_processed |
The processed user input. Will be provided by get_job_suggestions. |
sim_name |
Which similarity measure to use. Possible values are "wordwise" or "substring". |
probabilities |
Trained probabilities to be used, defaults to the one bundled with the package. See pretrained_models. This pretrained model always predicts a 5-digit code from the 2010 German Classification of Occupations, with some exceptions: -0004 stands for 'Not precise enough/uncodable', -0006 stands for 'Multiple Jobs', -0012 stands for 'Blue-collar workers', -0019 stands for 'Volunteer/Social Service', and -0030 stands for 'Student assistant'. |
... |
Additional arguments may be passed from |
A data.table with suggestions or NULL if no suggestions were found.
Schierholz, M. (2019). New Methods for Job and Occupation Classification (Ph.D. Thesis). University of Mannheim.
## Not run: # Use with default settings if (interactive()) { get_job_suggestions( "Arzt", steps = list( simbased_default = list( algorithm = algo_similarity_based_reasoning ) ) ) } # Use with substring similarity if (interactive()) { get_job_suggestions( "Arzt", steps = list( simbased_substring = list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "substring" ) ) ) ) } # Comparison of algo_similarity_based_reasoning() with get_job_suggestions() # Example of using algo_similarity_based_reasoning() directly. Not recommended. if (interactive()) { algo_similarity_based_reasoning( preprocess_string("Arzt"), sim_name = "wordwise" )[order(score, decreasing = TRUE)] } # Same output as before, but the function is more adaptable. if (interactive()) { get_job_suggestions( "Arzt", suggestion_type = "kldb-2010", num_suggestions = 1500, steps = list( simbased_default = list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "wordwise" ) ) ) )[, list(kldb_id, score, sim_name, kldb_id_title = title)] } ## End(Not run)
## Not run: # Use with default settings if (interactive()) { get_job_suggestions( "Arzt", steps = list( simbased_default = list( algorithm = algo_similarity_based_reasoning ) ) ) } # Use with substring similarity if (interactive()) { get_job_suggestions( "Arzt", steps = list( simbased_substring = list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "substring" ) ) ) ) } # Comparison of algo_similarity_based_reasoning() with get_job_suggestions() # Example of using algo_similarity_based_reasoning() directly. Not recommended. if (interactive()) { algo_similarity_based_reasoning( preprocess_string("Arzt"), sim_name = "wordwise" )[order(score, decreasing = TRUE)] } # Same output as before, but the function is more adaptable. if (interactive()) { get_job_suggestions( "Arzt", suggestion_type = "kldb-2010", num_suggestions = 1500, steps = list( simbased_default = list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "wordwise" ) ) ) )[, list(kldb_id, score, sim_name, kldb_id_title = title)] } ## End(Not run)
Start the occupation coding API.
api( start = TRUE, log_to_file = FALSE, file = system.file("plumber", "api", "plumber.R", package = "occupationMeasurement"), log_to_console = TRUE, log_filepath = file.path("output", "log_api.csv"), require_identifier = FALSE, allow_origin = NULL )
api( start = TRUE, log_to_file = FALSE, file = system.file("plumber", "api", "plumber.R", package = "occupationMeasurement"), log_to_console = TRUE, log_filepath = file.path("output", "log_api.csv"), require_identifier = FALSE, allow_origin = NULL )
start |
Whether to immediately start the api. (Defaults to TRUE) |
log_to_file |
Whether to requests should be logged in a file. Defaults to FALSE. Note: The file format used here is a CSV file for easier analysis. |
file |
Path to the |
log_to_console |
Whether to requests should be logged in the console. Defaults to TRUE. |
log_filepath |
The path to a CSV file in which to save the structured logs. |
require_identifier |
Whether an identifier has to be added to api requests in order to match / identify requests afterwards. Defaults to FALSE. |
allow_origin |
Domain from which to allow cross origin requests (CORS). If the API is running on a different domain / server than the application using it, the website's root has to be provided here e.g. "https://occupationMeasurement.github.io". For more information see the plumber security page, and MDN. Defaults to NULL to not set any header at all. |
A Plumber router
vignette("api")
if (interactive()) { # Get the plumber router router <- api( start = FALSE, # If this is TRUE, the log directory will immediately be created log_to_file = FALSE ) # Start the router plumber::pr_run(router) } if (interactive()) { # Immediately start the API api(start = TRUE) }
if (interactive()) { # Get the plumber router router <- api( start = FALSE, # If this is TRUE, the log directory will immediately be created log_to_file = FALSE ) # Start the router plumber::pr_run(router) } if (interactive()) { # Immediately start the API api(start = TRUE) }
Printing the returned instance or returning it without saving it in a variable will start the app.
app( questionnaire = questionnaire_web_survey(), app_settings = create_app_settings(save_to_file = TRUE), css_file = NULL, resource_dir = system.file("www", package = "occupationMeasurement"), ... )
app( questionnaire = questionnaire_web_survey(), app_settings = create_app_settings(save_to_file = TRUE), css_file = NULL, resource_dir = system.file("www", package = "occupationMeasurement"), ... )
questionnaire |
The questionnaire to load. (Defaults to the questionnaire returned by questionnaire_web_survey().) |
app_settings |
The app_settings to use. Check the documentation for create_app_settings to learn about the options. |
css_file |
Path to a CSS file to be included in the app. |
resource_dir |
From which directory to static files e.g. styles. If you want to load additional resources from outside the package, you should rather do so with shiny::addResourcePath rather than with this parameter. |
... |
Any additional parameters will be forwarded to shiny::shinyApp(). |
A shiny app instance.
vignette("app")
, questionnaire_web_survey()
## Not run: app_instance <- app( app_settings = create_app_settings( # Important to save results from the app save_to_file = TRUE ) ) # Start the app if (interactive()) { app_instance } ## End(Not run)
## Not run: app_instance <- app( app_settings = create_app_settings( # Important to save results from the app save_to_file = TRUE ) ) # Start the app if (interactive()) { app_instance } ## End(Not run)
Berufs-Hilfsklassifikation mit Tätigkeitsbeschreibungen.
auxco
auxco
A list with data.tables:
categories
data.table. Main list of AuxCO categories including their descriptions etc.
distinctions
data.table. List of highly similar AuxCO categories that one may want to present to disambiguate between them.
followup_questions
data.table. Follow-up questions to specify final codings based on AuxCO categories. Includes the questions' answer options as well as information on how to encode more complex occupations which depend on multiple answers.
mapping_from_isco
data.table. Mapping from ISCO-08 categories to AuxCO categories.
mapping_from_kldb
data.table. Mapping from KldB 2010 categories to AuxCO categories.
Schierholz, Malte; Brenner, Lorraine; Cohausz, Lea; Damminger, Lisa; Fast, Lisa; Hörig, Ann-Kathrin; Huber, Anna-Lena; Ludwig, Theresa; Petry, Annabell; Tschischka, Laura (2018): Vorstellung einer Hilfsklassifikation mit Tätigkeitsbeschreibungen für Zwecke der Berufskodierung. (IAB-Discussion Paper, 2018), Nürnberg, 45 S. https://www.iab.de/183/section.aspx/Publikation/k180509301
https://github.com/occupationMeasurement/auxiliary-classification, load_auxco()
Buttons to navigate between pages.
button_next(label = "Weiter") button_previous(label = "Zurück")
button_next(label = "Weiter") button_previous(label = "Zurück")
label |
What label the button should have. |
shiny Action Button
button_previous()
: Go to the previous page
## Not run: very_simple_page <- new_page( page_id = "example", render = function(session, run_before_output, input, output, ...) { list( shiny::tags$h1("My test page"), button_previous(), button_next() ) } ) ## End(Not run)
## Not run: very_simple_page <- new_page( page_id = "example", render = function(session, run_before_output, input, output, ...) { list( shiny::tags$h1("My test page"), button_previous(), button_next() ) } ) ## End(Not run)
This is the primary and most convenient way of configuring the app.
create_app_settings( save_to_file, suggestion_type = "auxco-1.2.x", default_num_suggestions = 5, require_respondent_id = FALSE, warn_before_leaving = FALSE, skip_followup_types = c(), response_output_dir = file.path("output", "responses"), handle_data = NULL, get_job_suggestion_params = NULL, display_page_ids = TRUE, default_tense = "present", default_extra_instructions = "on", verbose = TRUE, .validate = TRUE )
create_app_settings( save_to_file, suggestion_type = "auxco-1.2.x", default_num_suggestions = 5, require_respondent_id = FALSE, warn_before_leaving = FALSE, skip_followup_types = c(), response_output_dir = file.path("output", "responses"), handle_data = NULL, get_job_suggestion_params = NULL, display_page_ids = TRUE, default_tense = "present", default_extra_instructions = "on", verbose = TRUE, .validate = TRUE )
save_to_file |
Should responses be saved as files in response_output_dir? Defaults to use the SAVE_TO_FILE environment variable. We recommend setting this to TRUE. |
suggestion_type |
Which type of suggestion to use / provide. Possible options are "auxco-1.2.x" and "kldb-2010". |
default_num_suggestions |
The number of suggestions to generate and display to users. Accepts all positive integers. Defaults to 5. |
require_respondent_id |
Are respondent_ids required? Defaults to FALSE |
warn_before_leaving |
Should users be warned that their progress will be lost upon leaving the site? Defaults to FALSE. |
skip_followup_types |
A vector of strings corresponding to the question_type of followup_question that should be skipped. Allowed ' values: c("anforderungsniveau", "aufsicht", "spezialisierung", "sonstige") |
response_output_dir |
Path to the directory in which to store data
from the app. Defaults to |
handle_data |
Callback function to handle data from the app. This setting takes a function that get's passed 3 parameters: table_name (A reference name indicating which data to save), data (A dataframe of data to save), session (the user's current session). |
get_job_suggestion_params |
List of parameters to pass to
get_job_suggestion. Refer to |
display_page_ids |
Whether |
default_tense |
We may not always want to ask for the current occupation, but maybe also for the previous occupation in case of pensioners etc. with a value of "past". Possible values are "present" (default), "past". This setting can be overwritten on a session-by-session basis with the URL-Query parameter "tense". |
default_extra_instructions |
Display additional instructions for e.g. an interviewer conducting an interview. Possible values are "on" (default), "off". This setting can be overwritten on a session-by-session basis with the URL-Query parameter "extra_instructions". |
verbose |
Should additional output be printed when running? Defaults to TRUE. |
.validate |
Whether the created app_settings should be validated. Defaults to TRUE. |
A list of app_settings.
app_settings <- create_app_settings( # Important to save results from the app save_to_file = TRUE, require_respondent_id = TRUE )
app_settings <- create_app_settings( # Important to save results from the app save_to_file = TRUE, require_respondent_id = TRUE )
The final occupation code will depend on the suggestion_id
and,
possibly, on followup_answers
, depending on the suggestion_id
provided. See
occupationMeasurement::auxco$followup_questions
for a list of suggestion_ids (=auxco_id)
and their respective recommended follow-up questions.
get_final_codes( suggestion_id, followup_answers = list(), standardized_answer_levels = NULL, approximate_standardized_answer_levels = TRUE, code_type = c("isco_08", "kldb_10"), verbose = TRUE, suggestion_type = "auxco-1.2.x", suggestion_type_options = list() )
get_final_codes( suggestion_id, followup_answers = list(), standardized_answer_levels = NULL, approximate_standardized_answer_levels = TRUE, code_type = c("isco_08", "kldb_10"), verbose = TRUE, suggestion_type = "auxco-1.2.x", suggestion_type_options = list() )
suggestion_id |
Id of the suggestion |
followup_answers |
A named list of the question_ids with their respective answers to the followup_questions. Question ids correspond to list names, answers correspond to list values. |
standardized_answer_levels |
A named list of standardized isco
answer levels. Names in the list correspond to the type of isco standard,
values correspond to the level itself.
Possible standardized answer types are: "isco_skill_level" and
"isco_supervisor_manager".
These can be used instead of some followup questions if
the information is available already from a different source.
Please note that standardized answer levels are not available for all
question types. For a list of options please take a look at the
followup questions included in the auxco for example via
|
approximate_standardized_answer_levels |
(default TRUE) Follow up questions were designed to provide answer options that are not in conflict with suggestion_id. standardized_answer_levels can be in conflict with suggestion_id, and then no exact matches exist. With approximation, the answer option that is closest to the standardized_answer_levels provided, will be used. |
code_type |
Which type of codes should be returned. Multiple codes can be returned at the same time. Supported types of codes are "isco_08" and "kldb_10". Defaults to "isco_08" and "kldb_10". |
verbose |
(default TRUE) whether to return a |
suggestion_type |
Which suggestion type is being used. Only auxco-based suggestion_types are supported. |
suggestion_type_options |
A list with options for generating
suggestions. Supported options:
- |
The interview situation may not allow to ask these follow-up questions. Some default, but
suboptimal occupation code is returned if followup_answers
is missing.
If followup_answers
is missing or incomplete, one may wish to insert/infer the missing information
by using standardized_answer_levels
.
A named list corresponding to the code_type(s) specified. Includes a message
if verbose = TRUE
## Not run: get_final_codes( # Führungsaufgaben mit Personalverantwortung bei der Lebensmittelherstellung "9076", followup_answers = list( # The first answer option in the first followup question "Q9076_1" = 2 ) ) # The same, but using standardized answer levels get_final_codes( # Führungsaufgaben mit Personalverantwortung bei der Lebensmittelherstellung "9076", standardized_answer_levels = list( # A response corresponding to the standard ISCO Level "supervisor" "isco_supervisor_manager" = "isco_supervisor" ) ) # Same example with approximate matching, due to conflicting information: # External data suggest the person is not a supervisor, but the person still # says she does supervisory tasks (Führungsaufgaben, as encoded in "9076"). # If approximate_standardized_answer_levels = TRUE (the default), the # selected answer "9076" trumps the external data and we will code this # person as a supervisor. get_final_codes( # Führungsaufgaben mit Personalverantwortung bei der Lebensmittelherstellung "9076", standardized_answer_levels = list( # A response corresponding to the standard ISCO Level "not manager nor supervisor" "isco_supervisor_manager" = "isco_not_supervising" ) ) ## End(Not run)
## Not run: get_final_codes( # Führungsaufgaben mit Personalverantwortung bei der Lebensmittelherstellung "9076", followup_answers = list( # The first answer option in the first followup question "Q9076_1" = 2 ) ) # The same, but using standardized answer levels get_final_codes( # Führungsaufgaben mit Personalverantwortung bei der Lebensmittelherstellung "9076", standardized_answer_levels = list( # A response corresponding to the standard ISCO Level "supervisor" "isco_supervisor_manager" = "isco_supervisor" ) ) # Same example with approximate matching, due to conflicting information: # External data suggest the person is not a supervisor, but the person still # says she does supervisory tasks (Führungsaufgaben, as encoded in "9076"). # If approximate_standardized_answer_levels = TRUE (the default), the # selected answer "9076" trumps the external data and we will code this # person as a supervisor. get_final_codes( # Führungsaufgaben mit Personalverantwortung bei der Lebensmittelherstellung "9076", standardized_answer_levels = list( # A response corresponding to the standard ISCO Level "not manager nor supervisor" "isco_supervisor_manager" = "isco_not_supervising" ) ) ## End(Not run)
Get potential follow-up questions for a suggestion.
get_followup_questions( suggestion_id, tense = "present", suggestion_type = "auxco-1.2.x", suggestion_type_options = list(), include_answer_codes = FALSE )
get_followup_questions( suggestion_id, tense = "present", suggestion_type = "auxco-1.2.x", suggestion_type_options = list(), include_answer_codes = FALSE )
suggestion_id |
Id of the suggestion |
tense |
Which tense i.e. time to use for questions & answers, this can be "present" or "past". Defaults to "present". |
suggestion_type |
Which suggestion type is being used. Only auxco-based suggestion_types are supported. |
suggestion_type_options |
A list with options for generating
suggestions. Supported options:
- |
include_answer_codes |
Whether answer options should contain
information on the associated codes. Defaults to FALSE.
(Only for internal use, use |
List of followup questions and their answer options.
## Not run: # Get followup questions for "Post- und Zustelldienste" get_followup_questions("1004") ## End(Not run)
## Not run: # Get followup questions for "Post- und Zustelldienste" get_followup_questions("1004") ## End(Not run)
Each page in the questionnaire can have multiple items on it.
get_item_data( session, page_id, item_id = NULL, key = c("all", "question_text", "response_text", "response_id"), default = NULL )
get_item_data( session, page_id, item_id = NULL, key = c("all", "question_text", "response_text", "response_id"), default = NULL )
session |
The shiny session |
page_id |
The page for which to retrieve data. |
item_id |
The item for which to retrieve data.
This has to be different for different items on the same page.
Since most pages contain only a single question/item, |
key |
The key for which to retrieve a value. (Optional) If no key is provided, the items's whole data will be returned. |
default |
A default value to return if the key or page is not present in the questionnaire data. |
The items's data.
## Not run: # Set up a "fake" shiny session to store data session <- shiny::MockShinySession$new() session$userData <- list( current_page_id = "other_page", questionnaire_data = list( example_page = list() ) ) # This code is expected to be run in e.g. run_before or run_after # It doesn't really make sense to run this code outside set_item_data( session = session, page_id = "example_page", question_text = "How are you?" ) # This code is expected to be run in e.g. run_before get_item_data( session = session, page_id = "example_page" ) ## End(Not run)
## Not run: # Set up a "fake" shiny session to store data session <- shiny::MockShinySession$new() session$userData <- list( current_page_id = "other_page", questionnaire_data = list( example_page = list() ) ) # This code is expected to be run in e.g. run_before or run_after # It doesn't really make sense to run this code outside set_item_data( session = session, page_id = "example_page", question_text = "How are you?" ) # This code is expected to be run in e.g. run_before get_item_data( session = session, page_id = "example_page" ) ## End(Not run)
Given a text
input, find up to num_suggestions
possible occupation categories.
get_job_suggestions( text, suggestion_type = "auxco-1.2.x", num_suggestions = 5, suggestion_type_options = list(), aggregate_score_threshold = 0.02, item_score_threshold = 0, distinctions = TRUE, steps = list(simbased_wordwise = list(algorithm = algo_similarity_based_reasoning, parameters = list(sim_name = "wordwise")), simbased_substring = list(algorithm = algo_similarity_based_reasoning, parameters = list(sim_name = "substring"))), include_general_id = FALSE )
get_job_suggestions( text, suggestion_type = "auxco-1.2.x", num_suggestions = 5, suggestion_type_options = list(), aggregate_score_threshold = 0.02, item_score_threshold = 0, distinctions = TRUE, steps = list(simbased_wordwise = list(algorithm = algo_similarity_based_reasoning, parameters = list(sim_name = "wordwise")), simbased_substring = list(algorithm = algo_similarity_based_reasoning, parameters = list(sim_name = "substring"))), include_general_id = FALSE )
text |
The raw text input from the user. |
suggestion_type |
Which type of suggestion to use / provide. Possible options are "auxco-1.2.x" and "kldb-2010". |
num_suggestions |
The maximum number of suggestions to show. This is an upper bound and less suggestions may be returned. Defaults to 5. |
suggestion_type_options |
A list with options for generating
suggestions. Supported options:
- |
aggregate_score_threshold |
A single value or named list of thresholds
between 0 and 1. If it is a list, each entry should correspond to one of
the |
item_score_threshold |
A threshold between 0 and 1 (usually very small, default 0). Results from any step will only be returned if they are greater than the specified threshold. Allows the removal of highly implausible suggestions. |
distinctions |
Whether or not to add additional distinctions to similar occupational categories to the source code. Defaults to TRUE. |
steps |
A list with the algorithms to use and their parameters. Each entry of the list should contain a nested list with two entries: algorithm (the algorithm's function itself) and parameters (the parameters to pass onto the algorithm). Each algorithm will also always have access to a default set of three parameters:
list( # try similarity "one word at most 1 letter different" first list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "wordwise", min_aggregate_prob = 0.535 ) ), # since everything else failed, try "substring" similarity list( algorithm = algo_similarity_based_reasoning, parameters = list( sim_name = "substring", min_aggregate_prob = 0.02 ) ) ) |
include_general_id |
Whether a general column, called "id" should always be returned. This will automatically contain the appropriate id for different suggestion_types i.e. for "auxco-1-2.x" it will contain the same data as the column "auxco_id". |
The procedure implemented here is, roughly speaking, as follows:
Predict categories from KldB 2010, including their scores. The first algorithm mentioned in steps
is used (default: algo_similarity_based_reasoning()
).
Convert the predicted KldB 2010 categories to suggestion_type
(default: auxco-1.2.x
, an n:m mapping, scores are mapped accordingly.). See internal function convert_suggestions()
for details.
Remove predicted categories if their score is below item_score_threshold
and only keep the num_suggestions
top-ranked suggestions.
Start anew, trying the next algorithm in steps
, if the the top-ranked suggestions have a low chance to be correct. (Technically, this happens if the summed score of the num_suggestions
top-ranked suggestions is below aggregate_score_threshold
.)
If suggestion_type == "auxco-1.2.x"
and distinctions == TRUE
, insert additional and (highly) similar categories or replace existing ones. See internal function add_distinctions_auxco()
. Reorder and keep only the num_suggestions
top-ranked suggestions. Auxco categories which were added during this step can be identified by their scores: It equals 0.05 for categories with high similarity and 0.005 for categories with medium similarity.
A data.table with suggestions or NULL if no suggestions were found.
## Not run: if (interactive()) { get_job_suggestions("Koch") } if (interactive()) { get_job_suggestions("Schlosser") } ## End(Not run)
## Not run: if (interactive()) { get_job_suggestions("Koch") } if (interactive()) { get_job_suggestions("Schlosser") } ## End(Not run)
Expects data to be saved as files.
get_responses(app_settings = create_app_settings(save_to_file = TRUE))
get_responses(app_settings = create_app_settings(save_to_file = TRUE))
app_settings |
The app_settings configuration, should be the same as
used in |
A combined data.table of user data (based on results_overview) or NULL if there are no files.
## Not run: app_settings <- create_app_settings(save_to_file = TRUE) if (interactive()) { get_responses(app_settings = app_settings) } ## End(Not run)
## Not run: app_settings <- create_app_settings(save_to_file = TRUE) if (interactive()) { get_responses(app_settings = app_settings) } ## End(Not run)
Categories from the International Standard Classification of Occupations - ISCO-08. ISCO-08 is a hierarchical classification, consisting of 10 (1-digit) major groups, 43 (2-digit) sub-major groups, 130 (3-digit) minor groups, and 436 (4-digit) unit groups, all of them included in this data set.
isco_08_en
isco_08_en
A data frame with 619 rows and 3 variables:
code
character. Unique ISCO-08 identifier / code.
label
character. Short label / title for the category.
description
character. Detailed description of the category.
Source: https://esco.ec.europa.eu This service uses the ESCO classification of the European Commission. The descriptions used here are taken from the ESCO classification (v1.1, Occupations pillar) of the European Commission, which is based on ISCO-08.
More information on the ISCO-08: https://isco-ilo.netlify.app/en/isco-08/, https://www.ilo.org/public/english/bureau/stat/isco/isco08/
This function loads the Auxiliary Classification of Occupations (AuxCO) by reading CSVs from the specified directory, while loading e.g. ids in the correct format. Data is loaded into a named list matching the format expected by other functions in this package.
load_auxco(dir, add_explanations = TRUE)
load_auxco(dir, add_explanations = TRUE)
dir |
The path to the directory which holds the CSVs. |
add_explanations |
Whether explanations should be added to some of the harder to understand task descriptions. Defaults to TRUE. |
This package also includes an already loaded version of the auxco, which can be used straight away without calling this function.
A list with multiple data.tables.
https://github.com/occupationMeasurement/auxiliary-classification, auxco
## Not run: # This function expects the CSV files from # https://github.com/occupationMeasurement/auxiliary-classification/releases/ # to be there. path_to_auxco <- "auxco" if (dir.exists(path_to_auxco)) { load_auxco(path_to_auxco) } ## End(Not run)
## Not run: # This function expects the CSV files from # https://github.com/occupationMeasurement/auxiliary-classification/releases/ # to be there. path_to_auxco <- "auxco" if (dir.exists(path_to_auxco)) { load_auxco(path_to_auxco) } ## End(Not run)
Use load_kldb_raw() to load the whole dataset.
load_kldb_raw( cache_dir = getOption("occupationMeasurement.cache_dir", tempdir()) ) load_kldb(cache_dir = getOption("occupationMeasurement.cache_dir", tempdir()))
load_kldb_raw( cache_dir = getOption("occupationMeasurement.cache_dir", tempdir()) ) load_kldb(cache_dir = getOption("occupationMeasurement.cache_dir", tempdir()))
cache_dir |
The path to the directory where the downloaded data should be stored.
We recommend setting this to "cache" to store data in the working directory. This
will prevent reloading the data time and time again. This can be set globally via
|
Source: https://www.klassifikationsserver.de/klassService/index.jsp?variant=kldb2010
More information on the KldB 2010: https://statistik.arbeitsagentur.de/DE/Navigation/Grundlagen/Klassifikationen/Klassifikation-der-Berufe/KldB2010-Fassung2020/KldB2010-Fassung2020-Nav.html The KldB 2010 has been revised in 2020. These changes have not been implemented here yet.
A cleaned / slimmed version of the KldB 2010.
load_kldb_raw()
: Load raw KldB 2010 dataset.
## Not run: # We recommend using a non-temporary directory for caching, so data is # downloaded only once and not time and time again cache_dir <- tempdir() # Note: The dataset will be downloaded from the internet # Load the cleaned dataset load_kldb(cache_dir = cache_dir) # Load the raw dataset load_kldb_raw(cache_dir = cache_dir) ## End(Not run)
## Not run: # We recommend using a non-temporary directory for caching, so data is # downloaded only once and not time and time again cache_dir <- tempdir() # Note: The dataset will be downloaded from the internet # Load the cleaned dataset load_kldb(cache_dir = cache_dir) # Load the raw dataset load_kldb_raw(cache_dir = cache_dir) ## End(Not run)
Each page corresponds to a page within the app/questionnaire.
new_page( page_id, render, condition = NULL, run_before = NULL, render_before = NULL, render_after = NULL, run_after = NULL )
new_page( page_id, render, condition = NULL, run_before = NULL, render_before = NULL, render_after = NULL, run_after = NULL )
page_id |
A unique string identifiying this page. (Required) This will be used to store data. |
render |
Function to render the page. (Required)
It is expected, that the function returns a list of shiny tags.
Its output will be combined with |
condition |
Function to check whether the page should be shown.
When this function returns |
run_before |
Function that prepares data to render the page.
Called immediately after condition (if |
render_before |
Called exactly like |
render_after |
Called exactly like |
run_after |
Function that handles the user input when they leave the page. This function has access to the shiny session and shiny input. |
Pages are rendered by calling the different life-cycle functions one after another. The order in which they are called is as follows:
condition
(session
)
Only if this evaluated to TRUE
, continue.
run_before
(session
)
render_before
(session
, run_before_output
, input
, output
)
render
(session
, run_before_output
, input
, output
)
render_after
(session
, run_before_output
, input
, output
)
The outputs from render_before
, render
& render_after
are
stitched together to produce the final HTML output of the page.
run_after
(session
, input
, output
)
Run when the user leaves the page (=clicks the next button). Any
user input has to be handled here. For each question asked, one will
typically call set_item_data()
to save the collected data
internally.
Each of the life-cycle functions above is annotated with the
paramaters it has access to. session
, input
and output
are
passed directly from shiny and correspond to the objects made available by
shiny::shinyServer()
, run_before_output
is available for convenience and
corresponds to whatever is returned by run_before
.
Some side-effects occur:
occupationMeasurement:::init_page_data
is called before 1. run_before
.
It sets up an internal representation of the page data to be saved.
occupationMeasurement:::finalize_data
is called before 6. run_before
.
occupationMeasurement:::save_page_data
is called after 6. run_before
.
It saves the responses on a hard drive, i.e. it appends the responses
from this page to table_name == "answers"
. See the vignette
and create_app_settings()
for details.
Use of render_before
, render_after
is discouraged if not necessary,
as these two life-cycle functions have only been added to allow for easier
modification and extension of existing pages.
A new page
object.
## Not run: very_simple_page <- new_page( page_id = "example", render = function(session, run_before_output, input, output, ...) { list( shiny::tags$h1("My test page"), button_previous(), button_next() ) } ) # Example where we also save some data page_that_saves_two_items <- new_page( page_id = "questions_1_and_2", render = function(session, run_before_output, page, input, output, ...) { list( shiny::tags$h1("Questions"), shiny::textAreaInput( inputId = "day_freetext", label = "How was your day? Please give a detailed answer.", value = get_item_data( session = session, page_id = page$page_id, item_id = "day_freetext", key = "response_text" ) ), shiny::tags$p("How would you rate your day? On a scale of 1 - 4"), radioButtons( inputId = "day_radio", label = NULL, width = "100%", choices = list(One = 1, Two = 2, Three = 3, Four = 4), selected = get_item_data( session = session, page_id = page$page_id, item_id = "day_radio", key = "response_id", default = character(0) ) ), button_previous(), button_next() ) }, run_after = function(session, page, input, ...) { set_item_data( session = session, page_id = page$page_id, item_id = "day_freetext", response_text = input$day_freetext ) set_item_data( session = session, page_id = page$page_id, item_id = "day_radio", response_id = input$day_radio ) } ) questionnaire_that_saves_two_items <- list( page_that_saves_two_items, # So we have a next page to go to very_simple_page ) if (interactive()) { app(questionnaire = questionnaire_that_saves_two_items) } ## End(Not run)
## Not run: very_simple_page <- new_page( page_id = "example", render = function(session, run_before_output, input, output, ...) { list( shiny::tags$h1("My test page"), button_previous(), button_next() ) } ) # Example where we also save some data page_that_saves_two_items <- new_page( page_id = "questions_1_and_2", render = function(session, run_before_output, page, input, output, ...) { list( shiny::tags$h1("Questions"), shiny::textAreaInput( inputId = "day_freetext", label = "How was your day? Please give a detailed answer.", value = get_item_data( session = session, page_id = page$page_id, item_id = "day_freetext", key = "response_text" ) ), shiny::tags$p("How would you rate your day? On a scale of 1 - 4"), radioButtons( inputId = "day_radio", label = NULL, width = "100%", choices = list(One = 1, Two = 2, Three = 3, Four = 4), selected = get_item_data( session = session, page_id = page$page_id, item_id = "day_radio", key = "response_id", default = character(0) ) ), button_previous(), button_next() ) }, run_after = function(session, page, input, ...) { set_item_data( session = session, page_id = page$page_id, item_id = "day_freetext", response_text = input$day_freetext ) set_item_data( session = session, page_id = page$page_id, item_id = "day_radio", response_id = input$day_radio ) } ) questionnaire_that_saves_two_items <- list( page_that_saves_two_items, # So we have a next page to go to very_simple_page ) if (interactive()) { app(questionnaire = questionnaire_that_saves_two_items) } ## End(Not run)
Show a page with multiple radio button options where once can be picked.
page_choose_one_option( page_id, question_text = "Please pick one of the following options", list_of_choices = list(One = 1, Two = 2, Three = 3), choice_labels = NULL, next_button = TRUE, previous_button = TRUE, run_before = NULL, run_after = NULL, ... )
page_choose_one_option( page_id, question_text = "Please pick one of the following options", list_of_choices = list(One = 1, Two = 2, Three = 3), choice_labels = NULL, next_button = TRUE, previous_button = TRUE, run_before = NULL, run_after = NULL, ... )
page_id |
A unique string identifiying this page. Used to store data. |
question_text |
The question / text to display. This can be either a string, which will simply be displayed or a function to dynamically determine the question_text. |
list_of_choices |
A list of answering options.
This can either be just a simple list of values or a named list with the
names corresponding to what the user sees and the values corresponding to
the actually saved values.
e.g. with |
choice_labels |
List or vector of only the choice names to be shown. This has to be matched by an equal-length vector in list_of_choices. |
next_button |
Whether to show the button to navigate to the next page? Defaults to TRUE. |
previous_button |
Whether to show the button to navigate to the preivous page? Defaults to TRUE. |
run_before |
Similar to |
run_after |
Similar to |
... |
Other parametrs are passed on to |
A page object.
## Not run: one_page_questionnaire <- list( page_choose_one_option( "test_page_radio", question_text = "Hello there! Please pick your favorite number from the options below:", list_of_choices = list(One = 1, Two = 2, Three = 3) ), page_final() ) if (interactive()) { app(questionnaire = one_page_questionnaire) } ## End(Not run)
## Not run: one_page_questionnaire <- list( page_choose_one_option( "test_page_radio", question_text = "Hello there! Please pick your favorite number from the options below:", list_of_choices = list(One = 1, Two = 2, Three = 3) ), page_final() ) if (interactive()) { app(questionnaire = one_page_questionnaire) } ## End(Not run)
Page to receive feedback on how well the chosen suggestion fits
page_feedback(is_interview = FALSE, ...)
page_feedback(is_interview = FALSE, ...)
is_interview |
Should the page show slightly different / additional instructions and answer options for an interview that is conducted by another person? Defaults to FALSE. |
... |
All additional parameters are passed first passed on to
|
A page object.
## Not run: my_questionnaire <- list( page_first_freetext(), page_select_suggestion(), page_feedback() ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list( page_first_freetext(), page_select_suggestion(), page_feedback() ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
This page saves data in results_overview and marks the questionnaire as complete.
page_final(...)
page_final(...)
... |
All additional parameters are passed to |
A page object.
## Not run: my_questionnaire <- list( page_final() ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list( page_final() ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
Here, the description of the job can be entered in an open freetext field and suggestions will be generated based on the input.
page_first_freetext( is_interview = FALSE, aggregate_score_threshold = 0.535, ... )
page_first_freetext( is_interview = FALSE, aggregate_score_threshold = 0.535, ... )
is_interview |
Should the page show slightly different / additional instructions and answer options for an interview that is conducted by another person? Defaults to FALSE. |
aggregate_score_threshold |
The total sum of the scores of the
suggestions has to be higher than this threshold for suggestions to be
shown. The parameter is passed on to |
... |
All additional parameters are passed to |
A page object.
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
To disambiguate between similar occupations. Depending on the suggestion, multiple followup questions can be shown.
page_followup(index, is_interview = FALSE, ...)
page_followup(index, is_interview = FALSE, ...)
index |
The index of the followup question (1-based).
To show the first followup question (if there are any) use
page_followup(index = 1), to show a potential second followup question use
page_followup(index = 2).
For example |
is_interview |
Should the page show slightly different / additional instructions and answer options for an interview that is conducted by another person? Defaults to FALSE. |
... |
All additional parameters are passed to |
A page object.
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
Show a page with a text field where free text can be entered.
page_freetext( page_id, question_text = "Please enter your answer in the box below", is_interview = FALSE, no_answer_checkbox = TRUE, next_button = TRUE, previous_button = TRUE, trigger_next_on_enter = TRUE, render_question_text = TRUE, run_before = NULL, run_after = NULL, ... )
page_freetext( page_id, question_text = "Please enter your answer in the box below", is_interview = FALSE, no_answer_checkbox = TRUE, next_button = TRUE, previous_button = TRUE, trigger_next_on_enter = TRUE, render_question_text = TRUE, run_before = NULL, run_after = NULL, ... )
page_id |
A unique string identifiying this page. Used to store data. |
question_text |
The question / text to display. This can be either a string, which will simply be displayed or a function to dynamically determine the question_text. |
is_interview |
Should the page show slightly different / additional instructions and answer options for an interview that is conducted by another person? Defaults to FALSE. |
no_answer_checkbox |
Whether to provide a checkbox to denote that no answer has been provided. |
next_button |
Whether to show the button to navigate to the next page? Defaults to TRUE. |
previous_button |
Whether to show the button to navigate to the preivous page? Defaults to TRUE. |
trigger_next_on_enter |
Whether the next button is triggered when one presses enter. Defaults to TRUE. There are known issues with IE11. |
render_question_text |
Whether the question text should be displayed?
Only set this to FALSE, if you wish to change the rendering of the
question_text by e.g. using |
run_before |
Similar to |
run_after |
Similar to |
... |
Other parametrs are passed on to |
A page object.
## Not run: page_freetext( "test_page_freetext", question_text = "Hello there! Please fill in your name below:", no_answer_checkbox = TRUE ) ## End(Not run)
## Not run: page_freetext( "test_page_freetext", question_text = "Hello there! Please fill in your name below:", no_answer_checkbox = TRUE ) ## End(Not run)
An additional freetext page to show when no suggestion has been selected.
page_none_selected_freetext(is_interview = FALSE, ...)
page_none_selected_freetext(is_interview = FALSE, ...)
is_interview |
Should the page show slightly different / additional instructions and answer options for an interview that is conducted by another person? Defaults to FALSE. |
... |
All additional parameters are passed to |
A page object.
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
This page is only meant for demonstration purposes. Users can see what they
entered and which code was being saved. The page is only included in the
questionnaire_demo()
, but not in the other questionnaire templates.
page_results(...)
page_results(...)
... |
All additional parameters are passed to |
A page object.
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2), page_results() ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2), page_results() ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
If the first freetext question didn't provide satisfactory results, ask for more details and try again.
page_second_freetext( combine_input_with_first = TRUE, is_interview = FALSE, aggregate_score_threshold = 0.02, ... )
page_second_freetext( combine_input_with_first = TRUE, is_interview = FALSE, aggregate_score_threshold = 0.02, ... )
combine_input_with_first |
Should input be combined with the previous question? |
is_interview |
Should the page show slightly different / additional instructions and answer options for an interview that is conducted by another person? Defaults to FALSE. |
aggregate_score_threshold |
The total sum of the scores of the
suggestions has to be higher than this threshold for suggestions to be
shown. The parameter is passed on to |
... |
All additional parameters are passed to |
A page object.
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
Display the generated suggestions for the user to pick one.
page_select_suggestion(is_interview = FALSE, ...)
page_select_suggestion(is_interview = FALSE, ...)
is_interview |
Should the page show slightly different / additional instructions and answer options for an interview that is conducted by another person? Defaults to FALSE. |
... |
All additional parameters are passed to |
A page object.
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list( page_first_freetext(), page_second_freetext(), page_select_suggestion(), page_none_selected_freetext(), page_followup(1), page_followup(2) ) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
Providing an introduction and greeting participants.
page_welcome( title = "Herzlich Willkommen zum Modul zur automatischen Berufskodierung!", ... )
page_welcome( title = "Herzlich Willkommen zum Modul zur automatischen Berufskodierung!", ... )
title |
The heading with which to greet participants. |
... |
All additional parameters are passed to |
A page object.
## Not run: my_questionnaire <- list(page_welcome) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
## Not run: my_questionnaire <- list(page_welcome) if (interactive()) { app(questionnaire = my_questionnaire) } ## End(Not run)
Replace some common characters / character sequences (e.g., Ä, Ü, "DIPL.-ING.") with their uppercase equivalents and removes punctuation, empty spaces and the word "Diplom".
preprocess_string(verbatim, lang = "de")
preprocess_string(verbatim, lang = "de")
verbatim |
The character vector to process. |
lang |
The language the text is in. Currently only German is supported. Defaults to "de" (German). |
charToRaw()
helps to find UTF-8 characters.
The same character vector after processing
## Not run: preprocess_string(c( "Verkauf von B\u00fcchern, Schreibwaren", "Fach\u00e4rztin f\u00fcr Kinder- und Jugendmedizin im \u00f6ffentlichen Gesundheitswesen", "Industriemechaniker", "Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)" )) ## End(Not run)
## Not run: preprocess_string(c( "Verkauf von B\u00fcchern, Schreibwaren", "Fach\u00e4rztin f\u00fcr Kinder- und Jugendmedizin im \u00f6ffentlichen Gesundheitswesen", "Industriemechaniker", "Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)" )) ## End(Not run)
Pretrained ML models to be used with the package.
pretrained_models
pretrained_models
A nested list with pretrained machine learning models:
similarity_based_reasoning
list. Contains pretrained models to be used with algo_similarity_based_reasoning()
.
similarity_based_reasoning$wordwise
list. Contains the pretrained model to be used for providing suggestions using full wordwise matching.
similarity_based_reasoning$substring
list. Contains the pretrained model to be used for providing suggestions using substring matching.
This training data always predicts a 5-digit code from the 2010 German Classification of Occupations, with some exceptions: -0004 stands for 'Not precise enough/uncodable', -0006 stands for 'Multiple Jobs', -0012 stands for 'Blue-collar workers', -0019 stands for 'Volunteer/Social Service', and -0030 stands for 'Student assistant'.
Data from the following surveys were pooled:
Antoni, M., Drasch, K., Kleinert, C., Matthes, B., Ruland, M. and Trahms, A. (2010): Arbeiten und Lernen im Wandel * Teil 1: Überblick über die Studie, FDZ-Methodenreport 05/2010, Forschungsdatenzentrum der Bundesagentur für Arbeit im Institut für Arbeitsmarkt- und Berufsforschung, Nuremberg.
Rohrbach-Schmidt, D., Hall, A. (2013): BIBB/BAuA Employment Survey 2012, BIBB-FDZ Data and Methodological Reports Nr. 1/2013. Version 4.1, Federal Institute for Vocational Education and Training (Research Data Centre), Bonn.
Lange, C., Finger, J., Allen, J., Born, S., Hoebel, J., Kuhnert, R., Müters, S., Thelen, J., Schmich, P., Varga, M., von der Lippe, E., Wetzstein, M., Ziese, T. (2017): Implementation of the European Health Interview Survey (EHIS) into the German Health Update (GEDA), Archives of Public Health, 75, 1–14.
Hoffmann, R., Lange, M., Butschalowsky, H., Houben, R., Schmich, P., Allen, J., Kuhnert, R., Schaffrath Rosario, A., Gößwald, A. (2018): KiGGS Wave 2 Cross-Sectional Study—Participant Acquisition, Response Rates and Representativeness, Journal of Health Monitoring, 3, 78–91. (only wave 2)
Trappmann, M., Beste, J., Bethmann, A., Müller, G. (2013): The PASS Panel Survey after Six Waves, Journal for Labour Market Research, 46, 275–281. (only wave 10)
Job titles were taken from the following publication:
Bundesagentur für Arbeit (2019). Gesamtberufsliste der Bundesagentur für Arbeit. Stand: 03.01.2019. https://download-portal.arbeitsagentur.de/files/.
Basically, leaving some data anonymization steps aside, we count for each job title from the Gesamtberufsliste (and some additional titles/texts) how many responses from all surveys are similar to this job title, separately for each coded category. Similarity is calculated in two ways, implying that we obtain two different counts: SubstringSimilarity refers to situations where the job title from the Gesamtberufsliste is a substring of the verbal answer; WordwiseSimilarity refers to situations where a word from the verbal answer is identical to a job title from the Gesamtberufsliste, except that one character from this word is allowed to change (Levenshtein distance). These counts are available as two separate files in the data-raw/training-data/ directory of this package. The algorithm to create these counts is available inside an R-package at https://github.com/malsch/occupationCoding, along with further documentation.
train_similarity_based_reasoning()
is then used to train the ML models. See data-raw/pretrained_models.R for the raw counts and further details.
algo_similarity_based_reasoning()
, train_similarity_based_reasoning()
, https://github.com/malsch/occupationCoding
View the function's code itself to see the used pages.
questionnaire_demo(show_feedback_page = TRUE)
questionnaire_demo(show_feedback_page = TRUE)
show_feedback_page |
Show the |
Note, that this function has more complex code to create the additional pages.
A questionnaire for app()
i.e. a list of pages.
## Not run: # Inspect the code to create the questionnaire_demo print(questionnaire_demo) if (interactive()) { # Run the app with the questionnaire_demo app(questionnaire = questionnaire_demo()) } ## End(Not run)
## Not run: # Inspect the code to create the questionnaire_demo print(questionnaire_demo) if (interactive()) { # Run the app with the questionnaire_demo app(questionnaire = questionnaire_demo()) } ## End(Not run)
A questionnaire for Computer-assisted Interviewing (CAI), i.e. telephone interviewing or personal interviewing. In both modes, interviewer asks questions to an interviewee.
questionnaire_interviewer_administered(show_feedback_page = TRUE)
questionnaire_interviewer_administered(show_feedback_page = TRUE)
show_feedback_page |
Show the |
View the function's code to see the used pages. This function is meant as a template that can be changed to meet your requirements.
A questionnaire for app()
i.e. a list of pages.
## Not run: # Inspect the code to create the questionnaire_interviewer_administered print(questionnaire_interviewer_administered) if (interactive()) { # Run the app with the questionnaire_interviewer_administered app(questionnaire = questionnaire_interviewer_administered()) } ## End(Not run)
## Not run: # Inspect the code to create the questionnaire_interviewer_administered print(questionnaire_interviewer_administered) if (interactive()) { # Run the app with the questionnaire_interviewer_administered app(questionnaire = questionnaire_interviewer_administered()) } ## End(Not run)
The basic default questionnaire. View the function's code to see the used pages. This function is meant as a template that can be changed to meet your requirements.
questionnaire_web_survey(show_feedback_page = TRUE)
questionnaire_web_survey(show_feedback_page = TRUE)
show_feedback_page |
Show the |
A questionnaire for app()
, i.e. a list of pages.
## Not run: # Inspect the code to create the questionnaire_web_survey print(questionnaire_web_survey) if (interactive()) { # Run the app with the questionnaire_web_survey app(questionnaire = questionnaire_web_survey()) } if (interactive()) { # This is used by default within app app() } ## End(Not run)
## Not run: # Inspect the code to create the questionnaire_web_survey print(questionnaire_web_survey) if (interactive()) { # Run the app with the questionnaire_web_survey app(questionnaire = questionnaire_web_survey()) } if (interactive()) { # This is used by default within app app() } ## End(Not run)
There can be multiple items on any given page. Items can be different
questions, or multiple variables that need to be saved from a single
question. The question_text
is typically
saved in run_before
and the reply (response_text
and/or response_id
) is
typically saved in run_after
.
set_item_data( session, page_id, item_id = NULL, question_text = NULL, response_text = NULL, response_id = NULL )
set_item_data( session, page_id, item_id = NULL, question_text = NULL, response_text = NULL, response_id = NULL )
session |
The shiny session |
page_id |
The page for which to retrieve data. |
item_id |
The item for which to set/update data.
This has to be different for different items on the same page.
Since most pages contain only a single question/item, |
question_text |
The question's text. (optional) |
response_text |
The user's response in text form. (optional) |
response_id |
The user's response as an id from a set of choices. (optional) |
nothing
## Not run: # Set up a "fake" shiny session to store data session <- shiny::MockShinySession$new() session$userData <- list( current_page_id = "other_page", questionnaire_data = list( example_page = list() ) ) # This code is expected to be run in e.g. run_before or run_after # It doesn't really make sense to run this code outside set_item_data( session = session, page_id = "example_page", question_text = "How are you?" ) set_item_data( session = session, page_id = "example_page", response_id = 3, response_text = "I'm doing great! (response_id = 3)" ) ## End(Not run)
## Not run: # Set up a "fake" shiny session to store data session <- shiny::MockShinySession$new() session$userData <- list( current_page_id = "other_page", questionnaire_data = list( example_page = list() ) ) # This code is expected to be run in e.g. run_before or run_after # It doesn't really make sense to run this code outside set_item_data( session = session, page_id = "example_page", question_text = "How are you?" ) set_item_data( session = session, page_id = "example_page", response_id = 3, response_text = "I'm doing great! (response_id = 3)" ) ## End(Not run)
This function requires the mvtnorm package.
train_similarity_based_reasoning( anonymized_data, num_allowed_codes = 1291, coding_index_w_codes, coding_index_without_codes = NULL, preprocessing = list(stopwords = NULL, stemming = NULL, strPreprocessing = TRUE, removePunct = FALSE), dist_type = c("wordwise", "substring", "fulltext"), dist_control = list(method = "osa", weight = c(d = 1, i = 1, s = 1, t = 1)), threshold = c(max = 3, use = 1), simulation_control = list(n.draws = 250, check_normality = FALSE) )
train_similarity_based_reasoning( anonymized_data, num_allowed_codes = 1291, coding_index_w_codes, coding_index_without_codes = NULL, preprocessing = list(stopwords = NULL, stemming = NULL, strPreprocessing = TRUE, removePunct = FALSE), dist_type = c("wordwise", "substring", "fulltext"), dist_control = list(method = "osa", weight = c(d = 1, i = 1, s = 1, t = 1)), threshold = c(max = 3, use = 1), simulation_control = list(n.draws = 250, check_normality = FALSE) )
anonymized_data |
|
num_allowed_codes |
the number of allowed codes in the target classification. There are 1286 categories in the KldB 2010 plus 5 special codes in both anonymized training data sets, so the default value is 1291. |
coding_index_w_codes |
a data.table with columns
|
coding_index_without_codes |
(not used, but automatically determined) Any words from |
preprocessing |
a list with elements
|
dist_type |
How to calculate similarity between entries from both coding_indices and verbal answers from the survey? Three options are currently supported. Since we use the
|
dist_control |
If |
threshold |
A numeric vector with two elements. If |
simulation_control |
a list with two components,
|
a list with components
Contains all entries from the coding index. dist = "official" if the entry stems from coding_index_w_codes and dist = selfcreated if the entry stems from coding_index_without_codes. string.prob
is used for weighting purposes (model averaging) if a new verbal answer is similar to multiple strings. unobserved.mean.theta
gives a probability (usually very low) for any category that was not observed in the training data together with this string.
mean.theta
is the probability for code
given that an incoming verbal answer is similar to string
. Only available if this code was at least a single time observed with this string (Use unobserved.mean.theta
otherwise).
Number of categories in the classification.
The input parameter stored to replicate preprocessing with incoming data.
The input parameter stored to replicate distance calculations with incoming data.
The input parameter stored to replicate distance calculations with incoming data.
The input parameter stored to replicate distance calculations with incoming data.
The input parameters controlling the Monte Carlo simulation.
Schierholz, Malte (2019): New methods for job and occupation classification. Dissertation, Mannheim. https://madoc.bib.uni-mannheim.de/50617/, pp. 206-208 and p. 268, pp. 308-320
https://github.com/malsch/occupationCoding (function trainSimilarityBasedReasoning2 is implemented here)
pretrained_models, which were created using this function.