1 Using Positron for Reproducible Research in R

Reproducible research is a cornerstone of scientific integrity – it means that other researchers (or you in the future) can duplicate the results of a study using the same data, code, and methods. In recent years, a broad “reproducibility crisis” has been recognized across sciences: a 2016 survey in Nature found that over 70% of researchers had failed to reproduce someone else’s results, and even about half failed to reproduce their own prior results. Social sciences and management fields, including international business (IB), are not immune to this crisis. In fact, top IB scholars have noted that it’s not surprising many findings don’t hold up because common methodological practices allow researchers to “search for a maximally predictive model” through trial-and-error and not report those exploratory steps. This kind of undisclosed data mining or “capitalization on chance” can lead to results that only appear significant by luck and cannot be reproduced in new data (Aguinis, Cascio, & Ramani, 2017). The stakes are high: if business and policy decisions are informed by unreproducible analyses, trust in empirical evidence erodes. Therefore, ensuring that results can be independently reproduced is critical for credible, cumulative knowledge in both academia and industry.

Achieving reproducibility requires both the right tools and the right practices to support transparent, documented workflows. Two key components are literate programming and version control. Literate programming means weaving narrative and analysis together, so that a research report isn’t just tables and figures but also the code that generated them embedded alongside explanatory text. Tools like R Markdown (now succeeded by Quarto) enable this by allowing you to mix prose and code in one document. Version control, typically through Git and platforms like GitHub, provides a complete history of changes to your code and data, enhancing transparency and collaboration. Using these tools together, you can ensure that every figure or model result in your report is backed by code that is tracked, reviewable, and runnable by others.

Until recently, RStudio was the go-to integrated development environment (IDE) for R users, offering an all-in-one interface for script editing, interactive R console, plots, and environment viewing. Meanwhile, Visual Studio Code (VS Code) rose in popularity as a lightweight, highly customizable code editor with an extensive extension ecosystem, though setting it up for R (and ensuring features like code completion and plotting work smoothly) required some configuration. Now, Posit – the company behind RStudio – has introduced Positron, a new IDE that combines the best of both worlds. Positron is built on the same open-source core as VS Code (the Code OSS engine) but is tailored for data science, with native support for R and Python out-of-the-box. Positron aims to be a “polyglot” IDE for data science – you can use it primarily with R (as we do in this textbook), but it also seamlessly supports Python and other languages, reflecting the reality that modern machine learning often blends ecosystems. The interface will feel familiar: you still get the classic four-panel layout for code, console, plots, and environment (like RStudio) alongside VS Code’s flexible sidebar for file navigation, Git integration, extensions, etc. In fact, when you first launch Positron, it looks like a combination of RStudio and VS Code, with a sidebar on the left and a multi-panel workspace on the right.

Crucially for reproducible research, Positron encourages the use of script files and Quarto notebooks for all analysis – meaning every step is recorded in code rather than being an ephemeral GUI action. This is vital for scientific ML workflows: for example, if you try several data preprocessing techniques or model hyperparameters, you should have a code record of those attempts. With Positron and Quarto, you’ll document these in a literate programming format, and with Git version control you’ll have a time-stamped history of changes. This makes it far less likely to fall prey to “secret sauce” issues where results can’t be reproduced because something was changed without record. Positron also inherits VS Code’s robustness and modularity. Under the hood it runs R and Python via a kernel system (named Ark for R) rather than tying the IDE directly to a single R process, which means you can even crash R without crashing the IDE and you can easily switch between different R versions or between R and Python in the same session. This hybrid approach – RStudio’s user-friendly features with VS Code’s flexibility – provides an ideal platform for modern data science research. You get integrated support for Quarto documents (for mixing text, code, results, and citations), and built-in Git integration for version control – all in one place, no heavy configuration required.

In the context of machine learning (ML) for international business, these tools and practices are especially important. ML models often involve complex pipelines (data cleaning, feature engineering, model training, evaluation) that can be difficult to replicate exactly unless carefully documented. They are also prone to issues like randomness (e.g., random weight initialization or data splits) and hyperparameter tuning that can lead to variability in results. By using literate programming, you ensure that every parameter choice and step (from data preprocessing to model fitting) is logged in the code. By using version control, you avoid the common situation of “code spaghetti” or mysterious file versions – instead, you have a clear log of how the model evolved. And by integrating your analysis narrative with code (via Quarto in Positron), you can directly tie interpretations (for instance, discussing feature importance or cross-validation performance) to the exact code and data that produced them. In a field like international business, where analyses might inform cross-country strategy or policy, having this level of transparency bolsters the credibility of insights. It also facilitates collaboration: a colleague in another country can pull your GitHub repository and rerun the analysis on their machine, perhaps applying it to new data from a different region, to test the generalizability (which touches on replicability as distinct from reproducibility).

This chapter provides a comprehensive guide to setting up and using Positron for R-based data science in the social sciences (with a focus on ML applications in IB). We will walk through initial installation and configuration, managing packages, using Git/GitHub for version control, creating Quarto documents (for HTML, PDF, or Word outputs) with proper APA-formatted citations (with help from Zotero for reference management), and implementing best practices for reproducible, transparent research. Throughout, we’ll highlight special considerations for machine learning workflows – such as handling random seeds for reproducibility, tracking model versions, and possibly integrating Python when needed – to ensure that even complex modeling exercises can be reproduced and audited. By the end of this chapter, you should have your environment ready and understand how each tool contributes to a reproducible research pipeline.

1.1 Setting Up Positron for R and Quarto Development

Prerequisites: Before beginning, ensure you have a working internet connection and administrative rights on your computer to install software. We will install the core components needed: R, Positron, and (if not already included) the Quarto CLI. Positron comes pre-configured with much of what you need for R, but we’ll verify and tweak some settings. This section provides a step-by-step guide to get your environment ready for R programming and Quarto authoring in Positron.

Installing R and Positron

1. Install R: Download and install the latest version of R from the Comprehensive R Archive Network (CRAN) for your operating system. On Windows or macOS, run the installer and follow the prompts. (On Windows, it’s recommended to enable the option to “Write R version to the registry” for easier discovery by other applications.) On Linux, you can use your distribution’s package manager (e.g., apt on Debian/Ubuntu) or download the binary from CRAN and install it. After installation, confirm that R is accessible: open a terminal (or Command Prompt on Windows) and run R --version to ensure R runs and reports its version.

2. Install Positron: Positron is available from Posit (the company formerly known as RStudio). Download the Positron installer from the official Posit website (as of 2025, Posit provides installers for Windows, macOS, and Linux). Install Positron like you would any application. Note that Positron was in public beta in 2024 but has become stable and is actively developed – you may want to get the latest release. Launch Positron once installed – you’ll see an interface that looks like a blend of VS Code and RStudio. There’s a left sidebar (with icons for file explorer, search, source control, extensions, etc.), and the main area can be split into an editor and panels for console, terminals, plots, and the environment. Positron should automatically detect your R installation. You can verify this by opening a new R script in Positron and running a simple command: for example, in a new file type R.version.string and press Ctrl+Enter (or Cmd+Enter on Mac). This should send the command to the built-in R console and display the R version, confirming Positron found and launched R. (Positron’s default keybindings allow Ctrl/Cmd + Enter to run the current line or selected code, just like in RStudio.) If R doesn’t start or isn’t found, check Positron’s settings under Interpreters: you can specify the R binary path if needed, but in most cases Positron’s auto-discovery works out-of-the-box.

3. Verify R support and language tools: One advantage of Positron is that it comes pre-configured with rich R support, including the R kernel and language server. In plain terms, Positron bundles an R engine backend called Ark (which acts similarly to Jupyter’s IRkernel) to handle code execution and IntelliSense (auto-completion and inline help). This means you likely do not need to separately install the languageserver R package or the VS Code R extension – Positron has these capabilities built in. Ark leverages the R language server under the hood, providing auto-completion, function signature hints, and documentation pop-ups as you code. To test this, try typing a function name in an R script and see if suggestions appear. For example, create a new R script and type lm( and then hit the Tab key – you should see a tooltip with the function’s arguments and perhaps a snippet of its documentation, indicating that language intelligence is working. Another test: type mean( and see if a help bubble shows the function usage and description. If these work, Positron’s R language support is active. In case you don’t see such features, it’s worth installing the languageserver package in R (install.packages("languageserver")) because on rare occasions the bundled server might not start until that package is available. However, in our experience, a fresh Positron install with R should have these features working immediately. The key benefit is that you get a smart editor that can auto-complete object names, functions, and even suggest correct argument names, which speeds up coding and reduces errors.

4. [Optional] Enhanced R console (radian): In prior VS Code setups, many R users liked to use the radian console (a third-party R console with syntax highlighting and better history) for an improved interactive experience. Positron’s built-in R console already has some advantages: it’s tightly integrated with the editor (sending code from editor to console is seamless) and displays output in the IDE’s panels. There isn’t a straightforward way to swap out the console to radian within Positron, and frankly, most users won’t find it necessary. Positron’s console supports multiline editing, syntax coloring, and other niceties by default. So this step can be skipped for Positron. (If you really prefer a different console, you could still open an external terminal within Positron and run radian or plain R there, but you would lose some of the IDE integration). The takeaway is that Positron gives a good interactive R experience by default, suitable for prototyping code or inspecting data. If you also work with Python, Positron similarly provides a built-in Python console (using IPython/Jupyter technology), so you can execute Python code line-by-line just as you do with R. This is a boon for ML workflows where you might use Python for certain tasks (e.g., a scikit-learn model) alongside R – Positron can handle both in one IDE process, avoiding context-switching between tools.

5. Install additional R packages for development: Ensure you have the R packages that support R Markdown/Quarto execution and any other packages you’ll need for data analysis or modeling. A good starting point is to install rmarkdown (which will bring in knitr and other dependencies) and tidyverse (a collection of useful data science packages). Open the R console in Positron (for example, click on the Console panel or use Terminal > New R Terminal) and run: install.packages(c("rmarkdown", "tidyverse")). This will download and install the packages from CRAN. Since Quarto uses knitr (via rmarkdown) to execute R code chunks, having those packages ensures smooth rendering of documents. If you plan to do machine learning in R, you might also install the tidymodels suite (install.packages("tidymodels")), which includes packages like rsample, parsnip, recipes, and so on for modeling. The tidymodels meta-package is very useful for organizing ML workflows in a consistent grammar, and we will use it in later chapters for classification and regression tasks. Additionally, if you anticipate using specialized packages (for example, caret, randomForest, xgboost, or others often used in ML), you should install them now or as needed. On Windows, if you encounter issues installing packages that require compilation, you might need Rtools installed (a toolkit for building R packages). Many common packages have pre-compiled binaries on CRAN, so you may not hit that roadblock. It’s a good idea to update your packages regularly with update.packages(), but during a project you might hold off on updates to maintain consistency (we’ll discuss project-specific package management later).

Quarto Installation and Integration in Positron

6. Install Quarto CLI (if needed): Quarto is the open-source publishing system we’ll use for reproducible documents (it’s the successor to R Markdown, supporting multiple languages and output formats). Positron bundles the Quarto CLI out-of-the-box, so in most cases you do not need a separate installation. To verify, open a terminal in Positron (Terminal > New Terminal, which gives you a system shell, not the R console) and run quarto --version. You should see Quarto’s version printed (e.g., Quarto 1.3.340 or a later version). If that command reports an error or you get “command not found,” then something is amiss – possibly your Positron installation might be outdated or Quarto not in PATH. In that case, you can manually install Quarto by downloading it from quarto.org (they have installers for all platforms). But typically, installing Positron also sets up Quarto. Quarto is a separate tool under the hood (similar to how RStudio requires a separate Pandoc/LaTeX, etc.), but Positron’s integration makes it feel built-in. After any manual Quarto installation or update, again check quarto --version to ensure the system recognizes it. Having Quarto available is crucial because it’s what will render your .qmd documents into the final outputs (HTML, PDF, etc.).

7. Verify the Quarto extension in Positron: Along with the CLI, Positron includes the Quarto extension for the editor. This is what provides syntax highlighting, previews, and notebook-like execution within Quarto documents. You can confirm it’s enabled by clicking the Extensions icon in Positron’s sidebar (the square icon) and searching for “Quarto”. You should see an extension named “Quarto” which is likely already installed and enabled (Positron comes with it). This extension gives you an integrated visual preview and the ability to execute code cells inside .qmd files. In other words, it bridges Quarto with the IDE’s R/Python execution engine. For example, if you have a Python chunk in a Quarto document, the extension will route that to the active Python kernel and show output, whereas an R chunk goes to the R kernel – all coordinated seamlessly. If for some reason the Quarto extension isn’t present (perhaps in an older Positron build), you can install it via the Open VSX marketplace (Positron uses Open VSX, not Microsoft’s marketplace, for extensions). But this should not be necessary in current versions. One quick test after verifying installation is to open a sample Quarto file and see if the editor recognizes it (you should see markdown highlighting, and chunk options colored differently, etc.). We’ll create a Quarto document in a moment to test the full integration.

8. [Optional] Configure TeX for PDF output: If you will render Quarto documents to PDF, a LaTeX distribution is required (because Quarto/Pandoc uses LaTeX to typeset PDF). This is not specific to Positron – it’s a general requirement for converting Markdown to PDF. A lightweight way to get LaTeX on your system is TinyTeX (an R-based installer for TeX Live). You can install TinyTeX by running install.packages("tinytex") followed by tinytex::install_tinytex() in R. Alternatively, on Windows you might install MikTeX, or on macOS MacTeX, or use TeX Live on Linux via package manager. If you skip this step, you can still produce HTML and Word outputs, but PDF rendering will likely fail with an error about missing LaTeX. Many R users prefer TinyTeX because it’s small and Quarto will automatically use it. After installing a TeX distribution, try rendering a Quarto document to PDF to confirm it works (we’ll do a test render shortly). If the PDF creation works without errors, you’re set. If there are errors about missing packages, TinyTeX by default will auto-install them, so give it a moment or two. The first PDF compilation might be slow as it downloads needed LaTeX packages, but subsequent ones will be faster.

9. [Optional] Remote development setup: One powerful feature inherited from VS Code is the ability to run your work on a remote server while using the local IDE as the interface. Positron supports Remote SSH sessions, meaning you can connect to a remote Linux server (for example, a university HPC cluster or a cloud VM with GPUs for heavy ML training) and use Positron as if you were working locally. In a remote-SSH session, the Positron UI (your editor, etc.) runs on your local machine, but all the code, data, and computation happen on the remote machine. This is extremely useful for scaling up analyses or training large models without leaving the comfort of the IDE. To use this, you need SSH access to the server and an SSH client on your local machine. In Positron, open the Command Palette (Ctrl+Shift+P) and run “Remote SSH: Connect to Host…”. If it’s your first time, it will prompt for a host address (user@hostname) and possibly ask to install the VS Code server on the remote. Positron uses an open-source implementation of VS Code’s remote server to enable this. Once connected, you’ll see a new Positron window with a green “[SSH: hostname]” indicator in the status bar. Now, any folder you open or terminal you launch in that window operates on the remote server. You can edit files, run R or Python, and even render Quarto documents as if you were on the remote machine – because you are! This means you can harness a powerful server (with more CPU/RAM or special hardware) for your computations, but still do the editing and visualization through your local screen. For example, you could train a machine learning model on a large dataset stored on a remote server; your Quarto document in Positron can execute that training code remotely and produce figures, which then display in your local Plots pane. One thing to note: when you first connect to a remote, Positron will not have your extensions installed there. The interface will show separate sections for “Local – Installed” vs “Remote – Installed” extensions. If you need certain extensions on the remote (like the Zotero citation picker or Python/Jupyter support), install them in the remote context (Positron will handle this via the same Extensions panel). Remote development is an advanced but highly rewarding setup for team collaborations or heavy computations. If you’re working solely on a local machine for now, you can skip it. But keep in mind this capability exists – as your IB machine learning projects grow, you might leverage cloud computing, and Positron will scale with you. (As of late 2024, remote-SSH in Positron was marked experimental but “just works” for most cases. It’s expected to become a standard feature.)

10. Final check: At this stage, you should have R and Positron (with Quarto) set up. Let’s do a final verification of everything working together. In Positron, open the Command Palette and run “Quarto: Check”. This will print a diagnostic in the console, listing Quarto’s version, the Pandoc version, and whether R is detected. You should see something like “R: 4.3.1 – OK” in that output. Next, try creating a new Quarto document: click File > New File, and choose Quarto Document, or use the palette with “Quarto: New File”. Select an HTML or PDF article template if prompted (or the default empty document). A new .qmd file will open with some example content (if you used a template). Now, with that file active, find the Render or Preview button – usually a magnifying glass icon at the top-right of the editor. Click it to render the document. Positron will call Quarto to render, and if all is well, a preview pane will appear showing the formatted output (by default, an HTML preview within the IDE). You might see the document’s title, some headings, and even a plot if it was an example file. If this works, it confirms that R, Quarto, and Positron’s integration are all functioning. If there’s an error, read the error message: common issues could be missing R packages (e.g., if the document tried to use one you haven’t installed), or missing LaTeX (if you tried PDF without TeX installed). Resolve any issues (install packages, etc.) and render again. Finally, try an interactive execution: in the .qmd file, find a code chunk (delimited by ```). Click the little “play” icon to the left of the chunk (appears when you hover there) to run that chunk. The code will execute and its output will appear in the console or plots pane. This confirms that you can iteratively work with Quarto documents. Congratulations – your environment is ready for reproducible research! From here on, your analysis, whether a simple regression or a complex ML model, can be done in a way that others (and you in the future) can retrace every step.

1.2 Using Git and GitHub within Positron

One of the pillars of reproducible research is version control – tracking changes in your project’s files over time. The Git system is the de facto standard for this, and services like GitHub (or GitLab, Bitbucket, etc.) are popular platforms for hosting repositories and collaborating. Positron, like VS Code, has built-in integration with Git, making it convenient to manage version control without leaving the IDE. In this section, we’ll cover using Git inside Positron: creating a repository for your project, making commits, branching, and syncing with a remote repository on GitHub. By keeping your analysis under version control, you ensure that every change to your code or data is recorded and can be rolled back or scrutinized. This greatly enhances transparency (for example, a collaborator or reviewer can see exactly what changed from one version of the analysis to the next) and facilitates teamwork.

Initial Git setup: First, make sure Git is installed on your system. On Windows, if you didn’t have it already, installing Positron might not include Git by default, so you should install Git for Windows from git-scm.com (this will also provide a Bash shell which is handy, and Positron’s terminal can use it). On macOS, Git is usually available via the Xcode Command Line Tools. On Linux, install the git package via your package manager if needed. Once Git is available, you should perform a one-time configuration of your user identity. Open a terminal and run:

git config --global user.name "Your Name"
git config --global user.email "your.email@domain.com"

This sets your name and email, which will be associated with your commits (and show up on GitHub). It’s important for attribution and collaboration. If you skip this, Git will nag you when making the first commit.

Now, in Positron, the left sidebar has a Source Control view (an icon that looks like a branch or Y-shape). If you open a folder that is already a Git repository, Positron will automatically detect it and show the repository status here. If your project folder is not yet under Git, the Source Control panel will say “No source control providers registered.”

Creating and Initializing a Repository

Let’s start a new Git repository for a project:

Starting a new repo from scratch: Suppose you have a project folder (e.g., MLIB-chapter1/) with some files (maybe an R script or a Quarto doc you just created). Open that folder in Positron (File > Open Folder…). Then click on the Source Control icon. You should see an option to Initialize Repository (sometimes a blue button or a small initialization icon). Click it. Under the hood, this runs git init in the folder, creating a hidden .git directory that tracks version history. After initialization, Positron’s Source Control view will list your files as “untracked” (with a U icon for each). Now you have a local Git repository – but nothing is committed yet.
Cloning an existing repo: If instead you want to start by cloning an existing repository (for example, a template or an assignment repository provided by an instructor on GitHub), you can do so directly from Positron. On the Welcome page or Source Control view, select Clone Repository. Enter the repository URL (for instance, the HTTPS URL from GitHub like https://github.com/YourUsername/YourRepo.git). Positron will ask where to save it on your computer. Choose a location, and it will clone the repo (download all the files and commit history). Once done, it likely will automatically open the folder for you. You’ll now see the files and Git history in Positron.

Whether you inited a new repo or cloned an existing one, the next steps are to start recording changes with commits.

Basic Git Workflow: Staging, Committing, and Pushing

Git’s workflow can be summarized as: edit files → stage changes → commit → push (to remote). Here’s how to do those in Positron:

Stage changes: After editing or adding files, the Source Control panel will show them under CHANGES. A new file appears with a U (untracked), a modified file with M, etc. You need to stage the changes you want to include in the next commit. In Positron, hovering over a file in the changes list will reveal a + icon. Click it to stage that file. The file moves to a STAGED CHANGES section. Alternatively, you can right-click the file and choose Stage Changes. If you have many files and want to include all, there’s an option to Stage All Changes (the “…” menu in Source Control or a button). Staging is like selecting which changes will go into the snapshot.
Commit: Once staging is done, you commit the changes to the repository’s history. In the Source Control view, there’s a text box at the top for a commit message. Type a concise message that describes what you did, e.g., “Initialize repository with README and setup script” or “Added data cleaning step for country dataset”. Good commit messages help others (and you) understand the purpose of changes. Now either press Ctrl+Enter (Cmd+Enter on Mac) or click the ✔️ Commit button (usually appears after you start typing a message). After committing, the staged changes clear (they’re now recorded in history), and the file statuses might reset to unmodified (if they were new, they’ll disappear from the list; if modified, they’ll no longer be in the changes list because working directory matches the commit).
Connect to GitHub (remote): If you created a new repo locally, you might want to publish it to GitHub for backup and collaboration. On GitHub, create a new repository through their web interface (no need to initialize it with anything on GitHub side, since you have local files). Then in Positron, you have two approaches:
1. Using the terminal: Open a terminal and run:
```
git remote add origin https://github.com/YourUsername/YourRepo.git  
git branch -M main  
git push -u origin main  
```
  This adds a remote named “origin” pointing to your GitHub repo, renames your local branch to main (if it wasn’t already), and pushes the commits. The -u flag sets the upstream so that future git push knows where to go.
2. Using Positron UI: Often, after your first commit, Positron will show a Publish Branch button or prompt in the Source Control view. If you click that, it will guide you to log in to GitHub (Positron might open a browser for GitHub authentication or use a personal access token) and then create the remote repo or link to an existing one. It streamlines the above terminal steps.
Once linked, the bottom left of Positron’s status bar typically shows the branch name (e.g., “main”) and an icon for synchronization. Also, your Source Control view will indicate if you are ahead or behind the remote after commits.
Push commits: Pushing sends your local commits to the remote repository on GitHub. If you followed the steps above, pushing the main branch is as simple as clicking the sync icon (🔄) or using the “… menu” and selecting Push. After a successful push, your changes are now visible on GitHub. If you go to the GitHub page for the repo, you’ll see your files and commit message. It’s good practice to push relatively often, especially after major changes or at end of day, so that your work is backed up and teammates can see it.
Pull updates: If you’re working with others (or even pulling from another machine you use), you need to incorporate changes made elsewhere. GitHub will have commits from others once they push. Positron indicates if your local branch is behind (e.g., it might show an down-arrow with a number). Click the down-arrow icon or do Pull from the menu to fetch and merge those changes into your local copy. After pulling, your local files update to include the new commits. Always consider pulling before you start a day’s work to ensure you have the latest version and to minimize potential conflicts.

(Under the hood, the UI actions correspond to git commands: staging = git add, commit = git commit, push = git push, pull = git pull. You can always open a Positron integrated terminal and run these manually if you prefer or need advanced options, and Positron will still pick up the changes.)

With these basics, you can track the evolution of your project. For example, if you make a mistake or want to see what changed between two versions of your analysis script, you can use Git’s history and diff tools. Positron’s Source Control panel allows you to click a file and see the diff (differences) between the working copy and the last commit, or between commits. This is extremely useful for debugging: if a result changed, you can pinpoint exactly which code change caused it by looking at diffs. It’s also invaluable for machine learning experiments – you might commit each significant experiment or refactoring with a message, making it easier to recall what you tried (e.g., “Commit: switched to random forest model with 100 trees” and later “Commit: added normalization to features”). If a colleague says, “the previous version of the model was performing better,” you can retrieve that version from Git and compare.

Branching and Managing Changes

A branch in Git is like an independent line of development. The default branch when you initialize is typically called main (or master in older repos). Branching is a powerful way to isolate changes until they’re ready to be merged. In a collaborative setting, each person might work on their own branch for a new feature or analysis section. Even when working alone, you might use branches to try out a new approach (say, a different model or a major code refactor) without disturbing the stable main analysis.

In Positron, creating and switching branches is easy:

Creating a new branch: Look at the bottom status bar; you’ll see the current branch name (e.g., “main”). Click it, and a branch menu appears. Choose “Create new branch”, then type a name, like “experiment-new-features” or “refactor-data-import”. Use a descriptive name for clarity. Press Enter, and Positron will create the branch and switch you to it. This is equivalent to git checkout -b new-branch-name. Now any commits you make will be on this new branch, separate from main. The Source Control view will still track changes the same way, but be mindful of which branch you’re on (the status bar always shows it). It’s often wise to commit or stash any uncommitted changes before switching branches, to avoid confusion.
Working on a branch: On your new branch, go ahead and make changes. For instance, if you’re trying a different machine learning model, you might edit the R script to implement the new model. You can commit those changes on this branch as you go. The main branch remains untouched by these commits. This means if your experiment doesn’t pan out, you can simply switch back to main and abandon the branch, without cluttering main’s history. Conversely, if the experiment is successful, you’ll want to integrate it back.
Switching branches: To switch back to an existing branch (like main), click the branch name in the status bar and select the branch you want (e.g., “main”). If you have uncommitted changes on the current branch that conflict with the target branch, Git will warn you. It’s best to have a clean working state (commit or stash changes) before switching. When you switch, the files in your project will update to reflect the state of that branch. Positron will essentially load the project as it was at the last commit of that branch. This might mean files appear to “undo” changes if those changes were only on another branch – don’t panic, it’s just Git showing you the other branch’s version. Everything on your experiment branch is still there; you just need to switch back to see it.
Merging branches: When you’re ready to bring the changes from a branch into another (typically into main), you perform a merge. One way: switch to the target branch (e.g., switch to main), then in the Source Control “…” menu choose Merge Branch… and select your feature branch. Alternatively, use the terminal: git checkout main then git merge experiment-new-features. If there are no conflicts, the changes will merge and you can commit the merge (if not done automatically). If there are merge conflicts (meaning both branches edited the same part of a file differently), Positron will highlight these conflicts in the files, showing markers like <<<<<< HEAD and >>>>>> branch-name. You will need to manually edit those sections to decide what the merged content should be, then mark the file as resolved (stage it) and commit. Positron’s editor has shortcuts to accept one side or the other or combine changes. For example, if you and a colleague both edited the same paragraph in the Quarto document, you’ll see both versions and choose what the final text should be. Merging can be complex, but if you frequently commit and pull (so branches don’t diverge too much) it’s usually manageable. In our context, perhaps you created a branch to add a new figure to the report; meanwhile, the main branch had minor text edits. When merging, you’ll reconcile any overlapping changes.
Deleting branches: After a successful merge, you can delete the branch (since its work is now integrated). This can be done with git branch -d branchname in terminal or possibly via the UI branch menu. Deleting a branch in Git only deletes the pointer, not the commits (they remain in history). It’s mainly housekeeping to avoid dozens of stale branch names. On GitHub, if you merged via a Pull Request (discussed next), it often gives an option to delete the branch.

Using branches helps maintain a clean main line of development, which ideally always contains a working version of your project. For example, you might stipulate that the main branch reflects the version corresponding to the paper’s current draft, while experimental branches are used for trying alternative analyses or addressing reviewer comments. This way, you can work on new ideas without risk to the main document and only merge them when they’re finalized and checked.

Pull Requests and Collaborative Workflows

When using a hosted Git service like GitHub for collaboration, a common practice is to use Pull Requests (PRs). A PR is a feature on GitHub (and similar platforms) that lets you formally propose changes from one branch to another and have a discussion around those changes. It’s essentially a request: “Please pull my changes into the main branch.” PRs are extremely useful for code review and team communication. Even in academic collaborations, treating each major addition or revision as a PR can improve the quality of the project by involving multiple eyes on the changes.

Positron can integrate with GitHub’s PR system via extensions (e.g., the “GitHub Pull Requests and Issues” extension, which might come pre-installed or can be added via Open VSX). If you sign in to GitHub through Positron, you can manage PRs inside the IDE. However, you can also use the GitHub web interface for PRs, which is quite user-friendly.

Here’s a typical PR workflow:

Creating a PR: Let’s say you created a branch feature-new-model and committed a series of changes implementing a new machine learning model in your analysis. You push this branch to GitHub (git push -u origin feature-new-model). On GitHub, you’ll see a prompt “Compare & pull request” for that branch. Click it to open a PR. You’ll be asked to provide a title and description. For example: “Add random forest model for comparison” and describe the changes, results, etc. Assign reviewers if applicable. Alternatively, within Positron, if the GitHub PR extension is active, you might see a prompt or a button to create a PR from the branch. The extension will let you fill in the details and submit without leaving the IDE. In either case, a PR is now open on GitHub which anyone with access can review.
Reviewing and discussing: Team members (or your future self) can look at the PR on GitHub, which will show all the commits and a diff of all changes. They can comment on specific lines (e.g., “Is this parameter tuned properly?” or “Great use of ggplot here.”) or overall. You can respond to feedback by making further commits to the same branch – those new commits will automatically be added to the PR. This fosters an iterative improvement: maybe a collaborator spots a mistake in a data transformation; they comment, you fix it on your branch and push, and the PR updates. All this conversation is recorded, contributing to the documentation of the project’s development.
Merging the PR: Once everyone is satisfied (or if you’re working solo and you consider the feature complete), the PR can be merged. Usually, you or a repo maintainer will click “Merge pull request” on GitHub. There are a few merge options (create a merge commit, squash commits, etc.), but the default is to merge with a commit that brings all changes into main and closes the PR. After merging, don’t forget to pull the changes into your local main branch if you weren’t on it. Positron will then update your local files to match. If you opened the PR from within Positron’s extension, it may also offer a merge button and will handle the sync for you.
Cleaning up: After merge, you can delete the branch (GitHub usually offers a button to do so on the closed PR page). This keeps the repository tidy. If using Positron, switch back to main and delete the branch locally (Positron’s branch menu or git branch -d). This ensures you don’t accidentally keep working on an old branch that’s already merged.
Issues and project management: GitHub also provides Issues for tracking tasks/bugs and Projects/Boards for organizing work. If you use those, Positron’s GitHub integration can show issues (and even allow creating or commenting on them). For instance, you might have an issue “Improve model accuracy for region X” or “Update data source for 2025 figures.” You or collaborators can reference issues in commits or PRs (using #issue-number in commit messages or PR description), which creates links and traceability. While not strictly necessary for reproducibility, using issues helps ensure that planned analyses or revisions don’t get forgotten. It complements reproducibility by adding context – an issue discussion might record why a certain analysis decision was made (e.g., “we decided to use random forest after discussing issue #12 about non-linearity in data”).

In sum, using Git and GitHub within Positron allows you to treat your research project like a true software project – which it essentially is when you’re writing code for analysis. This practice is increasingly encouraged by top journals. Some journals in economics and management now require authors to submit code and data, and having a GitHub repository ready makes this straightforward. It also means anyone who wants to verify or build upon your work can easily access the exact code that produced each result, thus enabling reproducibility. For teaching or team projects, it means no more emailing code files around (which inevitably leads to confusion like “Final_final2.R”). Instead, there’s one central repository where the latest version lives, and a history to consult if needed.

Before moving on, a quick tip: remember to version control not just your code, but also important metadata like the Quarto document and bibliography. You generally should not put large raw data files in Git if they’re above a few dozen MB – Git isn’t great with big binaries – but you can keep small data files under version control. For larger data, you might instead provide a script to download them or use Git LFS. For example, if your analysis uses a large panel dataset, you could store a subset or example in the repo, and instructions for obtaining the full data (perhaps due to license restrictions). Everything needed to regenerate results (except the large data) should be in the repo. This way, the environment is as reproducible as possible.

1.3 Managing R Packages and Environments in Positron

Managing packages in R is a critical part of making your research reproducible. Different package versions can yield slightly different results (or even break code). Positron does not introduce a new way of handling R packages – it relies on the standard R library mechanism – but it’s worth discussing best practices for package management in a reproducible workflow. In this section, we’ll cover how to install and update packages within Positron, and how to use tools like {renv} to create project-specific libraries. We will also touch on handling Python packages if your analysis uses Python (since Positron supports both, you might have a scenario where R calls Python, e.g., via reticulate, or you have separate Python scripts for certain tasks).

Installing and Updating R Packages

As mentioned earlier, you can install packages by running install.packages("packagename") in the R console within Positron. For example, to install the car package (Companion to Applied Regression) you’d run install.packages("car") and watch the console as it downloads and installs. This is identical to how you’d do it in RStudio or base R. There isn’t a GUI menu for packages in Positron as of 2025 (RStudio’s Packages pane doesn’t have an equivalent here yet), but using code to install is actually more reproducible (since you can include those commands in a setup script or Quarto document).

Installing multiple packages: You can install several at once by providing a vector, e.g., install.packages(c("dplyr", "ggplot2", "xgboost")). If you attempt to use a package that isn’t installed, R will throw an error. A quick way to install missing packages in a script is to use a snippet like:

packages <- c("dplyr", "ggplot2", "xgboost")
install.packages(setdiff(packages, rownames(installed.packages())))

This will install only those not already installed. This could be included in your Quarto doc’s setup chunk (with caution – in a rendered document you might not want to auto-install, but it’s useful for scripts).

Updating packages: Over time, you may want to update to get bug fixes or new features. You can do this with update.packages() which will prompt you to select CRAN mirrors and show packages that have newer versions. Alternatively, tools like {checkmate} or RStudio’s package manager (if you have that separately) can help. In Positron, since there’s no dedicated update UI, you’d run the command or update via command line. Be mindful that updating packages in the middle of a project can change results (especially for ML packages if algorithms change or default hyperparameters differ). A reproducible approach is to freeze package versions once you start writing up results. This is where {renv} comes in, which we’ll discuss shortly.

Removing packages: If needed, remove.packages("packagename") will uninstall a package. This is rarely necessary unless you want to downgrade a package by removing the newer version and installing an older one, or you’re freeing space. Removing doesn’t erase the package from any lockfiles or records you keep, so you’d also update those records.

Using different CRAN-like repositories: In some cases, you might use MRAN (Microsoft’s CRAN snapshot service) or Posit’s Package Manager to get deterministic package sets. For instance, using {checkpoint} or {groundhog} packages allows you to install packages as of a particular date, which improves reproducibility across time. However, as of 2025, MRAN (which let you pin to daily snapshots) has been superseded by Posit’s Public Package Manager with snapshot capabilities. If absolute reproducibility is needed, you can document that all packages were from CRAN as of a certain date or use install.packages(..., repos="https://packagemanager.posit.co/cran/2023-12-01") to get versions as of that date. This is somewhat advanced; an easier path is using renv which essentially automates capturing exact versions.

Python packages (if using Python in Positron): Since Positron supports Python, you might be using libraries like pandas, numpy, scikit-learn, etc., for certain tasks. If you run a Python chunk or open a Python terminal, you’ll need those packages in your Python environment. Positron itself doesn’t manage Python environments, so you should manage them via pip or conda as usual. One nice thing is that if you use R’s reticulate package to call Python from R (common in RMarkdown/Quarto scenarios to leverage a Python library in an R project), reticulate can be configured to use a virtualenv or conda env, and you can include a requirements.txt or environment.yml to specify Python deps. But going deep into Python is outside our scope here; just note that reproducibility concerns apply to Python libraries as well – you’d want to record which version of, say, TensorFlow you used, if your IB project involves training a neural network in Python.

Project-Specific Libraries with renv

For truly reproducible research, it’s not enough to know which packages you used; you need to know which versions. This ensures that if someone tries to rerun your analysis in the future, they can get the same results even if newer versions of packages have been released since. R’s solution for this is the {renv} package (short for reproducible environment). It allows each project to have its own library of packages and a lockfile that records exact versions (and sources) of each package used.

Using renv in Positron is straightforward, as it doesn’t depend on the IDE. You would do the following (usually at project start or once you have the major packages installed):

Install renv (if not already): install.packages("renv").
Initialize renv for the project: call renv::init() in the R console while your project folder is open. This will:
- Create a new folder renv/library in the project, where project-specific package versions will live.
- Write an renv.lock file listing all the current packages (and their versions) you have in your default library that are being used in the project (it tries to detect package usage, or at least captures what’s currently installed).
- Switch the R session to use the project library (so .libPaths() is changed; you’ll see messages about this).
- It may also offer to snapshot the current state.
After init, moving forward, any install.packages() you do will install into renv/library/project-name/ rather than your user library. This isolates the project. If you have multiple projects requiring different package versions, renv keeps them separate so they don’t conflict.
As you add new packages or update them, run renv::snapshot() to update the lockfile. The lockfile is a plain JSON that lists each package, version, and source (CRAN or GitHub, etc.). This file should be committed to Git! It’s the record of your environment.
When someone else (or you on another machine) gets the project, they can run renv::restore() to install the exact versions from the lockfile. Renv will fetch packages from CRAN archives or cache or wherever necessary to match versions. This way, the analysis should run the same. Keep in mind, some packages (especially in ML) have non-deterministic algorithms or rely on system-specific libraries (e.g., XGBoost using different CPU instruction sets), but version consistency is a big step toward reproducibility.

In Positron, renv works as usual. One minor thing: when renv activates (which it does automatically when you open an renv-enabled project), you’ll see the R console showing the path to the renv library. Positron’s Environment pane will only list packages in that library for autocompletion, etc. If you try to use a package that’s not installed in the renv library, you need to install it (even if you had it globally, because renv library is separate).

Using renv does add a bit of overhead (each project may duplicate packages unless you use renv’s global cache feature which links to a central store to avoid actual duplication of files). But the benefits for a long-term project are huge: you can return to it a year later and restore the environment exactly. It’s highly recommended for collaborative projects or any research that might be audited or extended by others.

If you choose not to use renv, at the very least you should document your session info. For example, at the end of your Quarto document or in an Appendix, include the output of sessionInfo() or the more refined sessioninfo::session_info(). This prints R version, OS, and each package with version. That way, someone could manually assemble an environment that matches (it’s less convenient, but still possible). In fact, many journals now encourage authors to provide session info in appendices for transparency.

For Python, a similar philosophy applies. If you used Python, provide a requirements.txt or environment.yml. If your analysis is mostly R but with a bit of Python via reticulate, renv can snapshot the Python dependencies too (it can integrate with Python requirements). This is advanced, but worth mentioning.

Other Package Management Tips

Positron currently doesn’t have a GUI for managing packages (like installing from CRAN with a click). This is by design minimalistic, and arguably encourages scripting these actions. However, some conveniences: you can use the Command Palette in Positron and search for “R: Install Package” – there is a command that will prompt you for a package name and run install.packages for you. It’s essentially a shortcut. Similarly, there’s “R: Update Packages” which calls update.packages. These come from the R extension capabilities (provided by Ark in Positron). Feel free to use them, but remember to also note those actions in code or documentation if they affect your analysis.

One pitfall to avoid: Do not manually change objects in the Environment pane expecting that to carry over. In RStudio one could, say, tweak a dataframe by editing it in the spreadsheet view. That’s not directly possible in Positron (there’s a data viewer, but it’s read-only). This is actually good: it forces you to make changes via code, which is then recorded. Always perform data cleaning or transformations with code, not by manual steps. This ensures that someone else can run the raw data through the same code and get the same cleaned data.

Finally, if at some point you need to break out of Positron for package management (perhaps installing an older package from source, or debugging a library loading issue), you can always run R in a terminal or use RStudio just for that step. The resulting library will still be usable by Positron. But these cases should be rare. The typical flow is: you write code, get an error like “package XYZ not found”, you install the package in the console, maybe add that install line to a setup script (or use renv to lock it), and proceed.

By diligently managing your packages and documenting their versions, you eliminate a huge class of “it works on my machine” problems. For example, if a future reader of your work tries to rerun your code and it fails, the first thing they might check is package versions. If you have an renv lockfile or session info, they can compare and identify any discrepancies. This is especially relevant for ML results, where changes in a modeling package (like an update to how random forest calculates variable importance) could lead to slightly different outputs. Locking the version avoids any ambiguity about what was used.

1.4 Creating and Rendering Quarto Documents in Positron

Now we bring it all together: writing a literate programming document that combines prose, code, and results. Quarto is the tool for this job. A Quarto document (.qmd file) is a markdown file with embedded code chunks that can produce output (tables, figures, etc.) which gets included in the final rendered document. In Positron, Quarto is first-class: it has integrated preview and execution. Let’s go through how to create a Quarto document in Positron, structure it, and produce outputs such as HTML, PDF, or Word. Along the way we’ll demonstrate with a simple example analysis to illustrate key features (like cross-references, citations, and embedding plots).

Creating a New Quarto Document

To create a Quarto document in Positron, you can either use the menu or command palette:

Via menu: File > New File > Quarto Document (if available). Positron might prompt you to choose a document type (e.g., Article, HTML vs PDF, etc.). If you choose one, it will pre-populate the file with a sample title, author, and a bit of content.
Via command palette: Press Ctrl+Shift+P and start typing “Quarto New”. Select Quarto: New File. This does similar steps, possibly offering template options.

However you do it, save the new file with a .qmd extension, e.g., analysis.qmd. At the top of a Quarto file is typically a YAML header delineated by --- lines. For example:

---
title: "Education and Saving Rates Across Countries"
author: "Your Name"
date: "2025-07-08"
format:
  html: default
  pdf: default
editor: visual
bibliography: refs.bib
csl: apa.csl
---

Let’s break this down:

title/author/date: self-explanatory, they’ll appear in the document. Use date or omit it; format is flexible.
format: here we listed two formats (HTML and PDF) under the format key. This means when we render, Quarto will be prepared to output both. If you just want one format by default, you could specify one (or Quarto defaults to HTML if not specified). We can always render to others via command line or options.
editor: visual is an option that if you open the file in RStudio or certain contexts it would default to the WYSIWYG editor. In Positron, you can ignore that or set it; it doesn’t affect source mode.
bibliography: points to a BibTeX file (refs.bib) that contains reference data for citations.
csl: specifies a Citation Style Language file to format citations. Here we put apa.csl which would format references in APA 7th edition style (assuming we have the APA CSL file in our project directory).

After the YAML, you write the document in markdown interspersed with code chunks. Let’s outline a simple example in an international business context. We will examine whether countries with younger populations tend to have lower saving rates, illustrating a concept from development economics (the life-cycle saving hypothesis by Modigliani). We’ll use R’s built-in LifeCycleSavings dataset, which has data on 50 countries around 1970 including sr (savings rate) and pop15 (percentage of population under age 15). Our Quarto document will include some text, a figure, a table, and an inline equation or two, plus a citation to the original theory.

Here’s how we might structure the Quarto content (following the YAML):

````markdown # Introduction

Does demographic structure impact economic outcomes? According to the life-cycle savings hypothesis (Modigliani, 1986), countries with younger populations tend to have lower savings rates, as working-age adults support more dependents and thus save less (this is offset later when populations age). We examine this theory by analyzing data on national saving rates and age demographics.

Modigliani (1986) was a Nobel-winning economist who formalized the life-cycle hypothesis of saving – we cite his lecture here as an example. (modigliani1986?)

Data

We use the LifeCycleSavings dataset (built into R), which contains data on the aggregate personal saving rate and demographic indicators for 50 countries around 1970. The key variables are sr (savings rate as percentage of disposable income) and pop15 (percentage of population under 15 years old). For context, pop15 ranges from about 23% (in richer, older countries) up to 47% (in very young populations).

summary(LifeCycleSavings[c("sr", "pop15")])
#        sr            pop15     
#  Min.   : 0.60   Min.   :21.4  
#  1st Qu.: 6.97   1st Qu.:26.2  
#  Median :10.51   Median :32.6  
#  Mean   : 9.67   Mean   :35.1  
#  3rd Qu.:12.62   3rd Qu.:44.1  
#  Max.   :21.10   Max.   :47.6

Table 1. Summary statistics for savings rate (sr) and youth population percentage (pop15) across 50 countries.

Table 1.1: Descriptive stats for saving rate (sr) and youth population (pop15) for 50 countries.

	Savings Rate (%)	Pop under 15 (%)
	Min. : 0.60	Min. :21.4
	1st Qu.: 6.97	1st Qu.:26.2
	Median :10.51	Median :32.6
	Mean : 9.67	Mean :35.1
	3rd Qu.:12.62	3rd Qu.:44.1
	Max. :21.10	Max. :47.6

The summary above shows that the average saving rate is around 11% of income, and the average percentage of population under 15 is about 35%. We will investigate the relationship between these two variables next.

Analysis

First, let’s visualize the relationship. Figure Figure 1.1 is a scatterplot of youth population percentage vs. saving rate:

plot(LifeCycleSavings$pop15, LifeCycleSavings$sr,
     xlab="Population under 15 (%)", ylab="Savings Rate (%)",
     pch=19, col="steelblue")
abline(lm(sr ~ pop15, data=LifeCycleSavings), col="tomato", lwd=2)

Figure 1.1: Scatterplot of youth population (%) vs. saving rate (%) for 50 countries.

As we see in Figure Figure 1.1, there is an evident negative relationship: countries with a higher percentage of young people tend to have lower savings rates. The red line is a linear fit indicating this trend.

Next, we quantify the relationship with a simple linear regression:

model <- lm(sr ~ pop15, data = LifeCycleSavings)
summary(model)
# 
# Call:
# lm(formula = sr ~ pop15, data = LifeCycleSavings)
# 
# Residuals:
#    Min     1Q Median     3Q    Max 
# -8.637 -2.374  0.349  2.022 11.155 
# 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  17.4966     2.2797    7.67  6.8e-10 ***
# pop15        -0.2230     0.0629   -3.55  0.00089 ***
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 
# Residual standard error: 4.03 on 48 degrees of freedom
# Multiple R-squared:  0.208,   Adjusted R-squared:  0.191 
# F-statistic: 12.6 on 1 and 48 DF,  p-value: 0.000887

The model estimates that for every 1 percentage point increase in the youth population, the national saving rate decreases by about -0.22 percentage points on average. This coefficient is statistically significant (p ≈ < 0.001), supporting the hypothesis that younger populations are associated with lower saving.

To ensure reproducibility, we set a seed (not actually needed for this simple regression, but good practice for any analysis involving randomness):

set.seed(42)

(Setting a seed is crucial in machine learning workflows whenever there’s randomness, such as train-test splits or model initialization, to allow exact replication of results.)

Discussion

Our findings align with the life-cycle hypothesis: countries with very youthful populations (e.g., many developing countries in 1970 with high fertility rates) show lower savings, whereas countries with older populations save more, presumably because a larger share of individuals are in prime saving years. This simple analysis is, of course, bivariate. In reality, many factors influence savings (income levels, growth expectations, financial institutions, etc.), and age structure is just one piece. A more complete model might include those factors. However, for this demonstration, the focus is on workflow: we have created a fully reproducible analysis document.

Anyone reading this document can see the raw data summary (Table 1), the plotted relationship (Figure 1), and the model output, and they can trace each result to the exact code that produced it. Moreover, using tools like Git and Quarto ensures that if we extend this analysis – for instance, adding data for more years or applying a machine learning model to predict savings rates – all changes will be documented. This transparency builds trust in the results. As Aguinis et al. (2017) note, improving reproducibility in IB research requires addressing practices that hide trial-and-error; here, we’ve shown every step explicitly:contentReferenceoaicite:20.

Finally, the entire environment (R version, package versions) is documented, so this analysis can be reproduced even years later. For example, we used R 4.5.1 and the packages listed in the References (with versions). Ensuring such stability is especially important when methods or software might change – for instance, if a newer version of a machine learning algorithm tweaked its default settings, having our code and package versions pinned means we (or others) can get the original results for comparison.

1.5 Conclusion

In this chapter, we demonstrated how to set up and use Positron, Quarto, and Git for a reproducible research workflow, with an example drawn from international business and development economics. We installed the necessary tools, configured Positron for R (and Python) work, and walked through creating a literate programming document that integrates analysis and narrative. By using Positron, we benefit from an IDE that supports multi-language data science, allowing us to, for example, mix R and Python seamlessly and leverage modern editing features (like IntelliSense) for efficient coding:contentReferenceoaicite:21. Quarto enabled us to present our analysis with clarity – combining text, tables, figures, and citations in a single document that can be rendered to publication-quality output (HTML for quick sharing, PDF for formal reports, or Word if needed for collaboration with colleagues who prefer track changes). Git version control provided a safety net and collaboration framework, ensuring that every change is tracked and that we can use branches/PRs to manage contributions and experiments.

For machine learning in international business, these tools are not just conveniences but essential safeguards. ML projects often involve playing with different data transformations, model architectures, or tuning parameters. It’s easy to lose track of what was done unless one practices strict reproducibility. By scripting everything (no manual clicks) and using version control, we make certain that we can always retrace our steps. If someone asks, “How did you preprocess the data for the neural network model?” we can point to the exact code in the Quarto document or a committed script. If a collaborator is skeptical about a result, we can offer the repository for them to run themselves, or we can roll back to a previous state to double-check if a recent change introduced an issue. These capabilities mitigate the concern raised in the IB reproducibility editorial about undisclosed “several trial-and-error steps”:contentReferenceoaicite:22 – here, all steps (even the exploratory ones on branches) are potentially disclosed in the code history.

Another advantage of this integrated workflow is in producing outputs for dissemination. With Quarto, one can easily produce an HTML report that could be published online (for instance, on GitHub Pages or Posit’s Connect server), allowing interactive exploration (Quarto can embed interactive plots or even shiny apps). Or we can produce PDF/Word outputs that comply with academic journal formats. We used an APA citation style to format references, which is common in social sciences (including many IB journals). The citation integration (via Zotero or other reference managers) ensures that all our citations are consistent and complete. Instead of manually formatting bibliography entries, we let Quarto handle it, reducing the chance of errors. And with tools like the Zotero extension, adding a new reference to support a statement is as easy as a keyboard shortcut and search:contentReferenceoaicite:23, which encourages proper citation practices (thus our document cites both the original theory and a recent editorial on reproducibility).

From a practical standpoint, Positron being based on VS Code means it’s highly extensible. As you become more comfortable, you can customize the editor, install additional extensions (for example, spell check for prose, or linters for code to maintain style). It also means Positron will continue to benefit from improvements to VS Code’s core and Posit’s enhancements for data science. The fact that Positron supports remote development and multiple languages makes it a future-proof choice for projects that might grow in scope. For instance, maybe your IB analysis starts in R but later you incorporate a deep learning model in Python – Positron can handle that in the same project, and Quarto can still be the glue that holds the narrative together with results from both R and Python.

The combination of literate programming, version control, and modern IDEs addresses many of the reproducibility challenges in computational research. In the words of research methodology, it makes our work transparent, traceable, and trustworthy. By adopting these practices early (as a student or researcher), one develops habits that not only make replication by others easier but also make one’s own work more efficient and less error-prone. In the context of machine learning, where experiments can be complex and data large, having a disciplined workflow can save you from costly mistakes (like losing a model because you forgot how you derived it) and allow you to concentrate on the insights rather than worrying about logistics. As we move into subsequent chapters on regression, classification, unstructured data, and geospatial analysis, these same principles will apply. Each chapter will build on this foundation, demonstrating how to implement specific methods (from linear models to neural networks) in a reproducible manner. By the end of this textbook, you should not only have learned various machine learning techniques for international business problems, but also how to document and share your analyses so that others can trust and verify them – contributing to a more reproducible and robust scientific enterprise.

References

Aguinis, H., Cascio, W. F., & Ramani, R. S. (2017). Science’s reproducibility and replicability crisis: International business is not immune. Journal of International Business Studies, 48(6), 653–663:contentReferenceoaicite:24.
Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature News, 533(7604), 452–454:contentReferenceoaicite:25.
Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d’Alché-Buc, F., Fox, E., & Larochelle, H. (2021). Improving reproducibility in machine learning research (A report from the NeurIPS 2019 reproducibility program). Journal of Machine Learning Research, 22(211), 1–20:contentReferenceoaicite:26.
Jumping Rivers (2024). Positron vs RStudio – Is it time to switch? (Blog post by T. Roe, Dec 5, 2024). – Discusses Positron as a “polyglot” IDE and differences from RStudio:contentReferenceoaicite:27:contentReferenceoaicite:28.
Vuorre, M. (2025). How to Add Citations from Zotero to Quarto Documents. (Blog post, June 6, 2025). – Describes the VS Code Zotero extension for inserting citations:contentReferenceoaicite:29.
Posit (2025). Positron Documentation – Remote SSH. – Official docs on using Positron’s remote development feature:contentReferenceoaicite:30.
Quarto Project (2025). Using Quarto with Positron. – Official Quarto documentation on Positron integration:contentReferenceoaicite:31.
Appsilon (2024). Introducing Positron: A New, Yet Familiar IDE for R and Python. (Blog post by D. Radečić, July 4, 2024). – Overview of Positron’s features and multi-language support:contentReferenceoaicite:32:contentReferenceoaicite:33.
Marwick, B. (2017). Computational reproducibility in archaeological research: Basic principles and a case study of their implementation. Journal of Archaeological Method and Theory, 24(2), 424–450. (An example of reproducible research in practice, advocating for sharing code and data).
Modigliani, F. (1986). Life cycle, individual thrift, and the wealth of nations. American Economic Review, 76(3), 297–313. (Classic exposition of the life-cycle hypothesis of saving, cited in our example analysis).

# Using Positron for Reproducible Research in R Reproducible research is a cornerstone of scientific integrity – it means that other researchers (or you in the future) can **duplicate the results of a study using the same data, code, and methods**. In recent years, a broad *“reproducibility crisis”* has been recognized across sciences: a 2016 survey in *Nature* found that over 70% of researchers had failed to reproduce someone else’s results, and even about half failed to reproduce their own prior results. Social sciences and management fields, including **international business (IB)**, are not immune to this crisis. In fact, top IB scholars have noted that it’s not surprising many findings don’t hold up because common methodological practices allow researchers to “search for a maximally predictive model” through trial-and-error and not report those exploratory steps. This kind of **undisclosed data mining** or “capitalization on chance” can lead to results that only appear significant by luck and cannot be reproduced in new data (Aguinis, Cascio, & Ramani, 2017). The stakes are high: if business and policy decisions are informed by unreproducible analyses, trust in empirical evidence erodes. Therefore, ensuring that results can be independently reproduced is critical for credible, cumulative knowledge in both academia and industry. Achieving reproducibility requires both the right tools and the right practices to support transparent, documented workflows. Two key components are **literate programming** and **version control**. Literate programming means weaving narrative and analysis together, so that a research report isn’t just tables and figures but also the code that generated them embedded alongside explanatory text. Tools like R Markdown (now succeeded by Quarto) enable this by allowing you to mix prose and code in one document. Version control, typically through Git and platforms like GitHub, provides a complete history of changes to your code and data, enhancing transparency and collaboration. Using these tools together, you can ensure that every figure or model result in your report is backed by code that is tracked, reviewable, and runnable by others. Until recently, RStudio was the go-to integrated development environment (IDE) for R users, offering an all-in-one interface for script editing, interactive R console, plots, and environment viewing. Meanwhile, Visual Studio Code (VS Code) rose in popularity as a **lightweight, highly customizable code editor** with an extensive extension ecosystem, though setting it up for R (and ensuring features like code completion and plotting work smoothly) required some configuration. Now, Posit – the company behind RStudio – has introduced **Positron**, a new IDE that combines the best of both worlds. Positron is built on the same open-source core as VS Code (the Code OSS engine) but is tailored for data science, with **native support for R and Python out-of-the-box**. Positron aims to be a “polyglot” IDE for data science – you can use it primarily with R (as we do in this textbook), but it also seamlessly supports Python and other languages, reflecting the reality that modern machine learning often blends ecosystems. The interface will feel familiar: you still get the classic four-panel layout for code, console, plots, and environment (like RStudio) alongside VS Code’s flexible sidebar for file navigation, Git integration, extensions, etc. In fact, when you first launch Positron, it looks like a combination of RStudio and VS Code, with a sidebar on the left and a multi-panel workspace on the right. Crucially for reproducible research, Positron encourages the use of script files and Quarto notebooks for all analysis – meaning every step is recorded in code rather than being an ephemeral GUI action. This is vital for scientific ML workflows: for example, if you try several data preprocessing techniques or model hyperparameters, you should have a code record of those attempts. With Positron and Quarto, you’ll document these in a literate programming format, and with Git version control you’ll have a time-stamped history of changes. This makes it far less likely to fall prey to “secret sauce” issues where results can’t be reproduced because something was changed without record. Positron also inherits VS Code’s robustness and modularity. Under the hood it runs R and Python via a kernel system (named **Ark** for R) rather than tying the IDE directly to a single R process, which means you can even crash R without crashing the IDE and you can easily switch between different R versions or between R and Python in the same session. This hybrid approach – RStudio’s user-friendly features with VS Code’s flexibility – provides an ideal platform for modern data science research. You get integrated support for **Quarto** documents (for mixing text, code, results, and citations), and built-in Git integration for version control – all in one place, no heavy configuration required. In the context of **machine learning (ML) for international business**, these tools and practices are especially important. ML models often involve complex pipelines (data cleaning, feature engineering, model training, evaluation) that can be difficult to replicate exactly unless carefully documented. They are also prone to issues like randomness (e.g., random weight initialization or data splits) and hyperparameter tuning that can lead to variability in results. By using literate programming, you ensure that every parameter choice and step (from data preprocessing to model fitting) is logged in the code. By using version control, you avoid the common situation of “code spaghetti” or mysterious file versions – instead, you have a clear log of how the model evolved. And by integrating your analysis narrative with code (via Quarto in Positron), you can directly tie interpretations (for instance, discussing feature importance or cross-validation performance) to the exact code and data that produced them. In a field like international business, where analyses might inform cross-country strategy or policy, having this level of transparency bolsters the credibility of insights. It also facilitates collaboration: a colleague in another country can pull your GitHub repository and rerun the analysis on their machine, perhaps applying it to new data from a different region, to test the generalizability (which touches on *replicability* as distinct from reproducibility). This chapter provides a comprehensive guide to setting up and using Positron for R-based data science in the social sciences (with a focus on ML applications in IB). We will walk through initial installation and configuration, managing packages, using Git/GitHub for version control, creating Quarto documents (for HTML, PDF, or Word outputs) with proper APA-formatted citations (with help from Zotero for reference management), and implementing best practices for reproducible, transparent research. Throughout, we’ll highlight special considerations for machine learning workflows – such as handling random seeds for reproducibility, tracking model versions, and possibly integrating Python when needed – to ensure that even complex modeling exercises can be reproduced and audited. By the end of this chapter, you should have your environment ready and understand how each tool contributes to a reproducible research pipeline. ## Setting Up Positron for R and Quarto Development **Prerequisites:** Before beginning, ensure you have a working internet connection and administrative rights on your computer to install software. We will install the core components needed: R, Positron, and (if not already included) the Quarto CLI. Positron comes pre-configured with much of what you need for R, but we’ll verify and tweak some settings. This section provides a step-by-step guide to get your environment ready for R programming and Quarto authoring in Positron. ### Installing R and Positron **1. Install R:** Download and install the latest version of R from the Comprehensive R Archive Network (CRAN) for your operating system. On Windows or macOS, run the installer and follow the prompts. (On Windows, it’s recommended to enable the option to “Write R version to the registry” for easier discovery by other applications.) On Linux, you can use your distribution’s package manager (e.g., `apt` on Debian/Ubuntu) or download the binary from CRAN and install it. After installation, confirm that R is accessible: open a terminal (or Command Prompt on Windows) and run `R --version` to ensure R runs and reports its version. **2. Install Positron:** Positron is available from Posit (the company formerly known as RStudio). Download the Positron installer from the official Posit website (as of 2025, Posit provides installers for Windows, macOS, and Linux). Install Positron like you would any application. Note that Positron was in public beta in 2024 but has become stable and is actively developed – you may want to get the latest release. Launch Positron once installed – you’ll see an interface that looks like a blend of VS Code and RStudio. There’s a left sidebar (with icons for file explorer, search, source control, extensions, etc.), and the main area can be split into an editor and panels for console, terminals, plots, and the environment. Positron should automatically detect your R installation. You can verify this by opening a new R script in Positron and running a simple command: for example, in a new file type `R.version.string` and press **Ctrl+Enter** (or **Cmd+Enter** on Mac). This should send the command to the built-in R console and display the R version, confirming Positron found and launched R. (Positron’s default keybindings allow **Ctrl/Cmd + Enter** to run the current line or selected code, just like in RStudio.) If R doesn’t start or isn’t found, check Positron’s settings under **Interpreters**: you can specify the R binary path if needed, but in most cases Positron’s auto-discovery works out-of-the-box. **3. Verify R support and language tools:** One advantage of Positron is that it comes pre-configured with rich R support, including the R *kernel* and language server. In plain terms, Positron bundles an R engine backend called **Ark** (which acts similarly to Jupyter’s IRkernel) to handle code execution and IntelliSense (auto-completion and inline help). This means you likely do **not** need to separately install the `languageserver` R package or the VS Code R extension – Positron has these capabilities built in. Ark leverages the R language server under the hood, providing auto-completion, function signature hints, and documentation pop-ups as you code. To test this, try typing a function name in an R script and see if suggestions appear. For example, create a new R script and type `lm(` and then hit the **Tab** key – you should see a tooltip with the function’s arguments and perhaps a snippet of its documentation, indicating that language intelligence is working. Another test: type `mean(` and see if a help bubble shows the function usage and description. If these work, Positron’s R language support is active. In case you don’t see such features, it’s worth installing the `languageserver` package in R (`install.packages("languageserver")`) because on rare occasions the bundled server might not start until that package is available. However, in our experience, a fresh Positron install with R should have these features working immediately. The key benefit is that you get a **smart editor** that can auto-complete object names, functions, and even suggest correct argument names, which speeds up coding and reduces errors. **4. \[Optional\] Enhanced R console (radian):** In prior VS Code setups, many R users liked to use the `radian` console (a third-party R console with syntax highlighting and better history) for an improved interactive experience. Positron’s built-in R console already has some advantages: it’s tightly integrated with the editor (sending code from editor to console is seamless) and displays output in the IDE’s panels. There isn’t a straightforward way to swap out the console to `radian` within Positron, and frankly, most users won’t find it necessary. Positron’s console supports multiline editing, syntax coloring, and other niceties by default. So this step can be skipped for Positron. (If you *really* prefer a different console, you could still open an external terminal within Positron and run radian or plain R there, but you would lose some of the IDE integration). The takeaway is that Positron gives a good interactive R experience by default, suitable for prototyping code or inspecting data. If you also work with Python, Positron similarly provides a built-in Python console (using IPython/Jupyter technology), so you can execute Python code line-by-line just as you do with R. This is a boon for ML workflows where you might use Python for certain tasks (e.g., a scikit-learn model) alongside R – Positron can handle both in one IDE process, avoiding context-switching between tools. **5. Install additional R packages for development:** Ensure you have the R packages that support R Markdown/Quarto execution and any other packages you’ll need for data analysis or modeling. A good starting point is to install **rmarkdown** (which will bring in **knitr** and other dependencies) and **tidyverse** (a collection of useful data science packages). Open the R console in Positron (for example, click on the Console panel or use *Terminal \> New R Terminal*) and run: `install.packages(c("rmarkdown", "tidyverse"))`. This will download and install the packages from CRAN. Since Quarto uses **knitr** (via rmarkdown) to execute R code chunks, having those packages ensures smooth rendering of documents. If you plan to do machine learning in R, you might also install the **tidymodels** suite (`install.packages("tidymodels")`), which includes packages like **rsample**, **parsnip**, **recipes**, and so on for modeling. The **tidymodels** meta-package is very useful for organizing ML workflows in a consistent grammar, and we will use it in later chapters for classification and regression tasks. Additionally, if you anticipate using specialized packages (for example, **caret**, **randomForest**, **xgboost**, or others often used in ML), you should install them now or as needed. On Windows, if you encounter issues installing packages that require compilation, you might need Rtools installed (a toolkit for building R packages). Many common packages have pre-compiled binaries on CRAN, so you may not hit that roadblock. It’s a good idea to update your packages regularly with `update.packages()`, but during a project you might hold off on updates to maintain consistency (we’ll discuss project-specific package management later). ### Quarto Installation and Integration in Positron **6. Install Quarto CLI (if needed):** Quarto is the open-source publishing system we’ll use for reproducible documents (it’s the successor to R Markdown, supporting multiple languages and output formats). Positron **bundles the Quarto CLI** out-of-the-box, so in most cases you do not need a separate installation. To verify, open a terminal in Positron (*Terminal \> New Terminal*, which gives you a system shell, not the R console) and run `quarto --version`. You should see Quarto’s version printed (e.g., *Quarto 1.3.340* or a later version). If that command reports an error or you get “command not found,” then something is amiss – possibly your Positron installation might be outdated or Quarto not in PATH. In that case, you can manually install Quarto by downloading it from [quarto.org](https://quarto.org) (they have installers for all platforms). But typically, installing Positron also sets up Quarto. Quarto is a separate tool under the hood (similar to how RStudio requires a separate Pandoc/LaTeX, etc.), but Positron’s integration makes it feel built-in. After any manual Quarto installation or update, again check `quarto --version` to ensure the system recognizes it. Having Quarto available is crucial because it’s what will render your `.qmd` documents into the final outputs (HTML, PDF, etc.). **7. Verify the Quarto extension in Positron:** Along with the CLI, Positron includes the **Quarto extension** for the editor. This is what provides syntax highlighting, previews, and notebook-like execution within Quarto documents. You can confirm it’s enabled by clicking the **Extensions** icon in Positron’s sidebar (the square icon) and searching for "Quarto". You should see an extension named "Quarto" which is likely already installed and enabled (Positron comes with it). This extension gives you an integrated visual preview and the ability to execute code cells inside `.qmd` files. In other words, it bridges Quarto with the IDE’s R/Python execution engine. For example, if you have a Python chunk in a Quarto document, the extension will route that to the active Python kernel and show output, whereas an R chunk goes to the R kernel – all coordinated seamlessly. If for some reason the Quarto extension isn’t present (perhaps in an older Positron build), you can install it via the Open VSX marketplace (Positron uses Open VSX, not Microsoft’s marketplace, for extensions). But this should not be necessary in current versions. One quick test after verifying installation is to open a sample Quarto file and see if the editor recognizes it (you should see markdown highlighting, and chunk options colored differently, etc.). We’ll create a Quarto document in a moment to test the full integration. **8. \[Optional\] Configure TeX for PDF output:** If you will render Quarto documents to PDF, a LaTeX distribution is required (because Quarto/Pandoc uses LaTeX to typeset PDF). This is not specific to Positron – it’s a general requirement for converting Markdown to PDF. A lightweight way to get LaTeX on your system is **TinyTeX** (an R-based installer for TeX Live). You can install TinyTeX by running `install.packages("tinytex")` followed by `tinytex::install_tinytex()` in R. Alternatively, on Windows you might install MikTeX, or on macOS MacTeX, or use TeX Live on Linux via package manager. If you skip this step, you can still produce HTML and Word outputs, but PDF rendering will likely fail with an error about missing LaTeX. Many R users prefer TinyTeX because it’s small and Quarto will automatically use it. After installing a TeX distribution, try rendering a Quarto document to PDF to confirm it works (we’ll do a test render shortly). If the PDF creation works without errors, you’re set. If there are errors about missing packages, TinyTeX by default will auto-install them, so give it a moment or two. The first PDF compilation might be slow as it downloads needed LaTeX packages, but subsequent ones will be faster. **9. \[Optional\] Remote development setup:** One powerful feature inherited from VS Code is the ability to run your work on a remote server while using the local IDE as the interface. Positron supports **Remote SSH** sessions, meaning you can connect to a remote Linux server (for example, a university HPC cluster or a cloud VM with GPUs for heavy ML training) and use Positron as if you were working locally. In a remote-SSH session, the Positron UI (your editor, etc.) runs on your local machine, but all the code, data, and computation happen on the remote machine. This is extremely useful for scaling up analyses or training large models without leaving the comfort of the IDE. To use this, you need SSH access to the server and an SSH client on your local machine. In Positron, open the Command Palette (**Ctrl+Shift+P**) and run "**Remote SSH: Connect to Host...**". If it’s your first time, it will prompt for a host address (`user@hostname`) and possibly ask to install the VS Code server on the remote. Positron uses an open-source implementation of VS Code’s remote server to enable this. Once connected, you’ll see a new Positron window with a green "\[SSH: hostname\]" indicator in the status bar. Now, any folder you open or terminal you launch in that window operates on the remote server. You can edit files, run R or Python, and even render Quarto documents as if you were on the remote machine – because you are! This means you can harness a powerful server (with more CPU/RAM or special hardware) for your computations, but still do the editing and visualization through your local screen. For example, you could train a machine learning model on a large dataset stored on a remote server; your Quarto document in Positron can execute that training code remotely and produce figures, which then display in your local Plots pane. One thing to note: when you first connect to a remote, Positron will not have your extensions installed there. The interface will show separate sections for “Local – Installed” vs “Remote – Installed” extensions. If you need certain extensions on the remote (like the Zotero citation picker or Python/Jupyter support), install them in the remote context (Positron will handle this via the same Extensions panel). Remote development is an advanced but highly rewarding setup for team collaborations or heavy computations. If you’re working solely on a local machine for now, you can skip it. But keep in mind this capability exists – as your IB machine learning projects grow, you might leverage cloud computing, and Positron will scale with you. (As of late 2024, remote-SSH in Positron was marked experimental but “just works” for most cases. It’s expected to become a standard feature.) **10. Final check:** At this stage, you should have R and Positron (with Quarto) set up. Let’s do a final verification of everything working together. In Positron, open the Command Palette and run **“Quarto: Check”**. This will print a diagnostic in the console, listing Quarto’s version, the Pandoc version, and whether R is detected. You should see something like “R: 4.3.1 – OK” in that output. Next, try creating a new Quarto document: click **File \> New File**, and choose *Quarto Document*, or use the palette with “Quarto: New File”. Select an HTML or PDF article template if prompted (or the default empty document). A new `.qmd` file will open with some example content (if you used a template). Now, with that file active, find the **Render** or **Preview** button – usually a magnifying glass icon at the top-right of the editor. Click it to render the document. Positron will call Quarto to render, and if all is well, a preview pane will appear showing the formatted output (by default, an HTML preview within the IDE). You might see the document’s title, some headings, and even a plot if it was an example file. If this works, it confirms that R, Quarto, and Positron’s integration are all functioning. If there’s an error, read the error message: common issues could be missing R packages (e.g., if the document tried to use one you haven’t installed), or missing LaTeX (if you tried PDF without TeX installed). Resolve any issues (install packages, etc.) and render again. Finally, try an interactive execution: in the `.qmd` file, find a code chunk (delimited by \`\`\`). Click the little “play” icon to the left of the chunk (appears when you hover there) to run that chunk. The code will execute and its output will appear in the console or plots pane. This confirms that you can iteratively work with Quarto documents. Congratulations – your environment is ready for reproducible research! From here on, your analysis, whether a simple regression or a complex ML model, can be done in a way that others (and you in the future) can retrace every step. ## Using Git and GitHub within Positron One of the pillars of reproducible research is **version control** – tracking changes in your project’s files over time. The Git system is the de facto standard for this, and services like GitHub (or GitLab, Bitbucket, etc.) are popular platforms for hosting repositories and collaborating. Positron, like VS Code, has built-in integration with Git, making it convenient to manage version control without leaving the IDE. In this section, we’ll cover using Git inside Positron: creating a repository for your project, making commits, branching, and syncing with a remote repository on GitHub. By keeping your analysis under version control, you ensure that every change to your code or data is recorded and can be rolled back or scrutinized. This greatly enhances transparency (for example, a collaborator or reviewer can see exactly what changed from one version of the analysis to the next) and facilitates teamwork. **Initial Git setup:** First, make sure Git is installed on your system. On Windows, if you didn’t have it already, installing Positron might not include Git by default, so you should install Git for Windows from [git-scm.com](https://git-scm.com) (this will also provide a Bash shell which is handy, and Positron’s terminal can use it). On macOS, Git is usually available via the Xcode Command Line Tools. On Linux, install the `git` package via your package manager if needed. Once Git is available, you should perform a one-time configuration of your user identity. Open a terminal and run: ``` bash git config --global user.name "Your Name" git config --global user.email "your.email@domain.com" ``` This sets your name and email, which will be associated with your commits (and show up on GitHub). It’s important for attribution and collaboration. If you skip this, Git will nag you when making the first commit. Now, in Positron, the left sidebar has a **Source Control** view (an icon that looks like a branch or Y-shape). If you open a folder that is already a Git repository, Positron will automatically detect it and show the repository status here. If your project folder is not yet under Git, the Source Control panel will say “No source control providers registered.” ### Creating and Initializing a Repository Let’s start a new Git repository for a project: - **Starting a new repo from scratch:** Suppose you have a project folder (e.g., `MLIB-chapter1/`) with some files (maybe an R script or a Quarto doc you just created). Open that folder in Positron (**File \> Open Folder...**). Then click on the Source Control icon. You should see an option to *Initialize Repository* (sometimes a blue button or a small initialization icon). Click it. Under the hood, this runs `git init` in the folder, creating a hidden `.git` directory that tracks version history. After initialization, Positron’s Source Control view will list your files as “untracked” (with a `U` icon for each). Now you have a local Git repository – but nothing is committed yet. - **Cloning an existing repo:** If instead you want to start by cloning an existing repository (for example, a template or an assignment repository provided by an instructor on GitHub), you can do so directly from Positron. On the Welcome page or Source Control view, select **Clone Repository**. Enter the repository URL (for instance, the HTTPS URL from GitHub like `https://github.com/YourUsername/YourRepo.git`). Positron will ask where to save it on your computer. Choose a location, and it will clone the repo (download all the files and commit history). Once done, it likely will automatically open the folder for you. You’ll now see the files and Git history in Positron. Whether you inited a new repo or cloned an existing one, the next steps are to start recording changes with commits. ### Basic Git Workflow: Staging, Committing, and Pushing Git’s workflow can be summarized as: edit files → stage changes → commit → push (to remote). Here’s how to do those in Positron: - **Stage changes:** After editing or adding files, the Source Control panel will show them under **CHANGES**. A new file appears with a U (untracked), a modified file with M, etc. You need to *stage* the changes you want to include in the next commit. In Positron, hovering over a file in the changes list will reveal a **+** icon. Click it to stage that file. The file moves to a **STAGED CHANGES** section. Alternatively, you can right-click the file and choose *Stage Changes*. If you have many files and want to include all, there’s an option to **Stage All Changes** (the “…” menu in Source Control or a button). Staging is like selecting which changes will go into the snapshot. - **Commit:** Once staging is done, you commit the changes to the repository’s history. In the Source Control view, there’s a text box at the top for a commit message. Type a concise message that describes what you did, e.g., “Initialize repository with README and setup script” or “Added data cleaning step for country dataset”. Good commit messages help others (and you) understand the purpose of changes. Now either press **Ctrl+Enter** (Cmd+Enter on Mac) or click the ✔️ **Commit** button (usually appears after you start typing a message). After committing, the staged changes clear (they’re now recorded in history), and the file statuses might reset to unmodified (if they were new, they’ll disappear from the list; if modified, they’ll no longer be in the changes list because working directory matches the commit). - **Connect to GitHub (remote):** If you created a new repo locally, you might want to publish it to GitHub for backup and collaboration. On GitHub, create a new repository through their web interface (no need to initialize it with anything on GitHub side, since you have local files). Then in Positron, you have two approaches: 1. **Using the terminal:** Open a terminal and run: ``` bash git remote add origin https://github.com/YourUsername/YourRepo.git git branch -M main git push -u origin main ``` This adds a remote named “origin” pointing to your GitHub repo, renames your local branch to **main** (if it wasn’t already), and pushes the commits. The `-u` flag sets the upstream so that future `git push` knows where to go. 2. **Using Positron UI:** Often, after your first commit, Positron will show a **Publish Branch** button or prompt in the Source Control view. If you click that, it will guide you to log in to GitHub (Positron might open a browser for GitHub authentication or use a personal access token) and then create the remote repo or link to an existing one. It streamlines the above terminal steps. Once linked, the bottom left of Positron’s status bar typically shows the branch name (e.g., “main”) and an icon for synchronization. Also, your Source Control view will indicate if you are ahead or behind the remote after commits. - **Push commits:** Pushing sends your local commits to the remote repository on GitHub. If you followed the steps above, pushing the main branch is as simple as clicking the sync icon (🔄) or using the “… menu” and selecting **Push**. After a successful push, your changes are now visible on GitHub. If you go to the GitHub page for the repo, you’ll see your files and commit message. It’s good practice to push relatively often, especially after major changes or at end of day, so that your work is backed up and teammates can see it. - **Pull updates:** If you’re working with others (or even pulling from another machine you use), you need to incorporate changes made elsewhere. GitHub will have commits from others once they push. Positron indicates if your local branch is behind (e.g., it might show an down-arrow with a number). Click the down-arrow icon or do **Pull** from the menu to fetch and merge those changes into your local copy. After pulling, your local files update to include the new commits. Always consider pulling before you start a day’s work to ensure you have the latest version and to minimize potential conflicts. *(Under the hood, the UI actions correspond to `git` commands: staging = `git add`, commit = `git commit`, push = `git push`, pull = `git pull`. You can always open a Positron integrated terminal and run these manually if you prefer or need advanced options, and Positron will still pick up the changes.)* With these basics, you can track the evolution of your project. For example, if you make a mistake or want to see what changed between two versions of your analysis script, you can use Git’s history and diff tools. Positron’s Source Control panel allows you to click a file and see the *diff* (differences) between the working copy and the last commit, or between commits. This is extremely useful for debugging: if a result changed, you can pinpoint exactly which code change caused it by looking at diffs. It’s also invaluable for machine learning experiments – you might commit each significant experiment or refactoring with a message, making it easier to recall what you tried (e.g., “Commit: switched to random forest model with 100 trees” and later “Commit: added normalization to features”). If a colleague says, "the previous version of the model was performing better," you can retrieve that version from Git and compare. ### Branching and Managing Changes A **branch** in Git is like an independent line of development. The default branch when you initialize is typically called *main* (or *master* in older repos). Branching is a powerful way to isolate changes until they’re ready to be merged. In a collaborative setting, each person might work on their own branch for a new feature or analysis section. Even when working alone, you might use branches to try out a new approach (say, a different model or a major code refactor) without disturbing the stable main analysis. In Positron, creating and switching branches is easy: - **Creating a new branch:** Look at the bottom status bar; you’ll see the current branch name (e.g., “main”). Click it, and a branch menu appears. Choose **“Create new branch”**, then type a name, like “experiment-new-features” or “refactor-data-import”. Use a descriptive name for clarity. Press Enter, and Positron will create the branch and switch you to it. This is equivalent to `git checkout -b new-branch-name`. Now any commits you make will be on this new branch, separate from main. The Source Control view will still track changes the same way, but be mindful of which branch you’re on (the status bar always shows it). It’s often wise to commit or stash any uncommitted changes before switching branches, to avoid confusion. - **Working on a branch:** On your new branch, go ahead and make changes. For instance, if you’re trying a different machine learning model, you might edit the R script to implement the new model. You can commit those changes on this branch as you go. The main branch remains untouched by these commits. This means if your experiment doesn’t pan out, you can simply switch back to main and abandon the branch, without cluttering main’s history. Conversely, if the experiment is successful, you’ll want to integrate it back. - **Switching branches:** To switch back to an existing branch (like main), click the branch name in the status bar and select the branch you want (e.g., “main”). If you have uncommitted changes on the current branch that conflict with the target branch, Git will warn you. It’s best to have a clean working state (commit or stash changes) before switching. When you switch, the files in your project will update to reflect the state of that branch. Positron will essentially load the project as it was at the last commit of that branch. This might mean files appear to “undo” changes if those changes were only on another branch – don’t panic, it’s just Git showing you the other branch’s version. Everything on your experiment branch is still there; you just need to switch back to see it. - **Merging branches:** When you’re ready to bring the changes from a branch into another (typically into main), you perform a merge. One way: switch to the target branch (e.g., switch to `main`), then in the Source Control “…” menu choose *Merge Branch...* and select your feature branch. Alternatively, use the terminal: `git checkout main` then `git merge experiment-new-features`. If there are no conflicts, the changes will merge and you can commit the merge (if not done automatically). If there *are* merge conflicts (meaning both branches edited the same part of a file differently), Positron will highlight these conflicts in the files, showing markers like `<<<<<< HEAD` and `>>>>>> branch-name`. You will need to manually edit those sections to decide what the merged content should be, then mark the file as resolved (stage it) and commit. Positron’s editor has shortcuts to accept one side or the other or combine changes. For example, if you and a colleague both edited the same paragraph in the Quarto document, you’ll see both versions and choose what the final text should be. Merging can be complex, but if you frequently commit and pull (so branches don’t diverge too much) it’s usually manageable. In our context, perhaps you created a branch to add a new figure to the report; meanwhile, the main branch had minor text edits. When merging, you’ll reconcile any overlapping changes. - **Deleting branches:** After a successful merge, you can delete the branch (since its work is now integrated). This can be done with `git branch -d branchname` in terminal or possibly via the UI branch menu. Deleting a branch in Git only deletes the pointer, not the commits (they remain in history). It’s mainly housekeeping to avoid dozens of stale branch names. On GitHub, if you merged via a Pull Request (discussed next), it often gives an option to delete the branch. Using branches helps maintain a clean main line of development, which ideally always contains a working version of your project. For example, you might stipulate that the main branch reflects the version corresponding to the paper’s current draft, while experimental branches are used for trying alternative analyses or addressing reviewer comments. This way, you can work on new ideas without risk to the main document and only merge them when they’re finalized and checked. ### Pull Requests and Collaborative Workflows When using a hosted Git service like GitHub for collaboration, a common practice is to use **Pull Requests (PRs)**. A PR is a feature on GitHub (and similar platforms) that lets you formally propose changes from one branch to another and have a discussion around those changes. It’s essentially a request: “Please pull my changes into the main branch.” PRs are extremely useful for code review and team communication. Even in academic collaborations, treating each major addition or revision as a PR can improve the quality of the project by involving multiple eyes on the changes. Positron can integrate with GitHub’s PR system via extensions (e.g., the “GitHub Pull Requests and Issues” extension, which might come pre-installed or can be added via Open VSX). If you sign in to GitHub through Positron, you can manage PRs inside the IDE. However, you can also use the GitHub web interface for PRs, which is quite user-friendly. Here’s a typical PR workflow: - **Creating a PR:** Let’s say you created a branch `feature-new-model` and committed a series of changes implementing a new machine learning model in your analysis. You push this branch to GitHub (`git push -u origin feature-new-model`). On GitHub, you’ll see a prompt “Compare & pull request” for that branch. Click it to open a PR. You’ll be asked to provide a title and description. For example: “Add random forest model for comparison” and describe the changes, results, etc. Assign reviewers if applicable. Alternatively, within Positron, if the GitHub PR extension is active, you might see a prompt or a button to create a PR from the branch. The extension will let you fill in the details and submit without leaving the IDE. In either case, a PR is now open on GitHub which anyone with access can review. - **Reviewing and discussing:** Team members (or your future self) can look at the PR on GitHub, which will show all the commits and a diff of all changes. They can comment on specific lines (e.g., “Is this parameter tuned properly?” or “Great use of ggplot here.”) or overall. You can respond to feedback by making further commits to the same branch – those new commits will automatically be added to the PR. This fosters an iterative improvement: maybe a collaborator spots a mistake in a data transformation; they comment, you fix it on your branch and push, and the PR updates. All this conversation is recorded, contributing to the documentation of the project’s development. - **Merging the PR:** Once everyone is satisfied (or if you’re working solo and you consider the feature complete), the PR can be merged. Usually, you or a repo maintainer will click “Merge pull request” on GitHub. There are a few merge options (create a merge commit, squash commits, etc.), but the default is to merge with a commit that brings all changes into main and closes the PR. After merging, don’t forget to **pull** the changes into your local main branch if you weren’t on it. Positron will then update your local files to match. If you opened the PR from within Positron’s extension, it may also offer a merge button and will handle the sync for you. - **Cleaning up:** After merge, you can delete the branch (GitHub usually offers a button to do so on the closed PR page). This keeps the repository tidy. If using Positron, switch back to main and delete the branch locally (Positron’s branch menu or `git branch -d`). This ensures you don’t accidentally keep working on an old branch that’s already merged. - **Issues and project management:** GitHub also provides **Issues** for tracking tasks/bugs and **Projects/Boards** for organizing work. If you use those, Positron’s GitHub integration can show issues (and even allow creating or commenting on them). For instance, you might have an issue “Improve model accuracy for region X” or “Update data source for 2025 figures.” You or collaborators can reference issues in commits or PRs (using `#issue-number` in commit messages or PR description), which creates links and traceability. While not strictly necessary for reproducibility, using issues helps ensure that planned analyses or revisions don’t get forgotten. It complements reproducibility by adding context – an issue discussion might record why a certain analysis decision was made (e.g., “we decided to use random forest after discussing issue #12 about non-linearity in data”). In sum, using Git and GitHub within Positron allows you to treat your research project like a true software project – which it essentially is when you’re writing code for analysis. This practice is increasingly encouraged by top journals. Some journals in economics and management now require authors to submit code and data, and having a GitHub repository ready makes this straightforward. It also means anyone who wants to verify or build upon your work can easily access the exact code that produced each result, thus enabling reproducibility. For teaching or team projects, it means no more emailing code files around (which inevitably leads to confusion like “Final_final2.R”). Instead, there’s one central repository where the latest version lives, and a history to consult if needed. Before moving on, a quick tip: remember to version control not just your code, but also important metadata like the Quarto document and bibliography. You generally should *not* put large raw data files in Git if they’re above a few dozen MB – Git isn’t great with big binaries – but you can keep small data files under version control. For larger data, you might instead provide a script to download them or use Git LFS. For example, if your analysis uses a large panel dataset, you could store a subset or example in the repo, and instructions for obtaining the full data (perhaps due to license restrictions). Everything needed to regenerate results (except the large data) should be in the repo. This way, the environment is as reproducible as possible. ## Managing R Packages and Environments in Positron Managing packages in R is a critical part of making your research reproducible. Different package versions can yield slightly different results (or even break code). Positron does not introduce a new way of handling R packages – it relies on the standard R library mechanism – but it’s worth discussing best practices for package management in a reproducible workflow. In this section, we’ll cover how to install and update packages within Positron, and how to use tools like **{renv}** to create project-specific libraries. We will also touch on handling Python packages if your analysis uses Python (since Positron supports both, you might have a scenario where R calls Python, e.g., via reticulate, or you have separate Python scripts for certain tasks). ### Installing and Updating R Packages As mentioned earlier, you can install packages by running `install.packages("packagename")` in the R console within Positron. For example, to install the **car** package (Companion to Applied Regression) you’d run `install.packages("car")` and watch the console as it downloads and installs. This is identical to how you’d do it in RStudio or base R. There isn’t a GUI menu for packages in Positron as of 2025 (RStudio’s Packages pane doesn’t have an equivalent here yet), but using code to install is actually more reproducible (since you can include those commands in a setup script or Quarto document). **Installing multiple packages:** You can install several at once by providing a vector, e.g., `install.packages(c("dplyr", "ggplot2", "xgboost"))`. If you attempt to use a package that isn’t installed, R will throw an error. A quick way to install missing packages in a script is to use a snippet like: ``` r packages <- c("dplyr", "ggplot2", "xgboost") install.packages(setdiff(packages, rownames(installed.packages()))) ``` This will install only those not already installed. This could be included in your Quarto doc’s setup chunk (with caution – in a rendered document you might not want to auto-install, but it’s useful for scripts). **Updating packages:** Over time, you may want to update to get bug fixes or new features. You can do this with `update.packages()` which will prompt you to select CRAN mirrors and show packages that have newer versions. Alternatively, tools like **{checkmate}** or RStudio’s package manager (if you have that separately) can help. In Positron, since there’s no dedicated update UI, you’d run the command or update via command line. Be mindful that updating packages in the middle of a project can change results (especially for ML packages if algorithms change or default hyperparameters differ). A reproducible approach is to **freeze package versions** once you start writing up results. This is where **{renv}** comes in, which we’ll discuss shortly. **Removing packages:** If needed, `remove.packages("packagename")` will uninstall a package. This is rarely necessary unless you want to downgrade a package by removing the newer version and installing an older one, or you’re freeing space. Removing doesn’t erase the package from any lockfiles or records you keep, so you’d also update those records. **Using different CRAN-like repositories:** In some cases, you might use MRAN (Microsoft’s CRAN snapshot service) or Posit’s Package Manager to get deterministic package sets. For instance, using **{checkpoint}** or **{groundhog}** packages allows you to install packages as of a particular date, which improves reproducibility across time. However, as of 2025, MRAN (which let you pin to daily snapshots) has been superseded by Posit’s Public Package Manager with snapshot capabilities. If absolute reproducibility is needed, you can document that all packages were from CRAN as of a certain date or use `install.packages(..., repos="https://packagemanager.posit.co/cran/2023-12-01")` to get versions as of that date. This is somewhat advanced; an easier path is using renv which essentially automates capturing exact versions. **Python packages (if using Python in Positron):** Since Positron supports Python, you might be using libraries like pandas, numpy, scikit-learn, etc., for certain tasks. If you run a Python chunk or open a Python terminal, you’ll need those packages in your Python environment. Positron itself doesn’t manage Python environments, so you should manage them via pip or conda as usual. One nice thing is that if you use R’s **reticulate** package to call Python from R (common in RMarkdown/Quarto scenarios to leverage a Python library in an R project), reticulate can be configured to use a virtualenv or conda env, and you can include a requirements.txt or environment.yml to specify Python deps. But going deep into Python is outside our scope here; just note that reproducibility concerns apply to Python libraries as well – you’d want to record which version of, say, TensorFlow you used, if your IB project involves training a neural network in Python. ### Project-Specific Libraries with renv For truly reproducible research, it’s not enough to know *which* packages you used; you need to know *which versions*. This ensures that if someone tries to rerun your analysis in the future, they can get the same results even if newer versions of packages have been released since. R’s solution for this is the **{renv}** package (short for reproducible environment). It allows each project to have its own library of packages and a lockfile that records exact versions (and sources) of each package used. Using renv in Positron is straightforward, as it doesn’t depend on the IDE. You would do the following (usually at project start or once you have the major packages installed): 1. Install renv (if not already): `install.packages("renv")`. 2. Initialize renv for the project: call `renv::init()` in the R console while your project folder is open. This will: - Create a new folder `renv/library` in the project, where project-specific package versions will live. - Write an `renv.lock` file listing all the current packages (and their versions) you have in your default library that are being used in the project (it tries to detect package usage, or at least captures what’s currently installed). - Switch the R session to use the project library (so `.libPaths()` is changed; you’ll see messages about this). - It may also offer to snapshot the current state. 3. After init, moving forward, any `install.packages()` you do will install into `renv/library/project-name/` rather than your user library. This isolates the project. If you have multiple projects requiring different package versions, renv keeps them separate so they don’t conflict. 4. As you add new packages or update them, run `renv::snapshot()` to update the lockfile. The lockfile is a plain JSON that lists each package, version, and source (CRAN or GitHub, etc.). This file should be committed to Git! It’s the record of your environment. 5. When someone else (or you on another machine) gets the project, they can run `renv::restore()` to install the exact versions from the lockfile. Renv will fetch packages from CRAN archives or cache or wherever necessary to match versions. This way, the analysis should run the same. Keep in mind, some packages (especially in ML) have non-deterministic algorithms or rely on system-specific libraries (e.g., XGBoost using different CPU instruction sets), but version consistency is a big step toward reproducibility. In Positron, renv works as usual. One minor thing: when renv activates (which it does automatically when you open an renv-enabled project), you’ll see the R console showing the path to the renv library. Positron’s Environment pane will only list packages in that library for autocompletion, etc. If you try to use a package that’s not installed in the renv library, you need to install it (even if you had it globally, because renv library is separate). Using renv does add a bit of overhead (each project may duplicate packages unless you use renv’s global cache feature which links to a central store to avoid actual duplication of files). But the benefits for a long-term project are huge: you can return to it a year later and restore the environment exactly. It’s highly recommended for collaborative projects or any research that might be audited or extended by others. If you choose not to use renv, at the very least you should document your session info. For example, at the end of your Quarto document or in an Appendix, include the output of `sessionInfo()` or the more refined `sessioninfo::session_info()`. This prints R version, OS, and each package with version. That way, someone could manually assemble an environment that matches (it’s less convenient, but still possible). In fact, many journals now encourage authors to provide session info in appendices for transparency. For Python, a similar philosophy applies. If you used Python, provide a `requirements.txt` or `environment.yml`. If your analysis is mostly R but with a bit of Python via reticulate, renv can snapshot the Python dependencies too (it can integrate with Python requirements). This is advanced, but worth mentioning. ### Other Package Management Tips Positron currently doesn’t have a GUI for managing packages (like installing from CRAN with a click). This is by design minimalistic, and arguably encourages scripting these actions. However, some conveniences: you can use the Command Palette in Positron and search for “R: Install Package” – there is a command that will prompt you for a package name and run install.packages for you. It’s essentially a shortcut. Similarly, there’s “R: Update Packages” which calls update.packages. These come from the R extension capabilities (provided by Ark in Positron). Feel free to use them, but remember to also note those actions in code or documentation if they affect your analysis. One pitfall to avoid: **Do not manually change objects in the Environment pane expecting that to carry over.** In RStudio one could, say, tweak a dataframe by editing it in the spreadsheet view. That’s not directly possible in Positron (there’s a data viewer, but it’s read-only). This is actually good: it forces you to make changes via code, which is then recorded. Always perform data cleaning or transformations with code, not by manual steps. This ensures that someone else can run the raw data through the same code and get the same cleaned data. Finally, if at some point you need to break out of Positron for package management (perhaps installing an older package from source, or debugging a library loading issue), you can always run R in a terminal or use RStudio just for that step. The resulting library will still be usable by Positron. But these cases should be rare. The typical flow is: you write code, get an error like "package XYZ not found", you install the package in the console, maybe add that install line to a setup script (or use renv to lock it), and proceed. By diligently managing your packages and documenting their versions, you eliminate a huge class of “it works on my machine” problems. For example, if a future reader of your work tries to rerun your code and it fails, the first thing they might check is package versions. If you have an renv lockfile or session info, they can compare and identify any discrepancies. This is especially relevant for ML results, where changes in a modeling package (like an update to how random forest calculates variable importance) could lead to slightly different outputs. Locking the version avoids any ambiguity about what was used. ## Creating and Rendering Quarto Documents in Positron Now we bring it all together: writing a **literate programming document** that combines prose, code, and results. Quarto is the tool for this job. A Quarto document (`.qmd` file) is a markdown file with embedded code chunks that can produce output (tables, figures, etc.) which gets included in the final rendered document. In Positron, Quarto is first-class: it has integrated preview and execution. Let’s go through how to create a Quarto document in Positron, structure it, and produce outputs such as HTML, PDF, or Word. Along the way we’ll demonstrate with a simple example analysis to illustrate key features (like cross-references, citations, and embedding plots). ### Creating a New Quarto Document To create a Quarto document in Positron, you can either use the menu or command palette: - **Via menu:** *File \> New File \> Quarto Document* (if available). Positron might prompt you to choose a document type (e.g., Article, HTML vs PDF, etc.). If you choose one, it will pre-populate the file with a sample title, author, and a bit of content. - **Via command palette:** Press **Ctrl+Shift+P** and start typing “Quarto New”. Select **Quarto: New File**. This does similar steps, possibly offering template options. However you do it, save the new file with a `.qmd` extension, e.g., `analysis.qmd`. At the top of a Quarto file is typically a YAML header delineated by `---` lines. For example: ``` yaml --- title: "Education and Saving Rates Across Countries" author: "Your Name" date: "2025-07-08" format: html: default pdf: default editor: visual bibliography: refs.bib csl: apa.csl --- ``` Let’s break this down: - **title/author/date:** self-explanatory, they’ll appear in the document. Use date or omit it; format is flexible. - **format:** here we listed two formats (HTML and PDF) under the format key. This means when we render, Quarto will be prepared to output both. If you just want one format by default, you could specify one (or Quarto defaults to HTML if not specified). We can always render to others via command line or options. - **editor: visual** is an option that if you open the file in RStudio or certain contexts it would default to the WYSIWYG editor. In Positron, you can ignore that or set it; it doesn’t affect source mode. - **bibliography:** points to a BibTeX file (refs.bib) that contains reference data for citations. - **csl:** specifies a Citation Style Language file to format citations. Here we put `apa.csl` which would format references in APA 7th edition style (assuming we have the APA CSL file in our project directory). After the YAML, you write the document in markdown interspersed with code chunks. Let’s outline a simple example in an international business context. We will examine whether countries with younger populations tend to have lower saving rates, illustrating a concept from development economics (the life-cycle saving hypothesis by Modigliani). We’ll use R’s built-in **LifeCycleSavings** dataset, which has data on 50 countries around 1970 including `sr` (savings rate) and `pop15` (percentage of population under age 15). Our Quarto document will include some text, a figure, a table, and an inline equation or two, plus a citation to the original theory. Here’s how we might structure the Quarto content (following the YAML): \`\`\`\`markdown \# Introduction Does demographic structure impact economic outcomes? According to the life-cycle savings hypothesis (Modigliani, 1986), countries with younger populations tend to have lower savings rates, as working-age adults support more dependents and thus save less (this is offset later when populations age). We examine this theory by analyzing data on national saving rates and age demographics. *Modigliani (1986) was a Nobel-winning economist who formalized the life-cycle hypothesis of saving – we cite his lecture here as an example.* [@modigliani1986] #### Data We use the `LifeCycleSavings` dataset (built into R), which contains data on the aggregate personal saving rate and demographic indicators for 50 countries around 1970. The key variables are **sr** (savings rate as percentage of disposable income) and **pop15** (percentage of population under 15 years old). For context, `pop15` ranges from about 23% (in richer, older countries) up to 47% (in very young populations). ```{r} #| echo: true summary(LifeCycleSavings[c("sr", "pop15")]) ``` *Table 1. Summary statistics for savings rate (sr) and youth population percentage (pop15) across 50 countries.* ```{r} #| label: tbl-summary #| echo: false #| tbl-cap: "Descriptive stats for saving rate (sr) and youth population (pop15) for 50 countries." library(knitr) kable(summary(LifeCycleSavings[c("sr","pop15")]), col.names = c("Savings Rate (%)","Pop under 15 (%)")) ``` The summary above shows that the average saving rate is around 11% of income, and the average percentage of population under 15 is about 35%. We will investigate the relationship between these two variables next. #### Analysis First, let’s visualize the relationship. Figure @fig-scatter is a scatterplot of youth population percentage vs. saving rate: ```{r} #| label: fig-scatter #| fig-cap: "Scatterplot of youth population (%) vs. saving rate (%) for 50 countries." plot(LifeCycleSavings$pop15, LifeCycleSavings$sr, xlab="Population under 15 (%)", ylab="Savings Rate (%)", pch=19, col="steelblue") abline(lm(sr ~ pop15, data=LifeCycleSavings), col="tomato", lwd=2) ``` As we see in Figure @fig-scatter, there is an evident negative relationship: countries with a higher percentage of young people tend to have lower savings rates. The red line is a linear fit indicating this trend. Next, we quantify the relationship with a simple linear regression: ```{r} model <- lm(sr ~ pop15, data = LifeCycleSavings) summary(model) ``` The model estimates that for every 1 percentage point increase in the youth population, the national saving rate decreases by about `r round(coef(model)["pop15"], 2)` percentage points on average. This coefficient is statistically significant (*p* ≈ `r ifelse(coef(summary(model))["pop15","Pr(>|t|)"] < 0.001, "< 0.001", round(coef(summary(model))["pop15","Pr(>|t|)"], 3))`), supporting the hypothesis that younger populations are associated with lower saving. To ensure reproducibility, we set a seed (not actually needed for this simple regression, but good practice for any analysis involving randomness): ```{r} set.seed(42) ``` (Setting a seed is crucial in machine learning workflows whenever there’s randomness, such as train-test splits or model initialization, to allow exact replication of results.) #### Discussion Our findings align with the life-cycle hypothesis: countries with very youthful populations (e.g., many developing countries in 1970 with high fertility rates) show lower savings, whereas countries with older populations save more, presumably because a larger share of individuals are in prime saving years. This simple analysis is, of course, bivariate. In reality, many factors influence savings (income levels, growth expectations, financial institutions, etc.), and age structure is just one piece. A more complete model might include those factors. However, for this demonstration, the focus is on workflow: we have created a fully reproducible analysis document. Anyone reading this document can see the raw data summary (Table 1), the plotted relationship (Figure 1), and the model output, and they can trace each result to the exact code that produced it. Moreover, using tools like Git and Quarto ensures that if we extend this analysis – for instance, adding data for more years or applying a machine learning model to predict savings rates – all changes will be documented. This transparency builds trust in the results. As Aguinis et al. (2017) note, improving reproducibility in IB research requires addressing practices that hide trial-and-error; here, we’ve shown every step explicitly:contentReference[oaicite:20]{index="20"}. Finally, the entire environment (R version, package versions) is documented, so this analysis can be reproduced even years later. For example, we used R `r getRversion()` and the packages listed in the References (with versions). Ensuring such stability is especially important when methods or software might change – for instance, if a newer version of a machine learning algorithm tweaked its default settings, having our code and package versions pinned means we (or others) can get the original results for comparison. ## Conclusion In this chapter, we demonstrated how to set up and use Positron, Quarto, and Git for a reproducible research workflow, with an example drawn from international business and development economics. We installed the necessary tools, configured Positron for R (and Python) work, and walked through creating a literate programming document that integrates analysis and narrative. By using Positron, we benefit from an IDE that supports multi-language data science, allowing us to, for example, mix R and Python seamlessly and leverage modern editing features (like IntelliSense) for efficient coding:contentReference[oaicite:21]{index="21"}. Quarto enabled us to present our analysis with clarity – combining text, tables, figures, and citations in a single document that can be rendered to publication-quality output (HTML for quick sharing, PDF for formal reports, or Word if needed for collaboration with colleagues who prefer track changes). Git version control provided a safety net and collaboration framework, ensuring that every change is tracked and that we can use branches/PRs to manage contributions and experiments. For machine learning in international business, these tools are not just conveniences but essential safeguards. ML projects often involve playing with different data transformations, model architectures, or tuning parameters. It’s easy to lose track of what was done unless one practices strict reproducibility. By scripting everything (no manual clicks) and using version control, we make certain that we can always retrace our steps. If someone asks, “How did you preprocess the data for the neural network model?” we can point to the exact code in the Quarto document or a committed script. If a collaborator is skeptical about a result, we can offer the repository for them to run themselves, or we can roll back to a previous state to double-check if a recent change introduced an issue. These capabilities mitigate the concern raised in the IB reproducibility editorial about undisclosed “several trial-and-error steps”:contentReference[oaicite:22]{index="22"} – here, all steps (even the exploratory ones on branches) are potentially disclosed in the code history. Another advantage of this integrated workflow is in producing outputs for dissemination. With Quarto, one can easily produce an HTML report that could be published online (for instance, on GitHub Pages or Posit’s Connect server), allowing interactive exploration (Quarto can embed interactive plots or even shiny apps). Or we can produce PDF/Word outputs that comply with academic journal formats. We used an APA citation style to format references, which is common in social sciences (including many IB journals). The citation integration (via Zotero or other reference managers) ensures that all our citations are consistent and complete. Instead of manually formatting bibliography entries, we let Quarto handle it, reducing the chance of errors. And with tools like the Zotero extension, adding a new reference to support a statement is as easy as a keyboard shortcut and search:contentReference[oaicite:23]{index="23"}, which encourages proper citation practices (thus our document cites both the original theory and a recent editorial on reproducibility). From a practical standpoint, Positron being based on VS Code means it’s highly extensible. As you become more comfortable, you can customize the editor, install additional extensions (for example, spell check for prose, or linters for code to maintain style). It also means Positron will continue to benefit from improvements to VS Code’s core and Posit’s enhancements for data science. The fact that Positron supports remote development and multiple languages makes it a future-proof choice for projects that might grow in scope. For instance, maybe your IB analysis starts in R but later you incorporate a deep learning model in Python – Positron can handle that in the same project, and Quarto can still be the glue that holds the narrative together with results from both R and Python. The combination of literate programming, version control, and modern IDEs addresses many of the reproducibility challenges in computational research. In the words of research methodology, it makes our work **transparent, traceable, and trustworthy**. By adopting these practices early (as a student or researcher), one develops habits that not only make replication by others easier but also make one’s own work more efficient and less error-prone. In the context of machine learning, where experiments can be complex and data large, having a disciplined workflow can save you from costly mistakes (like losing a model because you forgot how you derived it) and allow you to concentrate on the insights rather than worrying about logistics. As we move into subsequent chapters on regression, classification, unstructured data, and geospatial analysis, these same principles will apply. Each chapter will build on this foundation, demonstrating how to implement specific methods (from linear models to neural networks) in a reproducible manner. By the end of this textbook, you should not only have learned various machine learning techniques for international business problems, but also how to document and share your analyses so that others can trust and verify them – contributing to a more reproducible and robust scientific enterprise. ## References {.unnumbered} - Aguinis, H., Cascio, W. F., & Ramani, R. S. (2017). *Science’s reproducibility and replicability crisis: International business is not immune*. **Journal of International Business Studies, 48**(6), 653–663:contentReference[oaicite:24]{index="24"}. - Baker, M. (2016). *1,500 scientists lift the lid on reproducibility*. **Nature News**, 533(7604), 452–454:contentReference[oaicite:25]{index="25"}. - Pineau, J., Vincent-Lamarre, P., Sinha, K., Larivière, V., Beygelzimer, A., d'Alché-Buc, F., Fox, E., & Larochelle, H. (2021). *Improving reproducibility in machine learning research (A report from the NeurIPS 2019 reproducibility program)*. **Journal of Machine Learning Research, 22**(211), 1–20:contentReference[oaicite:26]{index="26"}. - Jumping Rivers (2024). *Positron vs RStudio – Is it time to switch?* (Blog post by T. Roe, Dec 5, 2024). – Discusses Positron as a “polyglot” IDE and differences from RStudio:contentReference[oaicite:27]{index="27"}:contentReference[oaicite:28]{index="28"}. - Vuorre, M. (2025). *How to Add Citations from Zotero to Quarto Documents*. (Blog post, June 6, 2025). – Describes the VS Code Zotero extension for inserting citations:contentReference[oaicite:29]{index="29"}. - Posit (2025). *Positron Documentation – Remote SSH*. – Official docs on using Positron’s remote development feature:contentReference[oaicite:30]{index="30"}. - Quarto Project (2025). *Using Quarto with Positron*. – Official Quarto documentation on Positron integration:contentReference[oaicite:31]{index="31"}. - Appsilon (2024). *Introducing Positron: A New, Yet Familiar IDE for R and Python*. (Blog post by D. Radečić, July 4, 2024). – Overview of Positron’s features and multi-language support:contentReference[oaicite:32]{index="32"}:contentReference[oaicite:33]{index="33"}. - Marwick, B. (2017). *Computational reproducibility in archaeological research: Basic principles and a case study of their implementation*. **Journal of Archaeological Method and Theory, 24**(2), 424–450. (An example of reproducible research in practice, advocating for sharing code and data). - Modigliani, F. (1986). *Life cycle, individual thrift, and the wealth of nations*. **American Economic Review, 76**(3), 297–313. (Classic exposition of the life-cycle hypothesis of saving, cited in our example analysis).