Need help?

Our experts have had an average response time of 11.7 minutes in August 2021 to fix urgent issues.

We will keep your servers stable, secure, and fast at all times for one fixed price.

All about modifying the Spark configuration in an Amazon EMR notebook

by | Sep 26, 2021

Stuck with modifying the Spark configuration in an Amazon EMR notebook? Find out how the experienced Support Engineers at Bobcares go about it.

If you want to customize your configuration for a Spark job in an Amazon EMR notebook, you have come to the right place. Our Support Engineers are well versed with Server Management issues, big and small.

About Amazon EMR notebook & modifying the Spark configuration in an Amazon EMR notebook

Did you know the Amazon EMR notebook is actually a serverless Jupyter notebook? Furthermore, it uses Sparkmagic kernel as a client. It allows it to interactively work with Spark in the remote cluster via an Apache Livy server. Moreover, Spark configuration is configured using Sparkmagic commands. In addition, you need a custom configuration to do the following:

  • To edit executor cores and executor memory for a Spark Job.
  • To change Spark’s resource allocation.

Modifying the Spark configuration in an Amazon EMR notebook

Modify current Session

  1. First, run %%configure on a Jupyter notebook cell to modify the job configuration. For instance:
    %%configure -f
    {"executorMemory":"4G"}
  2. After that, pass the –conf option for more configurations. For instance, you can use a nested JSON as seen below:
    %%configure -f
    {"conf":{"spark.dynamicAllocation.enabled":"false"}}

Verify the configuration change was successful

  1. First, run %%info on the client side to verify current session configurations. For instance, here is a sample output:
    Current session configs: {'executorMemory': '4G', 'conf': {'spark.dynamicAllocation.enabled': 'false'}, 'kind': 'pyspark'}
  2. Next, verify /var/log/livy/livy-livy-server.out log on the EMR cluster. In case, the SparkSessions has started, you will get a log entry similar to this:
    20/06/24 10:11:22 INFO InteractiveSession$: Creating Interactive session 2: [owner: null, request: [kind: pyspark, proxyUser: None, executorMemory: 4G, conf: spark.dynamicAllocation.enabled -> false, heartbeatTimeoutInSecond: 0]]

[Looking for assistance with Server Management? Give us a call.]

Conclusion

In conclusion, we saw how to modify the Spark configuration in an Amazon EMR notebook without too much trouble. Without a doubt, the experts at Bobcares are proficient at handling different Server Management challenges.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF