Questions in topic: "notebooks"https://forums.databricks.com/questions/topics/single/131.html
The latest questions for the topic "notebooks"How to reduce run time of a “Hello World” program executing on Spark cluster in Azure Databricks?https://forums.databricks.com/questions/17103/how-to-reduce-run-time-of-a-hello-world-program-ex.html
<p>I'm running a simple Hello World program through an azure databricks python notebook by implementing a Job on Spark cluster with 1 driver node and 2 worker nodes. After the job gets executed, the duration to complete the job is coming out to be 12 seconds which should be between 2-3 seconds. </p><p>The below is the code snippet written in notebook:</p><pre>print("Hello World")</pre><p>I wanted to understand why is a simple task of printing "Hello World" taking so much of time whereas it should be very less seconds as the code is being run on a distributed spark cluster.</p><p>Is there a way to decrease this time period? If not, does anyone has a logical explanation specific to this program?</p>sparkpythonnotebooksazure databricksjob_clustersMon, 04 Mar 2019 06:05:10 GMTamanpreet kaurAccessing Databricks' REST API within Databricks' notebook (with no internet gateway)https://forums.databricks.com/questions/16932/accessing-databricks-rest-api-within-databricks-no.html
<p>Hi
This is the situation:</p><ul>
<li>I have a Databricks notebook</li> <li>From within the notebook, I want to invoke the Databricks' REST API (I want to automate the process of creating new folders and notebooks in the Workspace)</li> <li>The catch is that our Databricks environment is not connected to the Internet (the internet gateway is not attached to the deployment VPC), so I can't really hit the public endpoint</li></ul><p>
Is there a way how to solve this? I am basically looking for Databricks CLI in the notebook; or something like:</p><pre>dbutils.api.workspace.list()</pre>
</ul>notebooksrest apidatabricks rest apidatabricks cliFri, 22 Feb 2019 12:04:20 GMTberkamilanml@gmail.comWhere does databricks save my notebooks?https://forums.databricks.com/questions/16856/where-does-databricks-save-my-notebooks.html
<p>Thank you for your help, I am using the free trial version and I am wondering,</p><p>Where does databricks save my notebooks? I know I can add version control and export it. But was wondering does it saved in one of my S3 buckets or it saved somewhere in databricks?</p><p>Thank you so much,</p>notebookss3bucketMon, 18 Feb 2019 16:00:42 GMTammHow should I install nltk package on Azure Databricks to work with spark clusters?https://forums.databricks.com/questions/16684/how-should-i-install-nltk-package-on-azure-databri.html
<p>I'm trying to run a nltk python code on azure databricks notebook. I have installed nltk==3.5.2 library in the Workspace. While running the following commands: </p><p>import nltk </p><p>nltk.download()</p><p>The NLTK downloader keeps on running. I saw other answers as well but not able to understand them properly like : https://forums.databricks.com/questions/1343/how-can-i-use-nltk-natural-language-toolkit-with-d.html?sort=votes and https://forums.databricks.com/questions/445/importing-nltk-and-downloding-corpus.html .</p><p>Can anyone help me get a clear explanation on the above issue and help me work with nltk package?</p>pythonnotebooksazure databricksnltkFri, 08 Feb 2019 12:37:21 GMTamanpreet kaurYou cannot use dbutils within a spark job or otherwise pickle ithttps://forums.databricks.com/questions/16546/you-cannot-use-dbutils-within-a-spark-job-or-other.html
<p>
I have a Databricks notebook where I pull in messages from an Event Hub and want to write them to blob storage which I have mounted to `/mnt/eventhubtarget`:</p>
</p><pre>query = messages.writeStream.option("checkpointLocation", "/mnt/eventhubtarget/rawcheckpoint").foreach(processRow)
query.start().awaitTermination()
</pre><p>
In `processRow`, it appears I am not able to use `dbutils`:</p>
</p><pre>def processRow(row):
fileName = GenerateComplexFileName(row["Body"])
dbutils.fs.put("/mnt/eventhubtarget/" + fileName, row["Body"])
</pre><p>
This throws the error I have copied below - what should I be doing in this case?</p><p>
&gt;You cannot use dbutils within a spark job or otherwise pickle it. If you need to use getArguments within a spark job, you have to get the argument before using it in the job. For example, if you have the following code:</p>
</p><pre> myRdd.map(lambda i: dbutils.args.getArgument("X") + str(i))
Then you should use it this way:
argX = dbutils.args.getArgument("X")
myRdd.map(lambda i: argX + str(i))
</pre>databricksnotebooksdbutilsThu, 31 Jan 2019 18:24:06 GMTChris HardieStopping cells from running in parallel during runAllhttps://forums.databricks.com/questions/16233/stopping-cells-from-running-in-parallel-during-run.html
<p>Sorry if I've forgotten something trivial here...</p><p>In previous notebooks I've had RunAll behavior that submitted each cell consecutively to the cluster (where textbox focus is automatically moved along with cell progress), which is the behavior I'd like. In my current notebook/cluster, runAll is submitting all cells concurrently.</p><p>thanks</p>notebooksThu, 03 Jan 2019 19:34:36 GMTthelazydogsbackCalling Python notebook from R notebook which return a value to R notebookhttps://forums.databricks.com/questions/16009/calling-python-notebook-from-r-notebook-which-retu.html
<p>I am calling the python notebook from R notebook</p><pre>%run path_to_notebook/test1</pre><p>the command is able to run the python notebook and print the value when it finishes running.</p><p>But I am not able to return the value from python to R instead of just printing it.</p><pre>dbutils.notebook.run </pre><p>does not work in R.</p><p>You are not able to call R notebooks from python notebooks.</p><p>Does anyone had a workaround for this problem?</p>pythonnotebooksazure databricksrTue, 11 Dec 2018 10:43:18 GMTapoorv parmarFind and replace across all jobs / notebookshttps://forums.databricks.com/questions/15870/find-and-replace-across-all-jobs-notebooks.html
<p>Is there a way to parse through all the notebooks and find a specific string? We're using python.</p>notebooksjobsfindMon, 26 Nov 2018 15:46:24 GMTCasinDisplaying HTML Outputhttps://forums.databricks.com/questions/15404/displaying-html-output.html
<p>
I am trying to display the html output or read in an html file to display in databricks notebook from <a href="https://github.com/pandas-profiling/pandas-profiling" target="_blank">pandas-profiling</a>.</p><pre>import pandas as pd
import pandas_profiling
df = pd.read_csv("/dbfs/FileStore/tables/my_data.csv", header='infer', parse_dates=True, encoding='UTF-8')
profile = pandas_profiling.ProfileReport(df)
Out[13]: &lt;pandas_profiling.ProfileReport at 0x7f30e0b55780&gt;
profile.to_html()
Out[19]: '&lt;!doctype html&gt;\n\n&lt;html lang="en"&gt;\n&lt;head&gt;\n &lt;meta charset="utf-8"&gt;\n\n &lt;title&gt;Profile report&lt;/title&gt;...*** WARNING: skipped 429584 bytes of output ***</p><p>Error on dislpayHtml()</p><pre>displayHTML(profile)
Py4JException: An exception was raised by the Python Proxy. Return Message: x</pre><p>
I could also output to html file:</p><pre>profile.to_file(outputfile="/tmp/my_profiling.html")</pre><p>
but how would I read that in? is there a %html command?</p>
</p>notebooksdisplaydisplayhtmlhtmlMon, 08 Oct 2018 17:52:19 GMTdaviddownload URL for databricks /FileStore contents not working. will CLI work if i used community edition?https://forums.databricks.com/questions/15086/download-url-for-databricks-filestore-contents-not.html
<p>Note: this question was originally posted on stack overflow:&gt;</p><p>https://stackoverflow.com/questions/52286157/download-url-for-databricks-filestore-contents-not-working-will-cli-work-if-i</p><p>I am experimenting w/ the Databricks cloud deployed Spark service.
I created some data and would like to download it to my machine rather than lose it.
This post:
https://stackoverflow.com/questions/49019706/databricks-download-a-dbfs-filestore-file-to-my-local-machine/49021261#49021261
suggested going to the URL -&gt; https://community.cloud.databricks.com/files whilst logged in to databricks
(I am using community edition by the way). </p><p>So, I tried that URL and I got back a plain text document with the content "1". That was not so useful. So, I then looked up the instructions for using the Databricks CLI, and
according to this page -&gt; https://docs.databricks.com/api/latest/authentication.html#token-management
I need to set up a personal access token, which I generate by clicking on the 'Access Tokens'
tab of Account Settings screen. However, I found no 'Access Tokens' tab. </p><p>So, I'm wondering: </p><ul>
<li> can I use the Databricks CLI to download the files I created (or is this only available for paid users) ? </li><li> if it is possible to use the CLI, then how do I generate an access token ? </li><li> If the CLI is not the way to go, then what is the proper URL to use to view the FileStore contents ? </li></ul><p>Just for completeness sake, I will note that when I did an 'ls' of /FileStore, I found that folder to be non-empty: </p><pre> %fs ls /FileStore/</pre>
<pre> dbfs:/FileStore/chris.txt chris.txt 20044
dbfs:/FileStore/tables/
etc etc..</pre>
</pre>sparknotebooksfilestoreThu, 13 Sep 2018 02:02:31 GMTbuildlackeyDoes scheduling a spark job to run every day in Databricks start up the Spark cluster automatically?https://forums.databricks.com/questions/15076/does-scheduling-a-spark-job-to-run-every-day-in-da.html
<p>I have created a Spark job in Databricks and want to schedule to run it once every 24 hours but my Spark cluster on Databricks is configured to shut down if it's inactive for 2 hours(for costing issues).</p><p>Does scheduling the job to run everyday automatically start up the Spark cluster or do I need to write script for that? </p>sparknotebooksazure databricksjob schedulingazure sparkTue, 11 Sep 2018 20:49:46 GMTMaryam khalajiHow to migrate notebooks/jobs/clusters from one Databricks account to another?https://forums.databricks.com/questions/14916/how-to-migrate-notebooksjobsclusters-from-one-data.html
<p>Hi Team,</p><p>Our business case requires that we need to decommission one Databricks Account and move its entire content(Notebooks, Jobs, Configurations etcetra) to the newly created account.</p><p>Kindly help us in knowing if this feature is currently available in Databricks?</p>databricksnotebooksjobsmigrationMon, 27 Aug 2018 06:17:20 GMTVishalSainsEditing notebooks is very slow and laggy lately, with 3-5 second delay when typing or selecting text. Is there anything that can be done?https://forums.databricks.com/questions/14847/editing-notebooks-is-very-slow-and-laggy-lately-wi.html
<p>Doesn't really matter what computer we are on or the internet connection, most of our notebooks are very slow to work on. They execute just fine as far as speed goes, but editing in the UI has a lot of lag.</p>notebooksperformanceuilagFri, 17 Aug 2018 21:39:46 GMTtnilsson@artemishealth.comProcess keep running but disconnects from notebookhttps://forums.databricks.com/questions/14726/process-keep-running-but-disconnects-from-notebook.html
<p>Hi, </p><p>I've noticed that sometimes, if I leave a python process (ie executing a cell) running (for example, training a deep neural network) that takes a lot of time, the process keeps running on the driver BUT the cell does not seem to be executing anymore. This happens often if I disconnect and log in again from another connection/computer.</p><p>I can see that the process is still running by checking the standard output logs of the driver in the cluster menu.</p><p>The notebook seems to be still "connected" because I can't execute new cells, but the cell that I left running is not producing any output nor can I cancel the process. Is there a way to "reconnect" the notebook somehow? I have been training a NN for a couple days now and I would like to retrieve the objects/results.</p>notebooksbugFri, 03 Aug 2018 08:46:46 GMTgpadresDropdown visually modified but value not updated when filled conditionallyhttps://forums.databricks.com/questions/14707/dropdown-visually-modified-but-value-not-updated-w.html
<p>Hi,</p><p>I only get this issue when using two dropdowns in the same notebook. The idea is to fill the options of the second conditionally with the values of the first.</p><p>I can make the UI to show the new values when the first control is modified, but the value I retrieve from the second dropdown persists to be the original unless it is changed on the UI.</p><p>An example to illustrate it, from this starting screen:<br><img src="/storage/attachments/870-capture1.png"></p><p>And after uploading DD1 value to "2" I obtain:</p><p><img src="/storage/attachments/871-capture2.png"></p><p>Is this the expected behavior? Someone knows a workaround?</p><p>Thank you.</p>databricksnotebookswidgetsuidropdownsWed, 01 Aug 2018 16:05:11 GMTEnrique Gurdiel