Custom preprocessors that help convert notebook content into MDX

Cell Tag Cheatsheet

These preprocessors allow you to make special comments to enable/disable them. Here is a list of all special comments:

All comments start with #meta or #cell_meta, which are both aliases for the same thing. For brevity, we will use #meta in this cheatsheet.

Black code formatting

#meta:tag=black will apply black code formatting.

Show/Hide Cells

  1. Remvoe entire cells: #meta:tag=remove_cell or #meta:tag=hide
  2. Remove output: #meta:tag=remove_output or #meta:tag=remove_output or #meta:tag=hide_outputs or #meta:tag=hide_output
  3. Remove input: same as above, except input instead of output.

Hiding specific lines of outptut

  1. Remove lines of output containing keywords: #meta:filter_words=FutureWarning,MultiIndex
  2. Show maximum number of lines of output: #meta:limit=6, will show only the first 6 lines

Hiding specific lines of input (code):

Use the comment #meta_hide_line to hide a specific line of code:

def show():
    a = 2
    b = 3 #meta_hide_line

Selecting Metaflow Steps

You can selectively show meataflow steps in the output logs:

  1. Show one step: #meta:show_steps=<step_name>
  2. Show multiple steps: #meta:show_steps=<step1_name>,<step2_name>

class InjectMeta[source]

InjectMeta(*args:Any, **kwargs:Any) :: Preprocessor

Allows you to inject metadata into a cell for further preprocessing with a comment.

To inject metadata make a comment in a cell with the following pattern: #cell_meta:{key=value}. Note that #meta is an alias for #cell_meta

For example, consider the following code:

_test_file = 'test_files/hello_world.ipynb'
first_cell = read_nb(_test_file)['cells'][0]
print(first_cell['source'])
#meta:show_steps=start,train
print('hello world')

At the moment, this cell has no metadata:

print(first_cell['metadata'])
{}

However, after we process this notebook with InjectMeta, the appropriate metadata will be injected:

c = Config()
c.NotebookExporter.preprocessors = [InjectMeta]
exp = NotebookExporter(config=c)
cells, _ = exp.from_filename(_test_file)
first_cell = json.loads(cells)['cells'][0]

assert first_cell['metadata'] == {'nbdoc': {'show_steps': 'start,train'}}
first_cell['metadata']
{'nbdoc': {'show_steps': 'start,train'}}

class StripAnsi[source]

StripAnsi(*args:Any, **kwargs:Any) :: Preprocessor

Strip Ansi Characters.

Gets rid of colors that are streamed from standard out, which can interfere with static site generators:

c, _ = run_preprocessor([StripAnsi], 'test_files/run_flow.ipynb')
assert not _re_ansi_escape.findall(c)

class InsertWarning[source]

InsertWarning(*args:Any, **kwargs:Any) :: Preprocessor

Insert Autogenerated Warning Into Notebook after the first cell.

This preprocessor inserts a warning in the markdown destination that the file is autogenerated. This warning is inserted in the second cell so we do not interfere with front matter.

c, _ = run_preprocessor([InsertWarning], 'test_files/hello_world.ipynb', display_results=True)
assert "<!-- WARNING: THIS FILE WAS AUTOGENERATED!" in c
```python
#meta:show_steps=start,train
print('hello world')
```

<CodeOutputBlock lang="python">

```
    hello world
```

</CodeOutputBlock>

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! Instead, edit the notebook w/the location & name as this file. -->


```python

```

class RmEmptyCode[source]

RmEmptyCode(*args:Any, **kwargs:Any) :: Preprocessor

Remove empty code cells.

Notice how this notebook has an empty code cell at the end:

show_plain_md('test_files/hello_world.ipynb')
```python
#meta:show_steps=start,train
print('hello world')
```

    hello world



```python

```

With RmEmptyCode these empty code cells are stripped from the markdown:

c, _ = run_preprocessor([RmEmptyCode], 'test_files/hello_world.ipynb', display_results=True)
assert len(re.findall('```python',c)) == 1
```python
#meta:show_steps=start,train
print('hello world')
```

<CodeOutputBlock lang="python">

```
    hello world
```

</CodeOutputBlock>

class MetaflowTruncate[source]

MetaflowTruncate(*args:Any, **kwargs:Any) :: Preprocessor

Remove the preamble and timestamp from Metaflow output.

When you run a metaflow Flow, you are presented with a fair amount of boilerpalte before the job starts running that is not necesary to show in the documentation:

show_plain_md('test_files/run_flow.ipynb')
```python
#meta:show_steps=start
!python myflow.py run
```

    Metaflow 2.5.3 executing MyFlow for user:hamel
    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint is happy!
    2022-03-14 17:28:44.983 Workflow starting (run-id 1647304124981100):
    2022-03-14 17:28:44.990 [1647304124981100/start/1 (pid 41951)] Task is starting.
    2022-03-14 17:28:45.630 [1647304124981100/start/1 (pid 41951)] this is the start
    2022-03-14 17:28:45.704 [1647304124981100/start/1 (pid 41951)] Task finished successfully.
    2022-03-14 17:28:45.710 [1647304124981100/end/2 (pid 41954)] Task is starting.
    2022-03-14 17:28:46.348 [1647304124981100/end/2 (pid 41954)] this is the end
    2022-03-14 17:28:46.422 [1647304124981100/end/2 (pid 41954)] Task finished successfully.
    2022-03-14 17:28:46.423 Done!
    

We don't need to see the beginning part that validates the graph, and we don't need the time-stamps either. We can remove these with the MetaflowTruncate preprocessor:

c, _ = run_preprocessor([MetaflowTruncate], 'test_files/run_flow.ipynb', display_results=True)
assert 'Validating your flow...' not in c
```python
#meta:show_steps=start
!python myflow.py run
```

<CodeOutputBlock lang="python">

```
     Workflow starting (run-id 1647304124981100):
     [1647304124981100/start/1 (pid 41951)] Task is starting.
     [1647304124981100/start/1 (pid 41951)] this is the start
     [1647304124981100/start/1 (pid 41951)] Task finished successfully.
     [1647304124981100/end/2 (pid 41954)] Task is starting.
     [1647304124981100/end/2 (pid 41954)] this is the end
     [1647304124981100/end/2 (pid 41954)] Task finished successfully.
     Done!
    
```

</CodeOutputBlock>

class UpdateTags[source]

UpdateTags(*args:Any, **kwargs:Any) :: Preprocessor

Create cell tags based upon comment #cell_meta:tags=<tag>

Consider this python notebook prior to processing. The comments can be used configure the visibility of cells.

  • #cell_meta:tags=remove_output will just remove the output
  • #cell_meta:tags=remove_input will just remove the input
  • #cell_meta:tags=remove_cell will remove both the input and output

Note that you can use #cell_meta:tag or #cell_meta:tags as they are both aliases for the same thing. Here is a notebook before preprocessing:

show_plain_md('test_files/visibility.ipynb')
# Configuring Cell Visibility

#### Cell with the comment `#cell_meta:tag=remove_output`


```
#cell_meta:tag=remove_output
print('the output is removed, so you can only see the print statement.')
```

    the output is removed, so you can only see the print statement.


#### Cell with the comment `#cell_meta:tag=remove_input`


```
#cell_meta:tag=remove_input
print('hello, you cannot see the code that created me.')
```

    hello, you cannot see the code that created me.


#### Cell with the comment `#cell_meta:tag=remove_cell`


```
#cell_meta:tag=remove_cell
print('you will not be able to see this cell at all')
```

    you will not be able to see this cell at all



```
#cell_meta:tags=remove_input,remove_output
print('you will not be able to see this cell at all either')
```

    you will not be able to see this cell at all either


UpdateTags is meant to be used with InjectMeta and TagRemovePreprocessor to configure the visibility of cells in rendered docs. Here you can see what the notebook looks like after pre-processing:

_test_file = 'test_files/visibility.ipynb'
c = Config()
c.TagRemovePreprocessor.remove_cell_tags = ("remove_cell",)
c.TagRemovePreprocessor.remove_all_outputs_tags = ('remove_output',)
c.TagRemovePreprocessor.remove_input_tags = ('remove_input',)
c.MarkdownExporter.preprocessors = [InjectMeta, UpdateTags, TagRemovePreprocessor]
exp = MarkdownExporter(config=c)
result = exp.from_filename(_test_file)[0]

# show the results
assert 'you will not be able to see this cell at all either' not in result
print(result)
# Configuring Cell Visibility

#### Cell with the comment `#cell_meta:tag=remove_output`


```
#cell_meta:tag=remove_output
print('the output is removed, so you can only see the print statement.')
```

#### Cell with the comment `#cell_meta:tag=remove_input`

    hello, you cannot see the code that created me.


#### Cell with the comment `#cell_meta:tag=remove_cell`

class MetaflowSelectSteps[source]

MetaflowSelectSteps(*args:Any, **kwargs:Any) :: Preprocessor

Hide Metaflow steps in output based on cell metadata.

MetaflowSelectSteps is meant to be used with InjectMeta to only show specific steps in the output logs from Metaflow.

For example, if you want to only show the start and train steps in your flow, you would annotate your cell with the following pattern: #cell_meta:show_steps=<step_name>

Note that show_step and show_steps are aliases for convenience, so you don't need to worry about the s at the end.

In the below example, #cell_meta:show_steps=start,train shows the start and train steps, whereas #cell_meta:show_steps=train only shows the train step:

c, _ = run_preprocessor([InjectMeta, MetaflowSelectSteps], 
                        'test_files/run_flow_showstep.ipynb', 
                        display_results=True)
assert 'end' not in c
```
#cell_meta:show_steps=start,train
!python myflow.py run
```

<CodeOutputBlock lang="">

```
    ...
    2022-02-15 14:01:14.810 [1644962474801237/start/1 (pid 46758)] Task is starting.
    2022-02-15 14:01:15.433 [1644962474801237/start/1 (pid 46758)] this is the start
    2022-02-15 14:01:15.500 [1644962474801237/start/1 (pid 46758)] Task finished successfully.
    ...
    2022-02-15 14:01:15.507 [1644962474801237/train/2 (pid 46763)] Task is starting.
    2022-02-15 14:01:16.123 [1644962474801237/train/2 (pid 46763)] the train step
    2022-02-15 14:01:16.188 [1644962474801237/train/2 (pid 46763)] Task finished successfully.
    ...
```

</CodeOutputBlock>


```
#cell_meta:show_steps=train
!python myflow.py run
```

<CodeOutputBlock lang="">

```
    ...
    2022-02-15 14:01:18.924 [1644962478210532/train/2 (pid 46783)] Task is starting.
    2022-02-15 14:01:19.566 [1644962478210532/train/2 (pid 46783)] the train step
    2022-02-15 14:01:19.632 [1644962478210532/train/2 (pid 46783)] Task finished successfully.
    ...
```

</CodeOutputBlock>

class FilterOutput[source]

FilterOutput(*args:Any, **kwargs:Any) :: Preprocessor

Hide Output Based on Keywords.

If we want to exclude output with certain keywords, we can use the #meta:filter_words comment. For example, if we wanted to ignore all output that contains the text FutureWarning or MultiIndex we can use the comment:

#meta:filter_words=FutureWarning,MultiIndex

Consider this output below:

show_plain_md('test_files/strip_out.ipynb')
```python
#meta:filter_words=FutureWarning,MultiIndex
#meta:show_steps=end
!python serialize_xgb_dmatrix.py run
```

    /Users/hamel/opt/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
      from pandas import MultiIndex, Int64Index
    Metaflow 2.5.3 executing SerializeXGBDataFlow for user:hamel
    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint is happy!
    2022-03-30 07:04:02.315 Workflow starting (run-id 1648649042312116):
    2022-03-30 07:04:02.322 [1648649042312116/start/1 (pid 2459)] Task is starting.
    2022-03-30 07:04:03.122 [1648649042312116/start/1 (pid 2459)] /Users/hamel/opt/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
    2022-03-30 07:04:03.508 [1648649042312116/start/1 (pid 2459)] from pandas import MultiIndex, Int64Index
    2022-03-30 07:04:03.510 [1648649042312116/start/1 (pid 2459)] Task finished successfully.
    2022-03-30 07:04:03.517 [1648649042312116/end/2 (pid 2462)] Task is starting.
    2022-03-30 07:04:04.315 [1648649042312116/end/2 (pid 2462)] /Users/hamel/opt/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
    2022-03-30 07:04:04.563 [1648649042312116/end/2 (pid 2462)] there are 5 rows in the data.
    2022-03-30 07:04:04.705 [1648649042312116/end/2 (pid 2462)] from pandas import MultiIndex, Int64Index
    2022-03-30 07:04:04.707 [1648649042312116/end/2 (pid 2462)] Task finished successfully.
    2022-03-30 07:04:04.707 Done!
    

Notice how the lines containing the terms FutureWarning or MultiIndex are stripped out:

c, _ = run_preprocessor([InjectMeta, FilterOutput], 
                        'test_files/strip_out.ipynb', 
                        display_results=True)
assert 'FutureWarning:' not in c and 'from pandas import MultiIndex, Int64Index' not in c
```python
#meta:filter_words=FutureWarning,MultiIndex
#meta:show_steps=end
!python serialize_xgb_dmatrix.py run
```

<CodeOutputBlock lang="python">

```
    Metaflow 2.5.3 executing SerializeXGBDataFlow for user:hamel
    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint is happy!
    2022-03-30 07:04:02.315 Workflow starting (run-id 1648649042312116):
    2022-03-30 07:04:02.322 [1648649042312116/start/1 (pid 2459)] Task is starting.
    2022-03-30 07:04:03.510 [1648649042312116/start/1 (pid 2459)] Task finished successfully.
    2022-03-30 07:04:03.517 [1648649042312116/end/2 (pid 2462)] Task is starting.
    2022-03-30 07:04:04.563 [1648649042312116/end/2 (pid 2462)] there are 5 rows in the data.
    2022-03-30 07:04:04.707 [1648649042312116/end/2 (pid 2462)] Task finished successfully.
    2022-03-30 07:04:04.707 Done!
    
```

</CodeOutputBlock>

class Limit[source]

Limit(*args:Any, **kwargs:Any) :: Preprocessor

Limit The Number of Lines Of Output Based on Keywords.

c, _ = run_preprocessor([InjectMeta, Limit], 
                        'test_files/limit.ipynb', 
                        display_results=True)

_res = """```
    hello
    hello
    hello
    hello
    hello
    ...
```"""
assert _res in c
```python
#meta:limit=6
!python serialize_xgb_dmatrix.py run
```

<CodeOutputBlock lang="python">

```
    /Users/hamel/opt/anaconda3/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
      from pandas import MultiIndex, Int64Index
    Metaflow 2.5.3 executing SerializeXGBDataFlow for user:hamel
    Validating your flow...
        The graph looks good!
    Running pylint...
    ...
```

</CodeOutputBlock>


```python
#meta:limit=5
print('\n'.join(['hello']*10))
```

<CodeOutputBlock lang="python">

```
    hello
    hello
    hello
    hello
    hello
    ...
```

</CodeOutputBlock>

class HideInputLines[source]

HideInputLines(*args:Any, **kwargs:Any) :: Preprocessor

Hide lines of code in code cells with the comment #meta_hide_line at the end of a line of code.

You can use the special comment #meta_hide_line to hide a specific line of code in a code cell. This is what the code looks like before:

show_plain_md('test_files/hide_lines.ipynb')
```python
def show():
    a = 2
    b = 3 #meta_hide_line
```

and after:

c, _ = run_preprocessor([InjectMeta, HideInputLines], 
                        'test_files/hide_lines.ipynb', 
                        display_results=True)
```python
def show():
    a = 2
```

class WriteTitle[source]

WriteTitle(*args:Any, **kwargs:Any) :: Preprocessor

Modify the code-fence with the filename upon %%writefile cell magic.

WriteTitle creates the proper code-fence with a title in the situation where the %%writefile magic is used.

For example, here are contents before pre-processing:

show_plain_md('test_files/writefile.ipynb')
A test notebook


```python
%%writefile myflow.py
from metaflow import FlowSpec, step

class MyFlow(FlowSpec):
    
    @step
    def start(self):
        print('this is the start')
        self.next(self.train)
    
    @step
    def train(self):
        print('the train step')
        self.next(self.end)
    
    @step
    def end(self):
        print('this is the end')

if __name__ == '__main__':
    MyFlow()
```

    Overwriting myflow.py



```python
%%writefile hello.txt

Hello World
```

    Overwriting hello.txt


When we use WriteTitle, you will see the code-fence will change appropriately:

c, _ = run_preprocessor([WriteTitle], 'test_files/writefile.ipynb', display_results=True)
assert '```py title="myflow.py"' in c and '```txt title="hello.txt"' in c
A test notebook


```py title="myflow.py"
%%writefile myflow.py
from metaflow import FlowSpec, step

class MyFlow(FlowSpec):
    
    @step
    def start(self):
        print('this is the start')
        self.next(self.train)
    
    @step
    def train(self):
        print('the train step')
        self.next(self.end)
    
    @step
    def end(self):
        print('this is the end')

if __name__ == '__main__':
    MyFlow()
```


```txt title="hello.txt"
%%writefile hello.txt

Hello World
```

class CleanFlags[source]

CleanFlags(*args:Any, **kwargs:Any) :: Preprocessor

A preprocessor to remove Flags

c, _ = run_preprocessor([CleanFlags], _gen_nb())
assert '#notest' not in c

class CleanMagics[source]

CleanMagics(*args:Any, **kwargs:Any) :: Preprocessor

A preprocessor to remove cell magic commands and #cell_meta: comments

CleanMagics strips magic cell commands %% so they do not appear in rendered markdown files:

c, _ = run_preprocessor([WriteTitle, CleanMagics], 'test_files/writefile.ipynb', display_results=True)
assert '%%' not in c
A test notebook


```py title="myflow.py"
from metaflow import FlowSpec, step

class MyFlow(FlowSpec):
    
    @step
    def start(self):
        print('this is the start')
        self.next(self.train)
    
    @step
    def train(self):
        print('the train step')
        self.next(self.end)
    
    @step
    def end(self):
        print('this is the end')

if __name__ == '__main__':
    MyFlow()
```


```txt title="hello.txt"
Hello World
```

Here is how CleanMagics Works on the file with the Metaflow log outputs from earlier, we can see that the #cell_meta comments are gone:

c, _ = run_preprocessor([InjectMeta, MetaflowSelectSteps, CleanMagics], 
                        'test_files/run_flow_showstep.ipynb', display_results=True)
```
!python myflow.py run
```

<CodeOutputBlock lang="">

```
    ...
    2022-02-15 14:01:14.810 [1644962474801237/start/1 (pid 46758)] Task is starting.
    2022-02-15 14:01:15.433 [1644962474801237/start/1 (pid 46758)] this is the start
    2022-02-15 14:01:15.500 [1644962474801237/start/1 (pid 46758)] Task finished successfully.
    ...
    2022-02-15 14:01:15.507 [1644962474801237/train/2 (pid 46763)] Task is starting.
    2022-02-15 14:01:16.123 [1644962474801237/train/2 (pid 46763)] the train step
    2022-02-15 14:01:16.188 [1644962474801237/train/2 (pid 46763)] Task finished successfully.
    ...
```

</CodeOutputBlock>


```
!python myflow.py run
```

<CodeOutputBlock lang="">

```
    ...
    2022-02-15 14:01:18.924 [1644962478210532/train/2 (pid 46783)] Task is starting.
    2022-02-15 14:01:19.566 [1644962478210532/train/2 (pid 46783)] the train step
    2022-02-15 14:01:19.632 [1644962478210532/train/2 (pid 46783)] Task finished successfully.
    ...
```

</CodeOutputBlock>

class Black[source]

Black(*args:Any, **kwargs:Any) :: Preprocessor

Format code that has a cell tag black

Black is a preprocessor that will format cells that have the cell tag black with Python black code formatting. You can apply tags via the notebook interface or with a comment meta:tag=black.

This is how cell formatting looks before black formatting:

show_plain_md('test_files/black.ipynb')
Format with black


```python
#meta:tag=black
j = [1,
     2,
     3
]
```


```python
%%writefile black_test.py
#meta:tag=black


def very_important_function(template: str, *variables, file: os.PathLike, engine: str, header: bool = True, debug: bool = False):
    """Applies `variables` to the `template` and writes to `file`."""
    with open(file, 'w') as f:
        pass
```

After black is applied, the code looks like this:

c, _ = run_preprocessor([InjectMeta, UpdateTags, CleanMagics, Black], 'test_files/black.ipynb', display_results=True)
assert '[1, 2, 3]' in c
assert 'very_important_function(\n    template: str,' in c
Format with black


```python
j = [1, 2, 3]
```


```python
def very_important_function(
    template: str,
    *variables,
    file: os.PathLike,
    engine: str,
    header: bool = True,
    debug: bool = False
):
    """Applies `variables` to the `template` and writes to `file`."""
    with open(file, "w") as f:
        pass
```

class CatFiles[source]

CatFiles(*args:Any, **kwargs:Any) :: Preprocessor

Cat arbitrary files with %cat

class BashIdentify[source]

BashIdentify(*args:Any, **kwargs:Any) :: Preprocessor

A preprocessor to identify bash commands and mark them appropriately

When we issue a shell command in a notebook with !, we need to change the code-fence from python to bash and remove the !:

c, _ = run_preprocessor([MetaflowTruncate, CleanMagics, BashIdentify], 'test_files/run_flow.ipynb', display_results=True)
assert "```bash" in c and '!python' not in c
```bash
python myflow.py run
```

<CodeOutputBlock lang="bash">

```
     Workflow starting (run-id 1647304124981100):
     [1647304124981100/start/1 (pid 41951)] Task is starting.
     [1647304124981100/start/1 (pid 41951)] this is the start
     [1647304124981100/start/1 (pid 41951)] Task finished successfully.
     [1647304124981100/end/2 (pid 41954)] Task is starting.
     [1647304124981100/end/2 (pid 41954)] this is the end
     [1647304124981100/end/2 (pid 41954)] Task finished successfully.
     Done!
    
```

</CodeOutputBlock>

class CleanShowDoc[source]

CleanShowDoc(*args:Any, **kwargs:Any) :: Preprocessor

Ensure that ShowDoc output gets cleaned in the associated notebook.

_result, _ = run_preprocessor([CleanShowDoc], 'test_files/doc.ipynb')
assert '<HTMLRemove>' not in _result
print(_result)
```python
from fastcore.all import test_eq
from nbdoc.showdoc import ShowDoc
```


<DocSection type="function" name="test_eq" module="fastcore.test" link="https://github.com/fastcore/tree/masterhttps://github.com/fastai/fastcore/tree/master/fastcore/test.py#L34">
<SigArgSection>
<SigArg name="a" /><SigArg name="b" />
</SigArgSection>
<Description summary="`test` that `a==b`" />

</DocSection>


Composing Preprocessors Into A Pipeline

Lets see how you can compose all of these preprocessors together to process notebooks appropriately:

get_mdx_exporter[source]

get_mdx_exporter(template_file='ob.tpl')

A mdx notebook exporter which composes many pre-processors together.

get_mdx_exporter combines all of the previous preprocessors, along with the built in TagRemovePreprocessor to allow for hiding cell inputs/outputs based on cell tags. Here is an example of markdown generated from a notebook with the default preprocessing:

show_plain_md('test_files/example_input.ipynb')
---
title: my hello page title
description: my hello page description
hide_table_of_contents: true
---
## This is a test notebook

This is a shell command:


```python
! echo hello
```

    hello


We are writing a python script to disk:


```python
%%writefile myflow.py

from metaflow import FlowSpec, step

class MyFlow(FlowSpec):
    
    @step
    def start(self):
        print('this is the start')
        self.next(self.end)
    
    @step
    def end(self):
        print('this is the end')

if __name__ == '__main__':
    MyFlow()
```

    Overwriting myflow.py


Another shell command where we run a flow:


```python
#cell_meta:show_steps=start
! python myflow.py run
```

    Metaflow 2.5.3 executing MyFlow for user:hamel
    Validating your flow...
        The graph looks good!
    Running pylint...
        Pylint is happy!
    2022-03-10 22:52:37.069 Workflow starting (run-id 1646981557065941):
    2022-03-10 22:52:37.077 [1646981557065941/start/1 (pid 54733)] Task is starting.
    2022-03-10 22:52:37.752 [1646981557065941/start/1 (pid 54733)] this is the start
    2022-03-10 22:52:37.841 [1646981557065941/start/1 (pid 54733)] Task finished successfully.
    2022-03-10 22:52:37.849 [1646981557065941/end/2 (pid 54736)] Task is starting.
    2022-03-10 22:52:38.519 [1646981557065941/end/2 (pid 54736)] this is the end
    2022-03-10 22:52:38.604 [1646981557065941/end/2 (pid 54736)] Task finished successfully.
    2022-03-10 22:52:38.604 Done!
    

This is a normal python cell:


```python
a = 2
a
```




    2



The next cell has a cell tag of `remove_input`, so you should only see the output of the cell:


```python
#meta:tag=remove_input
print('hello, you should not see the print statement that produced me')
```

    hello, you should not see the print statement that produced me


Pandas DataFrame:


```python
import pandas as pd
pd.read_csv('https://github.com/outerbounds/.data/raw/main/hospital_readmission.csv').head(3).iloc[:, :3]
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>time_in_hospital</th>
      <th>num_lab_procedures</th>
      <th>num_procedures</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>14</td>
      <td>41</td>
      <td>0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>30</td>
      <td>0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>5</td>
      <td>66</td>
      <td>0</td>
    </tr>
  </tbody>
</table>
</div>



A matplotlib plot:


```python
from matplotlib import pyplot as plt
plt.plot(range(20), range(20))
plt.plot(range(10), range(10))
plt.show()
```


    
![png](output_15_0.png)
    


Here is the same notebook, but with all of the preprocessors that we defined in this module. Additionally, we hide the input of the last cell which prints hello, you should not see the print statement... by using the built in TagRemovePreprocessor:

exp = get_mdx_exporter()
print(exp.from_filename('test_files/example_input.ipynb')[0])
---
title: my hello page title
description: my hello page description
hide_table_of_contents: true
---


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! Instead, edit the notebook w/the location & name as this file. -->

## This is a test notebook

This is a shell command:


```bash
echo hello
```

<CodeOutputBlock lang="bash">

```
    hello
```

</CodeOutputBlock>

We are writing a python script to disk:


```py title="myflow.py"
from metaflow import FlowSpec, step

class MyFlow(FlowSpec):
    
    @step
    def start(self):
        print('this is the start')
        self.next(self.end)
    
    @step
    def end(self):
        print('this is the end')

if __name__ == '__main__':
    MyFlow()
```

Another shell command where we run a flow:


```bash
python myflow.py run
```

<CodeOutputBlock lang="bash">

```
    ...
     [1646981557065941/start/1 (pid 54733)] Task is starting.
     [1646981557065941/start/1 (pid 54733)] this is the start
     [1646981557065941/start/1 (pid 54733)] Task finished successfully.
    ...
```

</CodeOutputBlock>

This is a normal python cell:


```python
a = 2
a
```

<CodeOutputBlock lang="python">

```
    2
```

</CodeOutputBlock>

The next cell has a cell tag of `remove_input`, so you should only see the output of the cell:

<CodeOutputBlock lang="python">

```
    hello, you should not see the print statement that produced me
```

</CodeOutputBlock>

Pandas DataFrame:


```python
import pandas as pd
pd.read_csv('https://github.com/outerbounds/.data/raw/main/hospital_readmission.csv').head(3).iloc[:, :3]
```
    
<HTMLOutputBlock >




```html
<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>time_in_hospital</th>
      <th>num_lab_procedures</th>
      <th>num_procedures</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>14</td>
      <td>41</td>
      <td>0</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
      <td>30</td>
      <td>0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>5</td>
      <td>66</td>
      <td>0</td>
    </tr>
  </tbody>
</table>
</div>
```



</HTMLOutputBlock>

A matplotlib plot:


```python
from matplotlib import pyplot as plt
plt.plot(range(20), range(20))
plt.plot(range(10), range(10))
plt.show()
```

<CodeOutputBlock lang="python">

```
    
![png](_example_input_files/output_15_0.png)
    
```

</CodeOutputBlock>