README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114

# PicoStream

PicoStream is a high-performance Python application for streaming data from a PicoScope to an HDF5 file, with an optional, decoupled live visualization. It is designed for robust, high-speed data logging where data integrity is critical.

## Features

- **Robust Architecture**: Separates data acquisition from disk I/O using a producer-consumer pattern with a large, shared memory buffer pool to prevent data loss.
- **Zero-Risk Live Plotting**: The plotter reads from the HDF5 file, not the live data stream. This ensures that a slow or crashing GUI cannot interfere with data acquisition.
- **Efficient Visualization**: Uses `pyqtgraph` and a Numba-accelerated min-max decimation algorithm to display large datasets with minimal CPU impact.
- **Flexible Data Reading**: The included `PicoStreamReader` class allows for easy post-processing, including on-the-fly decimation ('mean' or 'min_max').

## Installation

1. `pip install -e .`

2. Install the official PicoSDK from Pico Technology. Linux can take some additional work, see the AUR wiki for details.

## Usage

The primary way to use the package is through the `picostream` command-line tool.

### Acquisition

The following command starts an acquisition at 62.5 MS/s and saves the data to `my_data.hdf5` with a live plot.

```bash
picostream -s 62.5 -o my_data.hdf5 --plot
```

Run `picostream --help` for a full list of options.

### Viewing an Existing File

The plotter can be run as a standalone tool to view any compatible HDF5 file.

```bash
python -m picostream.dfplot /path/to/your/data.hdf5
```

## Data Analysis with `PicoStreamReader`

The output HDF5 file contains raw ADC counts and metadata. The `PicoStreamReader` class in `picostream.reader` is the recommended way to read and process this data.

### Example: Processing a File in Chunks

Here is an example of how to use `PicoStreamReader` to iterate through a large file and perform analysis without loading the entire dataset into memory.

```python
import numpy as np
from picostream.reader import PicoStreamReader

# Use the reader as a context manager
with PicoStreamReader('my_data.hdf5') as reader:
    # Metadata is available as attributes after opening
    sample_rate_sps = 1e9 / reader.sample_interval_ns
    print(f"File contains {reader.num_samples:,} samples.")
    print(f"Sample rate: {sample_rate_sps / 1e6:.2f} MS/s")

    # Example: Iterate through the file with 10x decimation
    print("\nProcessing data with 10x 'min_max' decimation...")
    for times, voltages_mv in reader.get_block_iter(
        chunk_size=10_000_000, decimation_factor=10, decimation_mode='min_max'
    ):
        # The 'times' and 'voltages_mv' arrays are now decimated.
        # Process the smaller data chunk here.
        print(f"  - Processed a chunk of {voltages_mv.size} decimated points.")
        
    print("\nFinished processing.")
```

## API Reference: `PicoStreamReader`

The `PicoStreamReader` provides a simple and efficient interface for accessing data.

### Initialization

#### `__init__(self, hdf5_path: str)`
Initializes the reader. The file is opened and metadata is read when the object is used as a context manager.

### Metadata Attributes

These attributes are populated from the HDF5 file's metadata when the reader is opened.

-   `num_samples: int`: Total number of raw samples in the dataset.
-   `sample_interval_ns: float`: The time interval between samples in nanoseconds.
-   `voltage_range_v: float`: The configured voltage range (e.g., `20.0` for ±20V).
-   `max_adc_val: int`: The maximum ADC count value (e.g., 32767).
-   `analog_offset_v: float`: The configured analog offset in Volts.
-   `downsample_mode: str`: The hardware downsampling mode used during acquisition (`'average'` or `'aggregate'`).
-   `hardware_downsample_ratio: int`: The hardware downsampling ratio used.

### Data Access Methods

#### `get_block_iter(self, chunk_size: int = 1_000_000, decimation_factor: int = 1, decimation_mode: str = "mean") -> Generator`
Returns a generator that yields data blocks as `(times, voltages)` tuples for the entire dataset. This is the recommended method for processing large files.
-   `chunk_size`: The number of *raw* samples to read from the file for each chunk.
-   `decimation_factor`: The factor by which to decimate the data.
-   `decimation_mode`: The decimation method (`'mean'` or `'min_max'`).

#### `get_next_block(self, chunk_size: int, decimation_factor: int = 1, decimation_mode: str = "mean") -> Tuple | None`
Retrieves the next sequential block of data. Returns `None` when the end of the file is reached. Use `reset()` to start over.

#### `get_block(self, size: int, start: int = 0, decimation_factor: int = 1, decimation_mode: str = "mean") -> Tuple`
Retrieves a specific block of data from the file.
-   `size`: The number of *raw* samples to retrieve.
-   `start`: The starting sample index.

#### `reset(self) -> None`
Resets the internal counter for `get_next_block()` to the beginning of the file.


## Acknowledgements

This package began as a fork of [JoshHarris2108/pico_streaming](https://github.com/JoshHarris2108/pico_streaming) (which is unlicensed). I acknowledge and appreciate Josh's original idea/architecture.