PicoStream
PicoStream is a high-performance Python application for streaming data from a PicoScope to an HDF5 file, with an optional, decoupled live visualization. It is designed for robust, high-speed data logging where data integrity is critical.
Features
- Robust Architecture: Separates data acquisition from disk I/O using a producer-consumer pattern with a large, shared memory buffer pool to prevent data loss.
- Zero-Risk Live Plotting: The plotter reads from the HDF5 file, not the live data stream. This ensures that a slow or crashing GUI cannot interfere with data acquisition.
- Efficient Visualization: Uses
pyqtgraphand a Numba-accelerated min-max decimation algorithm to display large datasets with minimal CPU impact. - Live-Only Mode: Optional buffer size limiting for pure real-time monitoring without generating large files.
- Flexible Data Reading: The included
PicoStreamReaderclass allows for easy post-processing, including on-the-fly decimation ('mean' or 'min_max').
Installation
-
pip install -e . -
Install the official PicoSDK from Pico Technology. Linux can take some additional work, see the AUR wiki for details.
Usage
The primary way to use the package is through the picostream command-line tool.
Acquisition
The following commands show different acquisition modes:
# Standard acquisition (saves all data)
picostream -s 62.5 -o my_data.hdf5 --plot
# Live-only mode (limited buffer)
picostream -s 62.5 --plot --max-buff-sec 60
Run picostream --help for a full list of options.
Live-Only Mode
For pure real-time monitoring without saving large amounts of data, use the --max-buff-sec option:
picostream -s 62.5 --plot --max-buff-sec 60
This limits the HDF5 file to the last 60 seconds of data, automatically overwriting older data. Perfect for continuous monitoring applications where you only need recent data visible.
Benefits: - Prevents disk space issues during long acquisitions - Maintains full plotting functionality - File size stays constant after initial buffer fill - No performance impact on acquisition
Viewing an Existing File
The plotter can be run as a standalone tool to view any compatible HDF5 file.
python -m picostream.dfplot /path/to/your/data.hdf5
Data Analysis with PicoStreamReader
The output HDF5 file contains raw ADC counts and metadata. The PicoStreamReader class in picostream.reader is the recommended way to read and process this data.
Example: Processing a File in Chunks
Here is an example of how to use PicoStreamReader to iterate through a large file and perform analysis without loading the entire dataset into memory.
import numpy as np
from picostream.reader import PicoStreamReader
# Use the reader as a context manager
with PicoStreamReader('my_data.hdf5') as reader:
# Metadata is available as attributes after opening
sample_rate_sps = 1e9 / reader.sample_interval_ns
print(f"File contains {reader.num_samples:,} samples.")
print(f"Sample rate: {sample_rate_sps / 1e6:.2f} MS/s")
# Example: Iterate through the file with 10x decimation
print("\nProcessing data with 10x 'min_max' decimation...")
for times, voltages_mv in reader.get_block_iter(
chunk_size=10_000_000, decimation_factor=10, decimation_mode='min_max'
):
# The 'times' and 'voltages_mv' arrays are now decimated.
# Process the smaller data chunk here.
print(f" - Processed a chunk of {voltages_mv.size} decimated points.")
print("\nFinished processing.")
API Reference: PicoStreamReader
The PicoStreamReader provides a simple and efficient interface for accessing data.
Initialization
__init__(self, hdf5_path: str)
Initializes the reader. The file is opened and metadata is read when the object is used as a context manager.
Metadata Attributes
These attributes are populated from the HDF5 file's metadata when the reader is opened.
num_samples: int: Total number of raw samples in the dataset.sample_interval_ns: float: The time interval between samples in nanoseconds.voltage_range_v: float: The configured voltage range (e.g.,20.0for ±20V).max_adc_val: int: The maximum ADC count value (e.g., 32767).analog_offset_v: float: The configured analog offset in Volts.downsample_mode: str: The hardware downsampling mode used during acquisition ('average'or'aggregate').hardware_downsample_ratio: int: The hardware downsampling ratio used.
Data Access Methods
get_block_iter(self, chunk_size: int = 1_000_000, decimation_factor: int = 1, decimation_mode: str = "mean") -> Generator
Returns a generator that yields data blocks as (times, voltages) tuples for the entire dataset. This is the recommended method for processing large files.
- chunk_size: The number of raw samples to read from the file for each chunk.
- decimation_factor: The factor by which to decimate the data.
- decimation_mode: The decimation method ('mean' or 'min_max').
get_next_block(self, chunk_size: int, decimation_factor: int = 1, decimation_mode: str = "mean") -> Tuple | None
Retrieves the next sequential block of data. Returns None when the end of the file is reached. Use reset() to start over.
get_block(self, size: int, start: int = 0, decimation_factor: int = 1, decimation_mode: str = "mean") -> Tuple
Retrieves a specific block of data from the file.
- size: The number of raw samples to retrieve.
- start: The starting sample index.
reset(self) -> None
Resets the internal counter for get_next_block() to the beginning of the file.
Changelog
Version 0.2.0
- New Feature: Added live-only mode with
--max-buff-secoption for buffer size limiting - Enhancement: Improved plotter to handle buffer resets gracefully
- Enhancement: Added total sample count tracking across buffer resets
- Enhancement: Skip verification step in live-only mode for better performance
- Documentation: Updated README with live-only mode usage examples
Version 0.1.0
- Initial release with core streaming and plotting functionality
Acknowledgements
This package began as a fork of JoshHarris2108/pico_streaming (which is unlicensed). I acknowledge and appreciate Josh's original idea/architecture.
