主要内容

KeyValueDatastore

用于与键值配对数据一起使用的数据存储MapReduce

描述

KeyValueDatastoreobjects are associated with files containing key-value pair data that are outputs of or inputs toMapReduce. Use theKeyValueDatastoreproperties to specify how you want to access the data. Use dot notation to view or modify a particular property of aKeyValueDatastore目的:

ds = datastore("mapredout.mat"); ds.ReadSize = 20;

您还可以指定KeyValueDatastore使用名称值参数参数的属性,当您使用该数据存储使用该数据存储datastorefunction:

ds = datastore("mapredout.mat","ReadSize",20);

Creation

创造KeyValueDatastore使用datastorefunction.

Properties

expand all

Files included in the datastore, specified as ann-1个字符向量或字符串数​​组的单元格数组,其中每个字符向量或字符串都是文件的完整路径。这些是由locationargument to thedatastorefunction. Thelocation参数包含本地文件系统,网络文件系统或受支持的远程位置(例如Amazon S3™,Windows Azure)上的文件的完整路径万博1manbetx®Blob Storage, and HDFS™. For more information, seeWork with Remote Data.

The files must be either MAT-files or Sequence files generated by theMapReducefunction.

例子:["C:\dir\data\file1.mat";"C:\dir\data\file2.mat"]

例子:[“ s3://bucketname/path_to_files/your_file01.mat”;”

Data Types:细胞|string

File type, specified as either“垫”for MAT-files or"seq"for sequence files. By default, the output ofMapReduce对阵Hadoop®is a datastore containing sequence files. By default, the output of all otherMapReduceoperations is a datastore containing MAT-files.

Data Types:细胞|string

最大数量的键值对读取在呼叫中或者previewfunctions, specified as a positive integer.

Alternate file system root paths, specified as the name-value argument consisting of“替代filesystemroots”and a string vector or a cell array. Use“替代filesystemroots”当您在本地计算机上创建数据存储时,需要访问和处理另一台计算机上的数据(可能是另一个操作系统)。另外,使用并行计算工具箱™和MATLAB®Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use“替代filesystemroots”关联根路径。

  • To associate a set of root paths that are equivalent to one another, specify“替代filesystemroots”as a string vector. For example,

    ["Z:\datasets","/mynetwork/datasets"]

  • To associate multiple sets of root paths that are equivalent for the datastore, specify“替代filesystemroots”作为包含多行的单元格数组,其中每一行代表一组等效的根路径。将单元阵列中的每一行指定为字符串向量或字符向量的单元格数组。例如:

    • Specify“替代filesystemroots”作为字符串向量的单元格数组。

      {["Z:\datasets", "/mynetwork/datasets"];... ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}

    • Alternatively, specify“替代filesystemroots”as a cell array of cell array of character vectors.

      {{{'z:\ dataSets','/mynetwork/dataSets'}; ... {'y:\ dataSets','/mynetwork2/dataSets','s:\ dataSets'}}}

的价值“替代filesystemroots”必须满足这些条件:

  • Contains one or more rows, where each row specifies a set of equivalent root paths.

  • 每行指定多个根路径,每个根路径必须至少包含两个字符。

  • 根路径是唯一的,不是彼此的子文件夹。

  • Contains at least one root path entry that points to the location of the files.

有关更多信息,请参阅Set Up Datastore for Processing on Different Machines or Clusters.

例子:["Z:\datasets","/mynetwork/datasets"]

Data Types:string|细胞

Object Functions

hasdata Determine if data is available to read
numpartitions 数据存储分区的数量
partition 分区数据存储
preview Preview subset of data in datastore
在数据存储中读取数据
读all Read all data in datastore
重置 Reset datastore to initial state
转换 Transform datastore
结合 Combine data from multiple datastores
isPartitionable 确定数据存储是否可以分区
isShuffleable Determine whether datastore is shuffleable

例子

全部收缩

从示例文件创建数据存储,mapredout.mat, which is an output file of theMapReducefunction.

fs = matlab.io.datastore.fileset("mapredout.mat"); ds = datastore(fs,“类型”,“核心价值”)
ds =KeyValueDatastorewith properties:文件:{'... \ matlab \ toolbox \ matlab \ demos \ demos \ mapredout.mat'}

Set theReadSizeproperty to8so that each call to read reads at most8key-value pairs.

ds.ReadSize = 8
ds =KeyValueDatastorewith properties:Files: { '...\matlab\toolbox\matlab\demos\mapredout.mat' } ReadSize: 8 key-value pairs FileType: 'mat' AlternateFileSystemRoots: {}

Read 8 key-value pairs at a time using the功能尽管loop. The loop executes until there is no more data available to read andhasdata(DS)returnsfalse.

尽管hasdata(ds)t = read(ds);end

Show the last set of key-value pairs read.

T
T=5×2 tableKey Value ______ ________ {'OO'} {[3090]} {'TZ'} {[ 216]} {'XE'} {[2357]} {'9E'} {[ 521]} {'YV'} {[ 849]}

限制

  • KeyValueDatastore不支持命令序列文件万博1manbetxten in R2013b. Rewrite the sequence files using a version of MATLAB between R2014a and R2018a.

Version History

在R2014b中引入