woc.local
Returns the 32 bit FNV-1a hash value for the given data.
>>> hex(fnvhash('foo'))
'0xa9f37ed7'
Perl BER unpacking. BER is a way to pack several variable-length ints into one binary string. Here we do the reverse. Format definition: from http://perldoc.perl.org/functions/pack.html (see "w" template description)
Parameters
- buf: a binary string with packed values
Returns
a list of unpacked values
>>> unber(b'\x00\x83M')
[0, 461]
>>> unber(b'\x83M\x96\x14')
[461, 2836]
>>> unber(b'\x99a\x89\x12')
[3297, 1170]
Get length of uncompressed data from a header of Compress::LZF output.
Check Compress::LZF sources for the definition of this bit magic: (namely, LZF.xs, decompress_sv) https://metacpan.org/source/MLEHMANN/Compress-LZF-3.8/LZF.xs
Parameters
- **raw_data: data compressed with Perl
Compress:**: LZF
Returns
(header_size, uncompressed_content_length) in bytes
>>> lzf_length(b'\xc4\x9b')
(2, 283)
>>> lzf_length(b'\xc3\xa4')
(2, 228)
>>> lzf_length(b'\xc3\x8a')
(2, 202)
>>> lzf_length(b'\xca\x87')
(2, 647)
>>> lzf_length(b'\xe1\xaf\xa9')
(3, 7145)
>>> lzf_length(b'\xe0\xa7\x9c')
(3, 2524)
lzf wrapper to handle perl tweaks in Compress::LZF
This function extracts uncompressed size header and then does usual lzf decompression.
Parameters
- **raw_data: data compressed with Perl
Compress:**: LZF
Returns
unpacked data
Try to decompress raw_data, return raw_data if it fails
Slice raw_data into 20-byte chunks and hex encode each of them It returns tuple in order to be cacheable
Decode raw_data, detect the encoding if utf-8 fails
Cache TCHashDB objects
Get shard id
Decode values from tch maps.
Decode a tree binary object into tuples.
Python: 4.77 µs, Cython: 280 ns Reference: https://stackoverflow.com/questions/14790681/
>>> decode_tree(b'100644 .gitignore\x00\x8e\x9e\x1f...')
[('100644', '.gitignore', '8e9e1...'), ...]
Decode git commit objects into tuples.
Python: 2.35 µs, Cython: 855 ns Reference: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
>>> decode_commit(b'tree f1b66dcca490b5c4455af319bc961a34f69c72c2\n...')
('f1b66dcca490b5c4455af319bc961a34f69c72c2',
('c19ff598808b181f1ab2383ff0214520cb3ec659',),
('Audris Mockus <audris@utk.edu> 1410029988', '1410029988', '-0400'),
('Audris Mockus <audris@utk.edu>', '1410029988', '-0400'),
'News for Sep 5, 2014\n')
Decode git tag objects into tuples.
decode_tag(b'object fcadcb9366d4a011039e384affa10961e99cf2c4 type commit tag eccube-2.11.1 tagger nanasess
1303788649 +0000
Added tags/eccube-2.11.1
')
('fcadcb9366d4a011039e384affa10961e99cf2c4', 'commit', 'eccube-2.11.1', 'nanasess
Read a .large. and return its content.
Parameters
- path: path to the file
- dtype: data type
- offset: offset to start reading. It is either 0 or after the last separator.
- length: length to read. It should be longer than the longest record.
Returns
a tuple of bytes and the next offset, None if EOF. Returned bytes must not begin or end with a separator.
Initialize local WoC maps with a profile.
Parameters
- profile_path: path to the woc profile.
if not provided, use
./wocprofile.json
,~/.wocprofile.json
,/home/wocprofile.json
,/etc/wocprofile.json
. - version: version of the profile, default to the latest version. can be a single version like 'R' or a list of versions like ['R', 'U'].
- on_large: how to handle large files, default to 'all' (read all content). 'ignore' to ignore large files, 'head' to read only the first chunk.
Eqivalent to getValues in WoC Perl API.
Parameters
- map_name: The name of the map, e.g. 'c2p', 'c2r', 'P2c'
- key: The key of the object. For git objects, it is the SHA-1 hash of the object (in bytes or hex string). For other objects like Author, it is the name of the object.
Returns
The value of the object. Can be a list of strings, a tuple of strings, or a list of tuples of strings. Please refer to the documentation for details.
>>> self.get_values('P2c', 'user2589_minicms')
['05cf84081b63cda822ee407e688269b494a642de', ...]
>>> self.get_values('c2r', 'e4af89166a17785c1d741b8b1d5775f3223f510f')
('9531fc286ef1f4753ca4be9a3bf76274b929cdeb', 27)
>>> self.get_values('b2fa', '05fe634ca4c8386349ac519f899145c75fff4169')
('1410029988',
'Audris Mockus <audris@utk.edu>',
'e4af89166a17785c1d741b8b1d5775f3223f510f')
Similar to get_values, but returns a generator instead of a list. This is useful when querying large maps (on_large='all').
Parameters
- map_name: The name of the map, e.g. 'c2p', 'c2r', 'P2c'
- key: The key of the object. For git objects, it is the SHA-1 hash of the object (in bytes or hex string). For other objects like Author, it is the name of the object.
Returns
The value of the object. Can be a list of strings, a tuple of strings, or a list of tuples of strings. Please refer to the documentation for details.
>>> list(self.iter_values('P2c', 'user2589_minicms'))
['05cf84081b63cda822ee407e688269b494a642de', ...]
Eqivalent to showCnt in WoC Perl API.
Parameters
- obj_name: The name of the object, e.g. 'blob', 'tree', 'commit'
- key: The key of the object. It is the SHA-1 hash of the object (in bytes or hex string).
Returns
The content of the object. Can be a list of tuples of strings, a string, or a tuple of strings.
>>> self.show_content('blob', '05fe634ca4c8386349ac519f899145c75fff4169')
'This is the content of the blob'
Eqivalent to showCnt in WoC perl API
>>> self.show_content('tree', '7a374e58c5b9dec5f7508391246c48b73c40d200')
[('100644', '.gitignore', '8e9e1...'), ...]
>>> self.show_content('commit', 'e4af89166a17785c1d741b8b1d5775f3223f510f')
('f1b66dcca490b5c4455af319bc961a34f69c72c2',
('c19ff598808b181f1ab2383ff0214520cb3ec659',),
('Audris Mockus <audris@utk.edu> 1410029988', '1410029988', '-0400'),
('Audris Mockus <audris@utk.edu>', '1410029988', '-0400'),
'News for Sep 5, 2014\n')
Count the number of keys in a map.
Parameters
- map_name: The name of the mapping / object, e.g. 'c2p', 'c2r', 'commit'.
Returns
The number of keys in the tch databases plus the number of large files.
>>> self.count('c2r')
12345
Iterate over all keys in a map.
Parameters
- map_name: The name of the mapping / object, e.g. 'c2p', 'c2r', 'commit'. When on_large is 'ignore', keys in large maps are excluded.
Returns
A generator of keys in the map.
>>> for key in self.iter_map('P2c'):
... print(key) # hash or encoded string
Inherited Members
- woc.base.WocMapsBase
- maps
- objects