When working with large data that cannot fit into the 2 GB of RAM, how can we push the data into DataFrames?
One way is to chunk it into groups:
#--------------------------------------------------------------------------
def grouper_it(n, iterable):
"""
creates chunks of cursor row objects to make the memory
footprint more manageable
"""
it = iter(iterable)
while True:
chunk_it = itertools.islice(it, n)
try:
first_el = next(chunk_it)
except StopIteration:
return
yield itertools.chain((first_el,), chunk_it)
This code takes an iterable object (has next() defined at Python 2.7 or __next__() for Python 3.4) and makes other iterators of size n where n is a whole number (integer).
Example Usage:
import itertools
import os
import json
import arcpy
import pandas as pd
with arcpy.da.SearchCursor(fc, ["Field1", "Field2"]) as rows:
groups = grouper_it(n=50000, iterable=rows)
for group in groups:
df = pd.DataFrame.from_records(group, columns=rows.fields)
df['Field1'] = "Another Value"
df.to_csv(r"\\sever\test.csv", mode='a')
del group
del df
del groups
This is one way to manage your memory footprint by loading records in smaller bits.Some considerations on 'n'. I found the following effects the size of 'n': number of columns, field length, and data types.