Load large files into a buffer variable

(First Steps in FBSL v3)

Load large files into a buffer variable

Unread postby Stefan Schnell » Mon Apr 06, 2015 9:20 pm

Hello community,

I want to load a file in binary mode with 920 MB into a buffer variable, but FBSL crashes. What is the limit of file size to load it into a buffer? Is there a way to catch those errors? Is there another way to load big size files?

Thanks in advance.

Cheers
Stefan
Visit my homepage
or meet me at XING or at the SCN
User avatar
Stefan Schnell
FBSL geek
FBSL geek
 
Posts: 132
Joined: Thu Aug 22, 2013 7:48 am
Location: Germany - Oberirsen

Re: Load large files into a buffer variable

Unread postby Gerome » Tue Apr 07, 2015 9:49 am

Dear Stefan,

Can you provide the very exact script you used in order to make me able to reproduce the bug please?
Thanks in advance.
Yours,

(¯`·._.·[Gerome GUILLEMIN]·._.·´¯)
:: Full SETUP w. HELP 05th of December 2011 ::
http://www.fbsl.net/setup/FBSLv3.exe [full v3.4.10 installation pack]
http://www.fbsl.net/setup/FBSLv3bin.zip [minimal upgrade to v3.4.10]
Laissons les jolies femmes aux hommes sans imagination. / Let us leave pretty women to men without imagination.(M.Proust)
The success is a defeat for the one who does not want to dance any more! (H.F. Thiefaine)
User avatar
Gerome
FBSL Administrator
FBSL Administrator
 
Posts: 3149
Joined: Sat Mar 12, 2005 9:06 pm
Location: Paris -- France

Re: Load large files into a buffer variable

Unread postby Stefan Schnell » Wed Apr 08, 2015 1:51 pm

Hello Gerome,

sure:

Code: Select all
//-Begin----------------------------------------------------------------

  //-Directives---------------------------------------------------------
    #Option Explicit
    #AppType Console

  //-Includes-----------------------------------------------------------
    #Include <Include/Windows.inc>

  //-Sub Test-----------------------------------------------------------
    Sub Test(FileName As String)

      //-Variables------------------------------------------------------
        Dim hFile As Integer
        Dim Buffer As String

      hFile = FileOpen(FileName, Binary)
      ? hFile
      If hFile Then
        ? "Start"
          Buffer = FileGet(hFile, FileLen(FileName))
        ? "End"
        FileClose(hFile)
      End If

    End Sub

  //-Sub Main-----------------------------------------------------------
    Sub Main()
      Test("Test.txt")                         //-Size 971.824.271 Bytes
      Pause
    End Sub

//-End------------------------------------------------------------------


Thanks for your help.

Cheers
Stefan
Visit my homepage
or meet me at XING or at the SCN
User avatar
Stefan Schnell
FBSL geek
FBSL geek
 
Posts: 132
Joined: Thu Aug 22, 2013 7:48 am
Location: Germany - Oberirsen

Re: Load large files into a buffer variable

Unread postby Gerome » Wed Apr 08, 2015 3:50 pm

Hello Stefan,

I have reproduced same error you reported using fileGet... While waiting for any fix, here's a real and fully functional script that loads and stores a 930 Mb CSV text file without any problem.
For the moment, prefer this solution if you want to adopt it, it works.

Code: Select all
     Dim vt[] = ArrayFromFile("TestHUGE3.txt", 128)
     Dim nbLines = Ubound(vt)
     ? "Number of lines read = ", nbLines
     ? "First line is : ", vt[0]
     ? "Last line is : ", vt[nbLines-1]
Yours,

(¯`·._.·[Gerome GUILLEMIN]·._.·´¯)
:: Full SETUP w. HELP 05th of December 2011 ::
http://www.fbsl.net/setup/FBSLv3.exe [full v3.4.10 installation pack]
http://www.fbsl.net/setup/FBSLv3bin.zip [minimal upgrade to v3.4.10]
Laissons les jolies femmes aux hommes sans imagination. / Let us leave pretty women to men without imagination.(M.Proust)
The success is a defeat for the one who does not want to dance any more! (H.F. Thiefaine)
User avatar
Gerome
FBSL Administrator
FBSL Administrator
 
Posts: 3149
Joined: Sat Mar 12, 2005 9:06 pm
Location: Paris -- France

Re: Load large files into a buffer variable

Unread postby Mike Lobanovsky » Wed Apr 08, 2015 6:00 pm

Gentlemen,


Analysis

1. FileGet(BINARY) will be the fastest way to load the file contents into one contiguous chunk of memory in one swoop. However, if the file is unreasonably large, say, on the order of half a gigabyte and more on a 32-bit platform, there simply may be no contiguous memory chunk of such size available because memory tends to get fragmented as the program runs, and especially if there are many string operations involved. In this case the memory allocation stage involved in a FileGet() call will fail.

FBSL relies on the msvcrt.dll system library's implementation of general-purpose memory manager for its allocation operations. There may be better implementations of general-purpose memory management with less fragmentation in later implementations of C runtime libraries like msvcrt10.dll/msvcrt20.dll/msvcrt40.dll/msvcr70.dll/msvcr100.dll/msvcr110.dll/etc. but it is only msvcrt.dll that is guaranteed to be present on a clean installation of any Windows version (Windows'95 up to Windows 10 TP) and also in all MS Windows emulations like ReactOS or Wine for Linux and Mac OSX. Please don't forget that the general-purpose Frestyle BASIC Script Language runs on any of these platforms almost equally well. 8)

Linking FBSL against another C runtime would make FBSL highly system dependent and might break its compatibility with the older Windows versions and/or alien platforms. I don't think that FBSL will be linked to any other C runtime library for as long as I remain a de-facto FBSL project lead. :)

2. FileInput() and FileGets() that read lines from a text file one by one will be the slowest solutions because both of these FBSL functions are internally built around the C function fgets(), which is relatively slow in its msvcrt.dll implementation. This implementation uses unbuffered hard disk access, i.e. it reads text lines directly from the hard disk in each call rather than from the hard disk cache. Thus it is heavily dependent on the mechanical parameters of your hard disk(s) rather than FBSL's interpretative speed or actual CPU throughput. Use these two functions with relatively small text files only.

3. ArrayFromFile() will be the optimum solution for unreasonably large files. It is also built around fgets() internally but all allocations are made as needed in small line-long memory chunks only and interconnected in a doubly linked list (all FBSL "dynamic arrays" are in fact doubly linked lists of Variant variables), hence the chance of allocation failure is negligeably low. Since the entire allocation/reading/interconnection loop runs in pure machine code rather than in FBSL's interpretative For/Next loop, the overall speed is very high despite the unbuffered fgets() calls.


Conclusion

Prefer to use ArrayFromFile() to read text/CSV files when you expect them to be unreasonably large. Prefer to use your own AllocPtr() memory buffer allocations (you can evaluate its return values for NULL in case of allocation failure and react accordingly), and call the fopen()/fread()/fclose() C functions from msvcrt.dll which you can #DllDeclare or #DllInputs in your script to implement your own custom file access, in case you need to load unreasonably large chunks of binary data.
Mike
"Я старый солдат, мадам, и не знаю слов любви."
"I am an old soldier, ma'am, and I don't know the words of love."
"Je suis un vieux soldat, madame, et je ne connais pas les mots d'amour."
"Ich bin ein alter Soldat, gnädige Frau, und ich weiß nicht die Worte der Liebe."

__________________________________________________________________________________________________________________________________________________
(3.2GHz i5 Core Quad, 8GB RAM / 2 x nVidia GTX 550Ti SLI-bridged, 2GB VRAM)
(x86 Win XP Pro Russian Sp3/x86 Win Vista Ultimate Sp2/x64 Win 7 Ultimate Sp1/Wine in x64 elementaryOS Luna)
User avatar
Mike Lobanovsky
FBSL Administrator
FBSL Administrator
 
Posts: 1823
Joined: Tue Apr 19, 2005 8:22 am
Location: Republic of Belarus

Re: Load large files into a buffer variable

Unread postby Stefan Schnell » Fri Apr 10, 2015 12:22 pm

Hello Gerome and Mike,

thank you very much for your help and explanations.

Best regards
Stefan
Visit my homepage
or meet me at XING or at the SCN
User avatar
Stefan Schnell
FBSL geek
FBSL geek
 
Posts: 132
Joined: Thu Aug 22, 2013 7:48 am
Location: Germany - Oberirsen


Return to FBSL Newbies' Board

Who is online

Users browsing this forum: No registered users and 1 guest